Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8302590: Add String.indexOf(int ch, int fromIndex, int toIndex) #12600

Closed
wants to merge 8 commits into from

Conversation

rgiulietti
Copy link
Contributor

@rgiulietti rgiulietti commented Feb 16, 2023

Add an indexOf() variant allowing to specify both a lower and an upper bound on the search.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8302680 to be approved

Issues

  • JDK-8302590: Add String.indexOf(int ch, int fromIndex, int toIndex)
  • JDK-8302680: Add String.indexOf(int ch, int fromIndex, int toIndex) (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12600/head:pull/12600
$ git checkout pull/12600

Update a local copy of the PR:
$ git checkout pull/12600
$ git pull https://git.openjdk.org/jdk pull/12600/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12600

View PR using the GUI difftool:
$ git pr show -t 12600

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12600.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 16, 2023

👋 Welcome back rgiulietti! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 16, 2023
@openjdk
Copy link

openjdk bot commented Feb 16, 2023

@rgiulietti The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added core-libs core-libs-dev@openjdk.org csr Pull request needs approved CSR before integration labels Feb 16, 2023
@mlbridge
Copy link

mlbridge bot commented Feb 16, 2023

@rgiulietti
Copy link
Contributor Author

The underlying functionality is already present in non-publicly accessible methods. These are intrinsified, leveraging SIMD instructions where available.

This PR proposed to add a public API point to the underlying methods.

@ExE-Boss
Copy link

Could we also get a similar overload for String::lastIndexOf?

@rgiulietti
Copy link
Contributor Author

@ExE-Boss See the second half here for an explanation of why this is (currently) not part of this PR.

@rgiulietti
Copy link
Contributor Author

May I ask for comments or for a review?

@RogerRiggs
Copy link
Contributor

Is there any performance impact? Are there relevant JHM tests? Or should be added.
Also to verify that the intrinsics are still valid though the parameter names have changed.

@rgiulietti
Copy link
Contributor Author

@RogerRiggs Which performance impacts are you concerned with? Perhaps the additional invocation detour in indexOf(int ch, int fromIndex)?

AFAICT, there's nothing in vmIntrinsic.[hpp|cpp] that would relate to the Java parameter names. Only the method name and the (encoded) parameter types appear there. But of course, it would cost nothing to change them back if I'm wrong.

* {@code toIndex}, then {@code -1} is returned.
*
* <p>There are no restrictions on the value of {@code fromIndex} and
* {@code toIndex}. Negative values have the same effect as it they were zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"as it" -> "as if"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JMH does not indicate any performance regression.

W.r.t. renaming Java parameters in @IntrinsicCandidate methods, I crosschecked with a runtime compiler guy. He confirms that changing names in the Java implementations is irrelevant for the intrinsified code. There might be references to the names in native comments, but this is not the case for indexOf. However, to be on the safe side I reverted those renames back to the original.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming

@RogerRiggs
Copy link
Contributor

There should be no or little impact from the extra level of calls but its worth confirming.

* as if they were equal to the length of this string.
*
* <p>As consequence of these rules, if {@code fromIndex} is greater than
* or equal to {@code toIndex}, then {@code -1} is returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other examples in String where the equivalent of fromIndex > toIndex doesn't throw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply extrapolated the behavior from `indexOf(int ch,int fromIndex), which has a similar note:

     * There is no restriction on the value of {@code fromIndex}. If it
     * is negative, it has the same effect as if it were zero: this entire
     * string may be searched. If it is greater than the length of this
     * string, it has the same effect as if it were equal to the length of
     * this string: {@code -1} is returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the behavior of the underlying implementation as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would expect string.substring(fromIndex, toIndex).indexOf(ch) to behave isomorphically to string.indexOf(ch, fromIndex, toIndex)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you would also expect string.substring(fromIndex).indexOf(ch) to behave isomorphically to string.indexOf(ch, fromIndex) in current releases, right?
It does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlanBateman I can propose variants of forward-searching some-prefixIndexOf() which are consistent with substring() in their handling of the indices, but I'd then rather prefer to file another JBS issue and prepare another PR.

The proposed 3 parameters variant of indexOf() is the real "primitive" of the family. The other ones are mere shortcuts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parking this PR while exploration is done into other options is okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlanBateman Both from an @apiNote and from an implementation perspective, the checked some-prefixIndexOf()variants would better make use of the public 3 params indexOf() API point proposed here, rather than relying on StringLatin1 and StringUTF16 internals.

Thus, parking this PR would mean that the other PR could not refer to the API point from here until this one is integrated.

Perhaps I'm missing something.
Is it perhaps better to add the checked variants directly in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion is to write down some alternatives to the 3-arg indexOf method and work through some examples to see if they are less error prone. It might be that you decide to move forward with the current proposal or it might be that you decide a different method would help avoid some common mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An obvious alternative that comes to mind is an additional checkedIndexOf(int ch, int fromIndex, int toIndex) that behaves like substring() in checking the indices, and like the 3 params indexOf() when not throwing.

Depending on the invocation context, this might help in some cases, in particular when the arguments have not been validated before: a programmer can then rely on the "battery-included" check.

*/
public int indexOf(int ch, int fromIndex, int toIndex) {
return isLatin1() ? StringLatin1.indexOf(value, ch, fromIndex, toIndex)
: StringUTF16.indexOf(value, ch, fromIndex, toIndex);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you've add @since 21 before integrating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reluctant to add it, as it is not yet known when the PR will be integrated. I'll add it in the next commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the CSR is approved is a good trigger for adding @SInCE. But its fine to be optimistic in the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@rgiulietti
Copy link
Contributor Author

Added a checkedIndexOf(int ch, int fromIndex, int toIndex) variant that throws if
0 <= fromIndex <= toIndex <= this.length()
is not met.
Otherwise it behaves like indexOf(int ch, int fromIndex, int toIndex).

@rgiulietti
Copy link
Contributor Author

CSR updated accordingly.

@stuart-marks
Copy link
Member

In concept, having APIs that search a string subrange is a fine idea. Unfortunately the exception handling policy is an issue. We need to think through this carefully.

Currently, almost everything that takes some kind of String index or subrange will throw IndexOutOfBoundsException if the arguments are ill-specified in some fashion. There are a few notable outliers though:

indexOf(int ch, int fromIndex)
indexOf(String str, int fromIndex)
lastIndexOf(int ch, int fromIndex)
lastIndexOf(String str, int fromIndex)
regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)
regionMatches(int toffset, String other, int ooffset, int len)

They don't throw any exceptions for ill-defined values; instead, they return -1 or false which is indistinguishable from "not found". These APIs date all the way back to 1.0. Note that the JDK 1.0 String wasn't internally consistent. For example, other 1.0-era methods like substring throw StringIndexOutOfBoundsException.

The prevailing API style since that time has been to check arguments and throw the appropriate exceptions, instead of masking ill-defined values by returning "not found". This tends to reveal errors earlier instead of covering them up. Compared to the existing indexOf methods (which specify a single starting index), the new indexOf method specifies a subrange; this allows a new class of "end < start" errors. It seems strange not to throw any exception for this additional class of errors.

In understand the desire to be consistent with the existing methods. Adding a new non-throwing method seems like a mistake though. It's perpetuating an old API design mistake for the sake of consistency, while also being inconsistent with current API design style.

I also don't think it's necessary to have both throwing and non-throwing methods.

I'd suggest returning to the original indexOf(ch, from, to) proposal, but instead having it check its subrange arguments and throwing IndexOutOfBoundsException. I don't think the exception handling inconsistency with the existing methods is that bad. If people really, really object to it, then maybe it would be necessary to choose a new name for the new method (and not introduce two versions). But choosing a good name is hard, and it introduces issues such as discoverability and questions about how the new thing differs from the existing methods, so I'm skeptical that choosing a different name would be any better.

Another possible mitigation is to add API notes to highlight the unusual behavior of the old non-throwing methods. Some of these old methods don't mention their handling of illegal index values at all. (This could be done as part of a separate PR.)

@rgiulietti
Copy link
Contributor Author

@stuart-marks I agree.

My insistence on preserving old (but bad) behavior was well intended, but I'm now convinced that the new 3 params indexOf() better throws on illegal indices.

I'll thus add a check in its implementation, adapt the spec, remove the additional checkedIndexOf() method, and add API notes to the existing methods to highlight their unusual behavior w.r.t. illegal indices (assuming that adding these notes can be done in the same CSR issue).

@rgiulietti
Copy link
Contributor Author

The proposed indexOf() method now throws a StringIndexOutOfBoundsException on illegal indices.
The CSR has been updated.

@RogerRiggs
Copy link
Contributor

This is looking good.
When the single char version settles, a separate issue/PR should consider adding indexOf(String str, int fromIndex, int toIndex) providing the similar index checking variant for finding strings.

@rgiulietti
Copy link
Contributor Author

@RogerRiggs Yes, now that we settled on the "throwing" behavior, it is simpler to have a similar behavior with a future indexOf(String str, int fromIndex, int toIndex)

*
* @apiNote
* An invocation of this method returns -1 when {@code fromIndex} happens
* to be too large. The result is thus indistinguishable from a genuine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an apiNote to the existing indexOf(int, int) is good but I think it will need a bit word smithing, e.g. "happens to be too large" is a bit too casual. I think I would start with saying the method returns -1 if fromFrom is negative or >= the string length, it does not throw an exception if called with an out of range index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
(A negative fromIndex does not necessarily result in -1, though.)

* @throws StringIndexOutOfBoundsException if {@code fromIndex}
* is negative, or {@code toIndex} is larger than the length of
* this {@code String} object, or {@code fromIndex} is larger than
* {@code toIndex}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, I think you've got to a good place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuart-marks was quite convincing ;-)

@RogerRiggs
Copy link
Contributor

The commit comments could be more informative and useful.

@rgiulietti
Copy link
Contributor Author

The last commit renames the new method's parameters to align with other methods (like substring(int beginIndex, int endIndex)) that check that indices are in range.
It also rephrases the @apiNote on indexOf(int ch, int fromIndex).

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version looks good. Thanks for going through the process of getting this to a good place. One minor comment is that the test should probably in the java/lang/String rather than the CompactString sub-directory.

@rgiulietti
Copy link
Contributor Author

@AlanBateman There's a IndexOf.java test file already in the CompactString folder, which is why the new test file is located there, side-by-side to the current one. Both classes extend class CompactString, also residing in that folder, to get access to String constants defined there.

I have no problem in moving the new test file to the parent folder, but would like to understand more about the distinction between the two groups of tests. Could you expand a bit on this to help me getting a better picture?
TIA

@AlanBateman
Copy link
Contributor

@AlanBateman There's a IndexOf.java test file already in the CompactString folder, which is why the new test file is located there, side-by-side to the current one. Both classes extend class CompactString, also residing in that folder, to get access to String constants defined there.

I have no problem in moving the new test file to the parent folder, but would like to understand more about the distinction between the two groups of tests. Could you expand a bit on this to help me getting a better picture? TIA

String dates from JDK 1.0 and didn't historically have many tests except for some regression tests that accumulated along the way. Recent work on new String APIs (repeat, indent, ..) added to the tests in java/lang/String. The CompactString sub-directory is from the JEP 254 work in JDK 9 - it needed good tests to exercise all methods/cases. It might get onto someone radar sometime to do some consolidation and make it less confusing as to whether to put new tests.

… toIndex)"

Moved and renamed test file to parent folder
@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Mar 3, 2023
@openjdk
Copy link

openjdk bot commented Mar 3, 2023

@rgiulietti This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8302590: Add String.indexOf(int ch, int fromIndex, int toIndex)

Reviewed-by: rriggs, alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 248 new commits pushed to the master branch:

  • cd4b88d: 8292269: Replace FileMapInfo::fail_continue() with Unified Logging
  • ae29054: 8303175: (fs) Deprecate com.sun.nio.file.SensitivityWatchEventModifier for removal
  • a04b104: 8303413: (fs) Ignore polling interval sensitivity modifiers in PollingWatchService
  • 9944314: 8303587: Remove VMOutOfMemoryError001 test from the problem list after 8303198
  • a50dc67: 8303527: update for deprecated sprintf for jdk.hotspot.agent
  • 40c5edf: 8303181: Memory leak in ClassLoaderExt::setup_app_search_path
  • 29ee7c3: 8303523: Cleanup problem listing of nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java
  • e3016c1: 8303472: Display name for region TR
  • ae797c6: 8301117: Remove old_size param from ResizeableResourceHashtable::resize()
  • 5085bd5: 8297936: Use reachabilityFence to manage liveness in ClassUnload tests
  • ... and 238 more: https://git.openjdk.org/jdk/compare/52388179e65d4703ec33569dcc7c1351c57e6056...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 3, 2023
@rgiulietti
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Mar 3, 2023

Going to push as commit 5b2e2e4.
Since your change was applied there have been 248 commits pushed to the master branch:

  • cd4b88d: 8292269: Replace FileMapInfo::fail_continue() with Unified Logging
  • ae29054: 8303175: (fs) Deprecate com.sun.nio.file.SensitivityWatchEventModifier for removal
  • a04b104: 8303413: (fs) Ignore polling interval sensitivity modifiers in PollingWatchService
  • 9944314: 8303587: Remove VMOutOfMemoryError001 test from the problem list after 8303198
  • a50dc67: 8303527: update for deprecated sprintf for jdk.hotspot.agent
  • 40c5edf: 8303181: Memory leak in ClassLoaderExt::setup_app_search_path
  • 29ee7c3: 8303523: Cleanup problem listing of nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java
  • e3016c1: 8303472: Display name for region TR
  • ae797c6: 8301117: Remove old_size param from ResizeableResourceHashtable::resize()
  • 5085bd5: 8297936: Use reachabilityFence to manage liveness in ClassUnload tests
  • ... and 238 more: https://git.openjdk.org/jdk/compare/52388179e65d4703ec33569dcc7c1351c57e6056...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 3, 2023
@openjdk openjdk bot closed this Mar 3, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 3, 2023
@openjdk
Copy link

openjdk bot commented Mar 3, 2023

@rgiulietti Pushed as commit 5b2e2e4.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@rgiulietti rgiulietti deleted the JDK-8302590 branch March 3, 2023 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

6 participants