-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8302590: Add String.indexOf(int ch, int fromIndex, int toIndex) #12600
Conversation
👋 Welcome back rgiulietti! A progress list of the required criteria for merging this PR into |
@rgiulietti The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
The underlying functionality is already present in non-publicly accessible methods. These are intrinsified, leveraging SIMD instructions where available. This PR proposed to add a public API point to the underlying methods. |
Could we also get a similar overload for |
May I ask for comments or for a review? |
Is there any performance impact? Are there relevant JHM tests? Or should be added. |
@RogerRiggs Which performance impacts are you concerned with? Perhaps the additional invocation detour in AFAICT, there's nothing in |
* {@code toIndex}, then {@code -1} is returned. | ||
* | ||
* <p>There are no restrictions on the value of {@code fromIndex} and | ||
* {@code toIndex}. Negative values have the same effect as it they were zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"as it" -> "as if"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JMH does not indicate any performance regression.
W.r.t. renaming Java parameters in @IntrinsicCandidate
methods, I crosschecked with a runtime compiler guy. He confirms that changing names in the Java implementations is irrelevant for the intrinsified code. There might be references to the names in native comments, but this is not the case for indexOf
. However, to be on the safe side I reverted those renames back to the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming
There should be no or little impact from the extra level of calls but its worth confirming. |
* as if they were equal to the length of this string. | ||
* | ||
* <p>As consequence of these rules, if {@code fromIndex} is greater than | ||
* or equal to {@code toIndex}, then {@code -1} is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other examples in String where the equivalent of fromIndex > toIndex doesn't throw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply extrapolated the behavior from `indexOf(int ch,int fromIndex), which has a similar note:
* There is no restriction on the value of {@code fromIndex}. If it
* is negative, it has the same effect as if it were zero: this entire
* string may be searched. If it is greater than the length of this
* string, it has the same effect as if it were equal to the length of
* this string: {@code -1} is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the behavior of the underlying implementation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would expect string.substring(fromIndex, toIndex).indexOf(ch)
to behave isomorphically to string.indexOf(ch, fromIndex, toIndex)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you would also expect string.substring(fromIndex).indexOf(ch)
to behave isomorphically to string.indexOf(ch, fromIndex)
in current releases, right?
It does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlanBateman I can propose variants of forward-searching some-prefixIndexOf()
which are consistent with substring()
in their handling of the indices, but I'd then rather prefer to file another JBS issue and prepare another PR.
The proposed 3 parameters variant of indexOf()
is the real "primitive" of the family. The other ones are mere shortcuts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parking this PR while exploration is done into other options is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlanBateman Both from an @apiNote
and from an implementation perspective, the checked some-prefixIndexOf()
variants would better make use of the public 3 params indexOf()
API point proposed here, rather than relying on StringLatin1
and StringUTF16
internals.
Thus, parking this PR would mean that the other PR could not refer to the API point from here until this one is integrated.
Perhaps I'm missing something.
Is it perhaps better to add the checked variants directly in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion is to write down some alternatives to the 3-arg indexOf method and work through some examples to see if they are less error prone. It might be that you decide to move forward with the current proposal or it might be that you decide a different method would help avoid some common mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An obvious alternative that comes to mind is an additional checkedIndexOf(int ch, int fromIndex, int toIndex)
that behaves like substring()
in checking the indices, and like the 3 params indexOf()
when not throwing.
Depending on the invocation context, this might help in some cases, in particular when the arguments have not been validated before: a programmer can then rely on the "battery-included" check.
*/ | ||
public int indexOf(int ch, int fromIndex, int toIndex) { | ||
return isLatin1() ? StringLatin1.indexOf(value, ch, fromIndex, toIndex) | ||
: StringUTF16.indexOf(value, ch, fromIndex, toIndex); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you've add @since 21
before integrating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reluctant to add it, as it is not yet known when the PR will be integrated. I'll add it in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the CSR is approved is a good trigger for adding @SInCE. But its fine to be optimistic in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
Added a |
CSR updated accordingly. |
In concept, having APIs that search a string subrange is a fine idea. Unfortunately the exception handling policy is an issue. We need to think through this carefully. Currently, almost everything that takes some kind of String index or subrange will throw IndexOutOfBoundsException if the arguments are ill-specified in some fashion. There are a few notable outliers though:
They don't throw any exceptions for ill-defined values; instead, they return -1 or false which is indistinguishable from "not found". These APIs date all the way back to 1.0. Note that the JDK 1.0 String wasn't internally consistent. For example, other 1.0-era methods like The prevailing API style since that time has been to check arguments and throw the appropriate exceptions, instead of masking ill-defined values by returning "not found". This tends to reveal errors earlier instead of covering them up. Compared to the existing In understand the desire to be consistent with the existing methods. Adding a new non-throwing method seems like a mistake though. It's perpetuating an old API design mistake for the sake of consistency, while also being inconsistent with current API design style. I also don't think it's necessary to have both throwing and non-throwing methods. I'd suggest returning to the original Another possible mitigation is to add API notes to highlight the unusual behavior of the old non-throwing methods. Some of these old methods don't mention their handling of illegal index values at all. (This could be done as part of a separate PR.) |
@stuart-marks I agree. My insistence on preserving old (but bad) behavior was well intended, but I'm now convinced that the new 3 params I'll thus add a check in its implementation, adapt the spec, remove the additional |
The proposed |
This is looking good. |
@RogerRiggs Yes, now that we settled on the "throwing" behavior, it is simpler to have a similar behavior with a future |
* | ||
* @apiNote | ||
* An invocation of this method returns -1 when {@code fromIndex} happens | ||
* to be too large. The result is thus indistinguishable from a genuine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding an apiNote to the existing indexOf(int, int) is good but I think it will need a bit word smithing, e.g. "happens to be too large" is a bit too casual. I think I would start with saying the method returns -1 if fromFrom is negative or >= the string length, it does not throw an exception if called with an out of range index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
(A negative fromIndex
does not necessarily result in -1, though.)
* @throws StringIndexOutOfBoundsException if {@code fromIndex} | ||
* is negative, or {@code toIndex} is larger than the length of | ||
* this {@code String} object, or {@code fromIndex} is larger than | ||
* {@code toIndex}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, I think you've got to a good place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuart-marks was quite convincing ;-)
The commit comments could be more informative and useful. |
The last commit renames the new method's parameters to align with other methods (like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version looks good. Thanks for going through the process of getting this to a good place. One minor comment is that the test should probably in the java/lang/String rather than the CompactString sub-directory.
@AlanBateman There's a I have no problem in moving the new test file to the parent folder, but would like to understand more about the distinction between the two groups of tests. Could you expand a bit on this to help me getting a better picture? |
String dates from JDK 1.0 and didn't historically have many tests except for some regression tests that accumulated along the way. Recent work on new String APIs (repeat, indent, ..) added to the tests in java/lang/String. The CompactString sub-directory is from the JEP 254 work in JDK 9 - it needed good tests to exercise all methods/cases. It might get onto someone radar sometime to do some consolidation and make it less confusing as to whether to put new tests. |
… toIndex)" Moved and renamed test file to parent folder
@rgiulietti This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 248 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
/integrate |
Going to push as commit 5b2e2e4.
Your commit was automatically rebased without conflicts. |
@rgiulietti Pushed as commit 5b2e2e4. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Add an
indexOf()
variant allowing to specify both a lower and an upper bound on the search.Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12600/head:pull/12600
$ git checkout pull/12600
Update a local copy of the PR:
$ git checkout pull/12600
$ git pull https://git.openjdk.org/jdk pull/12600/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 12600
View PR using the GUI difftool:
$ git pr show -t 12600
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12600.diff