Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8065554: MatchResult should provide values of named-capturing groups #10000

Closed
wants to merge 7 commits into from

Conversation

rgiulietti
Copy link
Contributor

@rgiulietti rgiulietti commented Aug 24, 2022

Add support for named groups to java.util.regex.MatchResult


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires a CSR request to be approved

Issues

  • JDK-8065554: MatchResult should provide values of named-capturing groups
  • JDK-8292872: MatchResult should provide values of named-capturing groups (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10000/head:pull/10000
$ git checkout pull/10000

Update a local copy of the PR:
$ git checkout pull/10000
$ git pull https://git.openjdk.org/jdk pull/10000/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10000

View PR using the GUI difftool:
$ git pr show -t 10000

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10000.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 24, 2022

👋 Welcome back rgiulietti! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added csr Pull request needs approved CSR before integration rfr Pull request is ready for review labels Aug 24, 2022
@openjdk
Copy link

openjdk bot commented Aug 24, 2022

@rgiulietti The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Aug 24, 2022
@mlbridge
Copy link

mlbridge bot commented Aug 24, 2022

* this method to work. However, overriding this method directly
* might be preferable for other reasons.
*
* @since 20
Copy link
Member

@dfuch dfuch Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the method declare that it throws UnsupportedOperationsExceptions?
Because that is what will happen if namedGroups is not overridden/implemented.

Same comment for the other new methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure.
If the convention is to document every single RuntimeException that methods invoked by this one could throw, then yes.
In other words, should RuntimeExcpetions thrown deep in an invocation stack be documented in every caller method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, yes. In practice, I see that namedGroups doesn't have an @throws UnsupportedOperationException but has an @implSpec that says that the default implementation throws UnsupportedOperationException. This seems strange to me - maybe @stuart-marks or @jddarcy can comment.

What I was hinting at here however is that we might want to extend the @implSpec of the new methods to note that these method will throw UnsuportedOperationException if namedGroups is not implemented (like the @implSpec of namedGroups does).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.
So what you mean is not adding another @throws clause but to either improve @implNote or, better, to add @implSpec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some inline comments on the @implSpec. But I do think that these methods require @throws UnsupportedOperationException for the cases where they don't support named groups.

@rgiulietti
Copy link
Contributor Author

Addressed concerns about undocumented UnsuportedOperationException.

* @implSpec
* The default implementation of this method throws
* {@link UnsupportedOperationException} if {@link #namedGroups()} is not
* overridden.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The essential thing for @implSpec is to describe "self-use" of methods on this object. This is important for subclassers to know whether they can inherit the default implementation or whether they should override it. It looks like start(String) does the following:

  • calls namedGroups() to obtain a mapping from group names to group numbers, propagating UOE if namedGroups() throws it
  • if name is not present in the group map, throws IAE
  • calls start() on the group number obtained from the map, and returns that value

I don't think we need to go to the level of detail about whether get or containsKey is called on the map, but I think the self-calls to namedGroups() and start(int) are important.

Similar comments apply to the @implSpec comments of end(String) and group(String).

* The default implementation of this method makes use of the map returned
* by {@link #namedGroups()}. It is thus sufficient to override
* {@link #namedGroups()} for this method to work. However, overriding this
* method directly might be preferable for performance or for other reasons.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This @implNote text, is repeated in three different methods. Consider moving this to the class specification. It might make it a bit easier for implementors to see a central overview instead of having this information in each method.

* {@link UnsupportedOperationException}.
*
* @apiNote
* This method must be overridden by an implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit odd. It sounds like existing MatchResult implementations (outside the JDK) are now invalid. I think it really means something like, "This method must be overridden by an implementation in order to provide valid information about whether this MatchResult contains a match." I'm not sure whether saying this is necessary; it could be omitted.

Probably also needs an @throws UnsupportedOperationException in case the match information is unavailable.

r.end("noSuchGroup");
r.group("noSuchGroup");
} catch (IllegalArgumentException e) { // swallowing intended
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If MatchResult is behind correctly, the call to r.start("noSuchGroup") will always throw an exception and the subsequent calls to r.end and r.group will never be executed. This potentially will miss testing of those methods.

result.end("noSuchGroup");
result.group("noSuchGroup");
} catch (IllegalArgumentException e) { // swallowing intended
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar issue here as above.

@stuart-marks
Copy link
Member

Overall the specs, code, and tests look pretty good. I do think some areas of the spec need updating; sorry I didn't get to this before you created the CSR.

The test is OK, but it's starting to get to the point where it would be profitable to use TEST-NG data providers to collapse some of the test cases. We have two implementations of MatchResult: one is Matcher itself, and the other is the internal implementation returned by toMatchResult(). The setup for them differs somewhat, but the assertions should all be the same. This is kind of hard to see with separate test methods for Matcher and MatchResult. The test is reasonable as it stands, but we'll see what it looks like after the cases for checking exceptions from start(String), end(String), and group(String) are expanded.

@rgiulietti
Copy link
Contributor Author

Addressed concerns about spec details.
CSR will be updated once the discussion about the spec has settled and the wording has stabilized.

@stuart-marks
Copy link
Member

Good updates to the test, the @throws clauses, and the @implNote in the class specification. I don't know if I was clear before, but all the new default methods require @implSpec clauses that explain in some detail what they do, in particular self-use -- see my previous comment. (Or maybe you're still working on this.)

@rgiulietti
Copy link
Contributor Author

If the most recent commit is OK in terms of Javadoc, I'll update the CSR accordingly

@stuart-marks
Copy link
Member

Spec looks good. Let me know when you're done updating the CSR so I can mark it reviewed.

testMatchResultStartEndGroup1();
testMatchResultStartEndGroup2();
testMatchResultStartEndGroup3();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three numbered tests here are a little hard to follow. Looks like these tests are

  1. test existing group names, with no match
  2. test existing group names, with a successful match
  3. test nonexistent group names, with a successful match

On test names, sometimes people provide extremely verbose test names such as testThatExistingGroupNameWithMatchReturnsNegativeOrNull which I think is overkill, but having a name that's somewhat descriptive would be helpful.

It looks like a case is missing, which is a test for a nonexistent group name on a MatchResult that doesn't have a successful match. I'm not sure which is checked first; I think the implementation would throw IAE, because of the nonexistent name, regardless of whether or not the MatchResult has a match.

However, I don't think we've specified this, and in fact I don't think we want to. In general though, if multiple error conditions can arise in the same operation, the general style is not to constrain implementations to check for things in a particular order. Thus either IAE or ISE would be acceptable. Perhaps a test should be added for that. (Hm, might want to take another look at the specs regarding this issue.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Hm, might want to take another look at the specs regarding this issue.)

Not sure who wants to take another look. If that it's you, then I'll wait with the CSR.
I'll change the method names to something a bit more speaking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should have been more specific. "Somebody" should take another look. :-) Well, anyway, I did, and the specification as written does not indicate which error condition is checked first. I think this is OK, so I don't think any changes are necessary. You might mention this in the text of the CSR; I know that Joe and I have discussed this issue previously, and he might have a recommendation.

testMatchResultHasMatch();

testMatchResultStartEndGroup();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through all the tests in great detail (yet), but it occurs to me that we potentially have THREE implementations of some of the logic, and strictly speaking we should test all the code paths. The three implementations are in:

  1. Matcher
  2. Matcher$ImmutableMatchResult
  3. MatchResult's default methods

I took a quick look and it looks like Matcher and Matcher$ImmutableMatchResult override the default methods, so the default methods themselves need to be tested. This is essentially testing the @implSpec. The typical way to do that is to have the test create its own MatchResult implementation(s). There might need to be implementations that do and do not implement namedGroups, in order to test UOE. They might also need some state to cover various cases of no-match, has-match with and without group names, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An implementation that overrides namedGroups without overriding the other methods accepting group names is Matcher$ImmutableMatchResult, which is already exercised in the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, as long as all the code paths are covered.

@rgiulietti
Copy link
Contributor Author

rgiulietti commented Aug 31, 2022

Added an implementation of MatchResult in the test class that does not override any of the default methods.
More speaking method names.

@rgiulietti
Copy link
Contributor Author

CSR updated to current status.

* @return an unmodifiable map from capturing group names to group numbers
*
* @throws UnsupportedOperationException
* The default implementation of this method always throws
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @throws clause needs to state a general contract over all implementations, so it should say something like, "@throws UOE if the implementation does not support named groups". Then, the specification for the default implementation always throwing UOE should be moved to @implSpec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @return whether {@code this} contains a valid match
*
* @throws UnsupportedOperationException
* The default implementation of this method always throws
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here as above, though it has nothing to do with named groups. The @throws clause should say it throws UOE if "the implementation cannot report whether or not it has a match" or some such. This is a bit odd, but the specification needs to be permissive enough so that it doesn't invalidate existing implementations outside the JDK.

As before, the "always throws" should be moved to @implSpec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Sep 23, 2022
Copy link
Member

@stuart-marks stuart-marks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I took another look at everything and I think this is good to integrate.

The tests seem adequate but it seems like they would benefit from some refactoring. It might be an interesting exercise to revisit them and try out the new JUnit 5 APIs.

@openjdk
Copy link

openjdk bot commented Sep 28, 2022

@rgiulietti This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8065554: MatchResult should provide values of named-capturing groups

Reviewed-by: smarks

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 401 new commits pushed to the master branch:

  • 5e1e449: 8290920: sspi_bridge.dll not built if BUILD_CRYPTO is false
  • d827fd8: 8294430: RISC-V: Small refactoring for movptr_with_offset
  • 9d76ac8: 8292158: AES-CTR cipher state corruption with AVX-512
  • e5b65c4: 8290482: Update JNI Specification of DestroyJavaVM for better alignment with JLS, JVMS, and Java SE API Specifications
  • f8d9fa8: 8294483: Remove vmTestbase/nsk/jvmti/GetThreadState tests.
  • 6ad151d: 8293143: Workaround for JDK-8292217 when doing "step over" of bytecode with unresolved cp reference
  • 22b59b6: 8294471: SpecTaglet is inconsistent with SpecTree for inline property
  • 763d4bf: 8293592: Remove JVM_StopThread, stillborn, and related cleanup
  • 739fdec: 8289162: runtime/NMT/ThreadedMallocTestType.java should print out memory allocations to help debug
  • a11477c: 8289797: tools/launcher/I18NArgTest.java fails on Japanese Windows environment
  • ... and 391 more: https://git.openjdk.org/jdk/compare/71ab5c95af28497fb31aba8ba9597da71bc4d3d5...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 28, 2022
@rgiulietti
Copy link
Contributor Author

I'll integrate now but agree to open another PR to make use of JUnit in the tests, perhaps later this week or next week.

@rgiulietti
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Sep 29, 2022

Going to push as commit ce85cac.
Since your change was applied there have been 422 commits pushed to the master branch:

  • 1decdce: 8294492: RISC-V: Use li instead of patchable movptr at non-patchable callsites
  • 8491fd5: 8294551: Put java/io/BufferedInputStream/TransferTo.java on problem list
  • 6f8f28e: 8294160: misc crash dump improvements
  • 8873192: 8293515: heapShared.cpp: rename JavaThread parameter to current
  • 76f1865: 8293563: [macos-aarch64] SA core file tests failing with sun.jvm.hotspot.oops.UnknownOopException
  • 9db95ed: 8215788: Clarify JarInputStream Manifest access
  • 9309786: 8294472: Remove redundant rawtypes suppression in AbstractChronology
  • 3b7fc80: 8294411: SA should provide more useful info when it fails to start up due to "failed to workaround classshareing"
  • 4fb424b: 8293961: Unused ClassPathZipEntry::contents_do
  • 7515b30: 8279283: BufferedInputStream should override transferTo
  • ... and 412 more: https://git.openjdk.org/jdk/compare/71ab5c95af28497fb31aba8ba9597da71bc4d3d5...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 29, 2022
@openjdk openjdk bot closed this Sep 29, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 29, 2022
@openjdk
Copy link

openjdk bot commented Sep 29, 2022

@rgiulietti Pushed as commit ce85cac.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants