Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8311939: Excessive allocation of Matcher.groups array #14894

Closed
wants to merge 7 commits into from

Conversation

deathy
Copy link
Contributor

@deathy deathy commented Jul 15, 2023

Reduces excessive allocation of Matcher.groups array when the original Pattern has no groups or less than 9 groups.

Original clamping to 10 possibly due to documented behavior from javadoc:
"In this class, \1 through \9 are always interpreted as back references, "

Only with Matcher changes RegExTest.backRefTest fails when backreferences to non-existing groups are present.
Added a match failure condition in Pattern that fixes failing tests.

As per existing java.util.regex.Pattern.BackRef#match: "// If the referenced group didn't match, neither can this"

A group that does not exist in the original Pattern can never match so neither can a backref to that group.
If the group existed in the original Pattern then it would have had space allocated in Matcher.groups for that group index.
So a group index outside groups array length must never match.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8311939: Excessive allocation of Matcher.groups array (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14894/head:pull/14894
$ git checkout pull/14894

Update a local copy of the PR:
$ git checkout pull/14894
$ git pull https://git.openjdk.org/jdk.git pull/14894/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14894

View PR using the GUI difftool:
$ git pr show -t 14894

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14894.diff

Webrev

Link to Webrev Comment

@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label Jul 15, 2023
@bridgekeeper
Copy link

bridgekeeper bot commented Jul 15, 2023

Hi @deathy, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user deathy" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

@openjdk
Copy link

openjdk bot commented Jul 15, 2023

@deathy The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Jul 15, 2023
@bridgekeeper bridgekeeper bot removed the oca Needs verification of OCA signatory status label Jul 18, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 18, 2023
@mlbridge
Copy link

mlbridge bot commented Jul 18, 2023

@deathy
Copy link
Contributor Author

deathy commented Jul 22, 2023

updated to also reduce allocation in java.util.regex.Matcher#usePattern

all jdk_util tests passing

ran JMH java.util.regex.FindPattern test and times seem better but test is pretty light (large errors compared to avg score)

ran JMH with -prof gc:

Benchmark                                 (patternString)                                      (text)  Mode  Cnt    Score    Error   Units
FindPattern.testFind:·gc.alloc.rate.norm     [^A-Za-z0-9]  abcdefghijklmnop1234567890ABCDEFGHIJKLMNOP  avgt    3  207.999 ±  0.030    B/op
FindPattern.testFind:·gc.alloc.rate.norm     [^A-Za-z0-9]  ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,  avgt    3  208.000 ±  0.001    B/op
FindPattern.testFind:·gc.alloc.rate.norm      [A-Za-z0-9]  abcdefghijklmnop1234567890ABCDEFGHIJKLMNOP  avgt    3  207.861 ±  4.395    B/op
FindPattern.testFind:·gc.alloc.rate.norm      [A-Za-z0-9]  ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,  avgt    3  207.999 ±  0.031    B/op


Benchmark                                 (patternString)                                      (text)  Mode  Cnt    Score     Error   Units
FindPattern.testFind:·gc.alloc.rate.norm     [^A-Za-z0-9]  abcdefghijklmnop1234567890ABCDEFGHIJKLMNOP  avgt    3   56.181 ±   5.713    B/op
FindPattern.testFind:·gc.alloc.rate.norm     [^A-Za-z0-9]  ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,  avgt    3  136.000 ±   0.001    B/op
FindPattern.testFind:·gc.alloc.rate.norm      [A-Za-z0-9]  abcdefghijklmnop1234567890ABCDEFGHIJKLMNOP  avgt    3   56.000 ±   0.010    B/op
FindPattern.testFind:·gc.alloc.rate.norm      [A-Za-z0-9]  ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,  avgt    3  135.999 ±   0.028    B/op

As expected 72 byte/op reduction for case when pattern doesn't match.
Unexpected: seems double reduction in allocations for the case when pattern matches. Not completely sure where that is coming from. Maybe some optimizations in loops since these don't have groups and array is always 2 elements?

@deathy
Copy link
Contributor Author

deathy commented Jul 28, 2023

ping @shipilev, if you could take a look here

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the similar change be in CIBackRef.match too? The fact current tests do not catch it makes me uneasy: the test coverage seems to be rather low there.

We need a regex expert to look at it. @rgiulietti @igraves might help us out here?

src/java.base/share/classes/java/util/regex/Pattern.java Outdated Show resolved Hide resolved
@deathy
Copy link
Contributor Author

deathy commented Jul 28, 2023

RegExTest#backRefTest seems to be pretty extensive but only for BackRef not CIBackRef

I saw one CIBackRef related test added in #7501 but it's very simple/specific.

I triggered test failure locally by duplicating backRefTest 1-9 loop with (?i) in pattern so yes it seems like CIBackRef needs same change.

But not sure about the test, duplicating loop seems odd. Maybe entire RegExTest#backRefTest needs to be duplicated into a ciBackRefTest with all patterns preprended with (?i) ? Wouldn't look too clean.

@deathy
Copy link
Contributor Author

deathy commented Jul 28, 2023

Made changes also in CIBackRef and copied/changed test into new ciBackRefTest

Not pretty since changes to one could miss the other, but all patterns are different. (for what it's worth tests pass...)

There's also a special case with supplementary character tests since toSupplementaries given (?i) generates invalid pattern, so I had to change those like:
pattern = Pattern.compile("(?i)" + toSupplementaries("(a*)bc\\1"));

Definitely needs a close look by a regex expert.

Copy link
Contributor

@rgiulietti rgiulietti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good.
However, I'm not a Reviewer.

src/java.base/share/classes/java/util/regex/Pattern.java Outdated Show resolved Hide resolved
src/java.base/share/classes/java/util/regex/Pattern.java Outdated Show resolved Hide resolved
@deathy
Copy link
Contributor Author

deathy commented Aug 3, 2023

Thanks. I committed changes suggested for if conditions.

@deathy
Copy link
Contributor Author

deathy commented Aug 10, 2023

@shipilev anything I should do to continue here?

@shipilev
Copy link
Member

@shipilev anything I should do to continue here?

Not really, we need more reviewers.

@igraves
Copy link
Member

igraves commented Aug 15, 2023

This looks good to me, too. Like @rgiulietti I'm also not a reviewer.

Tagging in @stuart-marks and @RogerRiggs for reviewer status.

@shipilev
Copy link
Member

@deathy, please merge from master to get the clean testing.

@RogerRiggs
Copy link
Contributor

Looks good.
Non-"R"-reviewers can review and approve (and get recognition for reviewing).

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openjdk
Copy link

openjdk bot commented Aug 16, 2023

@deathy This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8311939: Excessive allocation of Matcher.groups array

Reviewed-by: rriggs, igraves

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 14 new commits pushed to the master branch:

  • ed585d1: 8314280: StructuredTaskScope.shutdown should document that the state of completing subtasks is not defined
  • 6f1071f: 8314213: DocLint should warn about unknown standard tags
  • 4331193: 8314423: Multiple patterns without unnamed variables
  • 249dc37: 8314321: Remove unused field jdk.internal.util.xml.impl.Attrs.mAttrIdx
  • b78f5a1: 8314076: ICC_ColorSpace#minVal/maxVal have the opposite description
  • 2a1176b: 8314276: Improve PtrQueue API around size/capacity
  • 0c3bc71: 8281169: Expand discussion of elements and types
  • f143380: 8314240: test/jdk/sun/security/pkcs/pkcs7/SignerOrder.java fails to compile
  • 6b396da: 8062795: (fs) Files.setPermissions requires read access when NOFOLLOW_LINKS specified
  • 7b28d36: 8314330: java/foreign tests should respect vm flags when start new processes
  • ... and 4 more: https://git.openjdk.org/jdk/compare/b80001de0c0aeedeb412430660a4727fc26be98b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev, @rgiulietti, @RogerRiggs, @igraves) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 16, 2023
@deathy
Copy link
Contributor Author

deathy commented Aug 17, 2023

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Aug 17, 2023
@openjdk
Copy link

openjdk bot commented Aug 17, 2023

@deathy
Your change (at version 9e33c2e) is now ready to be sponsored by a Committer.

@rgiulietti
Copy link
Contributor

/sponsor

@openjdk
Copy link

openjdk bot commented Aug 17, 2023

Going to push as commit 32efd23.
Since your change was applied there have been 14 commits pushed to the master branch:

  • ed585d1: 8314280: StructuredTaskScope.shutdown should document that the state of completing subtasks is not defined
  • 6f1071f: 8314213: DocLint should warn about unknown standard tags
  • 4331193: 8314423: Multiple patterns without unnamed variables
  • 249dc37: 8314321: Remove unused field jdk.internal.util.xml.impl.Attrs.mAttrIdx
  • b78f5a1: 8314076: ICC_ColorSpace#minVal/maxVal have the opposite description
  • 2a1176b: 8314276: Improve PtrQueue API around size/capacity
  • 0c3bc71: 8281169: Expand discussion of elements and types
  • f143380: 8314240: test/jdk/sun/security/pkcs/pkcs7/SignerOrder.java fails to compile
  • 6b396da: 8062795: (fs) Files.setPermissions requires read access when NOFOLLOW_LINKS specified
  • 7b28d36: 8314330: java/foreign tests should respect vm flags when start new processes
  • ... and 4 more: https://git.openjdk.org/jdk/compare/b80001de0c0aeedeb412430660a4727fc26be98b...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 17, 2023
@openjdk openjdk bot closed this Aug 17, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Aug 17, 2023
@openjdk
Copy link

openjdk bot commented Aug 17, 2023

@rgiulietti @deathy Pushed as commit 32efd23.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
5 participants