Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8259074: regex benchmarks and tests #1940

Conversation

@Martin-Buchholz
Copy link
Member

@Martin-Buchholz Martin-Buchholz commented Jan 5, 2021


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1940/head:pull/1940
$ git checkout pull/1940

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jan 5, 2021

👋 Welcome back martin! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Jan 5, 2021

@Martin-Buchholz The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs label Jan 5, 2021
@Martin-Buchholz Martin-Buchholz force-pushed the Martin-Buchholz:JDK-8259074-regex-benchmarks branch from 652767f to cf0922f Jan 30, 2021
@Martin-Buchholz Martin-Buchholz marked this pull request as ready for review Jan 30, 2021
@openjdk openjdk bot added the rfr label Jan 30, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 30, 2021

@Martin-Buchholz
Copy link
Member Author

@Martin-Buchholz Martin-Buchholz commented Jan 30, 2021

@amalloy - you are invited to comment on regex content
@cl4es @shipilev - you are invited to point out my jmh bad practices

1111111
false 1

// Unary power of two (8), reluctant quantifier

This comment has been minimized.

@amalloy

amalloy Jan 30, 2021
Contributor

I think the comment here is meant as "Giving a power of 2 as input to the primarily tester", but coming right after a unary primality tester commented with "Unary prime", it sounds like you are labeling this as a power-of-2 detector.

assert ! simple_find();
assert ! possessive_find();
assert ! possessive2_find();
assert ! possessive2_matches();
assert ! possessive3_find();
assert ! lookBehind_find();
assert ! find_loop_two_matchers();
assert ! find_loop_usePattern();
Comment on lines 112 to 119

This comment has been minimized.

@shipilev

shipilev Jan 31, 2021
Contributor

At a risk of muddling the profiles a bit, this can be rewritten as:

 if (!simple_find()) throw new IllegalStateException("simple_find is incorrect");
 ...

This comment has been minimized.

@Martin-Buchholz

Martin-Buchholz Feb 1, 2021
Author Member

Thanks Aleksey. But I was hoping for something more magical.

We really want the checking of the result of the benchmark method to be co-located in the source code with the method, but with zero disturbance to the benchmark numbers. Is such magic possible?

This comment has been minimized.

@Martin-Buchholz

Martin-Buchholz Feb 2, 2021
Author Member

I refactored the correctness checking, no longer using the assert facility.

@cl4es
Copy link
Member

@cl4es cl4es commented Feb 1, 2021

The assertion discussion aside, the micros look fine to me.

With an eye towards reducing total run time I'd ask you to consider if all parameter combinations are useful or if we can get the same value after some pruning.

@Martin-Buchholz
Copy link
Member Author

@Martin-Buchholz Martin-Buchholz commented Feb 1, 2021

@stuart-marks you are invited to review

@Martin-Buchholz
Copy link
Member Author

@Martin-Buchholz Martin-Buchholz commented Feb 1, 2021

@cl4es I agree pruning is a good idea. I settled on 3 data points with 16x separations as good enough to clearly show the difference between O(1) O(N) O(N^2) and O(2^N) (although O(2^N) would "run forever").

(although ... please tell me you're not actually running these benchmarks in an automated fashion ... too expensive, and needs a human to interpret the results)

@mlbridge
Copy link

@mlbridge mlbridge bot commented Feb 2, 2021

Mailing list message from Claes Redestad on core-libs-dev:

On 2021-02-01 21:54, Martin Buchholz wrote:

On Mon, 1 Feb 2021 10:22:14 GMT, Claes Redestad <redestad at openjdk.org> wrote:

@amalloy - you are invited to comment on regex content
@cl4es @shipilev - you are invited to point out my jmh bad practices

The assertion discussion aside, the micros look fine to me.

With an eye towards reducing total run time I'd ask you to consider if all parameter combinations are useful or if we can get the same value after some pruning.

@cl4es I agree pruning is a good idea. I settled on 3 data points with 16x separations as good enough to clearly show the difference between O(1) O(N) O(N^2) and O(2^N) (although O(2^N) would "run forever").

(although ... please tell me you're not actually running these benchmarks in an automated fashion ... too expensive, and needs a human to interpret the results)

No, we don't automatically pick up and run new microbenchmarks, but I
still think it's good if the checked in parameter combinations are a
reasonable approximation of what _someone_ should be running every now
and then.

A manual exploration of a new set of micros would naturally start with
the default config, so if such a config runs forever, that would be poor
ergonomics IMHO. I don't think such configurations should be checked in
in an active state. I'd opt for a comment on how to produce a regex that
would exponentially backtrack in such ways, for those who might want to
explore such corner cases.

(It would be nice with some JMH analogue of jtreg's "manual", so
that one can mark benchmarks that should are excluded from automatic or
wildcard executions. Maybe there exists some way to do this already?)

@Martin-Buchholz
Copy link
Member Author

@Martin-Buchholz Martin-Buchholz commented Feb 2, 2021

A manual exploration of a new set of micros would naturally start with
the default config, so if such a config runs forever, that would be poor
ergonomics IMHO. I don't think such configurations should be checked in
in an active state.

We're actually in agreement. There's no actual O(2^N) operation here, and jmh is similar to jtreg in having timeouts indicating failure.

Although I've been using shell loops as included in the class comments, I'll make sure running the tests using all the defaults gives sensible results.

I'm surprised to see jmh use the same number (5) of warmup and measurement iterations. Unless you're looking for very small effects, one warmup run should be sufficient.

@Martin-Buchholz
Copy link
Member Author

@Martin-Buchholz Martin-Buchholz commented Feb 2, 2021

I added annotations for sensible (faster) default, including @WarmUp(iterations = 1)

@cl4es
Copy link
Member

@cl4es cl4es commented Feb 2, 2021

I added annotations for sensible (faster) default, including @WarmUp(iterations = 1)

Perhaps just a personal preference.. if you instead tune down time per iteration ( @Warmup(iterations = 5, time = 2)) you would get the same warmup time budget but also some immediate feedback that the code is stabilizing well before switching over to measurement iterations.

@cl4es
cl4es approved these changes Feb 2, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Feb 2, 2021

@Martin-Buchholz This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8259074: regex benchmarks and tests

Reviewed-by: redestad

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 23 new commits pushed to the master branch:

  • 189b65b: 8260264: Move common os_ inline methods to a common posix source file
  • 288a4fe: 8260643: Remove parallel version handling in CardTableRS::younger_refs_in_space_iterate()
  • ddd2951: 8260571: Add PrintMetaspaceStatistics to print metaspace statistics upon VM exit
  • fe407cf: 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint
  • 474dba2: 8257086: Clarify differences between {Float, Double}.equals and ==
  • 54e7a64: 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java
  • a6d9505: 8260864: ProblemList two security/krb5 tests on Linux
  • 9880c4c: 8260860: ProblemList tools/jlink/plugins/CompressorPluginTest.java
  • 55d62a5: 8213226: [TESTBUG] Reduce the usage of CDSTestUtils.executeAndLog()
  • b6a7367: 8260349: Cannot programmatically retrieve Metaspace max set via JAVA_TOOL_OPTIONS
  • ... and 13 more: https://git.openjdk.java.net/jdk/compare/a61ff87cd49c0a73be4aeddba154667405421bc9...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Feb 2, 2021
@Martin-Buchholz
Copy link
Member Author

@Martin-Buchholz Martin-Buchholz commented Feb 8, 2021

@cl4es Your comment sent me back to jmh re-education camp. I never looked at warmup iterations numbers before, but I now see they can be very useful for eyeballing how long warmup takes. Very JVM-dependent of course. In practice I will stick with my rule of thumb that 10 seconds of warmup is good enough for any simple program, and that happens to be the JMH warmup time default. OTOH I think more than one warmup iteration is overkill for most benchmarks.

@Martin-Buchholz
Copy link
Member Author

@Martin-Buchholz Martin-Buchholz commented Feb 8, 2021

/integrate

@openjdk openjdk bot closed this Feb 8, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Feb 8, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Feb 8, 2021

@Martin-Buchholz Since your change was applied there have been 118 commits pushed to the master branch:

  • d6d5d9b: 8261231: Windows IME was disabled after DnD operation
  • 29a428f: 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3
  • 48c932e: 8231286: HTML font size too large with high-DPI scaling and W3C_LENGTH_UNITS
  • dbc35f6: 8261094: Open javax/swing/text/html/CSS/4765271/bug4765271.java
  • db0ca2b: 8261161: Clean up warnings in hotspot/jtreg/vmTestbase tests
  • 2c28e36: 8237352: Update DatagramSocket to add support for joining multicast groups
  • d0a8f2f: 8260593: javac can skip a temporary local variable when pattern matching over a local variable
  • deb0544: 8261251: Shenandoah: Use object size for full GC humongous compaction
  • d45343e: 8260899: ARM32: SyncOnValueBasedClassTest fails with assert(is_valid()) failed: invalid register
  • 9d59dec: 8248876: LoadObject with bad base address created for exec file on linux
  • ... and 108 more: https://git.openjdk.java.net/jdk/compare/a61ff87cd49c0a73be4aeddba154667405421bc9...master

Your commit was automatically rebased without conflicts.

Pushed as commit 351d788.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants