Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid locking for put methods for RE2. Fixes #46 #121

Merged
merged 6 commits into from
Mar 11, 2021

Conversation

charlesmunger
Copy link
Contributor

The locking could be avoided entirely by allocating wrapper objects for the stack nodes, but I'm not sure if that's desirable. I've also provided another approach that is more complex but offers lock-freedom when the cache is empty.

@google-cla google-cla bot added the cla: yes label Oct 28, 2020
@alandonovan
Copy link
Contributor

A commit message for a change to introduce a lock-free concurrent data structure needs to say a lot more about background, problem, alternative solutions, technical explanation, and benchmarks than the commit message for this change does.

@charlesmunger
Copy link
Contributor Author

I've updated the commit description. I am not sure how to resolve the GWT failure.

java/com/google/re2j/Machine.java Show resolved Hide resolved
java/com/google/re2j/RE2.java Outdated Show resolved Hide resolved
java/com/google/re2j/RE2.java Outdated Show resolved Hide resolved
@@ -213,25 +214,43 @@ int numberOfCapturingGroups() {
// get() returns a machine to use for matching |this|. It uses |this|'s
// machine cache if possible, to avoid unnecessary allocation.
Machine get() {
// Treiber stack (if reusing nodes) suffers from ABA problem on pop. Avoid by unlinking the
// entire stack, and stashing it in a pop-only stack guarded by a lock. This also reduces
// contention on the AtomicReference between putters and the getter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduces contention

How is that? Calls to get must incur a semaphore (exclusive lock).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the two stacks and batch moves between them means that there's reduced CAS traffic on the AtomicReference, because getters either do a read (empty stack) or pop from their own stack, or import a whole batch, which amortizes the CAS on get.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but before reaching the read, they must first enter a synchronized block, whose semaphore requires an atomic decrement. How does the performance compare to a single stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't use a single stack without allocating a new object per put(). That would make the whole system lock-free, but would also mean allocating on every call (once there's more than one concurrent call ever).

Or I could use a single stack and guard the pop section with a lock - but this would mean locking and atomic operations on every get() call. You would, however, be able to use a double-checked-locking approach to avoid synchronization if the pool is empty:

    Machine head = machine.get();
    if (head != null) {
      // Lock necessary here because we're reusing nodes - otherwise suffer from ABA problem
      synchronized (this) {
        while (true) {
          head = machine.get();
          if (head == null) {
            break;
          }
          if (machine.compareAndSet(head, head.next)) {
            head.next = null;
            return head;
          }
          head = machine.get();
        }
      }
    }

That approach felt a bit more complicated and seemed like more of a departure from the existing code, but if you prefer it I can use it instead. Actual performance cost depends on target architecture, amount of contention, etc. The checked-in benchmarks for RE2 don't exercise threads, and even if there were concurrent benchmarks it'd be hard to generalize them across platforms.

java/com/google/re2j/RE2.java Outdated Show resolved Hide resolved
@charlesmunger
Copy link
Contributor Author

With my change:

Benchmark                           (impl)  (regex)  (repeats)  Mode  Cnt      Score      Error  Units
BenchmarkBacktrack.matched             JDK      N/A          5  avgt    5      0.546 ±    0.018  us/op
BenchmarkBacktrack.matched             JDK      N/A         10  avgt    5     18.374 ±    2.611  us/op
BenchmarkBacktrack.matched             JDK      N/A         15  avgt    5    699.405 ±   26.860  us/op
BenchmarkBacktrack.matched             JDK      N/A         20  avgt    5  27318.340 ± 3808.455  us/op
BenchmarkBacktrack.matched            RE2J      N/A          5  avgt    5      0.926 ±    0.049  us/op
BenchmarkBacktrack.matched            RE2J      N/A         10  avgt    5      2.876 ±    0.234  us/op
BenchmarkBacktrack.matched            RE2J      N/A         15  avgt    5      5.837 ±    0.156  us/op
BenchmarkBacktrack.matched            RE2J      N/A         20  avgt    5      9.333 ±    0.839  us/op
BenchmarkCompile.compile               JDK     DATE        N/A  avgt    5    657.609 ±   30.343  ns/op
BenchmarkCompile.compile               JDK    EMAIL        N/A  avgt    5    209.188 ±    4.231  ns/op
BenchmarkCompile.compile               JDK    PHONE        N/A  avgt    5    320.981 ±   13.061  ns/op
BenchmarkCompile.compile               JDK   RANDOM        N/A  avgt    5   1018.124 ±   18.428  ns/op
BenchmarkCompile.compile               JDK   SOCIAL        N/A  avgt    5    283.678 ±   12.677  ns/op
BenchmarkCompile.compile               JDK   STATES        N/A  avgt    5   1321.436 ±  141.171  ns/op
BenchmarkCompile.compile              RE2J     DATE        N/A  avgt    5   2649.448 ±   32.502  ns/op
BenchmarkCompile.compile              RE2J    EMAIL        N/A  avgt    5    959.094 ±   24.034  ns/op
BenchmarkCompile.compile              RE2J    PHONE        N/A  avgt    5   1513.050 ±  100.937  ns/op
BenchmarkCompile.compile              RE2J   RANDOM        N/A  avgt    5   5404.207 ±  118.983  ns/op
BenchmarkCompile.compile              RE2J   SOCIAL        N/A  avgt    5   1067.002 ±   11.441  ns/op
BenchmarkCompile.compile              RE2J   STATES        N/A  avgt    5   7525.129 ±  638.113  ns/op
BenchmarkFullMatch.matched             JDK      N/A        N/A  avgt    5    102.320 ±    7.576  ns/op
BenchmarkFullMatch.matched            RE2J      N/A        N/A  avgt    5    486.554 ±   22.257  ns/op
BenchmarkFullMatch.notMatched          JDK      N/A        N/A  avgt    5     61.846 ±    4.451  ns/op
BenchmarkFullMatch.notMatched         RE2J      N/A        N/A  avgt    5    437.695 ±   21.006  ns/op
BenchmarkSubMatch.findPhoneNumbers     JDK      N/A        N/A  avgt    5      3.030 ±    0.152  ms/op
BenchmarkSubMatch.findPhoneNumbers    RE2J      N/A        N/A  avgt    5     13.751 ±    1.253  ms/op

Without:

Benchmark                           (impl)  (regex)  (repeats)  Mode  Cnt      Score      Error  Units
BenchmarkBacktrack.matched             JDK      N/A          5  avgt    5      0.563 ±    0.023  us/op
BenchmarkBacktrack.matched             JDK      N/A         10  avgt    5     19.759 ±    0.452  us/op
BenchmarkBacktrack.matched             JDK      N/A         15  avgt    5    717.583 ±   20.566  us/op
BenchmarkBacktrack.matched             JDK      N/A         20  avgt    5  30465.382 ± 1059.643  us/op
BenchmarkBacktrack.matched            RE2J      N/A          5  avgt    5      1.052 ±    0.015  us/op
BenchmarkBacktrack.matched            RE2J      N/A         10  avgt    5      3.245 ±    0.226  us/op
BenchmarkBacktrack.matched            RE2J      N/A         15  avgt    5      6.825 ±    0.196  us/op
BenchmarkBacktrack.matched            RE2J      N/A         20  avgt    5     11.549 ±    1.129  us/op
BenchmarkCompile.compile               JDK     DATE        N/A  avgt    5    785.358 ±   20.005  ns/op
BenchmarkCompile.compile               JDK    EMAIL        N/A  avgt    5    255.705 ±   13.109  ns/op
BenchmarkCompile.compile               JDK    PHONE        N/A  avgt    5    385.573 ±   15.514  ns/op
BenchmarkCompile.compile               JDK   RANDOM        N/A  avgt    5   1195.869 ±  106.540  ns/op
BenchmarkCompile.compile               JDK   SOCIAL        N/A  avgt    5    343.272 ±   12.794  ns/op
BenchmarkCompile.compile               JDK   STATES        N/A  avgt    5   1569.633 ±  164.334  ns/op
BenchmarkCompile.compile              RE2J     DATE        N/A  avgt    5   3208.025 ±  225.822  ns/op
BenchmarkCompile.compile              RE2J    EMAIL        N/A  avgt    5   1136.255 ±   71.899  ns/op
BenchmarkCompile.compile              RE2J    PHONE        N/A  avgt    5   1803.507 ±  106.871  ns/op
BenchmarkCompile.compile              RE2J   RANDOM        N/A  avgt    5   6472.174 ±  236.211  ns/op
BenchmarkCompile.compile              RE2J   SOCIAL        N/A  avgt    5   1273.778 ±   17.144  ns/op
BenchmarkCompile.compile              RE2J   STATES        N/A  avgt    5   9010.883 ±  734.132  ns/op
BenchmarkFullMatch.matched             JDK      N/A        N/A  avgt    5    119.059 ±    6.297  ns/op
BenchmarkFullMatch.matched            RE2J      N/A        N/A  avgt    5    573.888 ±   15.052  ns/op
BenchmarkFullMatch.notMatched          JDK      N/A        N/A  avgt    5     69.659 ±    2.206  ns/op
BenchmarkFullMatch.notMatched         RE2J      N/A        N/A  avgt    5    515.374 ±   25.949  ns/op
BenchmarkSubMatch.findPhoneNumbers     JDK      N/A        N/A  avgt    5      3.509 ±    0.640  ms/op
BenchmarkSubMatch.findPhoneNumbers    RE2J      N/A        N/A  avgt    5     15.647 ±    2.617  ms/op

Looks like an overall improvement, even for the single thread case. Avoiding the lock and avoiding the arraydeque bookkeeping probably helps.

@alandonovan
Copy link
Contributor

Nice; looks like about a 10% improvement across the board in the single threaded case. That probably paid for your time in CPUs already, never mind contention. : )

I don't see the latest changes we discussed yet. Ping me when they're ready.

Copy link
Contributor

@alandonovan alandonovan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; thanks for the optimization, and for your patience.

java/com/google/re2j/Machine.java Show resolved Hide resolved
java/com/google/re2j/RE2.java Outdated Show resolved Hide resolved
java/com/google/re2j/RE2.java Outdated Show resolved Hide resolved
@charlesmunger
Copy link
Contributor Author

Any updates on blockers for this CL?

@charlesmunger charlesmunger changed the title Avoid locking for put methods for RE2. Avoid locking for put methods for RE2. Fixes #46 Mar 9, 2021
@sjamesr
Copy link
Contributor

sjamesr commented Mar 11, 2021

This change should work against gwt 2.9.0, which you can specify in the build.gradle file.

Charles Munger added 6 commits March 11, 2021 12:15
Google-wide-profiling indicated that this was a significant source of java lock contention. This new approach uses a Treiber stack to make adding an operation back into the pool a lock-free operation. It uses the existing objects as nodes in the linked stack - the Treiber stack suffers from an ABA problem when popping if nodes are reused, so removing an item from the pool is done by moving the whole stack to a simple linked stack guarded by the existing lock.

The locking could be avoided entirely by allocating wrapper objects for the stack nodes, but I'm not sure if that's desirable, since the goal of the pool was to avoid allocation.
@charlesmunger
Copy link
Contributor Author

Updated, and tests now pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants