-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid locking for put methods for RE2. Fixes #46 #121
Conversation
A commit message for a change to introduce a lock-free concurrent data structure needs to say a lot more about background, problem, alternative solutions, technical explanation, and benchmarks than the commit message for this change does. |
dd7bfda
to
a9da081
Compare
I've updated the commit description. I am not sure how to resolve the GWT failure. |
java/com/google/re2j/RE2.java
Outdated
@@ -213,25 +214,43 @@ int numberOfCapturingGroups() { | |||
// get() returns a machine to use for matching |this|. It uses |this|'s | |||
// machine cache if possible, to avoid unnecessary allocation. | |||
Machine get() { | |||
// Treiber stack (if reusing nodes) suffers from ABA problem on pop. Avoid by unlinking the | |||
// entire stack, and stashing it in a pop-only stack guarded by a lock. This also reduces | |||
// contention on the AtomicReference between putters and the getter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduces contention
How is that? Calls to get must incur a semaphore (exclusive lock).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the two stacks and batch moves between them means that there's reduced CAS traffic on the AtomicReference, because getters either do a read (empty stack) or pop from their own stack, or import a whole batch, which amortizes the CAS on get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but before reaching the read, they must first enter a synchronized block, whose semaphore requires an atomic decrement. How does the performance compare to a single stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't use a single stack without allocating a new object per put(). That would make the whole system lock-free, but would also mean allocating on every call (once there's more than one concurrent call ever).
Or I could use a single stack and guard the pop section with a lock - but this would mean locking and atomic operations on every get() call. You would, however, be able to use a double-checked-locking approach to avoid synchronization if the pool is empty:
Machine head = machine.get();
if (head != null) {
// Lock necessary here because we're reusing nodes - otherwise suffer from ABA problem
synchronized (this) {
while (true) {
head = machine.get();
if (head == null) {
break;
}
if (machine.compareAndSet(head, head.next)) {
head.next = null;
return head;
}
head = machine.get();
}
}
}
That approach felt a bit more complicated and seemed like more of a departure from the existing code, but if you prefer it I can use it instead. Actual performance cost depends on target architecture, amount of contention, etc. The checked-in benchmarks for RE2 don't exercise threads, and even if there were concurrent benchmarks it'd be hard to generalize them across platforms.
With my change:
Without:
Looks like an overall improvement, even for the single thread case. Avoiding the lock and avoiding the arraydeque bookkeeping probably helps. |
Nice; looks like about a 10% improvement across the board in the single threaded case. That probably paid for your time in CPUs already, never mind contention. : ) I don't see the latest changes we discussed yet. Ping me when they're ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; thanks for the optimization, and for your patience.
Any updates on blockers for this CL? |
This change should work against gwt 2.9.0, which you can specify in the |
Google-wide-profiling indicated that this was a significant source of java lock contention. This new approach uses a Treiber stack to make adding an operation back into the pool a lock-free operation. It uses the existing objects as nodes in the linked stack - the Treiber stack suffers from an ABA problem when popping if nodes are reused, so removing an item from the pool is done by moving the whole stack to a simple linked stack guarded by the existing lock. The locking could be avoided entirely by allocating wrapper objects for the stack nodes, but I'm not sure if that's desirable, since the goal of the pool was to avoid allocation.
69a614f
to
82b62cd
Compare
Updated, and tests now pass. |
The locking could be avoided entirely by allocating wrapper objects for the stack nodes, but I'm not sure if that's desirable. I've also provided another approach that is more complex but offers lock-freedom when the cache is empty.