Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excess contention in ExecutorService #2118

Open
carl-mastrangelo opened this issue Aug 2, 2016 · 8 comments
Open

Excess contention in ExecutorService #2118

carl-mastrangelo opened this issue Aug 2, 2016 · 8 comments
Milestone

Comments

@carl-mastrangelo
Copy link
Contributor

When profiling a client with 200K active RPCs, there is a point of contention on the Executor. Each RPC gets its own SerializingExecutor, which executes work on an underlying executor. Currently, that executor is ThreadPoolExecutor in almost all cases, which itself has a BlockingQueue. That queue is heavily contended showing up at minutes of wasted time:

141.17mins 79.41% 79.41% 141.22mins 79.44%  java.util.concurrent.LinkedBlockingQueue.offer LinkedBlockingQueue.java
 36.51mins 20.54% 99.95%  36.52mins 20.54%  java.util.concurrent.LinkedBlockingQueue.take LinkedBlockingQueue.java

An idea to fix this is to have some sort of striping executor in order to prevent this contention from happening.

@carl-mastrangelo
Copy link
Contributor Author

@ejona86 Suggested using ForkJoinPool. While this is not possible in general due to being limited to Java 6 APIs, running a local server/ client with this does in fact reduce the contention.

@ejona86
Copy link
Member

ejona86 commented Aug 3, 2016

And it's unknown how much better ForkJoinPool does when receiving runnables from threads outside of the pool, but it seems worth a check.

@carl-mastrangelo
Copy link
Contributor Author

A spot check shows the contention gone, but QPS plummets to half (86kqps -> 42kqps). Run with:

carl-mastrangelo@f2ab548

@carl-mastrangelo
Copy link
Contributor Author

Hmmm, running on a 32x thread machine it does speed up a lot. Maybe there is a threshold.

@buchgr
Copy link
Collaborator

buchgr commented Aug 3, 2016

On how many cores and for how long did this client run? Also, I believe those numbers might be cumulative over all threads. So if say 32 threads are trying to add to the queue concurrently and one gets the lock for 100 micros, then that means 3.1millis of contention.

@carl-mastrangelo
Copy link
Contributor Author

@buchgr The numbers are cumulative. It is a 32core client talking to a 32core server. I can't recall if I looked at the client or the server, but since they both use the executor in the same way it doesn't matter which.

The contention profiler records how long a thread waits on a lock to become available and also acquires it. (So the thread that holds the lock and releases it will not be recorded).

Running last night with FJP showed a 3x perf jump (~460kqps) so this contention matters a lot in high qps cases.

@carl-mastrangelo carl-mastrangelo changed the title Extreme contention in ExecutorService Excess contention in ExecutorService Aug 3, 2016
@buchgr
Copy link
Collaborator

buchgr commented Aug 9, 2016

It might be worth mentioning that Netty backported the FJP, so that it can be used with Java 1.6 https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/chmv8/ForkJoinPool.java

We could check if Netty's FJP is there and if so use it?

@carl-mastrangelo
Copy link
Contributor Author

@buchgr FJP depends heavily on the number of available cores available for use. For example, running on a 32core machine, but under 50% load from other processes, FJP does worse at parallelism level 32 than 16. Picking the number to be too high or too low causes painful performance swings, so it would be hard to set it as a default.

Also, blocking calls are going to make it act poorly. Only Future / Async Clients (and servers) really benefit from it. It's a good optimization, but only after recognizing it as applicable to the use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants