New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TaskRunner's concurrent performance exhibits a significant decline in case of poor networks conditions #8388
Comments
How to reproduce: calling a external API which can only serve limited tcp and takes a long time to create each tcp |
Can you show a stack trace or the thread that holds the lock? Or attach the entire jstack output |
Can you point to other profiling showing the slowdown? As far as locks vs synchronised. I looked at this recently #8366 It seems most detailed investigations show for contended work using Locks is the more performant option. So before switching to synchronized, do you have some links or benchmarks showing the benefit of synchronized? |
This particular lock is not expected to be held for more than a few cycles – every acquire reads & writes a few lock-guarded fields and then releases the lock. But under highly-concurrent use, the fact that we’re using locking at all is likely to cause some contention. Can you tell me how many concurrent calls are you making? If it’s not very many (say, 10 or fewer) and you’re seeing this problem, then it’s likely we’ve got a bug in our implementation. We’re holding the lock longer than we’re expecting to. If you’re doing lots of concurrent calls and this lock is a bottleneck, then I think we should invest in find getting a lock-free alternative to TaskRunner; either by writing one or by using something else in the JDK. |
@yschimke Hi ,this is the full stacktrace , we can see that threads were blocked at
running on okhttpclient alpha-14
I didn't wish to change back to synchronized, sychronized is evan slower than lock~ |
Hi, @swankjesse it's about 4 concurrent calls. But as mentioned before, the network of external server is not stable and reliable, sometimes the server will take a long time to ack TCP. |
Thanks for the stacktrace, it's really helpful. Apologies for not taking this so seriously initially, it does look like a legit issue. The move from synchronized to ReentrantLock, does lose us the Thread dump info of who holds the lock. I can't see any thread in your trace that would have the lock, and is blocked. I can however trivially reproduce what you see, not by slowdown, but by taking a lock and just not releasing it. but it sounds like that is not what you see. I'll try to reproduce this with some additional tracing and socket delays. The atomic option would be to give you a slightly tweaked build with some tracing to capture the thread that has the lock at these times. |
I'm not seeing something concrete. In the creative testing I'm doing, the two things holding locks and taking a non trivial amount of time are TaskRunner.awaitTaskToRun I'll see if I can work out where it's going. RealBackend.execute seems to be occasionally taking 1-3 ms on my laptop. But I'm really not confident here, or what I'm seeing. |
I think we might just be missing a
|
@swankjesse I think you know this code a lot better than I do. I'll leave it to you. let me know if I can help. |
Maybe the |
My hypothesis is that when TaskRunner is busy, it potentially launches unnecessary threads here: okhttp/okhttp/src/main/kotlin/okhttp3/internal/concurrent/TaskRunner.kt Lines 208 to 211 in aede7c5
// Also start another thread if there's more work or scheduling to do.
if (multipleReadyTasks || !coordinatorWaiting && readyQueues.isNotEmpty()) {
backend.execute(this@TaskRunner, runnable)
} These threads exit quickly, but they contend with the lock in the process. And that exacerbates the problem. Fixing this code to start only the threads it needs should be a straightforward fix. |
We've got a race where we'll start a thread when we need one, even if we've already started a thread. This changes TaskRunner's behavior to never add a thread if we're still waiting for a recently-added one to start running. This is intended to reduce the number of threads contenting for the TaskRunner lock as reported in this issue: #8388
We've got a race where we'll start a thread when we need one, even if we've already started a thread. This changes TaskRunner's behavior to never add a thread if we're still waiting for a recently-added one to start running. This is intended to reduce the number of threads contenting for the TaskRunner lock as reported in this issue: #8388
Our server needs to call some external API, and the stablity of API providers is quite poor and not reliable.
And we found that during instances of network fluctuation, okHttp's internal
TaskRunner
will cause so many threads keep waiting for condition, finally resulting in all threads that have attempted HTTP calls becoming suspended. We can check the jstack pic belowAfter debugging we found that all
OkHttpClient
usesTaskRunner.INSTANCE
by the default.And nearly every method in
TaskRunner
acquire a instance level lock.In another word, all instances of
OkHttpClient
configured as default will competing for the same lock instance in some scenarios, especially when external's network is poor and not reliable.Is there any plan to improve the concurrent performance of
TaskRunner
? e.g. use atomic values rather than ReentrantLockThe text was updated successfully, but these errors were encountered: