Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock with virtual threads #849

Closed
sakuuj opened this issue Jul 25, 2024 · 2 comments
Closed

Deadlock with virtual threads #849

sakuuj opened this issue Jul 25, 2024 · 2 comments
Labels
Area: Transport Category: Bug Something isn't working

Comments

@sakuuj
Copy link

sakuuj commented Jul 25, 2024

Java API client version

8.13.4

Java version

21

Elasticsearch Version

8.14.3

Problem description

I have been testing performance of ElasticsearchOperations (Spring Data Elasticsearch, uses org.elasticsearch.client.RestClient underneath), by calling my controller endpoints and noticed a deadlock when using virtual threads and a custom IOReactorConfig with IoThreadCount set to a low number (1 in the current example). I used hatoo/oha and set a number of concurrent requests and a number of total requests to a 100. And there was a deadlock. As I found out after checking the thread dump from jcmd, pinned virtual threads were in a synchronized block:

#773 "tomcat-handler-715" virtual
      java.base/java.lang.Object.wait0(Native Method)
      java.base/java.lang.Object.wait(Object.java:366)
      java.base/java.lang.Object.wait(Object.java:339)
      org.apache.http.concurrent.BasicFuture.get(BasicFuture.java:82)
      org.apache.http.impl.nio.client.FutureWrapper.get(FutureWrapper.java:70)
      org.elasticsearch.client.RestClient.performRequest(RestClient.java:300)
      ...

and some of the unmounted virtual threads, were waiting for the lock in the AbstractNIOConnPool:

#779 "tomcat-handler-721" virtual
      java.base/java.lang.VirtualThread.park(VirtualThread.java:596)
      java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
      java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:54)
      java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
      java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
      java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
      java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
      java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
      org.apache.http.nio.pool.AbstractNIOConnPool.lease(AbstractNIOConnPool.java:278)
      ...

And the elasticsearch thread that should notify threads waiting in a synchronized block, was also blocked by the ReentrantLock in the AbstractNIOConnPool:

#36 "elasticsearch-rest-client-0-thread-2"
      java.base/jdk.internal.misc.Unsafe.park(Native Method)
      java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
      java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
      java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
      java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
      java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
      org.apache.http.nio.pool.AbstractNIOConnPool.release(AbstractNIOConnPool.java:344)
     ...

Maybe it happened because:

  1. virtual threads that got in the AbstractNIOConnPool.lease where unmounted due to some I/O operations inside the method, but still holding the AbstractNIOConnPool ReentrantLock
  2. other virtual threads were mounted instead and got blocked by calling Object.wait().
  3. no more platform threads were left
  4. elasticsearch-rest-client-0-thread-2 could never acquire the needed lock again and was blocked forever.

Or maybe that is wrong and there were some other interleavings.
But the fact is that a deadlock was detected and it would be nice to change current RestClient implementation.

The tested program configurations:

  • dependencies:
    Spring Boot 3.3.1, Spring Data Elasticsearch, Spring Boot MVC
  • properties:
    -Djdk.virtualThreadScheduler.maxPoolSize=8
    server.tomcat.threads.max: 8
    spring.threads.virtual.enabled: true
  • other:
    IOReactorConfig with IoThreadCount set to 1

Hardware: a processor with 8 physical cores.

How to fix the problem, but lose throughput?
The problem could be fixed with wrapping all the methods of RestClient with ReentrantLock.lock, ReentrantLock.unlock statements (for example, using Spring AOP). At least there were no deadlocks detected in that case, when manually testing with hatoo/oha with 10_000 of concurrent requests. But such a solution degrades the overall throughput.

@l-trotta
Copy link
Contributor

l-trotta commented Aug 5, 2024

Hello! As you already figured out, this is a problem with the underlying RestClient used by the java client. Unfortunately, updating it probably won't be enough to fix this issue: many users have been reporting problems when using virtual threads and locks/synchronized blocks, this thread being one of many examples.
This is likely going to be fixed in Java 23, as explained in this JEP draft, until then we have no way of ensuring compatibility with virtual threads.

@l-trotta l-trotta added Area: Transport Category: Bug Something isn't working labels Aug 5, 2024
@sakuuj
Copy link
Author

sakuuj commented Aug 6, 2024

Hello! Thank you for your informative response.

@l-trotta l-trotta closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Transport Category: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants