Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hibernate Reactive: ReactiveSessionProducer blocks event loop thread #18268

Closed
markusdlugi opened this issue Jun 30, 2021 · 17 comments · Fixed by #18458
Closed

Hibernate Reactive: ReactiveSessionProducer blocks event loop thread #18268

markusdlugi opened this issue Jun 30, 2021 · 17 comments · Fixed by #18458
Labels
area/hibernate-reactive area/persistence env/windows Impacts Windows machines kind/bug Something isn't working
Milestone

Comments

@markusdlugi
Copy link
Contributor

markusdlugi commented Jun 30, 2021

Describe the bug

Most likely since Quarkus 2.0.0.Alpha2 with the introduction of these changes, Hibernate Reactive can sometimes block the event loop thread forever if injecting Mutiny.Session via the ReactiveSessionProducer, making the entire application unresponsive.

Expected behavior

Hibernate Reactive should properly dispose its sessions without blocking.

Actual behavior

It is actually blocking the event loop thread. See the following thread dump from one of our applications. The thread remains in this state forever. As a consequence, the entire application cannot handle any incoming requests anymore.

"vert.x-eventloop-thread-1" #25 prio=5 os_prio=0 cpu=3312.11ms elapsed=815.13s tid=0x00007f778f573000 nid=0x2f waiting on condition  [0x00007f772698d000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@11.0.9.1/Native Method)
    - parking to wait for  <0x00000000ffadedf8> (a java.util.concurrent.CompletableFuture$Signaller)
    at java.util.concurrent.locks.LockSupport.park(java.base@11.0.9.1/LockSupport.java:194)
    at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.9.1/CompletableFuture.java:1796)
    at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.9.1/ForkJoinPool.java:3128)
    at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.9.1/CompletableFuture.java:1823)
    at java.util.concurrent.CompletableFuture.join(java.base@11.0.9.1/CompletableFuture.java:2043)
    at io.smallrye.context.CompletableFutureWrapper.join(CompletableFutureWrapper.java:166)
    at io.quarkus.hibernate.reactive.runtime.ReactiveSessionProducer.disposeMutinySession(ReactiveSessionProducer.java:54)
    at io.quarkus.hibernate.reactive.runtime.ReactiveSessionProducer_ProducerMethod_createMutinySession_1321d110ee9e92bda147899150401e0a136779c7_Bean.destroy(ReactiveSessionProducer_ProducerMethod_createMutinySession_1321d110ee9e92bda147899150401e0a136779c7_Bean.zig:180)
    at io.quarkus.hibernate.reactive.runtime.ReactiveSessionProducer_ProducerMethod_createMutinySession_1321d110ee9e92bda147899150401e0a136779c7_Bean.destroy(ReactiveSessionProducer_ProducerMethod_createMutinySession_1321d110ee9e92bda147899150401e0a136779c7_Bean.zig:212)
    at io.quarkus.arc.impl.InstanceHandleImpl.destroyInternal(InstanceHandleImpl.java:90)
    at io.quarkus.arc.impl.ContextInstanceHandleImpl.destroy(ContextInstanceHandleImpl.java:20)
    at io.quarkus.arc.impl.RequestContext.destroyContextElement(RequestContext.java:184)
    at io.quarkus.arc.impl.RequestContext$$Lambda$538/0x0000000100730c40.accept(Unknown Source)
    at java.util.concurrent.ConcurrentHashMap.forEach(java.base@11.0.9.1/ConcurrentHashMap.java:1603)
    at io.quarkus.arc.impl.RequestContext.destroy(RequestContext.java:170)
    - locked <0x00000000ffa55538> (a java.util.concurrent.ConcurrentHashMap)
    at io.quarkus.arc.impl.RequestContext.destroy(RequestContext.java:154)
    at io.quarkus.resteasy.reactive.common.runtime.ArcThreadSetupAction$1.close(ArcThreadSetupAction.java:27)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.close(AbstractResteasyReactiveContext.java:101)
    at org.jboss.resteasy.reactive.server.core.ResteasyReactiveRequestContext.close(ResteasyReactiveRequestContext.java:358)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:182)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:829)

To Reproduce

Haven't yet been able to create a reproducer, but most likely happens when some failure happens in a reactive chain where HR was previously used (not necessarily in Hibernate itself).

Configuration

# Add your application.properties here, if applicable.

Screenshots

(If applicable, add screenshots to help explain your problem.)

Environment (please complete the following information):

Output of uname -a or ver

Microsoft Windows [Version 10.0.18363.1556]

Output of java -version

OpenJDK 64-Bit Server VM Corretto-11.0.10.9.1 (build 11.0.10+9-LTS, mixed mode)

GraalVM version (if different from Java)

N/A

Quarkus version or git rev

2.0.0.Final

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.6.3

Additional context

(Add any other context about the problem here.)

@markusdlugi markusdlugi added the kind/bug Something isn't working label Jun 30, 2021
@quarkus-bot
Copy link

quarkus-bot bot commented Jun 30, 2021

/cc @DavideD, @Sanne, @gavinking

@Sanne
Copy link
Member

Sanne commented Jun 30, 2021

Very interesting, thanks @markusdlugi !

@Sanne
Copy link
Member

Sanne commented Jul 5, 2021

@cescoffier . @gavinking I'm going to need a suggestion here, this seems to highlight multiple problems:

  • how to reliably test to catch such mistakes?

  • how to properly integrate a reactive resource with CDI? Specifically, ReactiveSessionProducer in Quarkus models a CDI @Dispose method , which by CDI spec needs to be modelled as a blocking operation. The Hibernate Reactive Mutiny.Session is not returning a non-blocking Uni because Vert.x 4 decided closing a connection was being transformed into a reactive method as well.

Workaround

I honestly don't know how to solve this in a clean way; I'm thinking to apply the following should-work bandaid:

rather than converting this into a CompletionStage, I'll subscribe on it (not sure which kind to subscription to use), and store a reference in a ThreadLocal.

Then any subsequent invocation on createMutinySession needs to be performed as subsequent event after the previous was closed; this is to prevent excessive connections being opened and avoid possible deadlocks on limited resources such as the connection pool.

WDYT?

I could also use some suggestions for testing; I'm going to play with the idea and measure impact under load.

@Sanne
Copy link
Member

Sanne commented Jul 5, 2021

Actually to go with my workaround I would also need to change the CDI producer from type Mutiny.Session to Uni<Mutiny.Session>, otherwise I now have the problem on opening.

@Sanne
Copy link
Member

Sanne commented Jul 5, 2021

so I have a POC which works fine, but injecting a Uni<Mutiny.Session> is a PITA in terms of usability; I converted some tests and even for these simple use cases it's no longer readable as now it's all deeply nested:

@Inject
Uni<Mutiny.Session> sessionUni;

sessionUni.chain(session -> session.withTransaction(tx -> session.doStuff ...... )))))))))))))))));

wouldn't anyone prefer injecting the SessionFactory instead: ?

@Inject
SessionFactory sessionFactory;

sessionFactory.withTransaction((session, tx) -> {   ....   ))));

So I'm inclined to proceed testing this change but expect this lookup strategy to be used by Panache exclusively; and perhaps we shouldn't expose this as a CDI bean but move this logic into Panache. cc/ @FroMage

Please let me know if you see a simpler way forward

@gavinking
Copy link

Vert.x 4 decided closing a connection was being transformed into a reactive method as well.

Damn, this change has caused multiple bugs, one at each level. Ouch.

ReactiveSessionProducer in Quarkus models a CDI @Dispose method , which by CDI spec needs to be modelled as a blocking operation.

So I think it would make sense to add some sort of non-blocking equivalent.

However, here what I would do is instead of using @Dispose, use a CDI interceptor which appends a stage to the stream. Would that work?

@gavinking
Copy link

wouldn't anyone prefer injecting the SessionFactory instead

I think that's what I was doing in my code examples.

@gavinking
Copy link

rather than converting this into a CompletionStage, I'll subscribe on it (not sure which kind to subscription to use), and store a reference in a ThreadLocal.

I'm not quite sure what you mean by this, but it sounds very fragile and possibly broken to me.

@gavinking
Copy link

wouldn't anyone prefer injecting the SessionFactory instead

I think that's what I was doing in my code examples.

See: quarkusio/quarkus-quickstarts#871

@Sanne
Copy link
Member

Sanne commented Jul 5, 2021

However, here what I would do is instead of using @dispose, use a CDI interceptor which appends a stage to the stream. Would that work?

I wish.. we don't have a stream since the CDI entry points are blocking - or did I not understand your idea?

wouldn't anyone prefer injecting the SessionFactory instead

I think that's what I was doing in my code examples.

Yeah me too: and that's why I'm raising this to you and Clement: the notions of contexts from CDI doesn't integrate well.

FWIW I was originally planning to remove all reactive X.Session producers but the way Panache works you need a notion of "the current session", for those static methods injected on the entities to make sense.

rather than converting this into a CompletionStage, I'll subscribe on it (not sure which kind to subscription to use), and store a reference in a ThreadLocal.

I'm not quite sure what you mean by this, but it sounds very fragile and possibly broken to me.

Sure I hate it. Which is why I'm referring to it as "bandaid" and workaround - IMO a proper solution would require to design of proper semantics at ARC / CDI level, for example to allow declaring reactive versions of both producers and disposers - which the implementation would need to wire up correctly with other components on the stack, so that assuming the other ones are also reactive it all works well. (Not sure what to do when some do not but that's a different discussion also addressed on the mailing list).

However, consider that the request scope is modelled as a simple threadlocal in ArC - and these are request scoped components. So while I agree it's fragile and horrible, it could work fine until the greater design issue is addressed?

@gavinking
Copy link

or did I not understand your idea?

in an interceptor

@gavinking
Copy link

gavinking commented Jul 5, 2021

the notions of contexts from CDI doesn't integrate well.

I think the notions work fine. I think we simply have not integrated them well, because we have not adjusted our thinking to what contexts make sense in a reactive invocation. We're trying to just literalistically carry ideas directly over from servlets and of course they don't make sense because the threading model is just different. I was talking about this with Max and Emmanuel today.

allow declaring reactive versions of both producers and disposers

That is probably better, but it still seems to me that you can do this in an interceptor. (Just like you can do tx management in an interceptor.)

@gavinking
Copy link

However, consider that the request scope is modelled as a simple threadlocal in ArC - and these are request scoped components.

Well as I've been advocating in a different place, as long as the "request context" maps directly to a Vert.x context, we're fine, I think.

@FroMage
Copy link
Member

FroMage commented Jul 6, 2021

I don't mind at all if Panache gets its session from somewhere else than this producer, I just didn't want to introduce a second storage for it as long as it existed.

@Sanne
Copy link
Member

Sanne commented Jul 6, 2021

I've taken a step back from the more complicated solution I mentioned early and went with this:

I believe it should work because we technically don't really need to wait for this to have happened in such an aggressive way: the availability of connections on the pool will provide a natural ordering, prioritizing operations which close sessions over opening when saturated.

@quarkus-bot quarkus-bot bot added this to the 2.1 - main milestone Jul 6, 2021
@Sanne
Copy link
Member

Sanne commented Jul 6, 2021

I don't mind at all if Panache gets its session from somewhere else than this producer, I just didn't want to introduce a second storage for it as long as it existed.

yea I think that's reasonable. If the producer exists from CDI we should make sure that Panache uses the same Session instance.

On the other hand, I'll consider removing all CDI producers and provide an alternative for Panache. Creating a new Session and storing one in the Vertx context is easy.. the tricky part is to define when the scope should end.

@Sanne
Copy link
Member

Sanne commented Jul 6, 2021

(I've just closed this issue BTW)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/hibernate-reactive area/persistence env/windows Impacts Windows machines kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants