Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing use of F.unsafeRunSync #2735

Closed
Dennis4b opened this issue Jul 21, 2019 · 7 comments
Closed

Reducing use of F.unsafeRunSync #2735

Dennis4b opened this issue Jul 21, 2019 · 7 comments
Labels
bug Determined to be a bug in http4s module:blaze-server

Comments

@Dennis4b
Copy link

(See also my last comment in #2695)

When running with a very low number of threads in order to provoke deadlocks that could manifest under load in production I run into a deadlock in websocket/Http4sWSStage.scala:

138   override protected def stageShutdown(): Unit = {                                                                         
139     F.toIO(deadSignal.set(true)).unsafeRunSync()                                                                           
140     super.stageShutdown()                                                                                                  
141   } 

where the unsafeRunSync is called, via stageShutdown all the way back to sendClose from within an effect, and can not synchronously run deadSignal.set as there are no more threads available:


"pool-1-thread-1" #15 prio=5 os_prio=0 tid=0x00007f09e6d75800 nid=0x73b0 waiting on condition [0x00007f0943ffe000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000fa9d0920> (a cats.effect.internals.IOPlatform$OneShotLatch)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at cats.effect.internals.IOPlatform$.$anonfun$unsafeResync$2(IOPlatform.scala:50)
        at cats.effect.internals.IOPlatform$$$Lambda$2210/25411324.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:57)
        at scala.concurrent.package$.blocking(package.scala:146)
        at cats.effect.internals.IOPlatform$.unsafeResync(IOPlatform.scala:50)
        at cats.effect.IO.unsafeRunTimed(IO.scala:325)
        at cats.effect.IO.unsafeRunSync(IO.scala:240)
        at org.http4s.blazecore.websocket.Http4sWSStage.stageShutdown(Http4sWSStage.scala:139)
        at org.http4s.blaze.pipeline.Tail.closePipeline(Stages.scala:78)
        at org.http4s.blaze.pipeline.Tail.closePipeline$(Stages.scala:77)
        at org.http4s.blazecore.websocket.Http4sWSStage.closePipeline(Http4sWSStage.scala:19)
        at org.http4s.blazecore.websocket.Http4sWSStage.$anonfun$stageStartup$1(Http4sWSStage.scala:116)
        at org.http4s.blazecore.websocket.Http4sWSStage$$Lambda$2222/996996997.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at zio.internal.FiberContext.evaluateNow(FiberContext.scala:410)
        at zio.internal.FiberContext.$anonfun$evaluateLater$1(FiberContext.scala:599)
        at zio.internal.FiberContext$$Lambda$1657/745424370.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Is there a reason this is done synchronously, as with:

138   override protected def stageShutdown(): Unit = {                                                                         
139     F.toIO(deadSignal.set(true)).unsafeRunAsync(_ => ())                                                                   
140     super.stageShutdown()                                                                                                  
141   }

I can run my app with a single thread.

@rossabaker
Copy link
Member

I thought blaze internals required at least two threads, but it's possible we overcame that. I haven't tried, but I can't certify a single thread as viable. That said, I think making this async would be reasonable. There's no guarantee that wsStream's finalizers run before the stage itself is shutdown anyway.

@rossabaker
Copy link
Member

Regarding #2695, I think we have to set the deadSignal in stageShutdown, which can be triggered in a lot of ways, to get the finalizers to run. I think the unsafeRunAsync would address the deadlock concerns.

I would also log any lefts in the unsafeRunAsync callback. .unsafeRunAsync(_ => ()) is roughly the same as writing an empty catch block in synchronous code.

@Dennis4b
Copy link
Author

Thinking about it some more, even though using >1 threads solves the immediate deadlock, there is always the (very small but non-zero) risk of all worker threads (4 on a 4-core system?) being involved in cleaning up a connection at the same time and then deadlocking as none of them are giving up their thread until they can run their effect on another one.

I'm not sure about coding style and error logging specifics of http4s, so I hope someone else can make the PR or change.

Thank your for your quick replies Ross.

@rossabaker
Copy link
Member

Your proposal looks close to good to me, if you want to try a PR. I'd say:

  1. Target series/0.20 branch, and we can make it a patch
  2. For error handling, make it something like:
unsafeRunAsync {
  case Left(t) => logger.error(t)("Error setting dead signal")
  case Right(_) => ()
}
  1. Don't forget to run test:scalafmt

If you don't have time, I can in the next day or two, but I've got a few others in flight.

@Dennis4b
Copy link
Author

I don't know what it is about using ZIO instead of cats.effect.IO as the effect type, or if there is some kind of synchronization point in scheduling/networking/proxying/logging, but all 12 worker threads ended up deadlocked like this after an hour of uptime with moderate load using 0.20.6.

I've created a pull request, but it's small so feel free to just add the change into some other work you're doing. I thought I could get by with 0.20.6 but am now using a local 0.20.7-SNAPSHOT.

@rossabaker rossabaker added bug Determined to be a bug in http4s module:blaze-server labels Apr 6, 2021
@vasilmkd
Copy link
Contributor

This looks like it has been already addressed?

@rossabaker
Copy link
Member

Yeah, CE3 cleared that up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Determined to be a bug in http4s module:blaze-server
Projects
None yet
Development

No branches or pull requests

3 participants