New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskLocal not always restoring context after execution #624

Closed
oleg-py opened this Issue Mar 27, 2018 · 14 comments

Comments

Projects
None yet
4 participants
@oleg-py
Collaborator

oleg-py commented Mar 27, 2018

I finally managed to reproduce the issue I was talking about on gitter.

The app below crashes with MatchError because variable becomes true on 2nd iteration (on my machine).

Happens on both RC1 and #613 (commit # 63044eb)

import monix.execution.Scheduler.Implicits.global
import monix.eval._
import scala.concurrent.duration._

object Fakes extends App {
  implicit val opts = Task.Options(false, true)
  val local = TaskLocal[Boolean](false).runSyncUnsafeOpt(Duration.Inf)

  def attempt = local.read flatMap {
    case false => local.bind(true) {
      Task.unit
    }
  }

  for (_ <- 1 to 1000) {
      attempt.runSyncUnsafeOpt(Duration.Inf)
  }
}

Ironically, this works when LCP is disabled

/cc @leandrob13

@leandrob13

This comment has been minimized.

Contributor

leandrob13 commented Mar 28, 2018

@oleg-py got it, will test and debug asap

@alexandru

This comment has been minimized.

Member

alexandru commented Mar 28, 2018

So this sample should probably work because it's on the same thread, don't know what is wrong here; but it got me thinking ... your usage of runSyncUnsafeOpt in that loop is incompatible with the chosen model and this can't work for all tasks.

The problem gets more complicated when you introduce forks:

  def attempt = local.read flatMap {
    case false => local.bind(true) {
      Task.shift // <-- fork
    }
  }

In such a case the restore will happen on another thread entirely, captured in a ThreadLocal, which will never be seen from main. Basically the restore happens for the bind continuation of that Task, so the loop that should work is this:

def loop(n: Int = 1000): Task[Unit] =
  if (n > 0) attempt.flatMap(_ => loop(n - 1))
  else Task.unit

Is this a use case we care about?

If yes, then we cannot deliver on TaskLocal. If no, then we need to warn people in the docs, but if somebody as capable as you @oleg-py couldn't see the problem in this, then I'm afraid that TaskLocal is too error prone.

@alexandru

This comment has been minimized.

Member

alexandru commented Mar 28, 2018

Btw, our implementation is inspired by the implementation of Twitter's Future and Local.

But it has one big difference. Back when we implemented this, @leandrob13 used an immutable Map in that ThreadLocal which I liked very much, however Twitter's implementation uses a mutable Array.

I'm now thinking that we might have made the wrong decision and that we also need a mutable Array.

@oleg-py

This comment has been minimized.

Collaborator

oleg-py commented Mar 28, 2018

@alexandru I used shift initially in my testing, but as I went on simplifying, unit was enough.

Honestly, my understanding of concurrency issues is somewhat superficial. I prefer to rely on good practices (immutables, pure functions) and abstractions (Task) to avoid having them 😅.

As I mentioned on gitter, I encounter similar problem when managing connections in oleg-py/quill-monix. The Task I produce is eventually run by http4s server, so I have little control over how it gets evaluated.

TaskLocal is a key for supporting transactions in quill-monix which has the potential to be a decent alternative for doobie in FP setting. It also has other uses like per-request logging data (and horrible ones, like avoiding passing arguments to functions :).

@alexandru

This comment has been minimized.

Member

alexandru commented Mar 28, 2018

Yeah. I'm a little busy at work, but will play around with it this week. Maybe we need to go back to the basics. I'm curious how it behaves with a mutable Array in it, like Twitter is doing.

I somehow thought we outsmarted them, but I think they outsmarted us 😀

@leandrob13

This comment has been minimized.

Contributor

leandrob13 commented Mar 31, 2018

Getting back from holidays. Should I pause this until @alexandru tests the mutable Array alternative?

@alexandru alexandru added this to the 3.0.0-RC2 milestone Apr 1, 2018

@alexandru alexandru added the bug label Apr 1, 2018

@leandrob13

This comment has been minimized.

Contributor

leandrob13 commented Apr 1, 2018

@oleg-py I tested and it seems that the problem lies within the bindL implementation based on bracket.

Reverted back to this:

def bind[R](value: A)(task: Task[R]): Task[R] =
    Task.suspend {
      val saved = ref.value
      ref.update(value)
      task.doOnFinish(_ => restore(saved))
    }

and it works as you expect. The problem is that bracket forces an async boundary by using the onCancelRaiseError:

def apply[A, B](
    acquire: Task[A],
    use: A => Task[B],
    release: (A, Either[Option[Throwable], B]) => Task[Unit]): Task[B] = {

    acquire.flatMap { a =>
      val next = try use(a) catch { case NonFatal(e) => Task.raiseError(e) }
      next.onCancelRaiseError(isCancel).flatMap(new ReleaseFrame(a, release))
    }
  }

Since the boundary is forced, the release is done in another thread instead of the current one. That is something that is ok with bracket. Tested for #601 and the test I based on your reports passes. I can do the revert in #613 if you guys think it is ok.

@alexandru

This comment has been minimized.

Member

alexandru commented Apr 1, 2018

The async boundary in onCancelRaiseError is a light one, being managed by TrampolineRunnable instances pushed in the Scheduler, which knows how to execute them on the current thread.

That piece of logic never forks the execution.

What happens actually is that in the TaskRunLoop we are using Local.bind on executing stuff, restoring the previous context on the current thread after execution has completed or forked on another thread.

So on one of those Async values we end up doing:

// Saving the current context
savedLocals = Local.getContext()

// ...

Local.bind(savedLocals) {
  // Executing the bind continuation ...
  startFull(Now(value), context, callback, this, bFirst, bRest, runLoopIndex())
} // <-- restore happens afterwards
@alexandru

This comment has been minimized.

Member

alexandru commented Apr 1, 2018

This is also consistent with how the TracingScheduler works:

final class TracingRunnable(r: Runnable, context: Context = Local.getContext())
  extends Runnable {

  def run(): Unit = {
    val prev: Context = Local.getContext() // <-- saves current context
    Local.setContext(context) // <-- sets the context we had on scheduler.execute()
    try r.run() finally Local.setContext(prev) // <-- restores saved context
  }
}

Note that here too, whatever reads or writes happen for locals in that r.run(), they'll get overridden by that finally.

oleg-py added a commit to oleg-py/monix that referenced this issue Apr 1, 2018

Fix monix#624
The sync part of Task run-loop is implicitly side effecting on Local context. Those side effects were not wrapped in `bind`, so all the values written by evaluating tasks were persisted after asynchronous boundary or completion.
@leandrob13

This comment has been minimized.

Contributor

leandrob13 commented Apr 1, 2018

@alexandru @oleg-py As we discussed in gitter, I came up with this that seems to fix the problem reported:

def bindL[R](value: Task[A])(task: Task[R]): Task[R] =
    Task.eval(ref.value).flatMap { saved =>
      value.bracket(v => ref.bind(v)(task))(_ => restore(saved))
    }

If this is approved I can add it in #613.

@leandrob13

This comment has been minimized.

Contributor

leandrob13 commented Apr 1, 2018

@oleg-py Nevermind, this breaks the other tests. I had a premature conclusion on this one.

@alexandru

This comment has been minimized.

Member

alexandru commented Nov 6, 2018

Will close the issue for now. If you think it's still relevant, you can reopen.

@alexandru alexandru closed this Nov 6, 2018

@deusaquilus

This comment has been minimized.

deusaquilus commented Nov 23, 2018

Definitely think this is still relevant! Would love to have monix with quill-jdbc but this issue is a blocker!
Edit: Wait a second, this appears to be solved in RC2. Awesome!

@deusaquilus

This comment has been minimized.

deusaquilus commented Dec 10, 2018

Hi @oleg-py, @leandrob13, and @alexandru. We're talking about TaskLocal vs using a externally scoped Local with Task in getquill/quill#1261. Any input you have would be greatly appreciated!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment