Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TracingScheduler issues when mixing Future with Task #848

Closed
mdedetrich opened this issue Mar 22, 2019 · 30 comments · Fixed by #1213
Closed

TracingScheduler issues when mixing Future with Task #848

mdedetrich opened this issue Mar 22, 2019 · 30 comments · Fixed by #1213
Milestone

Comments

@mdedetrich
Copy link
Contributor

mdedetrich commented Mar 22, 2019

The equivalent issue can be found here mdedetrich/monix-mdc#1. For basic scenarios when mixing Task and Future along with a custom MDCAdaptor that uses TaskLocal, everything seems to work fine (i.e. you can use MDC.get/MDC.put interchangeably and stuff works). However some more complex test scenarios don't seem to work, i.e. these tests currently fail

https://github.com/mdedetrich/monix-mdc/blob/master/src/test/scala/MDCFutureSpec.scala#L56-L106

Where as the equivalent ones (that just use Task) work fine, i.e.

https://github.com/mdedetrich/monix-mdc/blob/master/src/test/scala/MDCBasicSpec.scala#L54-L66

The easiest way to replicate this is to just clone https://github.com/mdedetrich/monix-mdc/ and run test. Note that the repo uses 3.0.0-RC2-SNAPSHOT-9e79718-SNAPSHOT which contains some TaskLocal fixes.

Its also entirely possible that this is an issue with the way that MonixMDCAdapter is written

@mdedetrich
Copy link
Contributor Author

mdedetrich commented Mar 25, 2019

So just an update on this, I redid the tests in monix-mdc (I now use property based tests to make sure the state is clean on each test run) and figured out that the issue occurs when you put a value into TaskLocal initially when using a Future and then try to modify the TaskLocal later within a Task.

If you put a value into the TaskLocal initially using Task, then you can retrieve the values (correctly) when using a Future afterwards.

@oleg-py
Copy link
Collaborator

oleg-py commented Mar 25, 2019

Our main problem currently is that this test would fail, if added:

  testAsync("multiple futures operation") {
    implicit val traced = Scheduler.traced
    val local = Local(42)
    for {
      f1 <- Future { local := 11 }
      f2 <- Future { local.get }
    } yield assertEquals(f2, 11)
  }

It seems that, for simple case like this:

// slightly desugared version of Future(X).flatMap(Y)
val x = Future { X } (ec)
val y = x.flatMap(Y)(ec)

Both X and Y would receive a copy of a current/parent/callsite state. Therefore, any modifications done by X will not be seen when it's time for Y to be called. We need to somehow "link" the changes across two call sites, or, stated in a different way, share the context state between the two, like a reference to a mutable state.

@mdedetrich
Copy link
Contributor Author

So here is an example of how people carry over MDC context when using a custom ExecutionContext. The logic required would be along the same lines (just instead of the MDC map you have the current TaskLocal context)

http://yanns.github.io/blog/2014/05/04/slf4j-mapped-diagnostic-context-mdc-with-play-framework/ . See the 'Second solution with a custom execution context' section

@Avasil
Copy link
Collaborator

Avasil commented May 21, 2019

#866 changes are included in the snapshot: 3.0.0-RC2-9c49e02-SNAPSHOT

Does it fix issues mentioned here?

@Avasil
Copy link
Collaborator

Avasil commented Jul 6, 2019

I think this one should be fixed by #866
Please let us know if that's not the case

@Avasil Avasil closed this as completed Jul 6, 2019
@mdedetrich
Copy link
Contributor Author

mdedetrich commented Jul 6, 2019

Trying it now (although note, the problem also occurs in Doobie when just using IO/Future

@Avasil
Copy link
Collaborator

Avasil commented Jul 6, 2019

Thanks @mdedetrich
FYI it's included in 3.0.0-RC3

@mdedetrich
Copy link
Contributor Author

@Avasil Still getting the same issue

Just run the tests in https://github.com/mdedetrich/monix-mdc/ to replicate

[info] - can Write and get different values concurrently and mixed with Future first *** FAILED ***
C?p�>") was not equal to List("/n,
ii
C?p�>", "E��'JfZr'#��v9

@Avasil
Copy link
Collaborator

Avasil commented Jul 13, 2019

cc @oleg-py

@Avasil
Copy link
Collaborator

Avasil commented Jul 14, 2019

@oleg-py @mdedetrich

I guess it is the issue with too much sharing:

"Write and get different values concurrently and mixed with Future first" in {
    val keyMultipleValues = KeyMultipleValues("key", List("v1", "v2", "v3", "v4", "v5", "v6"))

    def getAndPut(key: String, value: String): Task[String] =
      for {
        _ <- Task.deferFuture {
              Future {
                MDC.put(key, value)
              }
            }
        get <- Task {
                MDC.get(key)
              }
      } yield get

    val tasks = keyMultipleValues.values.map { value =>
      getAndPut(keyMultipleValues.key, value).executeAsync
    }

    val task = Task.gather(tasks)

// List("v1", "v5", "v1", "v1", "v5", "v5") was not equal to 
// List("v1", "v2", "v3", "v4", "v5", "v6")
    task.runToFutureOpt.map { retrievedValues =>
      retrievedValues shouldBe keyMultipleValues.values
    }
}

I tried to do Local.isolate { getAndPut } but it didn't change anything.

@mdedetrich
Copy link
Contributor Author

with too much sharing

What do you mean precisely? Also we should reopen the ticket

@Avasil Avasil reopened this Jul 14, 2019
@Avasil
Copy link
Collaborator

Avasil commented Jul 14, 2019

In current model, tasks running in parallel share locals, a bit of explanation is in this comment

IIRC the way not to share them is to either call Local.isolate { } or run them separately (separate runToFuture etc.)

Does it prevent usage in MDC/OpenTracing? I haven't had the chance to try it myself yet. Ideally we could support these integrations without having to isolate manually

@oleg-py
Copy link
Collaborator

oleg-py commented Jul 15, 2019

@Avasil

I tried to do Local.isolate { getAndPut } but it didn't change anything.

You should do TaskLocal.isolate, not Local.isolate.

@Avasil
Copy link
Collaborator

Avasil commented Jul 15, 2019

Thanks @oleg-py , it works with TaskLocal.isolate

Actually I'm starting to like the current model, at least the situation is clear and not full of special cases.
I suppose it's intended @mdedetrich

The only thing I don't like is if it requires users to do TaskLocal.isolate { } but I feel like it should be either not necessary (separate task running in response to HTTP request so Local is not shared) or very rare (isolating just once at the top) in pretty much all of the use cases so not that big of an issue.

@mdedetrich
Copy link
Contributor Author

mdedetrich commented Jul 15, 2019

Well the issue is more that people using Future with a TracingScheduler are now expected to use a TaskLocal.isolate for something that has nothing to do with Task.

For example with OpenTracing, if you have concurrent Future's that creates a new span based on a span context which is currently stored in TaskLocal, it should just work and with other tracing solutions it does.

It might mean in the context of OpenTracing that we always do TaskLocal.isolate, I have to digest the changes I guess. The idea (at least in context of the current code base) is that other projects that use Future which have no concept of Task (i.e. they might even be other libraries that you have no control over) can just use a TracingScheduler and it would work.

@oleg-py
Copy link
Collaborator

oleg-py commented Jul 15, 2019

Local.isolate should work with Futures due to their eagerness.

testAsync("captures snapshot in actual async execution") {

It's lazy structures like Task that require a different method of isolation

@Avasil Avasil modified the milestones: 3.0.0-RC3, 3.0.0 Jul 18, 2019
@Avasil
Copy link
Collaborator

Avasil commented Jul 18, 2019

@mdedetrich when using Task it's a little better because if we run tasks separately instead of relying on Task parallelism, the locals will be isolated automatically. So replying to requests in Akka HTTP (because of .runToFuture), http4s too I think, Observable.mapEval etc.

Though it might be cumbersome to enable localContextPropagation at times. What do you think about enabling it by default? @alexandru @oleg-py

BTW @mdedetrich are there cases with OpenTracing with a lot of modification to Local ? Do you have an example (doesn't have to be code, just description) ?

@alexandru
Copy link
Member

alexandru commented Jul 28, 2019

With this PR the "local context propagation" would effectively be enabled all the time if a TracingScheduler is used:

#960

@alexandru
Copy link
Member

Would it fix the problem?

@Avasil
Copy link
Collaborator

Avasil commented Jul 28, 2019

@alexandru I dont think so, it is a consequence of current model. Local is shared by default unless you isolate it. It is done in Task.runXXX but not Task.gather and other parallel operators and failing test uses gather.

@mdedetrich
Copy link
Contributor Author

Hey guys I was sick last week with a viral infection, will resume looking into this

@Avasil
Copy link
Collaborator

Avasil commented Jul 29, 2019

Thank you a lot @mdedetrich and hope you're feeling better! Feedback from you is very valuable.

@Avasil Avasil modified the milestones: 3.0.0, 3.1.0 Aug 6, 2019
@mdedetrich
Copy link
Contributor Author

mdedetrich commented Dec 25, 2019

Okay so I finally had time to get back to this, I have updated OpenTracing to the latest version as well as fixing the tests and here are the issues that I have found. Basically the following type of tests fail

  1. Do a computation in Task
  2. Run the task with Task.runToFutureOpt
  3. And then do future operations with Future inside a for comprehension

There are two variants of these tests

In both cases, the issue seems to be that when you are have a computation inside of aTask {...} and then run that Task with runToFutureOpt, this doesn't appear to register inside the Local for the task. For example https://github.com/mdedetrich/monix-opentra)cing/blob/master/core/src/test/scala/FutureTaskTracingSchedulerSpec.scala#L78 fails with a NullPointerException but thats because even though https://github.com/mdedetrich/monix-opentracing/blob/master/core/src/test/scala/FutureTaskTracingSchedulerSpec.scala#L66-L69 executes, nothing ends up being stored in the Local.

Note that the same type of test also fails in monix-mdc (https://github.com/mdedetrich/monix-mdc/blob/master/src/test/scala/MDCFutureSpec.scala#L37-L50) so its not really specific to OpenTracing.

Apart from this, everything seems to work as expected in latest version of Monix

@Avasil
Copy link
Collaborator

Avasil commented Jan 4, 2020

@mdedetrich @alexandru @oleg-py

  • Do a computation in Task
  • Run the task with Task.runToFutureOpt
  • And then do future operations with Future inside a for comprehension

This is the following scenario:

val local = Local(0)

for {
  _ <- Task(local.update(1)).runToFutureOpt
  x <- Future(local.get)
} yield assertEquals(x, 1)

local.get doesn't register update because running a Task automatically isolates it.
I think it is desired (e.g. running to Future as a response to request in Akka HTTP) behavior overalI but I wonder if we could/want to make it work for cases like above.

@mdedetrich
Copy link
Contributor Author

mdedetrich commented Jan 4, 2020

running to Future as a response to request in Akka HTTP

The thing is even in cases like this you may want to propagate the Local rather than isolating it because even in cases like akka-http you often give the future to another function which may want to access the data inside the Local.

This is actually a case with monix-mdc, i.e. you will put values into the MDC context but unmarshallers may end up logging values from the MDC context and this is after the Task has already been converted to a Future.

@Avasil
Copy link
Collaborator

Avasil commented Jan 5, 2020

That's a good use case, I will see what we can do about it without removing automatic isolation where it makes sense

@mdedetrich
Copy link
Contributor Author

Note that I am fine with enabling this as an option if you don't want to break any backwards behavior (I would just have to document that you need 2 options available, the new one along with enableLocalContextPropagation).

I do question if there is any actual benefit to isolation in this case, I don't see a problem with making the Local available in the same Thread where runToFutureOpt is executed so the Future can pick it up.

@mdedetrich
Copy link
Contributor Author

@Avasil Is there a chance this would be merged by 3.2.x?

@Avasil
Copy link
Collaborator

Avasil commented Feb 17, 2020

@mdedetrich Thanks for the reminder, I hope we can!

Any thoughts @alexandru @oleg-py ?
I didn't look too closely into it yet but I hope there is a way to have both (auto-isolate + proper context in the Future)

If we can't then I guess we need to provide configs like in #1044

@mdedetrich
Copy link
Contributor Author

So a PR has been set up to solve this issue, it creates another option called isolateLocalContextOnRun which if you set to false will keep the Local context even after a Future is run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants