Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Storage.CrossThreadLocal #961

Merged
merged 2 commits into from
Mar 24, 2021
Merged

Add Storage.CrossThreadLocal #961

merged 2 commits into from
Mar 24, 2021

Conversation

jatcwang
Copy link
Contributor

@jatcwang jatcwang commented Mar 20, 2021

This storage implements the "unoptimized" version of thread local storage which allows for the scope to be closed in another thread, as it re-resolves the thread local context in close

Rerunning the benchmark:

[info] # JMH version: 1.21
[info] # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM GraalVM CE 21.0.0.2, 25.282-b07-jvmci-21.0-b06
[info] # VM invoker: /usr/lib/jvm/java-8-graalvm/jre/bin/java
[info] # VM options: -XX:MaxInlineLevel=24 -XX:MaxInlineSize=270
[info] # Warmup: 3 iterations, 10 s each
[info] # Measurement: 5 iterations, 10 s each
[info] # Timeout: 10 min per iteration
[info] # Threads: 2 threads, will synchronize iterations
[info] # Benchmark mode: Average time, time/op
[info] # Benchmark: kamon.bench.ThreadLocalStorageBenchmark.crossThreadLocal
[info] # Run progress: 0.00% complete, ETA 00:02:40
[info] # Fork: 1 of 1
[info] Picked up JAVA_TOOL_OPTIONS: -XX:MaxInlineLevel=24 -XX:MaxInlineSize=270
[info] # Warmup Iteration   1: 13.610 ns/op
[info] # Warmup Iteration   2: 13.379 ns/op
[info] # Warmup Iteration   3: 14.535 ns/op
[info] Iteration   1: 14.325 ns/op
[info] Iteration   2: 14.308 ns/op
[info] Iteration   3: 14.341 ns/op
[info] Iteration   4: 14.323 ns/op
[info] Iteration   5: 14.383 ns/op
[info] Result "kamon.bench.ThreadLocalStorageBenchmark.crossThreadLocal":
[info]   14.336 ±(99.9%) 0.110 ns/op [Average]
[info]   (min, avg, max) = (14.308, 14.336, 14.383), stdev = 0.029
[info]   CI (99.9%): [14.226, 14.446] (assumes normal distribution)
[info] # JMH version: 1.21
[info] # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM GraalVM CE 21.0.0.2, 25.282-b07-jvmci-21.0-b06
[info] # VM invoker: /usr/lib/jvm/java-8-graalvm/jre/bin/java
[info] # VM options: -XX:MaxInlineLevel=24 -XX:MaxInlineSize=270
[info] # Warmup: 3 iterations, 10 s each
[info] # Measurement: 5 iterations, 10 s each
[info] # Timeout: 10 min per iteration
[info] # Threads: 2 threads, will synchronize iterations
[info] # Benchmark mode: Average time, time/op
[info] # Benchmark: kamon.bench.ThreadLocalStorageBenchmark.fastThreadLocal
[info] # Run progress: 50.00% complete, ETA 00:01:20
[info] # Fork: 1 of 1
[info] Picked up JAVA_TOOL_OPTIONS: -XX:MaxInlineLevel=24 -XX:MaxInlineSize=270
[info] # Warmup Iteration   1: 12.372 ns/op
[info] # Warmup Iteration   2: 12.327 ns/op
[info] # Warmup Iteration   3: 12.380 ns/op
[info] Iteration   1: 12.363 ns/op
[info] Iteration   2: 12.541 ns/op
[info] Iteration   3: 12.448 ns/op
[info] Iteration   4: 13.155 ns/op
[info] Iteration   5: 13.782 ns/op
[info] Result "kamon.bench.ThreadLocalStorageBenchmark.fastThreadLocal":
[info]   12.858 ±(99.9%) 2.323 ns/op [Average]
[info]   (min, avg, max) = (12.363, 12.858, 13.782), stdev = 0.603
[info]   CI (99.9%): [10.534, 15.181] (assumes normal distribution)
[info] # Run complete. Total time: 00:02:40
[info] Benchmark                                     Mode  Cnt   Score   Error  Units
[info] ThreadLocalStorageBenchmark.crossThreadLocal  avgt    5  14.336 ± 0.110  ns/op
[info] ThreadLocalStorageBenchmark.fastThreadLocal   avgt    5  12.858 ± 2.323  ns/op

TBH, I don't think the "gain" of the optimized threadlocal implementation is worth keeping (the large variability of the fastThreadLocal is especially interesting). But I'm biased as I primarily work with the Typelevel stack so I'd like to minimize the steps required to get it working with Kamon, especially when failing to do so will lead to horrible cross-thread context contamination

This storage implements the "unoptimized" version of thread local storage which allows for the scope to be closed in another thread, as it re-resolves the thread local context in `close`
@dpsoft
Copy link
Contributor

dpsoft commented Mar 22, 2021

@jatcwang looks good to me!

btw, in my local machine I got this bench results:

[info] # JMH version: 1.25
[info] # VM version: JDK 1.8.0_252, OpenJDK 64-Bit Server VM, 25.252-b09
[info] # VM invoker: /Users/dparra/.sdkman/candidates/java/8.0.252.hs-adpt/jre/bin/java
[info] # VM options: <none>
[info] # Warmup: 5 iterations, 10 s each
[info] # Measurement: 10 iterations, 10 s each
[info] # Timeout: 10 min per iteration
[info] # Threads: 2 threads, will synchronize iterations
...
[info] Benchmark                                         Mode  Cnt   Score   Error  Units
[info] ThreadLocalStorageBenchmark.currentThreadLocal    avgt   50  17.712 ± 0.425  ns/op
[info] ThreadLocalStorageBenchmark.fastThreadLocal       avgt   50  13.102 ± 0.255  ns/op

the variability could be generated by some background process, if I don't close chrome my benchmarks are garbage. :(

@jatcwang
Copy link
Contributor Author

jatcwang commented Mar 22, 2021

Cool. @dpsoft can you retrigger CI. Seems like a flaky test unrelated to my changes?

@dpsoft
Copy link
Contributor

dpsoft commented Mar 22, 2021

I just ran again the tests!

Copy link
Contributor

@SimunKaracic SimunKaracic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, nice job.

I would only change the sys.props thing, otherwise it's good to go!

@@ -141,8 +141,10 @@ object ContextStorage {
* instrumentation follows them around.
*/
private val _contextStorage: Storage = {
if(sys.props("kamon.context.debug") == "true")
if (sys.props("kamon.context.debug") == "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the choice is between a couple of actual implementations, this should probably be loaded from the configuration file, not the environment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Folks, I'm thinking that we should make CrossThreadLocal the default context storage implementation. The reasons why I think that are:

  • It will not affect the behavior of any of the manual/automatic instrumentation. We always aimed to close scopes on the same thread where we created them, and in those situations this implementation will produce the same results. In 99% of the cases we need to close the scope in the same thread we created it anyways, and Cats/Monix implementations are an exception to that common case so we will continue to recommended closing scopes on the same thread, unless it is very clear that there is a reason for doing otherwise, as it is for Cats/Monix.
  • It makes it easier to get started. If there is something I learned over the years is that users almost never read the fine print 😄.. if users need to add a JVM property but that property is controlled from the outside world (IDE, command line parameters or container orchestrators), there will be several cases of things "not working" in certain environments. I would rather go with an option that makes it work by default for everyone.
  • In the specific cases where someone needs that little boost in performance (I'm counting to be very few cases here) they can add the extra JVM options to use the optimized TLS.

What do you think about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all sounds reasonable, and simplifying semantics is always good!
I'm just scared because this is a "breaking" change, and I have no intuition for how often this code is called.
If you're sure that we won't see a big performance drop, let's do it, and just make sure to document clearly and make not of it in the release notes.

Copy link
Contributor

@dpsoft dpsoft Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 100% with the default context storage should be CrossThreadLocal but would leave the optimized one as an option for experimented/power users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree @ivantopo (Makes my life easier that's for sure ;)). Happy to make the change if @SimunKaracic agree too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 🤝

Copy link
Contributor Author

@jatcwang jatcwang Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done.
@SimunKaracic I did not make it a config because in the kamon.context.Storage.Debug docstring it says we don't allow this be discovered from configuration because it can cause initialization issues when Kamon is first initialized via instrumentation trying to access the current Context which I think still applies. Added kamon.context.storageType system property though

core/kamon-core/src/main/scala/kamon/context/Storage.scala Outdated Show resolved Hide resolved
core/kamon-core/src/main/scala/kamon/context/Storage.scala Outdated Show resolved Hide resolved
- Fix Debug stroage to allow closing the scope in another thread without context contamination.
- Add system property to choose a different context storage type
@jatcwang
Copy link
Contributor Author

Can someone please retrigger the CI? Seems like a flakey test

@SimunKaracic SimunKaracic merged commit ce27fb5 into kamon-io:master Mar 24, 2021
@SimunKaracic
Copy link
Contributor

Thank you for the contribution! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants