Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds JDK Flight Recorder context correlation #843

Merged
merged 7 commits into from Dec 12, 2018

Conversation

Projects
None yet
7 participants
@adriancole
Copy link
Contributor

adriancole commented Dec 11, 2018

This adds trace and span IDs to JDK Flight Recorder "Scope"
events so that you can correlate with Zipkin UI or logs accordingly.

This currently requires JDK 11. JVMs that use this need to add the below
to their VM start arguments:

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder

To view the flight recordings, you need JDK Mission Control 7.

With the above in place, you can configure brave.Tracing with
JfrScopeDecorator like so:

tracing = Tracing.newBuilder()
    .currentTraceContext(ThreadLocalCurrentTraceContext.newBuilder()
       .addScopeDecorator(JfrScopeDecorator.create())
       .build()
    )
    ...
    .build();

After a flight is recorded, you can look for "Zipkin/Scope" in the
Event browser like so:

flight recording

Users could then copy/paste the trace ID into the zipkin UI, or use log
correlation to further debug a problem.

Thanks to @thegreystone for hints on how to get this working!

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

I orginally found out about flight recorder events when I saw opentracing decided not to expose parent span ID eventhough it is frequently used in correlation systems (as is the sampled flag) opentracing/opentracing-java#323. This led me to a JFR wrapper https://github.com/opentracing-contrib/java-jfr-tracer/tree/master/opentracing-jfr-tracer

After a chat with @thegreystone I noticed the JFR team had no ambition to be exclusively used with opentracing, so figured I would post the far simpler code which works with or without an opentracing bridge.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

PS since no-one asked for this, yet, we need at least 3 sites to thumbsup it. Zipkin has a policy of not adding features unless end users ask for them.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

PS screen shots are from the https://github.com/openzipkin/brave-webmvc-example which uses @anuraaga's nice scope decorator design as it allows you to add several types of correlation, in this case SLF4J and also JFR without wrapping the tracer or a current context component.

*/
public final class JfrScopeDecorator implements ScopeDecorator {

@Category("Zipkin")

This comment has been minimized.

Copy link
@adriancole

adriancole Dec 11, 2018

Author Contributor

super cool that from here down is less than 50loc. it is very easy to implement this, almost copy/pasteable!

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

PS the phrase "Scope Event" looks awkward in the list as other events have nicer names and don't say "event" either. Ex "Socket Read" "Allocation in new TLAB". I only used that name from the opentracing example.

It might be best to name it something like "Span in Scope", which matches the actual operation.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

@nicmunroe FYI this is fun and doesn't rely on out-of-band data. It might work well for you in wingtips.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

I will rename "Scope Event" to "Span in Scope" unless someone says they hate it :P

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

This is more reaction than most features we've done lately :P will merge soon!

@schrepfler

This comment has been minimized.

Copy link

schrepfler commented Dec 11, 2018

Being a ThreadLocal, I suppose this would be of limited applicability to things like Akka, Twitter Future or Monix Task? Can we somehow make it generic/pluggable to be usable with other concurrency mechanisms (also Fibers will be coming our way).

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

@schrepfler the implementation is not dependent on thread local, it is actually a scoping api (try/finally). if the scoping doesn't work you have bigger problems than JFR :P

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

Is there a problem with keeping this as part of the OpenTracing community? Taking the idea wholesale (http://hirt.se/blog/?p=1081, https://oracle.rainfocus.com/widget/oracle/oow18/catalogcodeone18?search=hirt) and jamming it into Brave, apart from making me feel slightly violated, makes duplicate events possible if running with the OpenTracing contrib part. Even worse, the JMC specific extensions we're planning will be for the open tracing events.

@devinsba

This comment has been minimized.

Copy link
Member

devinsba commented Dec 11, 2018

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

#846 for reducing the cost of hex as this would be used alongside MDC and if care isn't taken overhead can be crap.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

@devinsba anything derived from CurrentTraceContext will work including executors.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

@thegreystone I cited your work, maybe not all of the links you just did, but attempted to!

The code here does not prevent opentracing from doing anything, unless I somehow misunderstand. We are using a separate category, clearly marked as Zipkin. How does that interfere?

The below is the only code similar to what you have. Are you saying that using the jdk.jfr package in this way is somehow proprietary?

  @Category("Zipkin")
  @Label("Span In Scope")
  @Description("Zipkin event representing a span being placed in scope")
  static final class SpanInScope extends Event {
    @Label("Trace Id") String traceId;
    @Label("Parent Id") String parentId;
    @Label("Span Id") String spanId;
  }

adriancole added some commits Dec 11, 2018

Adds JDK Flight Recorder context correlation
This adds trace and span IDs to JDK Flight Recorder "Scope Events" so
that you can correlate with Zipkin UI or logs accordingly.

This currently requires JDK 11. JVMs that use this need to add the below
to their VM start arguments:
```bash
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder
```

To view the flight recordings, you need [JDK Mission Control 7](http://jdk.java.net/jmc/).

With the above in place, you can configure `brave.Tracing` with
`JfrScopeDecorator` like so:

```java
tracing = Tracing.newBuilder()
    .currentTraceContext(ThreadLocalCurrentTraceContext.newBuilder()
       .addScopeDecorator(JfrScopeDecorator.create())
       .build()
    )
    ...
    .build();
```

After a flight is recorded, you can look for "Zipkin/Scope Event" in the
Event browser like so:

<img width="1021" alt="flight recording" src="https://user-images.githubusercontent.com/64215/49773912-f0328b00-fd2d-11e8-9acd-26b82694aea9.png">

Users could then copy/paste the trace ID into the zipkin UI, or use log
correlation to further debug a problem.

Thanks to @thegreystone for hints on how to get this working!
@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

This currently requires JDK 11. JVMs that use this need to add the below
to their VM start arguments:

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder

--> Wrong. -XX:+UnlockCommercialFeatures is not supported on JDK 11. You should absolutely not add -XX:+UnlockCommercialFeatures.

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

The code here does not prevent opentracing from doing anything, unless I somehow misunderstand. >We are using a separate category, clearly marked as Zipkin. How does that interfere?

It does not, but it produces redundant data, but perhaps that is okay.

The below is the only code similar to what you have. Are you saying that using the jdk.jfr package > in this way is somehow proprietary?

No, it is fine. I should probably take a close look at the code. The problem starts on the JMC side of things - we were planning on building nice support for distributed tracers in JMC itself. If every tracer runs off and does custom integration, it will be hard for us to do this nicely. I wish there was a standardised nice decorator API that was shared among the distributed tracers, but there is not.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

@thegreystone brave isn't exactly "any tracer" it is the most used library, far more used than opentracing. We do have users that use the opentracing bridge, but it will never be an entrypoint for all apps due to various reasons mentioned in their issues list the last few years, plus not everyone wants to use it.

Ideally, JMC will like to work with zipkin as it is the largest ecosystem... if you'd prefer not to, that's ok, too.. your call! Meanwhile, people here have already engaged in your ideas far more than the other project. So, take the win?

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

I was going to merge this today, but since there's some mild drama will wait a day

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

I was going to merge this today, but since there's some mild drama will wait a day

Sorry for providing some drama. I can also provide review feedback. The latter will hopefully be useful.

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

Want me to leave the review comments here, like I did with the unlock commercial features one?

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

thanks @thegreystone looking forward to it.

FYI as I noticed even in our impl overhead is crap if you are correlating multiple things, you can't just paste https://github.com/openzipkin/brave/pull/843/files#diff-2c4683e99993fb2e9fe338cf2ded9b35 you'll need to build this or use latest snapshot before pasting.

Some technical note is that I left out JDK 1.8 as it used jrocket apis which seem scary to reason with at the moment. Better for folks to ask later for that, but it would complicate the build to try, so we might simply not do it and instead ask someone to copy/paste similarly.. time will tell.

Another thing in your impl I didn't do was dual-recording span timing also into flight recorder. Currently our users of correlation don't require this and it is more complicated and higher overhead to attempt. So, anyway hope these thoughts help add color to what I saw in your work, but didn't attempt in this PR.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

@thegreystone I have no idea why you can't make a line comment.. I quickly tried to find if there's even a setting to restrict this and couldn't. We've never restricted line comments intentionally. sorry it isn't working for whatever reason.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

renamed to Scope, that's a better name. Thanks @thegreystone. I'll wait until other feedback is in before re-doing the screen shot.

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

Also fix the comment about commercial features - for JDK 11 Oracle donated flight recorder etc. The flag will give you a warning now, and since you only support JDK 11 with this patch (not JDK 8) the comment should be changed.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

wrt "about commercial features" will nix.. easier anyway!

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

We're being stupid in our implementation too...

For the scope event, just call commit:
No need for this:
if (!event.shouldCommit()) return;
event.end();
event.commit();

Also, in JDK 11 - no need to end the event first. Just call commit.

.travis.yml Outdated
@@ -10,7 +10,7 @@ cache:
language: java

jdk:
- oraclejdk9
- oraclejdk11

This comment has been minimized.

Copy link
@anuraaga

anuraaga Dec 11, 2018

Contributor

Just wondering, should we make this openjdk11? I guess it might be causing the confusion about commercial features. Targeting the baseline seems reasonable

This comment has been minimized.

Copy link
@thegreystone

thegreystone Dec 11, 2018

Yes. Definitely.

This comment has been minimized.

Copy link
@adriancole

adriancole Dec 11, 2018

Author Contributor

ah yeah that was unintentional.. I just needed something 11 :P can try to change to openjdk11 presuming travis has it

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

Good thing about this is that I get to clean up my code. ;)

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

Any chance all you tracer folks could get together and agree on a nice vendor neutral decorator API? ;)

scope.close();
if (!event.shouldCommit()) return;
event.end();
event.commit();

This comment has been minimized.

Copy link
@thegreystone

thegreystone Dec 11, 2018

No need to check shouldCommit, no need to end before commit (in JDK 11). Simply just call event.commit()

.travis.yml Outdated
@@ -10,7 +10,7 @@ cache:
language: java

jdk:
- oraclejdk9
- oraclejdk11

This comment has been minimized.

Copy link
@thegreystone
@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

Any chance all you tracer folks could get together and agree on a nice vendor neutral decorator API? ;)

you mean besides ours? :P anyway feel free to propose our design to opentracing. they sometimes accept change requests.

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

@thegreystone I have no idea why you can't make a line comment

Seems to work now. Weird.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

ok I think I got all the feedback in, but check me. I need to be away for a 90m meeting then sleep. Anything else will address tomorrow.

Thanks @thegreystone this is better thanks to you!

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

you mean besides ours? :P anyway feel free to propose our design to opentracing. they sometimes >accept change requests.

Well, it would need to be a common one shared among all participating tracers. So, in a namespace that you could all agree on. But it would certainly make life easier.

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

Thanks @thegreystone this is better thanks to you!

Aside from "existing", but I'll take it. ;) I still think it would be better if we could somehow find a vendor independent way of doing this.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 11, 2018

@thegreystone I'll do you a favor and not dig into my opinion on how in the world people claim a group led by end users is vendor biased, and yet things run by vendors claim they are neutral! Let's leave that for a different github issue :D

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

Hey, I'm not a tracing guy, so I don't care how you all go about standardising and/or agreeing on standards. :) My life would just be easier if you do.

@anuraaga
Copy link
Contributor

anuraaga left a comment

It's cool connecting profilers and tracing like this. At some point I'm going to need to see if it's applicable to this too - https://github.com/GoogleCloudPlatform/cloud-profiler-java

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Dec 11, 2018

It's cool connecting profilers and tracing like this.

That said, JFR is much more than a profiler.

@adriancole adriancole merged commit 7a428c8 into master Dec 12, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@adriancole adriancole deleted the jfr branch Dec 12, 2018

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Dec 12, 2018

out in 5.6.0 https://github.com/openzipkin/brave/tree/master/context/jfr (added explicit credits, though we should also not forget that @thegreystone is also to thank for JMC itself!)

@felixbarny

This comment has been minimized.

Copy link

felixbarny commented Jan 22, 2019

This is really cool :)

@thegreystone Is it possible to correlate events with method profiling? I'm thinking about implementing a feature to create a flame graph of a particular trace/span.

@adriancole Is it guaranteed that the activation and corresponding deactivation is on the same thread? If so, you might be able to reuse the event object in a thread local. Because spans can be nested, it would have to be something like a ThreadLocal<List<ScopeEvent>>. And then there is the problem of class loader leaks with thread locals...

@schrepfler

This comment has been minimized.

Copy link

schrepfler commented Jan 22, 2019

Would be good to cover also Monix TaskLocal (https://medium.com/pragmatic-scala/local-variable-propagation-with-monix-part-1-106b80e652bb ) and related Twitter's Contexts https://twitter.github.io/finagle/guide/Contexts.html CC'ng @alexandru and @cacoco for awareness.

@nicmunroe

This comment has been minimized.

Copy link

nicmunroe commented Jan 22, 2019

@nicmunroe FYI this is fun and doesn't rely on out-of-band data. It might work well for you in wingtips.

Thanks for the heads up! I'll definitely need to take a look.

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Jan 22, 2019

@thegreystone Is it possible to correlate events with method profiling? I'm thinking about implementing a feature to create a flame graph of a particular trace/span.

https://github.com/thegreystone/jmc-flame-view

@thegreystone

This comment has been minimized.

Copy link

thegreystone commented Jan 22, 2019

So Flame View, plus techniques hinted at in this article, is what you are looking for:
hirt.se/blog/?p=1081

Then you can simply select a Trace ID looking interesting, "Store and set as active selection", and then you can look at a Flame View over not only method profiling events (though you can do that too), but a Flame View of the aggregated stack traces for the allocation hotspots, or whatever else you fancy.

Though, honestly, I did the Flame View mostly for fun. I find the normal stack trace view in JMC to be much faster to work with.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

adriancole commented Jan 22, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.