Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds JDK Flight Recorder context correlation #843

Merged
merged 7 commits into from Dec 12, 2018
Merged

Adds JDK Flight Recorder context correlation #843

merged 7 commits into from Dec 12, 2018

Conversation

codefromthecrypt
Copy link
Member

@codefromthecrypt codefromthecrypt commented Dec 11, 2018

This adds trace and span IDs to JDK Flight Recorder "Scope"
events so that you can correlate with Zipkin UI or logs accordingly.

This currently requires JDK 11. JVMs that use this need to add the below
to their VM start arguments:

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder

To view the flight recordings, you need JDK Mission Control 7.

With the above in place, you can configure brave.Tracing with
JfrScopeDecorator like so:

tracing = Tracing.newBuilder()
    .currentTraceContext(ThreadLocalCurrentTraceContext.newBuilder()
       .addScopeDecorator(JfrScopeDecorator.create())
       .build()
    )
    ...
    .build();

After a flight is recorded, you can look for "Zipkin/Scope" in the
Event browser like so:

flight recording

Users could then copy/paste the trace ID into the zipkin UI, or use log
correlation to further debug a problem.

Thanks to @thegreystone for hints on how to get this working!

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Dec 11, 2018

I orginally found out about flight recorder events when I saw opentracing decided not to expose parent span ID eventhough it is frequently used in correlation systems (as is the sampled flag) opentracing/opentracing-java#323. This led me to a JFR wrapper https://github.com/opentracing-contrib/java-jfr-tracer/tree/master/opentracing-jfr-tracer

After a chat with @thegreystone I noticed the JFR team had no ambition to be exclusively used with opentracing, so figured I would post the far simpler code which works with or without an opentracing bridge.

@codefromthecrypt
Copy link
Member Author

PS since no-one asked for this, yet, we need at least 3 sites to thumbsup it. Zipkin has a policy of not adding features unless end users ask for them.

@codefromthecrypt
Copy link
Member Author

PS screen shots are from the https://github.com/openzipkin/brave-webmvc-example which uses @anuraaga's nice scope decorator design as it allows you to add several types of correlation, in this case SLF4J and also JFR without wrapping the tracer or a current context component.

*/
public final class JfrScopeDecorator implements ScopeDecorator {

@Category("Zipkin")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super cool that from here down is less than 50loc. it is very easy to implement this, almost copy/pasteable!

@codefromthecrypt
Copy link
Member Author

PS the phrase "Scope Event" looks awkward in the list as other events have nicer names and don't say "event" either. Ex "Socket Read" "Allocation in new TLAB". I only used that name from the opentracing example.

It might be best to name it something like "Span in Scope", which matches the actual operation.

@codefromthecrypt
Copy link
Member Author

@nicmunroe FYI this is fun and doesn't rely on out-of-band data. It might work well for you in wingtips.

@codefromthecrypt
Copy link
Member Author

I will rename "Scope Event" to "Span in Scope" unless someone says they hate it :P

@codefromthecrypt
Copy link
Member Author

This is more reaction than most features we've done lately :P will merge soon!

@schrepfler
Copy link

Being a ThreadLocal, I suppose this would be of limited applicability to things like Akka, Twitter Future or Monix Task? Can we somehow make it generic/pluggable to be usable with other concurrency mechanisms (also Fibers will be coming our way).

@codefromthecrypt
Copy link
Member Author

@schrepfler the implementation is not dependent on thread local, it is actually a scoping api (try/finally). if the scoping doesn't work you have bigger problems than JFR :P

@thegreystone
Copy link

thegreystone commented Dec 11, 2018

Is there a problem with keeping this as part of the OpenTracing community? Taking the idea wholesale (http://hirt.se/blog/?p=1081, https://oracle.rainfocus.com/widget/oracle/oow18/catalogcodeone18?search=hirt) and jamming it into Brave, apart from making me feel slightly violated, makes duplicate events possible if running with the OpenTracing contrib part. Even worse, the JMC specific extensions we're planning will be for the open tracing events.

@devinsba
Copy link
Member

devinsba commented Dec 11, 2018 via email

@codefromthecrypt
Copy link
Member Author

#846 for reducing the cost of hex as this would be used alongside MDC and if care isn't taken overhead can be crap.

@codefromthecrypt
Copy link
Member Author

@devinsba anything derived from CurrentTraceContext will work including executors.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Dec 11, 2018

@thegreystone I cited your work, maybe not all of the links you just did, but attempted to!

The code here does not prevent opentracing from doing anything, unless I somehow misunderstand. We are using a separate category, clearly marked as Zipkin. How does that interfere?

The below is the only code similar to what you have. Are you saying that using the jdk.jfr package in this way is somehow proprietary?

  @Category("Zipkin")
  @Label("Span In Scope")
  @Description("Zipkin event representing a span being placed in scope")
  static final class SpanInScope extends Event {
    @Label("Trace Id") String traceId;
    @Label("Parent Id") String parentId;
    @Label("Span Id") String spanId;
  }

Adrian Cole added 5 commits December 11, 2018 22:22
This adds trace and span IDs to JDK Flight Recorder "Scope Events" so
that you can correlate with Zipkin UI or logs accordingly.

This currently requires JDK 11. JVMs that use this need to add the below
to their VM start arguments:
```bash
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder
```

To view the flight recordings, you need [JDK Mission Control 7](http://jdk.java.net/jmc/).

With the above in place, you can configure `brave.Tracing` with
`JfrScopeDecorator` like so:

```java
tracing = Tracing.newBuilder()
    .currentTraceContext(ThreadLocalCurrentTraceContext.newBuilder()
       .addScopeDecorator(JfrScopeDecorator.create())
       .build()
    )
    ...
    .build();
```

After a flight is recorded, you can look for "Zipkin/Scope Event" in the
Event browser like so:

<img width="1021" alt="flight recording" src="https://user-images.githubusercontent.com/64215/49773912-f0328b00-fd2d-11e8-9acd-26b82694aea9.png">

Users could then copy/paste the trace ID into the zipkin UI, or use log
correlation to further debug a problem.

Thanks to @thegreystone for hints on how to get this working!
@thegreystone
Copy link

thegreystone commented Dec 11, 2018

This currently requires JDK 11. JVMs that use this need to add the below
to their VM start arguments:

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder

--> Wrong. -XX:+UnlockCommercialFeatures is not supported on JDK 11. You should absolutely not add -XX:+UnlockCommercialFeatures.

@thegreystone
Copy link

The code here does not prevent opentracing from doing anything, unless I somehow misunderstand. >We are using a separate category, clearly marked as Zipkin. How does that interfere?

It does not, but it produces redundant data, but perhaps that is okay.

The below is the only code similar to what you have. Are you saying that using the jdk.jfr package > in this way is somehow proprietary?

No, it is fine. I should probably take a close look at the code. The problem starts on the JMC side of things - we were planning on building nice support for distributed tracers in JMC itself. If every tracer runs off and does custom integration, it will be hard for us to do this nicely. I wish there was a standardised nice decorator API that was shared among the distributed tracers, but there is not.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Dec 11, 2018

@thegreystone brave isn't exactly "any tracer" it is the most used library, far more used than opentracing. We do have users that use the opentracing bridge, but it will never be an entrypoint for all apps due to various reasons mentioned in their issues list the last few years, plus not everyone wants to use it.

Ideally, JMC will like to work with zipkin as it is the largest ecosystem... if you'd prefer not to, that's ok, too.. your call! Meanwhile, people here have already engaged in your ideas far more than the other project. So, take the win?

@codefromthecrypt
Copy link
Member Author

I was going to merge this today, but since there's some mild drama will wait a day

@thegreystone
Copy link

I was going to merge this today, but since there's some mild drama will wait a day

Sorry for providing some drama. I can also provide review feedback. The latter will hopefully be useful.

@thegreystone
Copy link

Want me to leave the review comments here, like I did with the unlock commercial features one?

@codefromthecrypt
Copy link
Member Author

thanks @thegreystone looking forward to it.

FYI as I noticed even in our impl overhead is crap if you are correlating multiple things, you can't just paste https://github.com/openzipkin/brave/pull/843/files#diff-2c4683e99993fb2e9fe338cf2ded9b35 you'll need to build this or use latest snapshot before pasting.

Some technical note is that I left out JDK 1.8 as it used jrocket apis which seem scary to reason with at the moment. Better for folks to ask later for that, but it would complicate the build to try, so we might simply not do it and instead ask someone to copy/paste similarly.. time will tell.

Another thing in your impl I didn't do was dual-recording span timing also into flight recorder. Currently our users of correlation don't require this and it is more complicated and higher overhead to attempt. So, anyway hope these thoughts help add color to what I saw in your work, but didn't attempt in this PR.

@thegreystone
Copy link

Don't have access to commenting directly. Perhaps that could be fixed?

@codefromthecrypt
Copy link
Member Author

@thegreystone I have no idea why you can't make a line comment.. I quickly tried to find if there's even a setting to restrict this and couldn't. We've never restricted line comments intentionally. sorry it isn't working for whatever reason.

@codefromthecrypt
Copy link
Member Author

renamed to Scope, that's a better name. Thanks @thegreystone. I'll wait until other feedback is in before re-doing the screen shot.

@thegreystone
Copy link

Also fix the comment about commercial features - for JDK 11 Oracle donated flight recorder etc. The flag will give you a warning now, and since you only support JDK 11 with this patch (not JDK 8) the comment should be changed.

@codefromthecrypt
Copy link
Member Author

wrt "about commercial features" will nix.. easier anyway!

@thegreystone
Copy link

We're being stupid in our implementation too...

For the scope event, just call commit:
No need for this:
if (!event.shouldCommit()) return;
event.end();
event.commit();

Also, in JDK 11 - no need to end the event first. Just call commit.

.travis.yml Outdated
@@ -10,7 +10,7 @@ cache:
language: java

jdk:
- oraclejdk9
- oraclejdk11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, should we make this openjdk11? I guess it might be causing the confusion about commercial features. Targeting the baseline seems reasonable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Definitely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah that was unintentional.. I just needed something 11 :P can try to change to openjdk11 presuming travis has it

@thegreystone
Copy link

Good thing about this is that I get to clean up my code. ;)

@thegreystone
Copy link

Any chance all you tracer folks could get together and agree on a nice vendor neutral decorator API? ;)

scope.close();
if (!event.shouldCommit()) return;
event.end();
event.commit();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check shouldCommit, no need to end before commit (in JDK 11). Simply just call event.commit()

.travis.yml Outdated
@@ -10,7 +10,7 @@ cache:
language: java

jdk:
- oraclejdk9
- oraclejdk11

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openjdk11

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Dec 11, 2018

Any chance all you tracer folks could get together and agree on a nice vendor neutral decorator API? ;)

you mean besides ours? :P anyway feel free to propose our design to opentracing. they sometimes accept change requests.

@thegreystone
Copy link

@thegreystone I have no idea why you can't make a line comment

Seems to work now. Weird.

@codefromthecrypt
Copy link
Member Author

ok I think I got all the feedback in, but check me. I need to be away for a 90m meeting then sleep. Anything else will address tomorrow.

Thanks @thegreystone this is better thanks to you!

@thegreystone
Copy link

you mean besides ours? :P anyway feel free to propose our design to opentracing. they sometimes >accept change requests.

Well, it would need to be a common one shared among all participating tracers. So, in a namespace that you could all agree on. But it would certainly make life easier.

@thegreystone
Copy link

Thanks @thegreystone this is better thanks to you!

Aside from "existing", but I'll take it. ;) I still think it would be better if we could somehow find a vendor independent way of doing this.

@codefromthecrypt
Copy link
Member Author

@thegreystone I'll do you a favor and not dig into my opinion on how in the world people claim a group led by end users is vendor biased, and yet things run by vendors claim they are neutral! Let's leave that for a different github issue :D

@thegreystone
Copy link

Hey, I'm not a tracing guy, so I don't care how you all go about standardising and/or agreeing on standards. :) My life would just be easier if you do.

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cool connecting profilers and tracing like this. At some point I'm going to need to see if it's applicable to this too - https://github.com/GoogleCloudPlatform/cloud-profiler-java

@thegreystone
Copy link

It's cool connecting profilers and tracing like this.

That said, JFR is much more than a profiler.

@codefromthecrypt codefromthecrypt merged commit 7a428c8 into master Dec 12, 2018
@codefromthecrypt
Copy link
Member Author

out in 5.6.0 https://github.com/openzipkin/brave/tree/master/context/jfr (added explicit credits, though we should also not forget that @thegreystone is also to thank for JMC itself!)

@felixbarny
Copy link

This is really cool :)

@thegreystone Is it possible to correlate events with method profiling? I'm thinking about implementing a feature to create a flame graph of a particular trace/span.

@adriancole Is it guaranteed that the activation and corresponding deactivation is on the same thread? If so, you might be able to reuse the event object in a thread local. Because spans can be nested, it would have to be something like a ThreadLocal<List<ScopeEvent>>. And then there is the problem of class loader leaks with thread locals...

@schrepfler
Copy link

Would be good to cover also Monix TaskLocal (https://medium.com/pragmatic-scala/local-variable-propagation-with-monix-part-1-106b80e652bb ) and related Twitter's Contexts https://twitter.github.io/finagle/guide/Contexts.html CC'ng @alexandru and @cacoco for awareness.

@nicmunroe
Copy link

@nicmunroe FYI this is fun and doesn't rely on out-of-band data. It might work well for you in wingtips.

Thanks for the heads up! I'll definitely need to take a look.

@thegreystone
Copy link

@thegreystone Is it possible to correlate events with method profiling? I'm thinking about implementing a feature to create a flame graph of a particular trace/span.

https://github.com/thegreystone/jmc-flame-view

@thegreystone
Copy link

So Flame View, plus techniques hinted at in this article, is what you are looking for:
hirt.se/blog/?p=1081

Then you can simply select a Trace ID looking interesting, "Store and set as active selection", and then you can look at a Flame View over not only method profiling events (though you can do that too), but a Flame View of the aggregated stack traces for the allocation hotspots, or whatever else you fancy.

Though, honestly, I did the Flame View mostly for fun. I find the normal stack trace view in JMC to be much faster to work with.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Jan 22, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants