New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds JDK Flight Recorder context correlation #843
Conversation
I orginally found out about flight recorder events when I saw opentracing decided not to expose parent span ID eventhough it is frequently used in correlation systems (as is the sampled flag) opentracing/opentracing-java#323. This led me to a JFR wrapper https://github.com/opentracing-contrib/java-jfr-tracer/tree/master/opentracing-jfr-tracer After a chat with @thegreystone I noticed the JFR team had no ambition to be exclusively used with opentracing, so figured I would post the far simpler code which works with or without an opentracing bridge. |
PS since no-one asked for this, yet, we need at least 3 sites to thumbsup it. Zipkin has a policy of not adding features unless end users ask for them. |
PS screen shots are from the https://github.com/openzipkin/brave-webmvc-example which uses @anuraaga's nice scope decorator design as it allows you to add several types of correlation, in this case SLF4J and also JFR without wrapping the tracer or a current context component. |
*/ | ||
public final class JfrScopeDecorator implements ScopeDecorator { | ||
|
||
@Category("Zipkin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super cool that from here down is less than 50loc. it is very easy to implement this, almost copy/pasteable!
PS the phrase "Scope Event" looks awkward in the list as other events have nicer names and don't say "event" either. Ex "Socket Read" "Allocation in new TLAB". I only used that name from the opentracing example. It might be best to name it something like "Span in Scope", which matches the actual operation. |
@nicmunroe FYI this is fun and doesn't rely on out-of-band data. It might work well for you in wingtips. |
I will rename "Scope Event" to "Span in Scope" unless someone says they hate it :P |
c0cb6a2
to
976279e
Compare
This is more reaction than most features we've done lately :P will merge soon! |
Being a ThreadLocal, I suppose this would be of limited applicability to things like Akka, Twitter Future or Monix Task? Can we somehow make it generic/pluggable to be usable with other concurrency mechanisms (also Fibers will be coming our way). |
@schrepfler the implementation is not dependent on thread local, it is actually a scoping api (try/finally). if the scoping doesn't work you have bigger problems than JFR :P |
Is there a problem with keeping this as part of the OpenTracing community? Taking the idea wholesale (http://hirt.se/blog/?p=1081, https://oracle.rainfocus.com/widget/oracle/oow18/catalogcodeone18?search=hirt) and jamming it into Brave, apart from making me feel slightly violated, makes duplicate events possible if running with the OpenTracing contrib part. Even worse, the JMC specific extensions we're planning will be for the open tracing events. |
So,to clarify, works with wrapped executor services?
…________________________________
From: Adrian Cole <notifications@github.com>
Sent: Tuesday, December 11, 2018 8:40 AM
To: openzipkin/brave
Cc: Subscribed
Subject: Re: [openzipkin/brave] Adds JDK Flight Recorder context correlation (#843)
ex https://github.com/openzipkin/brave/blob/master/context/rxjava2/src/main/java/brave/context/rxjava2/internal/TraceContextSubscriber.java#L35
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#843 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAIiohNYJMGCNmAaXZ8iSkZxupSmCRECks5u37W1gaJpZM4ZMilX>.
|
#846 for reducing the cost of hex as this would be used alongside MDC and if care isn't taken overhead can be crap. |
@devinsba anything derived from |
@thegreystone I cited your work, maybe not all of the links you just did, but attempted to! The code here does not prevent opentracing from doing anything, unless I somehow misunderstand. We are using a separate category, clearly marked as Zipkin. How does that interfere? The below is the only code similar to what you have. Are you saying that using the @Category("Zipkin")
@Label("Span In Scope")
@Description("Zipkin event representing a span being placed in scope")
static final class SpanInScope extends Event {
@Label("Trace Id") String traceId;
@Label("Parent Id") String parentId;
@Label("Span Id") String spanId;
} |
This adds trace and span IDs to JDK Flight Recorder "Scope Events" so that you can correlate with Zipkin UI or logs accordingly. This currently requires JDK 11. JVMs that use this need to add the below to their VM start arguments: ```bash -XX:+UnlockCommercialFeatures -XX:+FlightRecorder ``` To view the flight recordings, you need [JDK Mission Control 7](http://jdk.java.net/jmc/). With the above in place, you can configure `brave.Tracing` with `JfrScopeDecorator` like so: ```java tracing = Tracing.newBuilder() .currentTraceContext(ThreadLocalCurrentTraceContext.newBuilder() .addScopeDecorator(JfrScopeDecorator.create()) .build() ) ... .build(); ``` After a flight is recorded, you can look for "Zipkin/Scope Event" in the Event browser like so: <img width="1021" alt="flight recording" src="https://user-images.githubusercontent.com/64215/49773912-f0328b00-fd2d-11e8-9acd-26b82694aea9.png"> Users could then copy/paste the trace ID into the zipkin UI, or use log correlation to further debug a problem. Thanks to @thegreystone for hints on how to get this working!
05657e3
to
524bcb5
Compare
This currently requires JDK 11. JVMs that use this need to add the below -XX:+UnlockCommercialFeatures -XX:+FlightRecorder --> Wrong. -XX:+UnlockCommercialFeatures is not supported on JDK 11. You should absolutely not add -XX:+UnlockCommercialFeatures. |
It does not, but it produces redundant data, but perhaps that is okay.
No, it is fine. I should probably take a close look at the code. The problem starts on the JMC side of things - we were planning on building nice support for distributed tracers in JMC itself. If every tracer runs off and does custom integration, it will be hard for us to do this nicely. I wish there was a standardised nice decorator API that was shared among the distributed tracers, but there is not. |
@thegreystone brave isn't exactly "any tracer" it is the most used library, far more used than opentracing. We do have users that use the opentracing bridge, but it will never be an entrypoint for all apps due to various reasons mentioned in their issues list the last few years, plus not everyone wants to use it. Ideally, JMC will like to work with zipkin as it is the largest ecosystem... if you'd prefer not to, that's ok, too.. your call! Meanwhile, people here have already engaged in your ideas far more than the other project. So, take the win? |
I was going to merge this today, but since there's some mild drama will wait a day |
Sorry for providing some drama. I can also provide review feedback. The latter will hopefully be useful. |
Want me to leave the review comments here, like I did with the unlock commercial features one? |
thanks @thegreystone looking forward to it. FYI as I noticed even in our impl overhead is crap if you are correlating multiple things, you can't just paste https://github.com/openzipkin/brave/pull/843/files#diff-2c4683e99993fb2e9fe338cf2ded9b35 you'll need to build this or use latest snapshot before pasting. Some technical note is that I left out JDK 1.8 as it used jrocket apis which seem scary to reason with at the moment. Better for folks to ask later for that, but it would complicate the build to try, so we might simply not do it and instead ask someone to copy/paste similarly.. time will tell. Another thing in your impl I didn't do was dual-recording span timing also into flight recorder. Currently our users of correlation don't require this and it is more complicated and higher overhead to attempt. So, anyway hope these thoughts help add color to what I saw in your work, but didn't attempt in this PR. |
Don't have access to commenting directly. Perhaps that could be fixed? |
@thegreystone I have no idea why you can't make a line comment.. I quickly tried to find if there's even a setting to restrict this and couldn't. We've never restricted line comments intentionally. sorry it isn't working for whatever reason. |
renamed to Scope, that's a better name. Thanks @thegreystone. I'll wait until other feedback is in before re-doing the screen shot. |
Also fix the comment about commercial features - for JDK 11 Oracle donated flight recorder etc. The flag will give you a warning now, and since you only support JDK 11 with this patch (not JDK 8) the comment should be changed. |
wrt "about commercial features" will nix.. easier anyway! |
We're being stupid in our implementation too... For the scope event, just call commit: Also, in JDK 11 - no need to end the event first. Just call commit. |
.travis.yml
Outdated
@@ -10,7 +10,7 @@ cache: | |||
language: java | |||
|
|||
jdk: | |||
- oraclejdk9 | |||
- oraclejdk11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, should we make this openjdk11? I guess it might be causing the confusion about commercial features. Targeting the baseline seems reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Definitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah that was unintentional.. I just needed something 11 :P can try to change to openjdk11 presuming travis has it
Good thing about this is that I get to clean up my code. ;) |
Any chance all you tracer folks could get together and agree on a nice vendor neutral decorator API? ;) |
scope.close(); | ||
if (!event.shouldCommit()) return; | ||
event.end(); | ||
event.commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check shouldCommit, no need to end before commit (in JDK 11). Simply just call event.commit()
.travis.yml
Outdated
@@ -10,7 +10,7 @@ cache: | |||
language: java | |||
|
|||
jdk: | |||
- oraclejdk9 | |||
- oraclejdk11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openjdk11
you mean besides ours? :P anyway feel free to propose our design to opentracing. they sometimes accept change requests. |
Seems to work now. Weird. |
ok I think I got all the feedback in, but check me. I need to be away for a 90m meeting then sleep. Anything else will address tomorrow. Thanks @thegreystone this is better thanks to you! |
Well, it would need to be a common one shared among all participating tracers. So, in a namespace that you could all agree on. But it would certainly make life easier. |
Aside from "existing", but I'll take it. ;) I still think it would be better if we could somehow find a vendor independent way of doing this. |
@thegreystone I'll do you a favor and not dig into my opinion on how in the world people claim a group led by end users is vendor biased, and yet things run by vendors claim they are neutral! Let's leave that for a different github issue :D |
Hey, I'm not a tracing guy, so I don't care how you all go about standardising and/or agreeing on standards. :) My life would just be easier if you do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's cool connecting profilers and tracing like this. At some point I'm going to need to see if it's applicable to this too - https://github.com/GoogleCloudPlatform/cloud-profiler-java
That said, JFR is much more than a profiler. |
out in 5.6.0 https://github.com/openzipkin/brave/tree/master/context/jfr (added explicit credits, though we should also not forget that @thegreystone is also to thank for JMC itself!) |
This is really cool :) @thegreystone Is it possible to correlate events with method profiling? I'm thinking about implementing a feature to create a flame graph of a particular trace/span. @adriancole Is it guaranteed that the activation and corresponding deactivation is on the same thread? If so, you might be able to reuse the event object in a thread local. Because spans can be nested, it would have to be something like a |
Would be good to cover also Monix |
Thanks for the heads up! I'll definitely need to take a look. |
|
So Flame View, plus techniques hinted at in this article, is what you are looking for: Then you can simply select a Trace ID looking interesting, "Store and set as active selection", and then you can look at a Flame View over not only method profiling events (though you can do that too), but a Flame View of the aggregated stack traces for the allocation hotspots, or whatever else you fancy. Though, honestly, I did the Flame View mostly for fun. I find the normal stack trace view in JMC to be much faster to work with. |
hi felix. the scope api is absolutely intended to be opened and closed on
the same thread. thanks for asking!
…On Tue, Jan 22, 2019, 10:03 PM Felix Barnsteiner ***@***.*** wrote:
This is really cool :)
@thegreystone <https://github.com/thegreystone> Is it possible to
correlate events with method profiling? I'm thinking about implementing a
feature to create a flame graph of a particular trace/span.
@adriancole <https://github.com/adriancole> Is it guaranteed that the
activation and corresponding deactivation is on the same thread? If so, you
might be able to reuse the event object in a thread local. Because spans
can be nested, it would have to be something like a
ThreadLocal<List<ScopeEvent>>. And then there is the problem of class
loader leaks with thread locals...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#843 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD610h0JFRBawUo__5ZO6jIHS2ldr5Hks5vFxo1gaJpZM4ZMilX>
.
|
This adds trace and span IDs to JDK Flight Recorder "Scope"
events so that you can correlate with Zipkin UI or logs accordingly.
This currently requires JDK 11. JVMs that use this need to add the below
to their VM start arguments:
To view the flight recordings, you need JDK Mission Control 7.
With the above in place, you can configure
brave.Tracing
withJfrScopeDecorator
like so:After a flight is recorded, you can look for "Zipkin/Scope" in the
Event browser like so:
Users could then copy/paste the trace ID into the zipkin UI, or use log
correlation to further debug a problem.
Thanks to @thegreystone for hints on how to get this working!