Trace thread state is removed when root spans are completed #22

carterkozak · 2018-10-25T19:08:16Z

Multiple root spans are not allowed, traceId state should not
leak between multiple root spans.

Multiple root spans are not allowed, traceId state should not leak between multiple root spans.

carterkozak · 2018-10-25T19:09:29Z

carterkozak · 2018-10-25T19:10:51Z

tracing/src/main/java/com/palantir/tracing/Tracer.java

    /** Returns the globally unique identifier for this thread's trace. */
    public static String getTraceId() {
-        return currentTrace.get().getTraceId();
+        return Preconditions.checkNotNull(currentTrace.get(), "There is no root span").getTraceId();


I considered returning a newly generated ID when no trace is available, but that could potentially result in bad data if we initialize a new span assuming that traceId will be maintained.

Possibly want to use the safelogging Preconditions here so these messages will definitely not get redacted?

https://github.com/palantir/safe-logging#preconditions

I agree that would be ideal, but we don't currently depend on safe-logging. I'd prefer not to add deps in this change.

should we expose a public static boolean hasTraceId() to allow checking if tracing is currently enabled?

Once this PR goes through we can obviously clean up https://github.com/palantir/tritium/blob/develop/tritium-tracing/src/main/java/com/palantir/tritium/tracing/TracingInvocationEventHandler.java#L71 as getAndClearTrace will now do the right thing and clear state, but there may be cases where we want to know if there is a root trace ID already set.

+1 to a hasTraceId or a new method to return Optional<String>... otherwise you can't eyeball I method like this and be confident it's safe:

@Override public HttpConnection create(URL url) throws IOException { HttpConnection connection = delegate.create(url); String traceId = Tracer.getTraceId(); connection.setRequestProperty(TraceHttpHeaders.TRACE_ID, traceId); return connection; }

Added.

I searched for usages of getTraceId, and everything that I found called it after creating a new span, so it should be safe, though there may be cases that I'm not aware of. For what it's worth, anything that does throw here is also responsible for leaking traceIds.

iamdanfox · 2018-10-26T16:56:04Z

tracing/src/main/java/com/palantir/tracing/Tracer.java

-        MDC.put(Tracers.TRACE_ID_KEY, trace.getTraceId());
-        return trace;
-    });
+    private static final ThreadLocal<Trace> currentTrace = new ThreadLocal<>();


OK so I guess this is the key change - this guy now uses null to represent the idea that we're not in a trace?

I guess we just have to be super careful now everywhere that we use currentTrace.get() that it might actually be null!

That is correct, otherwise trace thread state leaks outside of root spans, which causes multiple root spans unless code directly creates a new traceId.

iamdanfox · 2018-10-26T16:57:15Z

tracing/src/main/java/com/palantir/tracing/Tracer.java

-                .filter(openSpan -> currentTrace.get().isObservable())
-                .map(openSpan -> toSpan(openSpan, metadata))
-                .ifPresent(Tracer::notifyObservers);
+        if (isTraceObservable()) {


Why put this behind the isTraceObservable check?

ack, this is a bug. fix incoming

iamdanfox · 2018-10-26T17:04:47Z

I think this makes sense directionally, but I want to be super careful changing this stuff because it's been stable for quite a long time and it underpins pretty much every one of our services!

iamdanfox · 2018-10-26T17:07:25Z

tracing/src/main/java/com/palantir/tracing/Tracer.java

+        return trace;
+    }
+
+    private static void clearCurrentTrace() {


Looks like the only two places this can be called are:

public Trace getAndClearTrace()

when a call to fastCompleteSpan() -> popCurrentSpan() (pops the last remaining span)

That is correct

schlosna

Supportive of this PR as it will allow removing the workarounds added in palantir/tritium#134

Have a few questions/nits below

schlosna · 2018-10-26T17:14:25Z

tracing/src/main/java/com/palantir/tracing/Tracer.java

+        Optional<OpenSpan> prevState = trace.top();
        if (prevState.isPresent()) {
            spanBuilder.parentSpanId(prevState.get().getSpanId());
        }


nit: could simplify

Suggested change

}

trace.top().ifPresent(openSpan -> spanBuilder.parentSpanId(openSpan.getSpanId()));

I opted to avoid the lambda allocation since this is a relatively hot path. We do a lot elsewhere, so I could go either way.

makes sense

schlosna · 2018-10-26T17:15:06Z

tracing/src/main/java/com/palantir/tracing/Tracer.java

+        if (trace != null) {
+            boolean observable = isTraceObservable();
+            Optional<OpenSpan> span = popCurrentSpan();
+            if (observable) {


should we reuse the trace to avoid an additional thread local get?

Suggested change

if (observable) {

if (isObservable(trace)) {

good call, I think we want trace.isObservable() rather than a new method unless I'm misunderstanding.

All the call sites should have a separate null check on the trace, so I think this will be okay.

schlosna · 2018-10-26T17:16:55Z

tracing/src/main/java/com/palantir/tracing/Tracer.java

-                notifyObservers(span);
-            }
-        });
+        if (maybeSpan.isPresent() && observable) {


should we reuse the trace to avoid additional thread local get and simplify?

Suggested change

if (maybeSpan.isPresent() && observable) {

if (isObservable(trace)) {

maybeSpan.ifPresent(Tracer::notifyObservers);

}

schlosna · 2018-10-26T17:17:56Z

tracing/src/main/java/com/palantir/tracing/Tracer.java

    public static boolean isTraceObservable() {
-        return currentTrace.get().isObservable();
+        Trace trace = currentTrace.get();
+        return trace != null && trace.isObservable();


related to above comments, could add isObservable(Trace) to avoid additional thread local lookups

Suggested change

return trace != null && trace.isObservable();

return isObservable(currentTrace.get());

}

private static boolean isObservable(Trace trace) {

return trace != null && trace.isObservable();

}

schlosna · 2018-10-26T17:29:49Z

tracing/src/main/java/com/palantir/tracing/Tracer.java

    /** Returns the globally unique identifier for this thread's trace. */
    public static String getTraceId() {
-        return currentTrace.get().getTraceId();
+        return Preconditions.checkNotNull(currentTrace.get(), "There is no root span").getTraceId();


should we expose a public static boolean hasTraceId() to allow checking if tracing is currently enabled?

Once this PR goes through we can obviously clean up https://github.com/palantir/tritium/blob/develop/tritium-tracing/src/main/java/com/palantir/tritium/tracing/TracingInvocationEventHandler.java#L71 as getAndClearTrace will now do the right thing and clear state, but there may be cases where we want to know if there is a root trace ID already set.

carterkozak · 2018-10-26T18:06:31Z

should we expose a public static boolean hasTraceId() to allow checking if tracing is currently enabled?

I think that would be helpful, but I would prefer to propose that in a separate change.

uschi2000 · 2018-10-28T12:23:45Z

I don't see a test for the changed behavior, could you add one, please?

carterkozak · 2018-10-28T19:18:41Z

Was tested (though admittedly indirectly) by this assertion: assertThat(MDC.get(Tracers.TRACE_ID_KEY)).isNull();

Added a better test for this specific change.

iamdanfox

Overall, I think this change looks good. It forces users to be a bit more precise rather than just assuming a root span will be invented.

I've also done a quick search internally for any users of the getTraceId method and it seems that very few people use it directly.

This version removes the need for the traceId thread state cleanup code. See: palantir/tracing-java#22

Trace thread state is removed when root spans are completed

38b237a

Multiple root spans are not allowed, traceId state should not leak between multiple root spans.

carterkozak requested a review from a team as a code owner October 25, 2018 19:08

carterkozak commented Oct 25, 2018

View reviewed changes

iamdanfox reviewed Oct 26, 2018

View reviewed changes

Tracer.fastCompleteSpan fix

10b89ba

schlosna reviewed Oct 26, 2018

View reviewed changes

carterkozak added 2 commits October 26, 2018 13:54

Prefer Trace.isObservable over static lookups

de812a3

lambda warning comments

1cfd152

Test that closing a root span completes the trace

102b5df

Implement Tracer.hasTraceId

1bbfefa

iamdanfox approved these changes Oct 29, 2018

View reviewed changes

iamdanfox merged commit 4e93239 into palantir:develop Oct 29, 2018

carterkozak added a commit to palantir/tritium that referenced this pull request Oct 29, 2018

Upgrade tracing-java to 2.0.0

e441801

This version removes the need for the traceId thread state cleanup code. See: palantir/tracing-java#22

carterkozak mentioned this pull request Oct 29, 2018

Upgrade tracing-java to 2.0.0 palantir/tritium#136

Merged

schlosna pushed a commit to palantir/tritium that referenced this pull request Oct 29, 2018

Upgrade tracing-java to 2.0.0 (#136)

1be119e

This version removes the need for the traceId thread state cleanup code. See: palantir/tracing-java#22

	}
	trace.top().ifPresent(openSpan -> spanBuilder.parentSpanId(openSpan.getSpanId()));

-        if (maybeSpan.isPresent() && observable) {
+        if (isObservable(trace)) {
+            maybeSpan.ifPresent(Tracer::notifyObservers);
+        }

Trace thread state is removed when root spans are completed #22

Trace thread state is removed when root spans are completed #22

Uh oh!

Conversation

carterkozak commented Oct 25, 2018

Uh oh!

carterkozak commented Oct 25, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schlosna Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamdanfox Oct 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamdanfox Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamdanfox commented Oct 26, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schlosna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schlosna Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carterkozak commented Oct 26, 2018

Uh oh!

uschi2000 commented Oct 28, 2018

Uh oh!

carterkozak commented Oct 28, 2018

Uh oh!

iamdanfox left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

schlosna Oct 26, 2018 •

edited

Loading

iamdanfox Oct 29, 2018 •

edited

Loading

iamdanfox Oct 26, 2018 •

edited

Loading

schlosna Oct 26, 2018 •

edited

Loading