Skip to content

Conversation

@carterkozak
Copy link
Contributor

Multiple root spans are not allowed, traceId state should not
leak between multiple root spans.

Multiple root spans are not allowed, traceId state should not
leak between multiple root spans.
@carterkozak carterkozak requested a review from a team as a code owner October 25, 2018 19:08
@carterkozak
Copy link
Contributor Author

@schlosna fysa

/** Returns the globally unique identifier for this thread's trace. */
public static String getTraceId() {
return currentTrace.get().getTraceId();
return Preconditions.checkNotNull(currentTrace.get(), "There is no root span").getTraceId();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered returning a newly generated ID when no trace is available, but that could potentially result in bad data if we initialize a new span assuming that traceId will be maintained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly want to use the safelogging Preconditions here so these messages will definitely not get redacted?

https://github.com/palantir/safe-logging#preconditions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be ideal, but we don't currently depend on safe-logging. I'd prefer not to add deps in this change.

Copy link
Contributor

@schlosna schlosna Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we expose a public static boolean hasTraceId() to allow checking if tracing is currently enabled?

Once this PR goes through we can obviously clean up https://github.com/palantir/tritium/blob/develop/tritium-tracing/src/main/java/com/palantir/tritium/tracing/TracingInvocationEventHandler.java#L71 as getAndClearTrace will now do the right thing and clear state, but there may be cases where we want to know if there is a root trace ID already set.

Copy link
Contributor

@iamdanfox iamdanfox Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to a hasTraceId or a new method to return Optional<String>... otherwise you can't eyeball I method like this and be confident it's safe:

    @Override
    public HttpConnection create(URL url) throws IOException {
        HttpConnection connection = delegate.create(url);
        String traceId = Tracer.getTraceId();
        connection.setRequestProperty(TraceHttpHeaders.TRACE_ID, traceId);
        return connection;
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

I searched for usages of getTraceId, and everything that I found called it after creating a new span, so it should be safe, though there may be cases that I'm not aware of. For what it's worth, anything that does throw here is also responsible for leaking traceIds.

MDC.put(Tracers.TRACE_ID_KEY, trace.getTraceId());
return trace;
});
private static final ThreadLocal<Trace> currentTrace = new ThreadLocal<>();
Copy link
Contributor

@iamdanfox iamdanfox Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so I guess this is the key change - this guy now uses null to represent the idea that we're not in a trace?

I guess we just have to be super careful now everywhere that we use currentTrace.get() that it might actually be null!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, otherwise trace thread state leaks outside of root spans, which causes multiple root spans unless code directly creates a new traceId.

.filter(openSpan -> currentTrace.get().isObservable())
.map(openSpan -> toSpan(openSpan, metadata))
.ifPresent(Tracer::notifyObservers);
if (isTraceObservable()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why put this behind the isTraceObservable check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, this is a bug. fix incoming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@iamdanfox
Copy link
Contributor

I think this makes sense directionally, but I want to be super careful changing this stuff because it's been stable for quite a long time and it underpins pretty much every one of our services!

return trace;
}

private static void clearCurrentTrace() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the only two places this can be called are:

  • public Trace getAndClearTrace()
  • when a call to fastCompleteSpan() -> popCurrentSpan() (pops the last remaining span)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct

Copy link
Contributor

@schlosna schlosna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supportive of this PR as it will allow removing the workarounds added in palantir/tritium#134

Have a few questions/nits below

Optional<OpenSpan> prevState = trace.top();
if (prevState.isPresent()) {
spanBuilder.parentSpanId(prevState.get().getSpanId());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could simplify

Suggested change
}
trace.top().ifPresent(openSpan -> spanBuilder.parentSpanId(openSpan.getSpanId()));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to avoid the lambda allocation since this is a relatively hot path. We do a lot elsewhere, so I could go either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

if (trace != null) {
boolean observable = isTraceObservable();
Optional<OpenSpan> span = popCurrentSpan();
if (observable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we reuse the trace to avoid an additional thread local get?

Suggested change
if (observable) {
if (isObservable(trace)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, I think we want trace.isObservable() rather than a new method unless I'm misunderstanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the call sites should have a separate null check on the trace, so I think this will be okay.

notifyObservers(span);
}
});
if (maybeSpan.isPresent() && observable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we reuse the trace to avoid additional thread local get and simplify?

Suggested change
if (maybeSpan.isPresent() && observable) {
if (isObservable(trace)) {
maybeSpan.ifPresent(Tracer::notifyObservers);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

public static boolean isTraceObservable() {
return currentTrace.get().isObservable();
Trace trace = currentTrace.get();
return trace != null && trace.isObservable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to above comments, could add isObservable(Trace) to avoid additional thread local lookups

Suggested change
return trace != null && trace.isObservable();
return isObservable(currentTrace.get());
}
private static boolean isObservable(Trace trace) {
return trace != null && trace.isObservable();
}

/** Returns the globally unique identifier for this thread's trace. */
public static String getTraceId() {
return currentTrace.get().getTraceId();
return Preconditions.checkNotNull(currentTrace.get(), "There is no root span").getTraceId();
Copy link
Contributor

@schlosna schlosna Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we expose a public static boolean hasTraceId() to allow checking if tracing is currently enabled?

Once this PR goes through we can obviously clean up https://github.com/palantir/tritium/blob/develop/tritium-tracing/src/main/java/com/palantir/tritium/tracing/TracingInvocationEventHandler.java#L71 as getAndClearTrace will now do the right thing and clear state, but there may be cases where we want to know if there is a root trace ID already set.

@carterkozak
Copy link
Contributor Author

should we expose a public static boolean hasTraceId() to allow checking if tracing is currently enabled?

I think that would be helpful, but I would prefer to propose that in a separate change.

@uschi2000
Copy link

I don't see a test for the changed behavior, could you add one, please?

@carterkozak
Copy link
Contributor Author

Was tested (though admittedly indirectly) by this assertion: assertThat(MDC.get(Tracers.TRACE_ID_KEY)).isNull();

Added a better test for this specific change.

Copy link
Contributor

@iamdanfox iamdanfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this change looks good. It forces users to be a bit more precise rather than just assuming a root span will be invented.

I've also done a quick search internally for any users of the getTraceId method and it seems that very few people use it directly.

@iamdanfox iamdanfox merged commit 4e93239 into palantir:develop Oct 29, 2018
carterkozak added a commit to palantir/tritium that referenced this pull request Oct 29, 2018
This version removes the need for the traceId thread state
cleanup code.
See: palantir/tracing-java#22
schlosna pushed a commit to palantir/tritium that referenced this pull request Oct 29, 2018
This version removes the need for the traceId thread state
cleanup code.
See: palantir/tracing-java#22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants