Skip to content

Conversation

@carterkozak
Copy link
Contributor

@carterkozak carterkozak commented Jun 20, 2019

Before this PR

Operations which are fast in the common case were infeasible to trace because regardless of the trace rate, span creation performance was worse than the operation being measured.

Benchmark                                                      (observability)  Mode  Cnt      Score       Error   Units
TracingBenchmark.nestedSpans                                            SAMPLE  avgt    3  39710.354 ± 36443.873   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                             SAMPLE  avgt    3   3425.430 ±  3215.533  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                        SAMPLE  avgt    3  41560.021 ±     0.434    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space                    SAMPLE  avgt    3   3444.571 ±  3116.221  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm               SAMPLE  avgt    3  41795.982 ±  7880.516    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space                SAMPLE  avgt    3      0.386 ±     2.219  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm           SAMPLE  avgt    3      4.741 ±    30.088    B/op
TracingBenchmark.nestedSpans:·gc.count                                  SAMPLE  avgt    3    137.000              counts
TracingBenchmark.nestedSpans:·gc.time                                   SAMPLE  avgt    3     87.000                  ms
TracingBenchmark.nestedSpans                                     DO_NOT_SAMPLE  avgt    3  19958.082 ±  6584.116   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                      DO_NOT_SAMPLE  avgt    3   3506.264 ±  1147.188  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                 DO_NOT_SAMPLE  avgt    3  21416.011 ±     0.230    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space             DO_NOT_SAMPLE  avgt    3   3547.447 ±  1525.303  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm        DO_NOT_SAMPLE  avgt    3  21666.731 ±  4251.611    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space         DO_NOT_SAMPLE  avgt    3      0.383 ±     0.331  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm    DO_NOT_SAMPLE  avgt    3      2.341 ±     1.300    B/op
TracingBenchmark.nestedSpans:·gc.count                           DO_NOT_SAMPLE  avgt    3    139.000              counts
TracingBenchmark.nestedSpans:·gc.time                            DO_NOT_SAMPLE  avgt    3     86.000                  ms
TracingBenchmark.nestedSpans                                         UNDECIDED  avgt    3  22570.008 ±  7874.344   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                          UNDECIDED  avgt    3   3590.635 ±  1268.870  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                     UNDECIDED  avgt    3  24792.658 ±    44.976    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space                 UNDECIDED  avgt    3   3674.586 ±  3668.504  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm            UNDECIDED  avgt    3  25360.986 ± 16430.980    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space             UNDECIDED  avgt    3      0.217 ±     2.148  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm        UNDECIDED  avgt    3      1.509 ±    15.297    B/op
TracingBenchmark.nestedSpans:·gc.count                               UNDECIDED  avgt    3     90.000              counts
TracingBenchmark.nestedSpans:·gc.time                                UNDECIDED  avgt    3     58.000                  ms

After this PR

Added Tracer.fastStartSpan which can effectively no-op for unsampled operations.

Benchmark                                                      (observability)  Mode  Cnt      Score       Error   Units
TracingBenchmark.nestedSpans                                            SAMPLE  avgt    3  37969.436 ± 25282.797   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                             SAMPLE  avgt    3   3475.679 ±  2277.025  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                        SAMPLE  avgt    3  40360.062 ±     1.751    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space                    SAMPLE  avgt    3   3511.677 ±  2743.986  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm               SAMPLE  avgt    3  40771.767 ±  7164.960    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space                SAMPLE  avgt    3      0.384 ±     0.428  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm           SAMPLE  avgt    3      4.451 ±     2.688    B/op
TracingBenchmark.nestedSpans:·gc.count                                  SAMPLE  avgt    3    148.000              counts
TracingBenchmark.nestedSpans:·gc.time                                   SAMPLE  avgt    3     88.000                  ms
TracingBenchmark.nestedSpans                                     DO_NOT_SAMPLE  avgt    3    908.402 ±    79.755   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                      DO_NOT_SAMPLE  avgt    3   2820.275 ±   290.026  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                 DO_NOT_SAMPLE  avgt    3    784.001 ±     0.011    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space             DO_NOT_SAMPLE  avgt    3   2834.984 ±   293.831  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm        DO_NOT_SAMPLE  avgt    3    788.123 ±   162.926    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space         DO_NOT_SAMPLE  avgt    3      0.223 ±     1.492  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm    DO_NOT_SAMPLE  avgt    3      0.062 ±     0.416    B/op
TracingBenchmark.nestedSpans:·gc.count                           DO_NOT_SAMPLE  avgt    3    126.000              counts
TracingBenchmark.nestedSpans:·gc.time                            DO_NOT_SAMPLE  avgt    3     77.000                  ms
TracingBenchmark.nestedSpans                                         UNDECIDED  avgt    3   1254.746 ±   233.856   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                          UNDECIDED  avgt    3   2945.625 ±   514.073  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                     UNDECIDED  avgt    3   1130.854 ±    22.743    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space                 UNDECIDED  avgt    3   2993.199 ±  1125.922  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm            UNDECIDED  avgt    3   1149.047 ±   258.339    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space             UNDECIDED  avgt    3      0.279 ±     0.890  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm        UNDECIDED  avgt    3      0.107 ±     0.355    B/op
TracingBenchmark.nestedSpans:·gc.count                               UNDECIDED  avgt    3    135.000              counts
TracingBenchmark.nestedSpans:·gc.time                                UNDECIDED  avgt    3     81.000                  ms

@carterkozak carterkozak requested a review from a team as a code owner June 20, 2019 16:43
@carterkozak carterkozak force-pushed the ckozak/simplify_tracer branch from 1fc38fb to 49c6b99 Compare June 20, 2019 16:55
@carterkozak carterkozak force-pushed the ckozak/simplify_tracer branch from 49c6b99 to f08617b Compare June 20, 2019 17:04
Doesn't change performance.
Copy link
Contributor

@schlosna schlosna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 the 4x speedup and 4x reduction in allocations for common UNDECIDED case and 10x improvements for DO_NOT_SAMPLE are excellent 😄

/**
* Like {@link #startSpan(String, String, SpanType)}, but does not return an {@link OpenSpan}.
*/
abstract void fastStartSpan(String operation, String parentSpanId, SpanType type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps what's warranted here is just a brand new tracing implementation? I don't think continuing to hack the current implementation even further.

I'm also perplexed by why this would be the primary target for perf improvement, we do lots of expensive processing throughout the stack of calls across our services, and while Tracing is a constant overhead it's likely dwarfed by serde costs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps what's warranted here is just a brand new tracing implementation?

Sure, there are several problems we could solve in a new tracing implementation, I would be happy to help with that project when I have cycles, but it's not something that I can drive at the moment.

I'm also perplexed by why this would be the primary target for perf improvement, we do lots of expensive processing throughout the stack of calls across our services, and while Tracing is a constant overhead it's likely dwarfed by serde costs.

That is certainly true of the standard conjure service stack, however this change is not targeting problems with conjure services. We have an internal application with unusually high throughput and low latency requirements, built on our service library (for configuration, status, logging, metrics, etc) where interoperability with the standard stack allows us to avoid reinventing these components.

@carterkozak
Copy link
Contributor Author

4x speedup and 4x reduction in allocations for common UNDECIDED case

This sounded lower than expected, the benchmark was set to 10% sample rate instead of 1% for the UNDECIDED case. I have updated the main comment with the more accurate numbers.

* not thread-safe and is intended to be used in a thread-local context.
*/
public final class Trace {
public abstract class Trace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure, but I think this might be an ABI break.

A quick search internally suggested that nobody is actually importing the com.palantir.tracing.Trace class - perhaps we could even make it package-private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s exposed by Tracer.getAndClearTrace, however all methods are package private — it is only available as a marker


abstract Optional<OpenSpan> top();

abstract Optional<OpenSpan> pop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the contract of these two methods a bit confusing... normally an Optional.empty means an element was not present, but in this case the UnsampledTrace always returns Optional.empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, this PR introduces a subtle change in behavior where completing an unsampled span (even using the slow complete span method) no longer returns a present value, because we know it is not recorded (in fact, sampling an unsampled span would be a bug).

Our tracing API is missing clean isolation between producers and consumers -- if I am creating spans, I don't need to know whether or not they are sampled, or what the span IDs/timing looks like, just how to start and stop them. Conceptually there's no reason to consume these values because that functionality is a concern of the SpanObserver.

The contract is defined by

/**
* Completes and returns the current span (if it exists) and notifies all {@link #observers subscribers} about the
* completed span.
* If the return value is not used, prefer {@link Tracer#fastCompleteSpan()}.
*/
@CheckReturnValue
public static Optional<Span> completeSpan() {

There's a lot going on in that comment: "returns the current span (if it exists)". If we're not sampled, we don't really have a current span because we didn't record one. We have to track how many we would have created in the sampled case, but it does not violate the contract to return an empty value. This is reinforced by the next statement: "notifies all subscribers about the completed span", because subscribers are only notified if spans are sampled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, this is a very surprising PR. It is very much a break in behavior, and excusing it on a possible implied reading of the javadoc is not reasonable. This was done as a minor version bump (although given our current tooling a major version bump wouldn't have made a huge difference).

Conceptually there's no reason to consume these values because that functionality is a concern of the SpanObserver.

If that were true we would only ever be using fastCompleteSpan(). As that isn't true, this is not an implementation detail but an API contract.


@Override
Optional<OpenSpan> pop() {
validateDepth();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm in the Sampled class, calling pop() repeatedly on an empty trace just returns Optional.empty over and over again

In this case though, it will throw. Should we make these behave the same? or is this difference important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pop validates that the value is positive, this should never throw unless we write a bug. I can add a comment explaining that, or remove the validation.


@Override
boolean isEmpty() {
validateDepth();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this call should be safe to delete right? As pop() is the only method that can decrement depth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, validateDepth only exists to make sure I didn't write a bug :-)

We already have it in most cases, no reason to look up from the
threadlocal.
@ferozco
Copy link
Contributor

ferozco commented Jun 24, 2019

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants