[improvement] Tracer spans are inexpensive for unsampled operations #180

carterkozak · 2019-06-20T16:43:41Z

Before this PR

Operations which are fast in the common case were infeasible to trace because regardless of the trace rate, span creation performance was worse than the operation being measured.

Benchmark                                                      (observability)  Mode  Cnt      Score       Error   Units
TracingBenchmark.nestedSpans                                            SAMPLE  avgt    3  39710.354 ± 36443.873   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                             SAMPLE  avgt    3   3425.430 ±  3215.533  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                        SAMPLE  avgt    3  41560.021 ±     0.434    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space                    SAMPLE  avgt    3   3444.571 ±  3116.221  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm               SAMPLE  avgt    3  41795.982 ±  7880.516    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space                SAMPLE  avgt    3      0.386 ±     2.219  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm           SAMPLE  avgt    3      4.741 ±    30.088    B/op
TracingBenchmark.nestedSpans:·gc.count                                  SAMPLE  avgt    3    137.000              counts
TracingBenchmark.nestedSpans:·gc.time                                   SAMPLE  avgt    3     87.000                  ms
TracingBenchmark.nestedSpans                                     DO_NOT_SAMPLE  avgt    3  19958.082 ±  6584.116   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                      DO_NOT_SAMPLE  avgt    3   3506.264 ±  1147.188  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                 DO_NOT_SAMPLE  avgt    3  21416.011 ±     0.230    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space             DO_NOT_SAMPLE  avgt    3   3547.447 ±  1525.303  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm        DO_NOT_SAMPLE  avgt    3  21666.731 ±  4251.611    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space         DO_NOT_SAMPLE  avgt    3      0.383 ±     0.331  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm    DO_NOT_SAMPLE  avgt    3      2.341 ±     1.300    B/op
TracingBenchmark.nestedSpans:·gc.count                           DO_NOT_SAMPLE  avgt    3    139.000              counts
TracingBenchmark.nestedSpans:·gc.time                            DO_NOT_SAMPLE  avgt    3     86.000                  ms
TracingBenchmark.nestedSpans                                         UNDECIDED  avgt    3  22570.008 ±  7874.344   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                          UNDECIDED  avgt    3   3590.635 ±  1268.870  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                     UNDECIDED  avgt    3  24792.658 ±    44.976    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space                 UNDECIDED  avgt    3   3674.586 ±  3668.504  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm            UNDECIDED  avgt    3  25360.986 ± 16430.980    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space             UNDECIDED  avgt    3      0.217 ±     2.148  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm        UNDECIDED  avgt    3      1.509 ±    15.297    B/op
TracingBenchmark.nestedSpans:·gc.count                               UNDECIDED  avgt    3     90.000              counts
TracingBenchmark.nestedSpans:·gc.time                                UNDECIDED  avgt    3     58.000                  ms

After this PR

Added Tracer.fastStartSpan which can effectively no-op for unsampled operations.

Benchmark                                                      (observability)  Mode  Cnt      Score       Error   Units
TracingBenchmark.nestedSpans                                            SAMPLE  avgt    3  37969.436 ± 25282.797   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                             SAMPLE  avgt    3   3475.679 ±  2277.025  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                        SAMPLE  avgt    3  40360.062 ±     1.751    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space                    SAMPLE  avgt    3   3511.677 ±  2743.986  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm               SAMPLE  avgt    3  40771.767 ±  7164.960    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space                SAMPLE  avgt    3      0.384 ±     0.428  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm           SAMPLE  avgt    3      4.451 ±     2.688    B/op
TracingBenchmark.nestedSpans:·gc.count                                  SAMPLE  avgt    3    148.000              counts
TracingBenchmark.nestedSpans:·gc.time                                   SAMPLE  avgt    3     88.000                  ms
TracingBenchmark.nestedSpans                                     DO_NOT_SAMPLE  avgt    3    908.402 ±    79.755   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                      DO_NOT_SAMPLE  avgt    3   2820.275 ±   290.026  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                 DO_NOT_SAMPLE  avgt    3    784.001 ±     0.011    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space             DO_NOT_SAMPLE  avgt    3   2834.984 ±   293.831  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm        DO_NOT_SAMPLE  avgt    3    788.123 ±   162.926    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space         DO_NOT_SAMPLE  avgt    3      0.223 ±     1.492  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm    DO_NOT_SAMPLE  avgt    3      0.062 ±     0.416    B/op
TracingBenchmark.nestedSpans:·gc.count                           DO_NOT_SAMPLE  avgt    3    126.000              counts
TracingBenchmark.nestedSpans:·gc.time                            DO_NOT_SAMPLE  avgt    3     77.000                  ms
TracingBenchmark.nestedSpans                                         UNDECIDED  avgt    3   1254.746 ±   233.856   ns/op
TracingBenchmark.nestedSpans:·gc.alloc.rate                          UNDECIDED  avgt    3   2945.625 ±   514.073  MB/sec
TracingBenchmark.nestedSpans:·gc.alloc.rate.norm                     UNDECIDED  avgt    3   1130.854 ±    22.743    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space                 UNDECIDED  avgt    3   2993.199 ±  1125.922  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Eden_Space.norm            UNDECIDED  avgt    3   1149.047 ±   258.339    B/op
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space             UNDECIDED  avgt    3      0.279 ±     0.890  MB/sec
TracingBenchmark.nestedSpans:·gc.churn.PS_Survivor_Space.norm        UNDECIDED  avgt    3      0.107 ±     0.355    B/op
TracingBenchmark.nestedSpans:·gc.count                               UNDECIDED  avgt    3    135.000              counts
TracingBenchmark.nestedSpans:·gc.time                                UNDECIDED  avgt    3     81.000                  ms

Added final fastStartSpan overload

Doesn't change performance.

schlosna

👍 the 4x speedup and 4x reduction in allocations for common UNDECIDED case and 10x improvements for DO_NOT_SAMPLE are excellent 😄

markelliot · 2019-06-20T22:01:19Z

tracing/src/main/java/com/palantir/tracing/Trace.java

+    /**
+     * Like {@link #startSpan(String, String, SpanType)}, but does not return an {@link OpenSpan}.
+     */
+    abstract void fastStartSpan(String operation, String parentSpanId, SpanType type);


perhaps what's warranted here is just a brand new tracing implementation? I don't think continuing to hack the current implementation even further.

I'm also perplexed by why this would be the primary target for perf improvement, we do lots of expensive processing throughout the stack of calls across our services, and while Tracing is a constant overhead it's likely dwarfed by serde costs.

perhaps what's warranted here is just a brand new tracing implementation?

Sure, there are several problems we could solve in a new tracing implementation, I would be happy to help with that project when I have cycles, but it's not something that I can drive at the moment.

I'm also perplexed by why this would be the primary target for perf improvement, we do lots of expensive processing throughout the stack of calls across our services, and while Tracing is a constant overhead it's likely dwarfed by serde costs.

That is certainly true of the standard conjure service stack, however this change is not targeting problems with conjure services. We have an internal application with unusually high throughput and low latency requirements, built on our service library (for configuration, status, logging, metrics, etc) where interoperability with the standard stack allows us to avoid reinventing these components.

carterkozak · 2019-06-21T02:02:56Z

4x speedup and 4x reduction in allocations for common UNDECIDED case

This sounded lower than expected, the benchmark was set to 10% sample rate instead of 1% for the UNDECIDED case. I have updated the main comment with the more accurate numbers.

tracing/src/main/java/com/palantir/tracing/Trace.java

iamdanfox · 2019-06-21T12:01:05Z

tracing/src/main/java/com/palantir/tracing/Trace.java

 * not thread-safe and is intended to be used in a thread-local context.
 */
-public final class Trace {
+public abstract class Trace {


Not 100% sure, but I think this might be an ABI break.

A quick search internally suggested that nobody is actually importing the com.palantir.tracing.Trace class - perhaps we could even make it package-private

It’s exposed by Tracer.getAndClearTrace, however all methods are package private — it is only available as a marker

iamdanfox · 2019-06-21T12:22:55Z

tracing/src/main/java/com/palantir/tracing/Trace.java

+
+    abstract Optional<OpenSpan> top();
+
+    abstract Optional<OpenSpan> pop();


I find the contract of these two methods a bit confusing... normally an Optional.empty means an element was not present, but in this case the UnsampledTrace always returns Optional.empty?

That is correct, this PR introduces a subtle change in behavior where completing an unsampled span (even using the slow complete span method) no longer returns a present value, because we know it is not recorded (in fact, sampling an unsampled span would be a bug).

Our tracing API is missing clean isolation between producers and consumers -- if I am creating spans, I don't need to know whether or not they are sampled, or what the span IDs/timing looks like, just how to start and stop them. Conceptually there's no reason to consume these values because that functionality is a concern of the SpanObserver.

The contract is defined by

tracing-java/tracing/src/main/java/com/palantir/tracing/Tracer.java

Lines 182 to 188 in 42aafc8

/**

* Completes and returns the current span (if it exists) and notifies all {@link #observers subscribers} about the

* completed span.

* If the return value is not used, prefer {@link Tracer#fastCompleteSpan()}.

*/

@CheckReturnValue

public static Optional<Span> completeSpan() {

There's a lot going on in that comment: "returns the current span (if it exists)". If we're not sampled, we don't really have a current span because we didn't record one. We have to track how many we would have created in the sampled case, but it does not violate the contract to return an empty value. This is reinforced by the next statement: "notifies all subscribers about the completed span", because subscribers are only notified if spans are sampled.

Hey, this is a very surprising PR. It is very much a break in behavior, and excusing it on a possible implied reading of the javadoc is not reasonable. This was done as a minor version bump (although given our current tooling a major version bump wouldn't have made a huge difference).

Conceptually there's no reason to consume these values because that functionality is a concern of the SpanObserver.

If that were true we would only ever be using fastCompleteSpan(). As that isn't true, this is not an implementation detail but an API contract.

tracing/src/main/java/com/palantir/tracing/Trace.java

iamdanfox · 2019-06-21T12:36:39Z

tracing/src/main/java/com/palantir/tracing/Trace.java

+
+        @Override
+        Optional<OpenSpan> pop() {
+            validateDepth();


Hmmm in the Sampled class, calling pop() repeatedly on an empty trace just returns Optional.empty over and over again

In this case though, it will throw. Should we make these behave the same? or is this difference important?

pop validates that the value is positive, this should never throw unless we write a bug. I can add a comment explaining that, or remove the validation.

iamdanfox · 2019-06-21T12:37:12Z

tracing/src/main/java/com/palantir/tracing/Trace.java

+
+        @Override
+        boolean isEmpty() {
+            validateDepth();


In theory this call should be safe to delete right? As pop() is the only method that can decrement depth?

Yep, validateDepth only exists to make sure I didn't write a bug :-)

We already have it in most cases, no reason to look up from the threadlocal.

ferozco · 2019-06-24T19:16:39Z

👍

carterkozak added the do not merge label Jun 20, 2019

carterkozak requested a review from a team as a code owner June 20, 2019 16:43

carterkozak force-pushed the ckozak/simplify_tracer branch from 1fc38fb to 49c6b99 Compare June 20, 2019 16:55

carterkozak added 2 commits June 20, 2019 13:04

Split Tracer into Sampled+Unsampled types

9ab758a

error prone validation to prefer the fast path

f08617b

carterkozak force-pushed the ckozak/simplify_tracer branch from 49c6b99 to f08617b Compare June 20, 2019 17:04

carterkozak added 2 commits June 20, 2019 13:49

Refactor startSpan logic into Tracer to reduce duplication

6063502

Added final fastStartSpan overload

Clean up TracingBenchmark a bit

e1adc3c

Doesn't change performance.

carterkozak removed the do not merge label Jun 20, 2019

carterkozak requested review from dsd987, ellisjoe, esword, schlosna and wenhoujx June 20, 2019 18:15

schlosna mentioned this pull request Jun 20, 2019

[improvement] Avoid trace ID and span allocations when not observing #168

Closed

schlosna approved these changes Jun 20, 2019

View reviewed changes

markelliot reviewed Jun 20, 2019

View reviewed changes

Match default sample rate

41896d4

iamdanfox reviewed Jun 21, 2019

View reviewed changes

tracing/src/main/java/com/palantir/tracing/Trace.java Show resolved Hide resolved

iamdanfox reviewed Jun 21, 2019

View reviewed changes

Annotate Trace internal methods with CheckReturnValue

90e45fb

iamdanfox reviewed Jun 21, 2019

View reviewed changes

tracing/src/main/java/com/palantir/tracing/Trace.java Outdated Show resolved Hide resolved

iamdanfox reviewed Jun 21, 2019

View reviewed changes

carterkozak added 4 commits June 21, 2019 08:50

s/depth/numberOfSpans

64e9214

style

752fb36

class level design documentation

1b57d51

Trace.push is an internal concern

80340a3

carterkozak added 2 commits June 23, 2019 18:33

rename validateDepth -> validateNumberOfSpans

9d06c36

popCurrentSpan takes a trace

da00afb

We already have it in most cases, no reason to look up from the threadlocal.

carterkozak mentioned this pull request Jun 24, 2019

Support for tracing asynchronous operations #124

Merged

carterkozak added the merge when ready label Jun 24, 2019

bulldozer-bot bot merged commit 4178047 into develop Jun 24, 2019

bulldozer-bot bot deleted the ckozak/simplify_tracer branch June 24, 2019 19:28

carterkozak mentioned this pull request Jun 25, 2019

Upgrade tracing-java to take advantage of low-garbage fastStartSpan palantir/tritium#291

Merged


		abstract Optional<OpenSpan> top();

		abstract Optional<OpenSpan> pop();

	/**
	* Completes and returns the current span (if it exists) and notifies all {@link #observers subscribers} about the
	* completed span.
	* If the return value is not used, prefer {@link Tracer#fastCompleteSpan()}.
	*/
	@CheckReturnValue
	public static Optional<Span> completeSpan() {

[improvement] Tracer spans are inexpensive for unsampled operations #180

[improvement] Tracer spans are inexpensive for unsampled operations #180

Uh oh!

Conversation

carterkozak commented Jun 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before this PR

After this PR

Uh oh!

schlosna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carterkozak commented Jun 21, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ferozco commented Jun 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

carterkozak commented Jun 20, 2019 •

edited

Loading