Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Propagation support #510

Merged
merged 1 commit into from
Oct 15, 2017
Merged

AWS Propagation support #510

merged 1 commit into from
Oct 15, 2017

Conversation

codefromthecrypt
Copy link
Member

@codefromthecrypt codefromthecrypt commented Oct 4, 2017

Starting with trace ID propagation and ad-hoc reporting:

  • extracting trace identifiers costs more because we have to parse within a string.
    • I did get a 12% performance improvement by avoiding use of String.split
  • injecting trace identifiers costs less because we only write one string.

Below is reporting within a brave tracer in lambda, with a custom zipkin reporter:

screen shot 2017-10-06 at 6 42 22 pm

Next step is cleaning up stuff, including readable demo code, and code move to zipkin-aws repo.

Benchmark                          Mode  Cnt   Score   Error   Units
PropagationBenchmarks.extract_aws  thrpt   15   4.307 ± 0.122  ops/us
PropagationBenchmarks.extract_b3   thrpt   15  10.617 ± 0.924  ops/us
PropagationBenchmarks.inject_aws   thrpt   15  13.092 ± 0.794  ops/us
PropagationBenchmarks.inject_b3    thrpt   15  11.132 ± 0.944  ops/us

End to end (overhead) benchmarks (currently propagation only, not conversion)

Benchmark                                            Mode  Cnt    Score   Error  Units
EndToEndBenchmarks.server_get                        avgt   15  149.558 ± 4.030  us/op
EndToEndBenchmarks.traced128Server_get               avgt   15  171.492 ± 8.606  us/op
EndToEndBenchmarks.traced128Server_get_resumeTrace   avgt   15  179.316 ± 5.602  us/op
EndToEndBenchmarks.tracedServer_get                  avgt   15  171.560 ± 4.287  us/op
EndToEndBenchmarks.tracedServer_get_resumeTrace      avgt   15  178.494 ± 7.080  us/op
EndToEndBenchmarks.tracedxrayServer_get              avgt   15  172.102 ± 6.651  us/op
EndToEndBenchmarks.tracedxrayServer_get_resumeTrace  avgt   15  178.151 ± 3.580  us/op
EndToEndBenchmarks.unsampledServer_get               avgt   15  158.549 ± 2.813  us/op

@codefromthecrypt codefromthecrypt changed the title Starts on AWS X-Ray propagation support AWS X-Ray support Oct 4, 2017
@codefromthecrypt
Copy link
Member Author

Added extract impl, tests and benchmarks. Parsing within a header value is more expensive than separate headers. I got it as quick as I could within a reasonable amount of time.

@codefromthecrypt
Copy link
Member Author

Note: there's a behavior difference in X-Ray vs B3, which is that you can send only a trace ID (not a span ID). This implied a separate commit, which I can pull out separately later. This commit makes it possible to get just the trace ID (which is specified by ELB for log correlation purposes, and has no span ID as ELB doesn't write to X-Ray).

}

TraceContext nextContext(@Nullable TraceContext parent, SamplingFlags samplingFlags) {
TraceContext nextContext(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the actual precedence is TraceContext and then TraceIdContext, what about reflecting this in the signature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. done

@cemo
Copy link

cemo commented Oct 4, 2017

@adriancole this is the best thing, I have ever seen for a while. do you have any idea about when this support will be ready for testing?

@jcchavezs
Copy link
Contributor

Awesome work @adriancole

@codefromthecrypt
Copy link
Member Author

added end-to-end overhead benchmarks, which show the overhead related to this isn't statistically significant in the big picture.

@cemo glad you like! trying to get a converting reporter working today. After that, will need to step back and make sure tech debt isn't added, but won't be long. watch this thread

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 5, 2017

so far so good. Propagation works (creating single-host spans). The trace ID prefix of 59d5a836 is legit (1507174454 in epoch seconds, which is right)

screen shot 2017-10-05 at 11 35 29 am

Here's a diff for the https://github.com/openzipkin/brave-webmvc-example changed to use this so far

index 3e0967a..0ccec3b 100755
--- a/servlet3/pom.xml
+++ b/servlet3/pom.xml
@@ -13,7 +13,7 @@
 
   <properties>
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
-    <brave.version>4.8.1</brave.version>
+    <brave.version>4.8.2-SNAPSHOT</brave.version>
     <zipkin-reporter2.version>2.0.2</zipkin-reporter2.version>
     <spring.version>4.3.8.RELEASE</spring.version>
     <log4j.version>2.8.2</log4j.version>
diff --git a/servlet3/src/main/java/brave/webmvc/TracingConfiguration.java b/servlet3/src/main/java/brave/webmvc/TracingConfiguration.java
index dd1f7e4..465501b 100644
--- a/servlet3/src/main/java/brave/webmvc/TracingConfiguration.java
+++ b/servlet3/src/main/java/brave/webmvc/TracingConfiguration.java
@@ -3,6 +3,7 @@ package brave.webmvc;
 import brave.Tracing;
 import brave.context.log4j2.ThreadContextCurrentTraceContext;
 import brave.http.HttpTracing;
+import brave.propagation.XRayPropagation;
 import brave.spring.web.TracingClientHttpRequestInterceptor;
 import brave.spring.webmvc.TracingHandlerInterceptor;
 import java.util.ArrayList;
@@ -44,6 +45,8 @@ public class TracingConfiguration extends WebMvcConfigurerAdapter {
   @Bean Tracing tracing() {
     return Tracing.newBuilder()
         .localServiceName("brave-webmvc-example")
+        .traceId128Bit(true)
+        .propagationFactory(new XRayPropagation.Factory())
         .currentTraceContext(ThreadContextCurrentTraceContext.create()) // puts trace IDs into logs
         .spanReporter(spanReporter()).build();
   }

@codefromthecrypt
Copy link
Member Author

next step was a custom UDP reporter which converts zipkin spans. This should probably live in zipkin-aws when finished. Here's an example from the demo app:

screen shot 2017-10-05 at 9 52 49 pm

@codefromthecrypt codefromthecrypt changed the base branch from traceId128ForXray to master October 5, 2017 14:42
@codefromthecrypt codefromthecrypt changed the base branch from master to extract-just-traceID October 5, 2017 14:44
@cemo
Copy link

cemo commented Oct 5, 2017

🕺

@codefromthecrypt
Copy link
Member Author

last experiments before moving this code to zipkin-aws repo:

  • incoming B3 -> outgoing AWS ID
  • incoming AWS -> outgoing B3
  • reporting within lambda (to understand things)

in zipkin-aws repo, I think we'll end up with a converter, collector and a reporter component similar to what we have for stackdriver. the zipkin-aws repo contains stock zipkin components which will work on anything like brave, sleuth, wingtips etc.

Trace ID propagation is beyond X-Ray (ex multiple AWS services use the same format), and the implementation is brave-specific (brave's propagation api is currently only for brave), so I think the AWSPropagation class might stay here even if in a different module

@codefromthecrypt
Copy link
Member Author

Below is reporting within a brave tracer in lambda, with a custom zipkin reporter:

screen shot 2017-10-06 at 6 42 22 pm

Next step is cleaning up stuff, including readable demo code, and code move to zipkin-aws repo.

@devinsba
Copy link
Member

devinsba commented Oct 6, 2017 via email

@codefromthecrypt
Copy link
Member Author

Renamed to Amazon and made a utility to extract env. Checking, but I think amazon is the right name as the header format is also used by non-xray products.

@codefromthecrypt codefromthecrypt changed the title AWS X-Ray support AWS Propagation support Oct 11, 2017
@codefromthecrypt
Copy link
Member Author

This code is ready to go. The other part will be in zipkin-aws repo (the XRayUDPSender). Once that's released you will be able to make a brave app. I'm not releasing this as there's currently various problems across travis and maven central. Will try tomorrow.

@codefromthecrypt
Copy link
Member Author

here's a place the reporter will go openzipkin/zipkin-aws#56

@codefromthecrypt
Copy link
Member Author

release won't happen until #514 is in, as that supports the extra fields needed in AWS (plus removes a yet-unreleased api for a better one)

@codefromthecrypt
Copy link
Member Author

OK the implementation no longer drops "extra" fields (besides the Self references which should be dropped). Later implementations can expose means to access these, but in the mean time we can release.

Will merge and release tomorrow morning unless I hear screams.

@jcchavezs
Copy link
Contributor

Lgtm

@codefromthecrypt codefromthecrypt merged commit 40a43cd into master Oct 15, 2017
@codefromthecrypt codefromthecrypt deleted the aws-xray branch October 15, 2017 02:10
marcingrzejszczak added a commit to spring-cloud/spring-cloud-sleuth that referenced this pull request Jan 19, 2018
with this pull request we have rewritten the whole Sleuth internals to use Brave. That way we can leverage all the functionalities & instrumentations that Brave already has (https://github.com/openzipkin/brave/tree/master/instrumentation).

Migration guide is available here: https://github.com/spring-cloud/spring-cloud-sleuth/wiki/Spring-Cloud-Sleuth-2.0-Migration-Guide

fixes #711 - Brave instrumentation
fixes #92 - we move to Brave's Sampler
fixes #143 - Brave is capable of passing context
fixes #255 - we've moved away from Zipkin Stream server
fixes #305 - Brave has GRPC instrumentation (https://github.com/openzipkin/brave/tree/master/instrumentation/grpc)
fixes #459 - Brave (openzipkin/brave#510) & Zipkin (openzipkin/zipkin#1754) will deal with the AWS XRay instrumentation
fixes #577 - Messaging instrumentation has been rewritten
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants