Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon X-Ray interop #1754

Closed
adriancole opened this issue Oct 2, 2017 · 12 comments

Comments

@adriancole
Copy link
Contributor

@adriancole adriancole commented Oct 2, 2017

Similar to how last year, we had users requesting for Google Stackdriver compatibility, there are users explicitly requesting Amazon X-Ray, as well general support and interest from @abhiksingh (X-Ray product lead), and indirect comments which I can't currently find about tension between lambda and zipkin architecture*.

There's no doubt that Zipkin and Amazon interop have been important in the past. Many of our core team rely on AWS and/or make custom components for AWS. This issue will explore how we could fit-in, and how we could allow 3rd party tracers designed for B3 to have the smallest impact to support X-Ray.

Similar to StackDriver, there's two major concerns: propagation and out-of-band data.

Unlike StackDriver, propagation is wider than the X-Ray service. For example, in AWS the same propagation format is used even when X-Ray isn't: ELB uses the same format eventhough ELB doesn't write to X-Ray. There are also interesting concerns such as that Api Gateway restarts traces at its edge, also in X-Ray format. These types of concerns weren't present when we integrated with StackDriver, although they are building in the new trace-context format, targeted initially inside Google to StackDriver and gRPC services.

Also unlike StackDriver, the trace ID requires a 32-bit timestamp. This has some impact, as if there's an invalid timestamp, the service will drop any related data. For this reason, pragmatic ID generation strategy is something we must consider. For example, creating "interop" IDs by default where the first 32-bits are a timestamp and the following are 96-bits of random. This discussion occurred at the end of #1262.

Reporting is very much like what we did in StackDriver. The X-Ray format is span structured, with an api to post data to. While it has more structure than Zipkin data, mapping rules like exist in StackDriver are more than possible. Zipkin-compatible (or otherwise) tracers could send data to X-Ray's daemon (automatically present in lambda), the X-Ray POST api, or to a zipkin destination that does one of the two.

The above is fairly reliable from initial explorations, but could change with experience. This issue will track the exploration and any related issues on Zipkin's side.

  • If you are running a lambda (serverless) architecture, running a zipkin server, even a stupid proxy to X-Ray one, could feel heavyweight. In this case, writing directly to X-Ray, or translating zipkin to AWS via a kinesis lambda could make sense.
@adriancole

This comment has been minimized.

Copy link
Contributor Author

@adriancole adriancole commented Oct 2, 2017

@yurishkuro

This comment has been minimized.

Copy link
Contributor

@yurishkuro yurishkuro commented Oct 2, 2017

So these are the interop models for the out-of-band data?

  • [zipkin SDK] -> [X-Ray backend] directly
  • [zipkin SDK] -> [zipkin collector] -> [X-Ray backend]
@adriancole

This comment has been minimized.

Copy link
Contributor Author

@adriancole adriancole commented Oct 2, 2017

@adriancole

This comment has been minimized.

Copy link
Contributor Author

@adriancole adriancole commented Oct 2, 2017

random note: http://docs.aws.amazon.com/xray/latest/api/API_PutTraceSegments.html The POST format is a list of escaped json, as the doc implies.

Ex.

{"TraceSegmentDocuments": [
"{\"id\": \"0b89f1dec76af795\", ..."
]}
adriancole added a commit to openzipkin/brave that referenced this issue Oct 3, 2017
This customizes the trace ID generator to make the high bits convertable
to Amazon X-Ray trace ID format v1.

See openzipkin/zipkin#1754
adriancole added a commit to openzipkin/brave that referenced this issue Oct 3, 2017
This customizes the trace ID generator to make the high bits convertable
to Amazon X-Ray trace ID format v1.

See openzipkin/zipkin#1754
@adriancole

This comment has been minimized.

Copy link
Contributor Author

@adriancole adriancole commented Oct 3, 2017

Added an example implementation of trace ID with a time component. Unsurprisingly, it is slower than a fully random ID. However, the scale is still sub microsecond (on my laptop™), and only affects the root span: openzipkin/brave#509

Next step is to add a converter which proves the concept.

@adriancole

This comment has been minimized.

Copy link
Contributor Author

@adriancole adriancole commented Oct 4, 2017

experimental work starting in Brave here openzipkin/brave#510

@adriancole

This comment has been minimized.

Copy link
Contributor Author

@adriancole adriancole commented Oct 5, 2017

Thanks to @jcarres-mdsol for making new trace ID provisioning instructions a bit simpler:

|---- 32 bits for epoc seconds --- | ----- 96 bits for random number --- |
it can potentially be implemented by:
High 64:
|---- 32 bits for epoc seconds --- | ----- 32 bits for random number --- |
Low 64:
| ----- 64 bits for random number --- |

Optional cheap sanity check the high 32 bits are epoch seconds
58000000 = 1476395008 = 2016-10-13 < prior to X-Ray and zipkin supporting 128-bit trace IDs
60000000 = 1610612736 = 2021-01-14 < of course you can even more future proof

marcingrzejszczak added a commit to spring-cloud/spring-cloud-sleuth that referenced this issue Jan 19, 2018
with this pull request we have rewritten the whole Sleuth internals to use Brave. That way we can leverage all the functionalities & instrumentations that Brave already has (https://github.com/openzipkin/brave/tree/master/instrumentation).

Migration guide is available here: https://github.com/spring-cloud/spring-cloud-sleuth/wiki/Spring-Cloud-Sleuth-2.0-Migration-Guide

fixes #711 - Brave instrumentation
fixes #92 - we move to Brave's Sampler
fixes #143 - Brave is capable of passing context
fixes #255 - we've moved away from Zipkin Stream server
fixes #305 - Brave has GRPC instrumentation (https://github.com/openzipkin/brave/tree/master/instrumentation/grpc)
fixes #459 - Brave (openzipkin/brave#510) & Zipkin (openzipkin/zipkin#1754) will deal with the AWS XRay instrumentation
fixes #577 - Messaging instrumentation has been rewritten
@devinsba

This comment has been minimized.

Copy link
Member

@devinsba devinsba commented May 9, 2019

Closing as I believe we have addressed this from zipkin-aws

@devinsba devinsba closed this May 9, 2019
@msmsimondean

This comment has been minimized.

Copy link

@msmsimondean msmsimondean commented May 24, 2019

@devinsba the issue is about providing interoperability with AWS X-Ray. Looking at zipkin-aws, that doesn't seem to provide that interopability; from what I can tell, it interops with other AWS services (SQS, Kinesis and Elasticsearch Service) but not AWS X-Ray. I can see some code in zipkin-aws that mentions X-Ray (e.g. reporter-xray-udp and storage-xray-udp). Does zipkin-aws provide some X-Ray integration that isn't documented in the codebase's README? Thanks in advance!

@devinsba

This comment has been minimized.

Copy link
Member

@devinsba devinsba commented May 24, 2019

We have support for the XRay propagation format and sending traces/spans to XRay, is there some kind of integration that you are looking for that isn't either of these? Also any feature requests for more integration should be handled in the zipkin-aws repo

@msmsimondean

This comment has been minimized.

Copy link

@msmsimondean msmsimondean commented May 24, 2019

@devinsba that's great. Is there any documentation available for setting it up or is that still to come? I'm just thinking of the good documentation at https://cloud.google.com/trace/docs/zipkin and https://github.com/openzipkin/zipkin-gcp for the equivalent Google Stackdriver Trace integration. Thanks

@adriancole

This comment has been minimized.

Copy link
Contributor Author

@adriancole adriancole commented May 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.