Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Cassandra client and server integration #414

Closed
wants to merge 2 commits into from

Conversation

codefromthecrypt
Copy link
Member

@codefromthecrypt codefromthecrypt commented May 12, 2017

This contains tracing instrumentation for Cassandra and the DataStax Java Driver.

brave.cassandra.Tracing extracts trace state from the custom payload
of incoming requests. How long each request takes, each suboperation,
and relevant tags like the session ID are reported to Zipkin.

brave.cassandra.driver.TracingSession tracks the client-side of cassandra and
adds trace context to the custom payload of outgoing requests. If
server integration is in place, cassandra will contribute data to these
RPC spans.

@codefromthecrypt
Copy link
Member Author

cc @openzipkin/cassandra This is intentionally conservative in tagging and nested span policy inside cassandra. For example, this makes annotations, not sub-spans. I think this is a good start and would be useful in zipkin tuning

screen shot 2017-05-12 at 6 02 50 pm

screen shot 2017-05-12 at 6 03 07 pm


// By default, B3 style is used, so instrumented clients do something like this
Map<String, ByteBuffer> payload = new LinkedHashMap<>();
payload.set("X-B3-TraceId", byteBuffer("463ac35c9f6413ad"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that brave allows you to change the propagation format. I intentionally decided to not go with the finagle binary encoding as most people don't understand what that is. B3 is reasonably well documented, so a better default IMHO.

@codefromthecrypt
Copy link
Member Author

buffed the CassandraDriverParser, so it looks prettier and accepts Tagger not Span

@codefromthecrypt
Copy link
Member Author

I'm going to rename CassandraDriverX to CassandraClientX in a separate commit. @llinder etc let me know if you disagree with this choice.

@codefromthecrypt
Copy link
Member Author

OK I think this is good for a first cut. I'll integrate in zipkin and if it works have no more changes planned for the initial impl. We can always add more functionality later (and likely will need to)

@codefromthecrypt codefromthecrypt force-pushed the cassandra branch 3 times, most recently from 73fa549 to 0f3eca9 Compare May 15, 2017 02:38
@codefromthecrypt
Copy link
Member Author

updated to use the SpanCustomizer type

@codefromthecrypt codefromthecrypt force-pushed the cassandra branch 2 times, most recently from 3ee0eba to 8f0a91a Compare May 15, 2017 07:25
@codefromthecrypt
Copy link
Member Author

I'm not going to merge this until some folks savvy with cassandra have a chance to play around with it.

Adrian Cole added 2 commits May 15, 2017 17:06
This contains tracing instrumentation for [Cassandra](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tracing/Tracing.java) and the [DataStax Java Driver](https://github.com/datastax/java-driver).

`brave.cassandra.Tracing` extracts trace state from the custom payload
of incoming requests. How long each request takes, each suboperation,
and relevant tags like the session ID are reported to Zipkin.

`brave.cassandra.driver.TracingSession` tracks the client-side of cassandra and
adds trace context to the custom payload of outgoing requests. If
server integration is in place, cassandra will contribute data to these
RPC spans.
@codefromthecrypt
Copy link
Member Author

ps considering the scope of this includes 3 modules, it might make more sense to bump this to a separate repository brave-cassandra. Any thoughts?

@devinsba
Copy link
Member

👍 to new repo if there will be a bunch of churn on that. Not sure what the cadence is for the cassandra project

@llinder
Copy link
Member

llinder commented May 15, 2017

Separate repository sounds good to me.

@llinder
Copy link
Member

llinder commented May 15, 2017

Looks like a solid start. I will give this a go in one of our services in pre-production as soon as I can. Just want to finish up the Ratpack work first.

if (!span.isNoop()) parser.request(statement, span.kind(CLIENT));

// o.a.c.tracing.Tracing.newSession must use the same propagation format
if (version.compareTo(ProtocolVersion.V4) >= 0) {
Copy link
Member

@michaelsembwever michaelsembwever May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it break the request if outgoingPayload is added to it against a server that doesn't support it (ie < V4) ?
(if not, then why not just add it anyway…)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'd know better than me :) To test this though, I suppose we could add an invoker test that uses an older version of cassandra..

span.finish();
throw e;
}
Futures.addCallback(result, new FutureCallback<ResultSet>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we any idea on the impact adding all the callbacks will cause.
EG in a client that processes 10k+ requests per second this could cause trouble, and a suitable load-shedding mechanism be appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh actually looking at the code, this is a bit silly.. I am checking span.isNoop for each callback. Will buff this out and move the conditional in front. I don't think we'd want to sample 10K spans/second, so this should take care of that concern.

Anyway, you highlight something which is that we don't yet have benchmarks on this for normal vs instrumented case.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 16, 2017 via email

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 16, 2017 via email

ZipkinTraceState state = (ZipkinTraceState) get();
if (state != null) state.incoming.finish();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come you didn't also override public void doneWithLocalSession(TraceState) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly an oversight?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or public TraceState initializeFromMessage(final MessageIn<?> message) ?
Or public Map<String, byte[]> getTraceHeaders()

(by not implementing outbound tracing, you won't be getting the tracing across the nodes within the cluster)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly I only implemented barebones to get things off the ground. Notably there's a lot of moving parts just in basics, so I was hoping to get something together that people can contribute to to add more features.


@Override protected final TraceState newTraceState(InetAddress coordinator, UUID sessionId,
TraceType traceType) {
throw new AssertionError();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this an AssertionError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this method shouldn't be called because we override what calls it. That said, if the upstream code drifts that could cease to be the case.

@codefromthecrypt
Copy link
Member Author

I've moved all this code here for future updates. feel free to adjust anything! https://github.com/openzipkin/brave-cassandra

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 16, 2017 via email

@codefromthecrypt
Copy link
Member Author

ok! the brave-cassandra repo is in business. For example, you can grab the snapshot cassandra jar here http://oss.jfrog.org/oss-snapshot-local/io/zipkin/brave/cassandra/brave-instrumentation-cassandra/0.0.1-SNAPSHOT/

Thanks for the review folks, and please direct all future things here, por favor! https://github.com/openzipkin/brave-cassandra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants