Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a developer/operator I want to be able to see what queries Atlas is running #495

Closed
schlosna opened this issue May 24, 2016 · 13 comments
Closed
Assignees

Comments

@schlosna
Copy link
Contributor

When running AtlasDB under real workloads, one wants to be able to enable additional tracing and produce tracing spans in a format that could be consumed into a distributed system tracing tool such as Zipkin. This would include the raw Cassandra Thrift, CQL, and/or SQL queries being executed.

@davidscottcohen
Copy link
Contributor

I believe that Brave is the library we'd want to look at: https://github.com/openzipkin/brave

@jboreiko
Copy link
Contributor

Do we have a good concept of how often this would be used? This question seems to have been brought up several times and I would like to have a better understanding of it’s value in the field. I’m having trouble seeing it prioritized anytime soon given our current spree of correctness issues as well as our more general lack of debugging tools until console can be easily deployed.

@schlosna
Copy link
Contributor Author

@jboreiko I filed this mainly for tracking as I had talked about some of this with @rjullman . Now that we've migrated the C*KVS and DbKVS over, we've lost some of the more granular distributed tracing we previously had, which is very useful when tracking down service latency/response time where Atlas is one piece of broader service composition.

This partially relates to palantir/conjure-java-runtime#115 and the main challenge here is hooking up cross-thread tracing via PTExecutors and determining where/how we want the produced trace spans handled to integrate with existing systems like Zipkin and/or log consumers. I'll look into getting a PR for this at some point in the next few weeks if no one grabs it before then.

@jkozlowski
Copy link
Contributor

Is there a plan for Cassandra? I know that since 3.4 they allowed custom tracers to be plugged in, as per http://thelastpickle.com/blog/2015/12/07/using-zipkin-for-full-stack-tracing-including-cassandra.html, but both Atlas and Phoenix are on rather much older versions. Was wondering what thoughts you guys had about this?

@schlosna
Copy link
Contributor Author

http-remoting recently added Zipkin support via Brave in palantir/conjure-java-runtime#142 though we probably need to help push openzipkin/brave#166 along if we want to use that for internal service tracing.

@jkozlowski
Copy link
Contributor

I received approval from internal open source to work on this, but I am currently hosed by other work. I also talked extensively to @adriancole about the design, so somebody just needs to do it.

Sent from my iPhone

On 15 Jul 2016, at 14:53, David Schlosnagle notifications@github.com wrote:

http-remoting recently added Zipkin support via Brave in palantir/conjure-java-runtime#142 though we probably need to help push openzipkin/brave#166 along if we want to use that for internal service tracing.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@schlosna
Copy link
Contributor Author

Created http-remoting PR palantir/conjure-java-runtime#235 to get the initial plumbing for Dropwizard based services & clients to have a Brave tracer available. We still need to fix the span collection plumbing, sampling configuration, and a few more things

@clockfort mentioned he's looking at adding tracing, so assuming we inject a Brave instance into the transaction manager, we should be able to create the tracing wrappers appropriately.

@jboreiko
Copy link
Contributor

@schlosna glad to help get this pushed through. I was looking at this a bit a few weeks ago and was unsure how to get this to work when AtlasDB doesn't control the service endpoints. Sounds like this solves that problem.

@rhero
Copy link
Contributor

rhero commented Nov 13, 2016

Slight update to this ticket - we're going to be supporting Cassandra 3.7 soon (#1147), which will have zipkin integration.

@gsheasby
Copy link
Contributor

gsheasby commented Jan 6, 2017

@schlosna @jboreiko @clockfort ping - is anyone doing anything about this ticket, or planning to do anything soon? It's been a P1 for almost 2 months with no action.

@gsheasby
Copy link
Contributor

After #1385 goes in, we'll need to connect the trace IDs to the cassandra pool.

@schlosna is running point on this one.

@rhero
Copy link
Contributor

rhero commented Mar 17, 2017

Out of curiosity - is this already done?

@jboreiko
Copy link
Contributor

Not that I'm aware, wouldn't consider this done until we can extract tracing from AtlasDB. Looks like either @gsheasby or @schlosna would have the insight here.

@gmaretic gmaretic closed this as completed Dec 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants