Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can a tracer append data to an existing span, i.e. span ID that was created in another process #22

Closed
yurishkuro opened this issue Jan 15, 2016 · 16 comments

Comments

@yurishkuro
Copy link
Member

A use case we're working on right now is we want haproxy to add Logs to the current trace that's passing through it. We already have something in place that can sniff headers from passing requests and write the out to kafka along with some timers. The plan is to have a custom reader of that kafka topic that will extract the timestamps and trace IDs of when the request was passing through haproxy and create additional span Logs that will be saved to Cassandra via somewhat of a back door. It would be nice to have a clean OpenTracing way of doing that, which requires not only the ability to inject the timestamps into Logs externally (per #20), but in our case do that without starting/finishing the span. I know Ben you will be against that last part :-), and say we could just create a new span instead, but doing so would require modifying in-flight request inside haproxy to replace the client trace/span ID with haproxy span ID, and that is much more intrusive way of doing it vs. just adding logs.

@yurishkuro
Copy link
Member Author

@bensigelman said

Oh, just saw your second message: so you basically want to be able to record logs independent of spans?

Most like adding logs to a span that was managed elsewhere.

@codefromthecrypt
Copy link
Contributor

@bensigelman https://github.com/bensigelman said

Oh, just saw your second message: so you basically want to be able to
record logs independent of spans?

Most like adding logs to a span that was managed elsewhere.

I think we are getting quite close to how zipkin (and I guess dapper)
worked. Ex. Trace.record (Span.addAnnotation in zipkin) allows unrelated
systems, who only have access to the span identifiers, to participate
out-of-band. In fact, in zipkin, this is why each log (annotation) had an
endpoint, as this was to tell who actually logged it (ex if it was a proxy).

In implementation, finagle prefers to send complete spans, but will still
flush "incomplete ones" such as those containing just annotations.
https://github.com/twitter/finagle/blob/develop/finagle-zipkin/src/main/scala/com/twitter/finagle/zipkin/thrift/DeadlineSpanMap.scala#L69

In practical terms, I'd expect OpenTracing's Tracer to be used as a partial
api, as it may never have all features (ex Trace.clear()), etc. So one way
to address this is just to fill features in the zipkin or otherwise
tracers, and use deleting that code as a driver for an issue like this. Of
course, I'd prefer to be able to account for "unmanaged events" now, as
that solves various existing use cases.

@bhs
Copy link
Contributor

bhs commented Jan 31, 2016

@yurishkuro I'm going to rename this to be more general, hope that's ok. rename it back if not.

@bhs bhs changed the title should we have some way of appending data to existing span? we need a way to describe spans and logs/events that happened in the past Jan 31, 2016
@yurishkuro
Copy link
Member Author

@bensigelman don't mind the title change, they are both applicable and likely to be resolved by the same API.

I think one of the reasons Zipkin did not have the timestamp & duration in the spans is because of its backend ability to receive multiple submissions of the same span ID and aggregate events from all of them into a single timeline, which was then used to infer startTs and duration. This is what this ticket is really about. We chose a simpler API in OT for reporting spans via start/finish methods, which is great as it covers the majority of use cases, but for these less frequent cases it doesn't work and we need something else. It's not necessarily "past spans" as the routing layer may be emitting an event into the ongoing span ID, but both can be solved with the same API.

BTW, did Dapper require at most one process per span, the same way that htrace does? If so, I'm curious how it dealt with the ^^^ use cases.

@bhs
Copy link
Contributor

bhs commented Jan 31, 2016

Dapper used two processes for each RPC span (which I think is regrettable in retrospect)...

@bhs bhs changed the title we need a way to describe spans and logs/events that happened in the past we need a way to specify explicit timestamps for spans and logs/events Jan 31, 2016
@bhs
Copy link
Contributor

bhs commented Jan 31, 2016

(renamed again!)

@bhs
Copy link
Contributor

bhs commented Feb 10, 2016

This is basically the same thing as #20... I will close that one and cross-ref.

@bhs
Copy link
Contributor

bhs commented Feb 17, 2016

@yurishkuro close-able now, yes?

@yurishkuro
Copy link
Member Author

Let me rename it to the original subject and keep open (and cc @adriancole). I think the use case kind of works for us due to Zipkin specifics.

The scenario is:

  • Process A calls process B over RPC
    • Per Zipkin spec, they both report the same span ID
  • Process A marks the span with span.kind=RPC_CLIENT
  • Process B marks the span with span_kind=RPC_SERVER
    • The span kind tags cause tracer to logs cs/cr events in A, and sr/ss events in B.

Meanwhile, the actual RPC request goes through haproxy, which is not able to report spans itself, but writes logs that are converted to spans by a separate process H:

  • H has access to headers and can do extractor.join_trace(timestamp=x)
    • per Zipkin convention that creates the same span ID as A and B had
  • H logs two events, haproxy-in and haproxy-out, with historical timestamps
  • H calls finish(duration=y) on the span to get it reported

I think this should work (in Zipkin), due to the following "features":

  • Span from H is treated as local span, and therefore does not log cs/cr/ss/sr events
  • join_trace returns span with the same ID as on the wire
  • Zipkin UI merges submitted spans with the same ID, adjusts for clock skew in H and B timestamps, and merges startTime=min(...) and duration=max(...)

I don't think it will work in tracers that assign unique span ID to each process (two spans per RPC).

@yurishkuro yurishkuro changed the title we need a way to specify explicit timestamps for spans and logs/events How can a tracer append data to an existing span, i.e. span ID that was created in another process Feb 17, 2016
@bhs
Copy link
Contributor

bhs commented Feb 17, 2016

I suppose a tracing impl could accept the span_id (or any other "tracer state") as a tag and use the fully-overloaded version of Tracer.start_span(...) to hard-code everything about said span. Not really in the spirit of "Span Tags" but would get the job done IIUC.

@codefromthecrypt
Copy link
Contributor

@yurishkuro followed you until we got to where H is actually the one finishing the span.

What it sounds like is that the haproxy won't actually change the width of the span (startTs, duration), rather it would add annotations around the "sr" "ss".. I wouldn't say this has to be a "local span", rather a contribution to an existing span. Ex. H is adding two annotations, and needs to send required ids (currently trace_id, span_id, parent_id) to allocate them properly *. The (local)component tag is just a nice to have.

sideline on clock skew:
Currently clock skew adjustment works in relation to the "core annotations", so I'm not sure if skew from H would actually .. unskew? solvable probably!

  • ps see this for maybe not requiring parent_id.

@yurishkuro
Copy link
Member Author

@adriancole

followed you until we got to where H is actually the one finishing the span.
What it sounds like is that the haproxy won't actually change the width of the span (startTs, duration),

well, yes that's the end goal, but OpenTracing API does not have a way to report anything about a span without starting and finishing it. That's why it's hacky - Zipkin will min/max the startTs/duration, so it will effectively "suppress" the fact that H started/finished span, and just treat it as if H simply added two non-core annotations. The fact that a span is "local' by default is important to ensure H does not emit core annotations.

You may be right about the clock skew. We probably need to revisit that thing anyway, to ensure startTs+duration take priority over core annotations in skew calculations.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Feb 18, 2016 via email

@yurishkuro
Copy link
Member Author

@bensigelman coming back to this thread. We have a number of routing services, including haproxy, that need this kind of "add logs to existing span" solution, because injecting a child span between client and server is not always feasible (due to routing service's implementation language), and not always desirable (as it confuses the RPC parent/child view and service dependencies). And there's still no way in OT to attach data to an existing span from another process.

So concrete proposal: what if we had a special type of span reference for this scenario? Don't know the name of it, something like annotationsFor:

spCtx := tracer.Extract(Format.xxx, carrier)
span := tracer.StartSpan("router",
    opentracing.SpanReference(AnnotationsFor, spCtx)
)

Given this reference type, the implementations have a lot of flexibility on how to handle it. They can choose to share spanID or create a unique id, can merge the data into the lead span before or after storage, can handle it differently in the UI, etc.

@bhs
Copy link
Contributor

bhs commented Oct 19, 2016

@yurishkuro sorry for the delay!

Hm, seems to me that this could be more like tracer.StartSpan("router", opentracing.SpanReference(Identity, spCtx)), which would generate a Span with the same ids/etc as spCtx. (Much like what jaeger / "zipkin classic" do for RPC server spans)

Thoughts?

If we wanted something more in the spirit of AnnotationsFor, I would probably suggest opentracing.About or something more general than AnnotationsFor, but that's just me.

@yurishkuro
Copy link
Member Author

About sounds good. Identity has a bit more confusing connotation - I am not sure I want to have that as a standard ref type. For example, in the classic zipkin model the server side span, even though it shares the span ID with the client span, nonetheless has its own start/end timestamps. With About span I could see tracing system ignoring the start/end timestamps and only using the logs/tags - it's really up to the implementation (specifically the UI).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants