How can a tracer append data to an existing span, i.e. span ID that was created in another process #22

yurishkuro · 2016-01-15T15:43:53Z

A use case we're working on right now is we want haproxy to add Logs to the current trace that's passing through it. We already have something in place that can sniff headers from passing requests and write the out to kafka along with some timers. The plan is to have a custom reader of that kafka topic that will extract the timestamps and trace IDs of when the request was passing through haproxy and create additional span Logs that will be saved to Cassandra via somewhat of a back door. It would be nice to have a clean OpenTracing way of doing that, which requires not only the ability to inject the timestamps into Logs externally (per #20), but in our case do that without starting/finishing the span. I know Ben you will be against that last part :-), and say we could just create a new span instead, but doing so would require modifying in-flight request inside haproxy to replace the client trace/span ID with haproxy span ID, and that is much more intrusive way of doing it vs. just adding logs.

yurishkuro · 2016-01-15T15:46:42Z

@bensigelman said

Oh, just saw your second message: so you basically want to be able to record logs independent of spans?

Most like adding logs to a span that was managed elsewhere.

codefromthecrypt · 2016-01-16T02:42:16Z

@bensigelman https://github.com/bensigelman said

Oh, just saw your second message: so you basically want to be able to
record logs independent of spans?

Most like adding logs to a span that was managed elsewhere.

I think we are getting quite close to how zipkin (and I guess dapper)
worked. Ex. Trace.record (Span.addAnnotation in zipkin) allows unrelated
systems, who only have access to the span identifiers, to participate
out-of-band. In fact, in zipkin, this is why each log (annotation) had an
endpoint, as this was to tell who actually logged it (ex if it was a proxy).

In implementation, finagle prefers to send complete spans, but will still
flush "incomplete ones" such as those containing just annotations.
https://github.com/twitter/finagle/blob/develop/finagle-zipkin/src/main/scala/com/twitter/finagle/zipkin/thrift/DeadlineSpanMap.scala#L69

In practical terms, I'd expect OpenTracing's Tracer to be used as a partial
api, as it may never have all features (ex Trace.clear()), etc. So one way
to address this is just to fill features in the zipkin or otherwise
tracers, and use deleting that code as a driver for an issue like this. Of
course, I'd prefer to be able to account for "unmanaged events" now, as
that solves various existing use cases.

bhs · 2016-01-31T18:49:20Z

@yurishkuro I'm going to rename this to be more general, hope that's ok. rename it back if not.

yurishkuro · 2016-01-31T20:26:45Z

@bensigelman don't mind the title change, they are both applicable and likely to be resolved by the same API.

I think one of the reasons Zipkin did not have the timestamp & duration in the spans is because of its backend ability to receive multiple submissions of the same span ID and aggregate events from all of them into a single timeline, which was then used to infer startTs and duration. This is what this ticket is really about. We chose a simpler API in OT for reporting spans via start/finish methods, which is great as it covers the majority of use cases, but for these less frequent cases it doesn't work and we need something else. It's not necessarily "past spans" as the routing layer may be emitting an event into the ongoing span ID, but both can be solved with the same API.

BTW, did Dapper require at most one process per span, the same way that htrace does? If so, I'm curious how it dealt with the ^^^ use cases.

bhs · 2016-01-31T21:41:36Z

Dapper used two processes for each RPC span (which I think is regrettable in retrospect)...

bhs · 2016-01-31T21:42:22Z

(renamed again!)

bhs · 2016-02-10T05:46:33Z

This is basically the same thing as #20... I will close that one and cross-ref.

bhs · 2016-02-17T22:24:59Z

@yurishkuro close-able now, yes?

yurishkuro · 2016-02-17T22:47:18Z

Let me rename it to the original subject and keep open (and cc @adriancole). I think the use case kind of works for us due to Zipkin specifics.

The scenario is:

Process A calls process B over RPC
- Per Zipkin spec, they both report the same span ID
Process A marks the span with span.kind=RPC_CLIENT
Process B marks the span with span_kind=RPC_SERVER
- The span kind tags cause tracer to logs cs/cr events in A, and sr/ss events in B.

Meanwhile, the actual RPC request goes through haproxy, which is not able to report spans itself, but writes logs that are converted to spans by a separate process H:

H has access to headers and can do extractor.join_trace(timestamp=x)
- per Zipkin convention that creates the same span ID as A and B had
H logs two events, haproxy-in and haproxy-out, with historical timestamps
H calls finish(duration=y) on the span to get it reported

I think this should work (in Zipkin), due to the following "features":

Span from H is treated as local span, and therefore does not log cs/cr/ss/sr events
join_trace returns span with the same ID as on the wire
Zipkin UI merges submitted spans with the same ID, adjusts for clock skew in H and B timestamps, and merges startTime=min(...) and duration=max(...)

I don't think it will work in tracers that assign unique span ID to each process (two spans per RPC).

bhs · 2016-02-17T22:55:42Z

I suppose a tracing impl could accept the span_id (or any other "tracer state") as a tag and use the fully-overloaded version of Tracer.start_span(...) to hard-code everything about said span. Not really in the spirit of "Span Tags" but would get the job done IIUC.

codefromthecrypt · 2016-02-18T07:14:29Z

@yurishkuro followed you until we got to where H is actually the one finishing the span.

What it sounds like is that the haproxy won't actually change the width of the span (startTs, duration), rather it would add annotations around the "sr" "ss".. I wouldn't say this has to be a "local span", rather a contribution to an existing span. Ex. H is adding two annotations, and needs to send required ids (currently trace_id, span_id, parent_id) to allocate them properly *. The (local)component tag is just a nice to have.

sideline on clock skew:
Currently clock skew adjustment works in relation to the "core annotations", so I'm not sure if skew from H would actually .. unskew? solvable probably!

ps see this for maybe not requiring parent_id.

yurishkuro · 2016-02-18T12:21:35Z

@adriancole

followed you until we got to where H is actually the one finishing the span.
What it sounds like is that the haproxy won't actually change the width of the span (startTs, duration),

well, yes that's the end goal, but OpenTracing API does not have a way to report anything about a span without starting and finishing it. That's why it's hacky - Zipkin will min/max the startTs/duration, so it will effectively "suppress" the fact that H started/finished span, and just treat it as if H simply added two non-core annotations. The fact that a span is "local' by default is important to ensure H does not emit core annotations.

You may be right about the clock skew. We probably need to revisit that thing anyway, to ensure startTs+duration take priority over core annotations in skew calculations.

codefromthecrypt · 2016-02-18T12:40:15Z

thx for the loopback i-i

yurishkuro · 2016-10-17T19:25:23Z

@bensigelman coming back to this thread. We have a number of routing services, including haproxy, that need this kind of "add logs to existing span" solution, because injecting a child span between client and server is not always feasible (due to routing service's implementation language), and not always desirable (as it confuses the RPC parent/child view and service dependencies). And there's still no way in OT to attach data to an existing span from another process.

So concrete proposal: what if we had a special type of span reference for this scenario? Don't know the name of it, something like annotationsFor:

spCtx := tracer.Extract(Format.xxx, carrier)
span := tracer.StartSpan("router",
    opentracing.SpanReference(AnnotationsFor, spCtx)
)

Given this reference type, the implementations have a lot of flexibility on how to handle it. They can choose to share spanID or create a unique id, can merge the data into the lead span before or after storage, can handle it differently in the UI, etc.

bhs · 2016-10-19T17:16:07Z

@yurishkuro sorry for the delay!

Hm, seems to me that this could be more like tracer.StartSpan("router", opentracing.SpanReference(Identity, spCtx)), which would generate a Span with the same ids/etc as spCtx. (Much like what jaeger / "zipkin classic" do for RPC server spans)

Thoughts?

If we wanted something more in the spirit of AnnotationsFor, I would probably suggest opentracing.About or something more general than AnnotationsFor, but that's just me.

yurishkuro · 2016-10-19T18:36:24Z

About sounds good. Identity has a bit more confusing connotation - I am not sure I want to have that as a standard ref type. For example, in the classic zipkin model the server side span, even though it shares the span ID with the client span, nonetheless has its own start/end timestamps. With About span I could see tracing system ignoring the start/end timestamps and only using the logs/tags - it's really up to the implementation (specifically the UI).

yurishkuro mentioned this issue Jan 15, 2016

need some way (pan-platform) to record a Span that happened in the past #20

Closed

This was referenced Jan 27, 2016

Merge Tracer.join_trace into Tracer.start_trace opentracing/opentracing-python#13

Closed

Simplify the API: removing TraceContext and TraceContextSource opentracing/opentracing-java#8

Merged

bhs changed the title ~~should we have some way of appending data to existing span?~~ we need a way to describe spans and logs/events that happened in the past Jan 31, 2016

bhs mentioned this issue Jan 31, 2016

Simplify the API: logging and timestamped annotations opentracing/opentracing-java#9

Merged

bhs changed the title ~~we need a way to describe spans and logs/events that happened in the past~~ we need a way to specify explicit timestamps for spans and logs/events Jan 31, 2016

yurishkuro mentioned this issue Feb 1, 2016

How can a user request a trace to be sampled #43

Closed

2 tasks

bhs mentioned this issue Feb 10, 2016

consolidate span creation methods opentracing/opentracing-go#50

Merged

yurishkuro changed the title ~~we need a way to specify explicit timestamps for spans and logs/events~~ How can a tracer append data to an existing span, i.e. span ID that was created in another process Feb 17, 2016

opentracing-importer mentioned this issue Nov 15, 2016

How can a tracer append data to an existing span, i.e. span ID that was created in another process opentracing/specification#3

Open

bhs closed this as completed Nov 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can a tracer append data to an existing span, i.e. span ID that was created in another process #22

How can a tracer append data to an existing span, i.e. span ID that was created in another process #22

yurishkuro commented Jan 15, 2016

yurishkuro commented Jan 15, 2016

codefromthecrypt commented Jan 16, 2016

bhs commented Jan 31, 2016

yurishkuro commented Jan 31, 2016

bhs commented Jan 31, 2016

bhs commented Jan 31, 2016

bhs commented Feb 10, 2016

bhs commented Feb 17, 2016

yurishkuro commented Feb 17, 2016

bhs commented Feb 17, 2016

codefromthecrypt commented Feb 18, 2016

yurishkuro commented Feb 18, 2016

codefromthecrypt commented Feb 18, 2016 via email

yurishkuro commented Oct 17, 2016

bhs commented Oct 19, 2016

yurishkuro commented Oct 19, 2016

How can a tracer append data to an existing span, i.e. span ID that was created in another process #22

How can a tracer append data to an existing span, i.e. span ID that was created in another process #22

Comments

yurishkuro commented Jan 15, 2016

yurishkuro commented Jan 15, 2016

codefromthecrypt commented Jan 16, 2016

bhs commented Jan 31, 2016

yurishkuro commented Jan 31, 2016

bhs commented Jan 31, 2016

bhs commented Jan 31, 2016

bhs commented Feb 10, 2016

bhs commented Feb 17, 2016

yurishkuro commented Feb 17, 2016

bhs commented Feb 17, 2016

codefromthecrypt commented Feb 18, 2016

yurishkuro commented Feb 18, 2016

codefromthecrypt commented Feb 18, 2016 via email

yurishkuro commented Oct 17, 2016

bhs commented Oct 19, 2016

yurishkuro commented Oct 19, 2016