New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing trace context to signalfx #509
Conversation
Gerald Rule: Copy Observability on Veneur, Unilog, Falconer pull requestscc @stripe/observability |
@@ -47,15 +47,16 @@ func (c *collection) submit(ctx context.Context, cl *trace.Client) error { | |||
errorCh := make(chan error, len(c.pointsByKey)+1) | |||
|
|||
submitOne := func(client dpsink.Sink, points []*datapoint.Datapoint) { | |||
span, ctx := trace.StartSpanFromContext(ctx, "") | |||
span, childCtx := trace.StartSpanFromContext(ctx, "") | |||
span.SetTag("datapoint_count", len(points)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be best done as a Log...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems fine to do it this way - i can imagine interesting correlations that we'd lose sight of if we didn't have the information available in tracing.
proxy.go
Outdated
// We do this after we've fetched info so we don't hold the lock during long | ||
// queries, timeouts or errors. The flusher can lock the mutex and prevent us | ||
// from updating at the same time. | ||
func updateRing(destinations []string, ring *consistent.Consistent, mtx *sync.Mutex) { | ||
mtx.Lock() | ||
defer mtx.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i realize this was here before, but if we care about performance enough to be talking about "At the last moment" in this segment, then better not to use a defer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
80f39bc
to
5c56af9
Compare
I've found another change that I'd like to add here. The gauges tracking batch size in forwarding should be using counters. Since these aren't being globalized we can just use the rollup designation to get either an easy sum of all metrics forwarded or average of batch size. With a gauge, this just capture the last batch size of a gauge window. |
@@ -47,15 +47,16 @@ func (c *collection) submit(ctx context.Context, cl *trace.Client) error { | |||
errorCh := make(chan error, len(c.pointsByKey)+1) | |||
|
|||
submitOne := func(client dpsink.Sink, points []*datapoint.Datapoint) { | |||
span, ctx := trace.StartSpanFromContext(ctx, "") | |||
span, childCtx := trace.StartSpanFromContext(ctx, "") | |||
span.SetTag("datapoint_count", len(points)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems fine to do it this way - i can imagine interesting correlations that we'd lose sight of if we didn't have the information available in tracing.
Summary
Motivation
Attempting to get better visibility out of veneur-proxy forwarding process.
Test plan
No additional tests required. Will deploy to a subset in QA for tests when ready.
Rollout/monitoring/revert plan
After deployment traces in the SignalFx sink will be reordered to reflect parent child span processes.