Vulture Improvements #898

zalegrala · 2021-08-19T20:26:08Z

What this PR does:
Refactor vulture to send traces using the oltp exporter.
Add option to send traces over a longer period of time, in addition to short traces.
Begin comparing the trace received from a query to the trace we sent.

Which issue(s) this PR fixes:
Fixes #791

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Here we ensure that the tempo service receives a trace before vulture is done sending spans for that trace ID. This ensures that tempo is tested for retrieving a trace that may exist in multiple blocks.

joe-elliott · 2021-08-20T12:08:09Z

cmd/tempo-vulture/main.go

 		tm.missingSpans++
 	}

+	count := spanCount(trace)
+	if count != complete.spanCount {


excellent improvement and we will merge as is, but the long term vision is to regenerate the entire trace from the seed and check all events/attributes/spans. i.e. revalidate the entire trace.

One thing I couldn't figure out was why attributes weren't being sent. There is this notion of "recording" or not, and the otlp code seems to skip doing anything with attributes if the exporter isn't recording. If we can figure out how to enable that bit, then checking the attributes should be extensible from the structure that is in place now.

There is another way to get the attributes on the trace. I'll see if I can get something in place to include more detail that we can check for in the trace.

I've updated this to inject some randomness into the attributes and ensure they get validated on the query.

joe-elliott · 2021-08-20T12:12:29Z

cmd/tempo-vulture/vulture.go

+
+// Vulture is used to send traces to Tempo, and then read those traces back out to verify that the service is operating correctly.
+type Vulture struct {
+	completeChan     chan completeTrace


the original design was to seed the random number generator with known values anchored on timestamps. this way vulture can query for any trace in the past 2 weeks and know exactly what it should look like without having to keep anything in memory. i'd prefer keeping that design unless there are strong reasons to do otherwise.

It seemed like there was a race happening where we'd try to discover a past time stamp that was never sent and then check for it, and get a 404 back. I like Vulture being stateless, in that nothing gets persisted between restarts. Is there a middle ground here between trying to rediscover an old timestamp and keeping track of every trace id? We could perhaps continue to query traces that had been sent in a sampled way if checking for older traces is important. I definitely didn't factor that in. Though, the long running traces should ensure that traces pull from multiple blocks.

Generating and then validating a trace from a past timestamp is stateless :) which is one reason I like it so much. It's easy to validate a complete trace that was pushed last week without having to store it somewhere. I'd really like to keep it.

I'm willing to endure a small number of 404s to keep this design.

We could perhaps continue to query traces that had been sent in a sampled way if checking for older traces is important.

Querying older traces is very important b/c it ensures that nothing changes as the trace is cut, flushed to the backend and repeatedly compacted over 2 weeks.

zalegrala · 2021-08-20T20:39:14Z

I'll close this out and open from my fork.

zalegrala added 7 commits August 18, 2021 15:12

Include test for vulture helper

6e2e59e

Refactor to use otlp exporter

2d88e8e

Refactor go routines into Vulture struct

f5f9726

Update vendor to reflect new otel package usage

b2e740a

Add new vulture metric for incorrect span count

0842fa6

Send long traces early and then add to them later

f88e2c6

Here we ensure that the tempo service receives a trace before vulture is done sending spans for that trace ID. This ensures that tempo is tested for retrieving a trace that may exist in multiple blocks.

Implement config option for long running traces

a6c887d

zalegrala force-pushed the vultureHandling branch 2 times, most recently from 7fa1630 to e3dab69 Compare August 19, 2021 21:32

Clean up for lint

bf3f2e7

zalegrala force-pushed the vultureHandling branch from e3dab69 to bf3f2e7 Compare August 19, 2021 21:39

joe-elliott reviewed Aug 20, 2021

View reviewed changes

Inject random attributes to validate on query

de883b5

zalegrala closed this Aug 20, 2021

zalegrala deleted the vultureHandling branch August 20, 2021 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulture Improvements #898

Vulture Improvements #898

zalegrala commented Aug 19, 2021 •

edited

joe-elliott Aug 20, 2021

zalegrala Aug 20, 2021

zalegrala Aug 20, 2021

zalegrala Aug 20, 2021

joe-elliott Aug 20, 2021

zalegrala Aug 20, 2021

joe-elliott Aug 20, 2021

zalegrala commented Aug 20, 2021

Vulture Improvements #898

Vulture Improvements #898

Conversation

zalegrala commented Aug 19, 2021 • edited

joe-elliott Aug 20, 2021

Choose a reason for hiding this comment

zalegrala Aug 20, 2021

Choose a reason for hiding this comment

zalegrala Aug 20, 2021

Choose a reason for hiding this comment

zalegrala Aug 20, 2021

Choose a reason for hiding this comment

joe-elliott Aug 20, 2021

Choose a reason for hiding this comment

zalegrala Aug 20, 2021

Choose a reason for hiding this comment

joe-elliott Aug 20, 2021

Choose a reason for hiding this comment

zalegrala commented Aug 20, 2021

zalegrala commented Aug 19, 2021 •

edited