New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulture Improvements #898
Vulture Improvements #898
Conversation
Here we ensure that the tempo service receives a trace before vulture is done sending spans for that trace ID. This ensures that tempo is tested for retrieving a trace that may exist in multiple blocks.
7fa1630
to
e3dab69
Compare
e3dab69
to
bf3f2e7
Compare
tm.missingSpans++ | ||
} | ||
|
||
count := spanCount(trace) | ||
if count != complete.spanCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent improvement and we will merge as is, but the long term vision is to regenerate the entire trace from the seed and check all events/attributes/spans. i.e. revalidate the entire trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I couldn't figure out was why attributes weren't being sent. There is this notion of "recording" or not, and the otlp code seems to skip doing anything with attributes if the exporter isn't recording. If we can figure out how to enable that bit, then checking the attributes should be extensible from the structure that is in place now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another way to get the attributes on the trace. I'll see if I can get something in place to include more detail that we can check for in the trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this to inject some randomness into the attributes and ensure they get validated on the query.
|
||
// Vulture is used to send traces to Tempo, and then read those traces back out to verify that the service is operating correctly. | ||
type Vulture struct { | ||
completeChan chan completeTrace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the original design was to seed the random number generator with known values anchored on timestamps. this way vulture can query for any trace in the past 2 weeks and know exactly what it should look like without having to keep anything in memory. i'd prefer keeping that design unless there are strong reasons to do otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed like there was a race happening where we'd try to discover a past time stamp that was never sent and then check for it, and get a 404 back. I like Vulture being stateless, in that nothing gets persisted between restarts. Is there a middle ground here between trying to rediscover an old timestamp and keeping track of every trace id? We could perhaps continue to query traces that had been sent in a sampled way if checking for older traces is important. I definitely didn't factor that in. Though, the long running traces should ensure that traces pull from multiple blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generating and then validating a trace from a past timestamp is stateless :) which is one reason I like it so much. It's easy to validate a complete trace that was pushed last week without having to store it somewhere. I'd really like to keep it.
I'm willing to endure a small number of 404s to keep this design.
We could perhaps continue to query traces that had been sent in a sampled way if checking for older traces is important.
Querying older traces is very important b/c it ensures that nothing changes as the trace is cut, flushed to the backend and repeatedly compacted over 2 weeks.
I'll close this out and open from my fork. |
What this PR does:
Refactor vulture to send traces using the oltp exporter.
Add option to send traces over a longer period of time, in addition to short traces.
Begin comparing the trace received from a query to the trace we sent.
Which issue(s) this PR fixes:
Fixes #791
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]