Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulture Improvements #898

Closed
wants to merge 9 commits into from
Closed

Vulture Improvements #898

wants to merge 9 commits into from

Conversation

zalegrala
Copy link
Contributor

@zalegrala zalegrala commented Aug 19, 2021

What this PR does:
Refactor vulture to send traces using the oltp exporter.
Add option to send traces over a longer period of time, in addition to short traces.
Begin comparing the trace received from a query to the trace we sent.

Which issue(s) this PR fixes:
Fixes #791

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@zalegrala zalegrala force-pushed the vultureHandling branch 2 times, most recently from 7fa1630 to e3dab69 Compare August 19, 2021 21:32
tm.missingSpans++
}

count := spanCount(trace)
if count != complete.spanCount {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent improvement and we will merge as is, but the long term vision is to regenerate the entire trace from the seed and check all events/attributes/spans. i.e. revalidate the entire trace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I couldn't figure out was why attributes weren't being sent. There is this notion of "recording" or not, and the otlp code seems to skip doing anything with attributes if the exporter isn't recording. If we can figure out how to enable that bit, then checking the attributes should be extensible from the structure that is in place now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another way to get the attributes on the trace. I'll see if I can get something in place to include more detail that we can check for in the trace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to inject some randomness into the attributes and ensure they get validated on the query.


// Vulture is used to send traces to Tempo, and then read those traces back out to verify that the service is operating correctly.
type Vulture struct {
completeChan chan completeTrace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original design was to seed the random number generator with known values anchored on timestamps. this way vulture can query for any trace in the past 2 weeks and know exactly what it should look like without having to keep anything in memory. i'd prefer keeping that design unless there are strong reasons to do otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed like there was a race happening where we'd try to discover a past time stamp that was never sent and then check for it, and get a 404 back. I like Vulture being stateless, in that nothing gets persisted between restarts. Is there a middle ground here between trying to rediscover an old timestamp and keeping track of every trace id? We could perhaps continue to query traces that had been sent in a sampled way if checking for older traces is important. I definitely didn't factor that in. Though, the long running traces should ensure that traces pull from multiple blocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating and then validating a trace from a past timestamp is stateless :) which is one reason I like it so much. It's easy to validate a complete trace that was pushed last week without having to store it somewhere. I'd really like to keep it.

I'm willing to endure a small number of 404s to keep this design.

We could perhaps continue to query traces that had been sent in a sampled way if checking for older traces is important.

Querying older traces is very important b/c it ensures that nothing changes as the trace is cut, flushed to the backend and repeatedly compacted over 2 weeks.

@zalegrala
Copy link
Contributor Author

I'll close this out and open from my fork.

@zalegrala zalegrala closed this Aug 20, 2021
@zalegrala zalegrala deleted the vultureHandling branch August 20, 2021 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend vulture to generate long-running traces
2 participants