Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TraceQL: Grouping #2490

Merged
merged 19 commits into from May 23, 2023
Merged

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented May 18, 2023

What this PR does:

  • Adds by() and coalesce() to the language. Purposefully only documents by() b/c I can't think of a use case for coalesce() and I want to keep docs focused.
  • Reduces the tempodb search tests from 250 -> 50 traces for test sanity.
  • Adds a rebatchIterator to do the work of persisting the SecondPass callback spansets through the iterators.
  • Correctly persist matched and other spanset attributes to the frontend!

TODO:

  • Correctly handle recombination at the query frontend
  • Make sure that the order of returned spanset attributes is pinned

Which issue(s) this PR fixes:
Fixes #2136
Fixes #2307

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Copy link
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding doc! Updates look good.

@joe-elliott
Copy link
Member Author

image

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
@joe-elliott
Copy link
Member Author

After much hand-wringing I decided to deprecate the current "SpanSet" field on the return in favor of "SpanSets".

image

For now the SpanSet field is a "random" spanset from the slice. For queries that do not create multiple spansets per trace (i.e. everything but by()) there will be one SpanSet in the slice and it will match the old field.

Signed-off-by: Joe Elliott <number101010@gmail.com>
Copy link
Contributor

@mdisibio mdisibio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Very nice solution to get spans back in order for the second pass after grouping. A few q's but just surface-level stuff.

pkg/traceql/engine.go Show resolved Hide resolved
@@ -90,34 +96,23 @@ type Spanset struct {
RootServiceName string
StartTimeUnixNanos uint64
DurationNanos uint64
Attributes map[string]Static
Attributes []*SpansetAttribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for the slice over a map? Also map[string]Static wasn't right, it probably should be map[Attribute]Static

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order is meaningful and needs to be preserved b/c it's established by the query, but the map can't preserve the order.

func spansetID(ss *tempopb.SpanSet) string {
id := ""

for _, s := range ss.Attributes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial thought is that the whole Attributes slice could uniquely define the spanset. Does it contain attributes we don't want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, if the attributes are:

by(.namespace) = "prod"
by(.service.name) = "app"
avg(duration) = 1.32ms

we only want to use the first 2 to identify the spanset

tempodb/encoding/vparquet2/block_traceql.go Outdated Show resolved Hide resolved
// use otherEntryCallbackSpansetKey to indicate to the rebatchIterator that either
// 1) this is the last span in the spanset, or 2) there are more spans in the spanset
span.cbSpansetFinal = idx == len(ss.Spans)-1
span.cbSpanset = ss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a bool on the last span, we could probably rebatch them based on equal ss pointers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could do that, but then the rebatchIterator would have to call Next() until it received a nil span to guarantee its batches were complete. Then it would be able to dump all batches. I did this to preserve the "streaming" nature of the iterators

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
@joe-elliott joe-elliott merged commit 7ad15d6 into grafana:main May 23, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants