-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design Proposal: TraceQL Concepts #1378
Design Proposal: TraceQL Concepts #1378
Conversation
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
|
||
Note that the second expression requires both conditions to be true on the same span. The entire expression inside of `{}` must be evaluated as true on a single span for it to be included in the resultset. | ||
|
||
## Combining Spansets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a section introducing the result set data model and define spanset
there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I am going to reword the "structure" section above to better highlight how integral spansets are to the language and add some details about the resultset.
|
||
A query is an expression that is evaluated on one trace at a time, and iterates, evaluating each trace. The expression is evaluated on each trace as a whole, one at a time. If the evaluation produces a set of spans, then the trace and the set of spans is part of the result set. | ||
|
||
The structure of a query is an expression or set of chained expressions (a pipeline). Chained expressions are evaluated left to right. Each expression selects or discards this trace from being included in the results set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few occurrences of "resultset" in this document. What does that signify? Is it the set of results displayed in the search result table? Why does it contains only those spans that match the conditions and not the entire trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add some clarification on spanset/resultset. (Requested by Yuri as well above). I really like only returning those spans that match the criteria specified in the pipeline. If you were to write
{ http.status = 200 }
and only returned the matching spans. Then a UI could provide links to both the matching trace as a whole as well as deep links to the few spans that matched the requested criteria.
`{ namespace = "prod" }` | ||
|
||
Two attributes appear on the same span: | ||
`{ namespace = "prod" && http.status = 200 }` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder if we need the &&
operator for multiple conditions on the same span. Wonder if we could use ,
that would be more aligned with PromQL/LogQL:
{ namespace = "prod", http.status = 200 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I would like to support a variety of operators here so I prefer the &&
{ namespace = "prod" || http.status = 200 }
of course that's equivalent to:
{ namespace = "prod" } || { http.status = 200 }
so perhaps comma makes sense? interested in others' thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 on comma, it's an additional mental barrier while &&
is universally understood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course that's equivalent to
@joe-elliott Technically those are not equivalent, maybe edit this comment to prevent any confusion?
|
||
### Attribute fields | ||
|
||
We can refer to dynamic attributes (also known as tags) on the span or the span's resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that there is no way to distinguish between resource attributes and span attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our internal doc we have suggested using a prefix like the below to explicitly select resource or span attributes. If "span" or "resource" is not present then we would look on both the resource and the span. The exact details here are still under discussion. Input is appreciated.
{ span.key = "val" }
{ resource.key = "val" }
I kind of hate the term "resource" and prefer the Jaeger terminology "process", but we will likely align with OTel on naming conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If "span" or "resource" is not present then we would look on both the resource and the span
I would love the ability to specify span-level or resource-level, if no reason other than improved search times and efficiency. From my standpoint, I think we're waiting for the right syntax. { key=val }
is great for usability but not precise... And {span.key=val}
feels wordy, and could conflict if another attribute is named span.key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: there's a new OTEP for a new "scope attribute".
`{ duration > 1s }` | ||
|
||
The trace as a whole has a duration of over one second: | ||
`max(end) - min(start) > 1s` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect Tempo to store the final trace duration determined exactly the same way, so could this simplified? Thinking that this is an inefficient precedence to set, as it requires scanning all spans for two values instead of the single value per trace.
`max(end) - min(start) > 1s` | |
`duration > 1s` |
I realize this kind of conflicts with the span-level duration, so maybe another intrinsic property like totalDuration
, or duration could just work differently if wrapped in { }
or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently no way to reference the "trace" as a thing in traceql. All of the operations are selecting and filtering sets of spans. There are a few "trace level" attributes that I wonder if it would make sense to include or not, but we would have to think quite carefully about we would integrate it into our current "pipeline of spansets" approach.
Examples of trace level attributes
- duration
- rootSpanName
- ??
To help with the second one we have considered adding a "root" intrinsic so you could ask something like:
{ name = "do the thing" && root = true }
but that's still up in the air.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trace duration is absolutely an attribute that people would query often. Perhaps having a root
level would be already sufficient in terms of performance? The trace duration would then be max(end) - root.start
?
Co-authored-by: Annanay Agarwal <annanay25@users.noreply.github.com>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Does your AGPL apply to this language as well? |
@till This is a good question. There are at least few components here: the language spec itself, documents such as this that describe the language and the implementation of the language. I am working with people internally and will provide you with an honest answer as soon as I can. |
1. A bare condition is wrapped in `{}` | ||
`http.status = 200` -> `{ http.status = 200 }` | ||
|
||
2. Multiple bare conditions are individually wrapped in `{}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the idea of shorthand but I also think that shorthand can become source of confusion.
We have &&
in field expressions where we can write a query like { http.status >= 200 && http.status < 300 }
, and &&
is also being used to write a query like { namespace = "prod" } && { http.status = 200 }
let's say I write two queries
{ namespace = "prod" } && { http.status = 200 }
{ namespace = "prod" && http.status = 200 }
are these two queries same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are not the same. (1) selects traces where some spans match 1st and 2nd condition independently. (2) selects traces where both conditions hold on the same spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks.
In that case, I believe suggested shorthand 2 with multiple bare conditions might lead to confusion / unexpected results.
If a user writes http.status = 200 && namespace = "prod"
, they might expect it return results for { http.status = 200 && namespace = "prod" }
(whole query gets wrapped into {}
, like how it happend for shorthand 1)
But it will return results for { http.status = 200 } && { namespace = "prod" }
(wrapping each part into {}
, as sugessted into shorthand 2)
One way to avoid this is to always wrap whole query into {}
(only keep shorthand 1) for both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very flexible on the shorthand idea. I mainly wanted the user to type something very quick, easy and intuitive and get an meaningful response. Thanks for this feedback!
Hey, very nice work! |
@metalmatze Good questions.
These examples include
Since these tags are arbitrary it can look like whatever you want for grpc. You are welcome to name the attributes anything you'd like. However, like http, there are some published conventions for rpc as well. |
Find traces that contain spans whose duration is greater than 2 seconds: | ||
`{ duration > 2s }` | ||
|
||
Find traces that contain a span named "HTTP POST": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far I have seen only these "contain" wording. I searched for:
- wildcard , *
- regex , regular, expression
- starts with, ends with
- match
- like , %
I would presume a regex usage as already exist in promql. Is it addressed or intentionally excluded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we intend to support regex using the same syntax as promql.
name =~ ".*test.*"
|
||
## Aggregators | ||
|
||
All of the above expressions involve asking questions about individual spans. However, sometimes we want to ask questions about a set of spans. For that we have aggregate functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aggregation functionality is something I am missing from APM world as traces/spans seem to be burst data flow. However, when it comes to usage scenarios, it was not about always about attributes of captured span/method/class/etc. In an HTTP scenario, the aggregation key can be query parameters like '?&operation=getItems' or an attribute inside json post body. For even more complex ones, one might need to skip the root or/and a few spans and extract key aggregation parameter in a couple of levels below.
The good news is, you generally do not need such aggregation for categorization over all nodes transaction visits, rather it gets generally needed on the very first contact point of an application or system.
I do not think TraceQL should cover all such aspects at once. Just sharing my previous needs in the last decade regarding aggregation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thoughts. Since TraceQL can operate on custom attributes you could extract query params and add them as attributes to the span. Then you would be able to aggregate on them easily.
We did our best to cover the use cases we found the most important, but of course we will find users with new ideas and needs as the project grows. After we are all using TraceQL please keep bringing these ideas back to the table with real world examples so we can continue to improve the language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are doing not great, covering a lot at once, no discussions on that. Thanks!
One thing I want to add to this piece of thread as a side note (where I will try to bring back on usage) is that:
Such TraceQL in my point of view as well needs to be available on the collection point with the aggregation feature. I would prefer to use TraceQL as a processor on the collector, especially for aggregation. This is a known tactic to overcome some usual cons. Filter more, have less cardinality, less noise... so to solve some of them with aggregation as a layer before it reaches to a storage layer.
## Structure | ||
|
||
A query is an expression that is evaluated on one trace at a time. The query is structured as a set of chained | ||
expressions (a pipeline). Each expression in the pipeline selects or discards spansets from being included in the results set. E.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a spanset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's explained later on: it's the set of spans from a trace matching the given condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. When you write a traceql expression you are building a pipeline of operations on a spanset. A trace enters the pipeline whole and if any subset of it "survives" the filters and selectors applied in the pipeline then it is considered part of the resultset.
|
||
If the trace being evaluated contains no spans with an attribute `http.status` with the value `200` then no spans will be selected and this trace will not appear in our resultset. | ||
|
||
If the trace does contain spans with an attribute `http.status` with the value `200` then only those spans will be returned. The trace is reduced to only the set of spans that match the condition inside the `{}`. This set of spans (and therefore trace) will be included in our resultset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm having some trouble with the terminology, but to me, a trace is the representation of a transaction, potentially across process boundaries. In other words, a trace is the set of all available spans for a given transaction. The result is then not the whole trace, but a partial trace with span set(s) matching the criteria. Would it be worth extending this to clarify that it's not the whole trace that is returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. In the general case the results of a traceql expression will be a collection of spansets. Each spanset will be a subset of spans from a given trace that "survive" the pipeline. I will add a bit more detail here. Thanks
|
||
### Attribute fields | ||
|
||
We can refer to dynamic attributes (also known as tags) on the span or the span's resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: there's a new OTEP for a new "scope attribute".
Note the difference between the above and the following: | ||
`{ region = "eu-west-0" && region = "eu-west-1" } ` | ||
|
||
The second expression will return no traces because it's impossible for both conditions to be simultaneously true on the same span. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's possible to have repeated tags so that it's possible for a span to have two "region" tags, one with each value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point I had not considered. Technically the proto does not prevent it, but the specification requires that for a set of attributes the key name is unique:
So span attributes should be unique, but technically that attribute could exist on both the resource and the span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the wire-format may allow it, but the spec does require uniqueness if I am reading it right.
`{ duration > 1s }` | ||
|
||
The trace as a whole has a duration of over one second: | ||
`max(end) - min(start) > 1s` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trace duration is absolutely an attribute that people would query often. Perhaps having a root
level would be already sufficient in terms of performance? The trace duration would then be max(end) - root.start
?
| `duration` | end - start time of the span | | ||
| `name` | operation or span name | | ||
| `status` | status values are "error", "ok", or "unset" | | ||
| `parent` | the parent of this span | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent
intrinsic I feel has some hidden complexity and wonder if it is a worthwhile element of the language. A basic spanset operator such as { A && B}
is evaluated on an individual span which is a straightforward concept and allows for simple storage mechanisms and algorithms. However { parent.A && B }
evaluates across multiple spans. The implied downstream implementation must now theoretically have both spans present in-memory and evaluated at the same time. For a worst case example consider the possibility of parent.parent.parent...
.
The parent
intrinsic could be replaced with the existing child operator: { parent.A && B }
is equal to {A} > {B}
, which is more consistent with the rest of the language, and also the implementation is clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like parent for expressions such as:
{ parent.duration - duration > 1s }
or
{ parent.service.name != service.name }
Also, I'm fine with restricting parent to only being allowed to have depth of 1. Let's dig into the details as we work toward an internal POC. This is still an evolving spec and we can make adjustments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdisibio your complexity argument doesn't sit well with me. The whole point of a dedicated trace QL to me is its ability to express queries across spans of the trace; if it didn't do that I could just as well stick with SQL. So any sensible implementation of TraceQL would have to deal with conditions not defined on individual spans, this complexity is unavoidable.
The language purity argument makes sense, I think it's a matter of how difficult it is to express the same clauses that Joe showed above using the {A} > {B}
notation.
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
`{ } >> { }` | ||
This is the descendant operator. The spans returned from this operator will match the right hand side conditions while also being descendants of spans that match the left hand side conditions. | ||
|
||
`{ } > { }` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While >
for direct parent-child relation looks intuitive when thinking of spans, I can imagine that overloading the >
character will confuse users with less familiarity with the trace model.
|
||
If the trace does contain spans with an attribute `http.status` with the value `200` then only those spans will be returned. The trace is reduced to only the set of spans that match the condition inside the `{}`. The resultset will contain only this subset of spans matching the condition. | ||
|
||
### Intrinsic fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think span kind is missing from this list. Span kind should be especially useful with the descendant operator.
Updated with the latest
|
@till Apologies for the late response. After discussing this internally we have decided that all code related to TraceQL will be licensed under AGPLv3. |
Here we go! |
What this PR does:
Adds a design proposal that outlines the concepts and basic syntax of TraceQL, the upcoming Tempo query language. The full specification for this language is still a WIP, but we feel the core concepts are ready to be shared. We invite anyone to provide comments and criticisms to better create a language that serves the needs of the tracing community!
Special thanks to @yurishkuro, @tomwilkie, @mdisibio and @cyriltovena for helping build this language over the past months.