Design Proposal: TraceQL Concepts #1378

joe-elliott · 2022-04-12T20:31:32Z

What this PR does:

Adds a design proposal that outlines the concepts and basic syntax of TraceQL, the upcoming Tempo query language. The full specification for this language is still a WIP, but we feel the core concepts are ready to be shared. We invite anyone to provide comments and criticisms to better create a language that serves the needs of the tracing community!

Special thanks to @yurishkuro, @tomwilkie, @mdisibio and @cyriltovena for helping build this language over the past months.

Signed-off-by: Joe Elliott <number101010@gmail.com>

yurishkuro · 2022-04-12T23:38:15Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+
+Note that the second expression requires both conditions to be true on the same span. The entire expression inside of `{}` must be evaluated as true on a single span for it to be included in the resultset.
+
+## Combining Spansets


maybe add a section introducing the result set data model and define spanset there

Good point. I am going to reword the "structure" section above to better highlight how integral spansets are to the language and add some details about the resultset.

docs/design-proposals/2022-04 TraceQL Concepts.md

annanay25 · 2022-04-13T09:22:41Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+
+A query is an expression that is evaluated on one trace at a time, and iterates, evaluating each trace. The expression is evaluated on each trace as a whole, one at a time. If the evaluation produces a set of spans, then the trace and the set of spans is part of the result set.
+
+The structure of a query is an expression or set of chained expressions (a pipeline). Chained expressions are evaluated left to right.  Each expression selects or discards this trace from being included in the results set.


There are a few occurrences of "resultset" in this document. What does that signify? Is it the set of results displayed in the search result table? Why does it contains only those spans that match the conditions and not the entire trace?

I will add some clarification on spanset/resultset. (Requested by Yuri as well above). I really like only returning those spans that match the criteria specified in the pipeline. If you were to write

{ http.status = 200 }

and only returned the matching spans. Then a UI could provide links to both the matching trace as a whole as well as deep links to the few spans that matched the requested criteria.

docs/design-proposals/2022-04 TraceQL Concepts.md

annanay25 · 2022-04-13T09:26:36Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+`{ namespace = "prod" }`
+
+Two attributes appear on the same span:  
+`{ namespace = "prod" && http.status = 200 }`


Hmm, I wonder if we need the && operator for multiple conditions on the same span. Wonder if we could use , that would be more aligned with PromQL/LogQL:

{ namespace = "prod", http.status = 200 }

That's fair. I would like to support a variety of operators here so I prefer the &&

{ namespace = "prod" || http.status = 200 }

of course that's equivalent to:

{ namespace = "prod" } || { http.status = 200 }

so perhaps comma makes sense? interested in others' thoughts.

-1 on comma, it's an additional mental barrier while && is universally understood

of course that's equivalent to

@joe-elliott Technically those are not equivalent, maybe edit this comment to prevent any confusion?

docs/design-proposals/2022-04 TraceQL Concepts.md

pavolloffay · 2022-04-13T12:26:51Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+
+### Attribute fields
+
+We can refer to dynamic attributes (also known as tags) on the span or the span's resource.


Does this imply that there is no way to distinguish between resource attributes and span attributes?

In our internal doc we have suggested using a prefix like the below to explicitly select resource or span attributes. If "span" or "resource" is not present then we would look on both the resource and the span. The exact details here are still under discussion. Input is appreciated.

{ span.key = "val" }
{ resource.key = "val" }

I kind of hate the term "resource" and prefer the Jaeger terminology "process", but we will likely align with OTel on naming conventions.

If "span" or "resource" is not present then we would look on both the resource and the span

I would love the ability to specify span-level or resource-level, if no reason other than improved search times and efficiency. From my standpoint, I think we're waiting for the right syntax. { key=val } is great for usability but not precise... And {span.key=val} feels wordy, and could conflict if another attribute is named span.key.

Note: there's a new OTEP for a new "scope attribute".

docs/design-proposals/2022-04 TraceQL Concepts.md

mdisibio · 2022-04-13T12:47:35Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+`{ duration > 1s }`
+
+The trace as a whole has a duration of over one second:  
+`max(end) - min(start) > 1s`


I expect Tempo to store the final trace duration determined exactly the same way, so could this simplified? Thinking that this is an inefficient precedence to set, as it requires scanning all spans for two values instead of the single value per trace.

Suggested change

`max(end) - min(start) > 1s`

`duration > 1s`

I realize this kind of conflicts with the span-level duration, so maybe another intrinsic property like totalDuration, or duration could just work differently if wrapped in { } or not.

There is currently no way to reference the "trace" as a thing in traceql. All of the operations are selecting and filtering sets of spans. There are a few "trace level" attributes that I wonder if it would make sense to include or not, but we would have to think quite carefully about we would integrate it into our current "pipeline of spansets" approach.

Examples of trace level attributes

duration

rootSpanName

??

To help with the second one we have considered adding a "root" intrinsic so you could ask something like:

{ name = "do the thing" && root = true }

but that's still up in the air.

Trace duration is absolutely an attribute that people would query often. Perhaps having a root level would be already sufficient in terms of performance? The trace duration would then be max(end) - root.start?

docs/design-proposals/2022-04 TraceQL Concepts.md

Co-authored-by: Annanay Agarwal <annanay25@users.noreply.github.com>

Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>

Co-authored-by: Martin Disibio <mdisibio@gmail.com>

Signed-off-by: Joe Elliott <number101010@gmail.com>

till · 2022-04-13T18:54:40Z

Does your AGPL apply to this language as well?

joe-elliott · 2022-04-14T17:55:48Z

Does your AGPL apply to this language as well?

@till This is a good question. There are at least few components here: the language spec itself, documents such as this that describe the language and the implementation of the language. I am working with people internally and will provide you with an honest answer as soon as I can.

electron0zero · 2022-04-18T18:44:17Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+1. A bare condition is wrapped in `{}`  
+  `http.status = 200` -> `{ http.status = 200 }`
+
+2. Multiple bare conditions are individually wrapped in `{}`    


Like the idea of shorthand but I also think that shorthand can become source of confusion.

We have && in field expressions where we can write a query like { http.status >= 200 && http.status < 300 }, and && is also being used to write a query like { namespace = "prod" } && { http.status = 200 }

let's say I write two queries

{ namespace = "prod" } && { http.status = 200 }

{ namespace = "prod" && http.status = 200 }

are these two queries same?

they are not the same. (1) selects traces where some spans match 1st and 2nd condition independently. (2) selects traces where both conditions hold on the same spans.

I see, thanks.

In that case, I believe suggested shorthand 2 with multiple bare conditions might lead to confusion / unexpected results.

If a user writes http.status = 200 && namespace = "prod", they might expect it return results for { http.status = 200 && namespace = "prod" } (whole query gets wrapped into {}, like how it happend for shorthand 1)

But it will return results for { http.status = 200 } && { namespace = "prod" } (wrapping each part into {}, as sugessted into shorthand 2)

One way to avoid this is to always wrap whole query into {} (only keep shorthand 1) for both cases.

I very flexible on the shorthand idea. I mainly wanted the user to type something very quick, easy and intuitive and get an meaningful response. Thanks for this feedback!

metalmatze · 2022-04-19T09:29:29Z

Hey, very nice work!
Looking at the provided example labels like http.status reminds me more of hierarchical structures than Prometheus labels. Is this a limitation that came from the existing labels in OpenTracing/OpenTelemtery? Is this a limitation to begin with? Does the dot mean anything?
What would this look like with grpc for example? Will the grpc.method and grpc.service show up in the spans and then be able to be queried just the same?

joe-elliott · 2022-04-19T12:59:06Z

@metalmatze Good questions.

Looking at the provided example labels like http.status reminds me more of hierarchical structures than Prometheus labels. Is this a limitation that came from the existing labels in OpenTracing/OpenTelemtery? Is this a limitation to begin with? Does the dot mean anything?

These examples include http.status simply b/c it is listed in the OpenTelemetry semantic conventions and is easily understood. The language is able to select arbitrary attribute key/value pairs that appear on your spans and is not limited
to anything specific. The dot is only meaningful in terms of the linked conventions where it kind of "namespaces" http properties.

What would this look like with grpc for example? Will the grpc.method and grpc.service show up in the spans and then be able to be queried just the same?

Since these tags are arbitrary it can look like whatever you want for grpc. You are welcome to name the attributes anything you'd like. However, like http, there are some published conventions for rpc as well.

devrimdemiroz · 2022-04-26T23:29:51Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+Find traces that contain spans whose duration is greater than 2 seconds:  
+`{ duration > 2s }`
+
+Find traces that contain a span named "HTTP POST":  


So far I have seen only these "contain" wording. I searched for:

wildcard , *

regex , regular, expression

starts with, ends with

match

like , %

I would presume a regex usage as already exist in promql. Is it addressed or intentionally excluded?

Yes, we intend to support regex using the same syntax as promql.

name =~ ".*test.*"

devrimdemiroz · 2022-04-26T23:47:12Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+
+## Aggregators
+
+All of the above expressions involve asking questions about individual spans. However, sometimes we want to ask questions about a set of spans. For that we have aggregate functions.


Aggregation functionality is something I am missing from APM world as traces/spans seem to be burst data flow. However, when it comes to usage scenarios, it was not about always about attributes of captured span/method/class/etc. In an HTTP scenario, the aggregation key can be query parameters like '?&operation=getItems' or an attribute inside json post body. For even more complex ones, one might need to skip the root or/and a few spans and extract key aggregation parameter in a couple of levels below.

The good news is, you generally do not need such aggregation for categorization over all nodes transaction visits, rather it gets generally needed on the very first contact point of an application or system.

I do not think TraceQL should cover all such aspects at once. Just sharing my previous needs in the last decade regarding aggregation.

Thanks for the thoughts. Since TraceQL can operate on custom attributes you could extract query params and add them as attributes to the span. Then you would be able to aggregate on them easily.

We did our best to cover the use cases we found the most important, but of course we will find users with new ideas and needs as the project grows. After we are all using TraceQL please keep bringing these ideas back to the table with real world examples so we can continue to improve the language.

You are doing not great, covering a lot at once, no discussions on that. Thanks!

One thing I want to add to this piece of thread as a side note (where I will try to bring back on usage) is that:
Such TraceQL in my point of view as well needs to be available on the collection point with the aggregation feature. I would prefer to use TraceQL as a processor on the collector, especially for aggregation. This is a known tactic to overcome some usual cons. Filter more, have less cardinality, less noise... so to solve some of them with aggregation as a layer before it reaches to a storage layer.

jpkrohling · 2022-04-27T14:50:04Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+## Structure
+
+A query is an expression that is evaluated on one trace at a time. The query is structured as a set of chained
+expressions (a pipeline). Each expression in the pipeline selects or discards spansets from being included in the results set. E.g.


What's a spanset?

I think it's explained later on: it's the set of spans from a trace matching the given condition.

Yup. When you write a traceql expression you are building a pipeline of operations on a spanset. A trace enters the pipeline whole and if any subset of it "survives" the filters and selectors applied in the pipeline then it is considered part of the resultset.

jpkrohling · 2022-04-27T15:53:11Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+
+If the trace being evaluated contains no spans with an attribute `http.status` with the value `200` then no spans will be selected and this trace will not appear in our resultset.
+
+If the trace does contain spans with an attribute `http.status` with the value `200` then only those spans will be returned. The trace is reduced to only the set of spans that match the condition inside the `{}`. This set of spans (and therefore trace) will be included in our resultset.


Perhaps I'm having some trouble with the terminology, but to me, a trace is the representation of a transaction, potentially across process boundaries. In other words, a trace is the set of all available spans for a given transaction. The result is then not the whole trace, but a partial trace with span set(s) matching the criteria. Would it be worth extending this to clarify that it's not the whole trace that is returned?

That is correct. In the general case the results of a traceql expression will be a collection of spansets. Each spanset will be a subset of spans from a given trace that "survive" the pipeline. I will add a bit more detail here. Thanks

docs/design-proposals/2022-04 TraceQL Concepts.md

jpkrohling · 2022-04-27T17:13:44Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+
+### Attribute fields
+
+We can refer to dynamic attributes (also known as tags) on the span or the span's resource.


Note: there's a new OTEP for a new "scope attribute".

jpkrohling · 2022-04-27T17:17:07Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+Note the difference between the above and the following:  
+`{ region = "eu-west-0"  && region = "eu-west-1" } `
+
+The second expression will return no traces because it's impossible for both conditions to be simultaneously true on the same span.


I believe it's possible to have repeated tags so that it's possible for a span to have two "region" tags, one with each value.

This is a good point I had not considered. Technically the proto does not prevent it, but the specification requires that for a set of attributes the key name is unique:

https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/common#attribute-collections

So span attributes should be unique, but technically that attribute could exist on both the resource and the span.

I agree the wire-format may allow it, but the spec does require uniqueness if I am reading it right.

docs/design-proposals/2022-04 TraceQL Concepts.md

jpkrohling · 2022-04-27T17:32:06Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+`{ duration > 1s }`
+
+The trace as a whole has a duration of over one second:  
+`max(end) - min(start) > 1s`


Trace duration is absolutely an attribute that people would query often. Perhaps having a root level would be already sufficient in terms of performance? The trace duration would then be max(end) - root.start?

mdisibio · 2022-04-27T20:05:23Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+| `duration` | end - start time of the span |
+| `name`     | operation or span name |
+| `status`   | status values are "error", "ok", or "unset" |
+| `parent`   | the parent of this span | 


The parent intrinsic I feel has some hidden complexity and wonder if it is a worthwhile element of the language. A basic spanset operator such as { A && B} is evaluated on an individual span which is a straightforward concept and allows for simple storage mechanisms and algorithms. However { parent.A && B } evaluates across multiple spans. The implied downstream implementation must now theoretically have both spans present in-memory and evaluated at the same time. For a worst case example consider the possibility of parent.parent.parent....

The parent intrinsic could be replaced with the existing child operator: { parent.A && B } is equal to {A} > {B}, which is more consistent with the rest of the language, and also the implementation is clear.

I do like parent for expressions such as:
{ parent.duration - duration > 1s }
or
{ parent.service.name != service.name }

Also, I'm fine with restricting parent to only being allowed to have depth of 1. Let's dig into the details as we work toward an internal POC. This is still an evolving spec and we can make adjustments.

@mdisibio your complexity argument doesn't sit well with me. The whole point of a dedicated trace QL to me is its ability to express queries across spans of the trace; if it didn't do that I could just as well stick with SQL. So any sensible implementation of TraceQL would have to deal with conditions not defined on individual spans, this complexity is unavoidable.

The language purity argument makes sense, I think it's a matter of how difficult it is to express the same clauses that Joe showed above using the {A} > {B} notation.

Signed-off-by: Joe Elliott <number101010@gmail.com>

mmanciop · 2022-05-03T08:02:03Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+`{ } >> { }`  
+  This is the descendant operator. The spans returned from this operator will match the right hand side conditions while also being descendants of spans that match the left hand side conditions.
+
+`{ } > { }`  


While > for direct parent-child relation looks intuitive when thinking of spans, I can imagine that overloading the > character will confuse users with less familiarity with the trace model.

kvrhdn · 2022-05-09T12:40:52Z

docs/design-proposals/2022-04 TraceQL Concepts.md

+
+If the trace does contain spans with an attribute `http.status` with the value `200` then only those spans will be returned. The trace is reduced to only the set of spans that match the condition inside the `{}`. The resultset will contain only this subset of spans matching the condition.
+
+### Intrinsic fields


I think span kind is missing from this list. Span kind should be especially useful with the descendant operator.

Signed-off-by: Joe Elliott <number101010@gmail.com>

joe-elliott · 2022-06-06T20:12:28Z

Updated with the latest

Shorthand removed.
Updated all attributes to use the latest syntax.
Added a section on resource vs span attributes
Removed start/end intrinsics.

joe-elliott · 2022-06-06T20:14:13Z

@till Apologies for the late response. After discussing this internally we have decided that all code related to TraceQL will be licensed under AGPLv3.

joe-elliott · 2022-06-08T18:03:41Z

Here we go!

lets go

94baed3

Signed-off-by: Joe Elliott <number101010@gmail.com>

joe-elliott requested review from KMiller-Grafana, annanay25, mdisibio, dgzlopes, mapno, kvrhdn and zalegrala as code owners April 12, 2022 20:31

joe-elliott added 2 commits April 12, 2022 16:37

Fixed examples and table

5ac599b

Signed-off-by: Joe Elliott <number101010@gmail.com>

formatting

d006ed0

Signed-off-by: Joe Elliott <number101010@gmail.com>

yurishkuro reviewed Apr 12, 2022

View reviewed changes

docs/design-proposals/2022-04 TraceQL Concepts.md Outdated Show resolved Hide resolved

annanay25 reviewed Apr 13, 2022

View reviewed changes

pavolloffay reviewed Apr 13, 2022

View reviewed changes

mdisibio reviewed Apr 13, 2022

View reviewed changes

docs/design-proposals/2022-04 TraceQL Concepts.md Outdated Show resolved Hide resolved

mdisibio reviewed Apr 13, 2022

View reviewed changes

docs/design-proposals/2022-04 TraceQL Concepts.md Outdated Show resolved Hide resolved

joe-elliott and others added 6 commits April 13, 2022 11:04

Update docs/design-proposals/2022-04 TraceQL Concepts.md

38a6a65

Co-authored-by: Annanay Agarwal <annanay25@users.noreply.github.com>

Update docs/design-proposals/2022-04 TraceQL Concepts.md

dee0f75

Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>

Update docs/design-proposals/2022-04 TraceQL Concepts.md

ad444e6

Co-authored-by: Martin Disibio <mdisibio@gmail.com>

claned up max example

5b1ea88

Signed-off-by: Joe Elliott <number101010@gmail.com>

reworked structure section

8dd0b80

Signed-off-by: Joe Elliott <number101010@gmail.com>

Moved service name

c534ad0

Signed-off-by: Joe Elliott <number101010@gmail.com>

electron0zero reviewed Apr 18, 2022

View reviewed changes

devrimdemiroz reviewed Apr 26, 2022

View reviewed changes

jpkrohling reviewed Apr 27, 2022

View reviewed changes

mdisibio reviewed Apr 27, 2022

View reviewed changes

joe-elliott added 2 commits April 27, 2022 16:30

formatted table properly for @jpkrohling

3dc0a83

Signed-off-by: Joe Elliott <number101010@gmail.com>

small clarification on resultsets

f897a21

Signed-off-by: Joe Elliott <number101010@gmail.com>

mmanciop reviewed May 3, 2022

View reviewed changes

kvrhdn reviewed May 9, 2022

View reviewed changes

joe-elliott mentioned this pull request May 19, 2022

TraceQL Parser #1436

Merged

3 tasks

updates

3005558

Signed-off-by: Joe Elliott <number101010@gmail.com>

joe-elliott requested a review from knylander-grafana as a code owner June 6, 2022 20:10

mdisibio approved these changes Jun 8, 2022

View reviewed changes

joe-elliott merged commit 02a945d into grafana:main Jun 8, 2022


		Note that the second expression requires both conditions to be true on the same span. The entire expression inside of `{}` must be evaluated as true on a single span for it to be included in the resultset.

		## Combining Spansets


		A query is an expression that is evaluated on one trace at a time, and iterates, evaluating each trace. The expression is evaluated on each trace as a whole, one at a time. If the evaluation produces a set of spans, then the trace and the set of spans is part of the result set.

		The structure of a query is an expression or set of chained expressions (a pipeline). Chained expressions are evaluated left to right. Each expression selects or discards this trace from being included in the results set.


		### Attribute fields

		We can refer to dynamic attributes (also known as tags) on the span or the span's resource.


		## Aggregators

		All of the above expressions involve asking questions about individual spans. However, sometimes we want to ask questions about a set of spans. For that we have aggregate functions.


		If the trace being evaluated contains no spans with an attribute `http.status` with the value `200` then no spans will be selected and this trace will not appear in our resultset.

		If the trace does contain spans with an attribute `http.status` with the value `200` then only those spans will be returned. The trace is reduced to only the set of spans that match the condition inside the `{}`. This set of spans (and therefore trace) will be included in our resultset.


		If the trace does contain spans with an attribute `http.status` with the value `200` then only those spans will be returned. The trace is reduced to only the set of spans that match the condition inside the `{}`. The resultset will contain only this subset of spans matching the condition.

		### Intrinsic fields

Design Proposal: TraceQL Concepts #1378

Design Proposal: TraceQL Concepts #1378

Conversation

joe-elliott commented Apr 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdisibio Apr 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdisibio Apr 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

till commented Apr 13, 2022

joe-elliott commented Apr 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metalmatze commented Apr 19, 2022

joe-elliott commented Apr 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joe-elliott Apr 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joe-elliott Apr 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joe-elliott commented Jun 6, 2022

joe-elliott commented Jun 6, 2022

joe-elliott commented Jun 8, 2022

joe-elliott commented Apr 12, 2022 •

edited

Loading

mdisibio Apr 27, 2022 •

edited

Loading

mdisibio Apr 13, 2022 •

edited

Loading

joe-elliott commented Apr 19, 2022 •

edited

Loading

joe-elliott Apr 27, 2022 •

edited

Loading

joe-elliott Apr 27, 2022 •

edited

Loading