Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using trace duration in traceQL #2311

Closed
amitsetty opened this issue Apr 7, 2023 · 7 comments
Closed

Using trace duration in traceQL #2311

amitsetty opened this issue Apr 7, 2023 · 7 comments
Labels

Comments

@amitsetty
Copy link
Contributor

To my understanding I can use duration in traceQL to query on span durations. This doesn’t serve my use case since I want to know about traces which take a long time since they mean a lot more than spans. Is this something that is planned or that I can contribute to?

@joe-elliott
Copy link
Member

We are discussing adding a trace scope here:

#1989

One of the primary reasons is b/c it's much more efficient to search for trace duration then it is to search for span duration. For me the biggest question is how to introduce it into the language. The change is not particularly complex but it would touch a lot of different pieces (parser, engine and fetch layer).

Going to add a comment in that other issue to get some discussion going.

@mdisibio
Copy link
Contributor

A common convention is to have a root span covering the whole execution. Its span duration is the same as the trace. Would that help you in the meantime?

@amitsetty
Copy link
Contributor Author

amitsetty commented Apr 17, 2023

Hey,
Tl:dr - it’s not a matter of convention rather that there is no actor involved in the whole service in the system and therefore a root span doesn’t contain the whole trace duration

This would be great to have in a case where there was a root span

The issue is that my systems are communicating using queues, file systems, and other offline asynchronous communication methods.
The first actor does an action and then puts a message in a queue (while being instrumented). The second actor picks the message up from the queue and acts (also instrumented). In reality I want to measure duration from the beginning of the first span to the end of the last but there is no actor involved in this whole process and therefore span duration and span metrics don’t help me address my data with a service centric view in the current approach. Having the capability to get trace duration will allow us to use traceql to find problematic traces and in the future when there are trace aggregations to figure out if a service is running correctly. Trace metrics would also help solve this problem however open up a lot of unknowns

@amitsetty
Copy link
Contributor Author

Hey, any plans or updates around this?

@joe-elliott
Copy link
Member

I think the main blocker is deciding on a syntax. We've been very focused on performance the last month or so and haven't pushed on this. Discussion here:

#1989

@mdisibio
Copy link
Contributor

This was merged in #2503. The trace duration can be referenced like { traceDuration > 10s } and combined with any other conditions.

@zalegrala
Copy link
Contributor

I believe this is resolved with #2503. Reopen with a comment if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants