Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tempo-cli trace-summary issue #3690

Closed
edgarkz opened this issue May 20, 2024 · 5 comments · Fixed by #3697
Closed

Tempo-cli trace-summary issue #3690

edgarkz opened this issue May 20, 2024 · 5 comments · Fixed by #3697

Comments

@edgarkz
Copy link
Contributor

edgarkz commented May 20, 2024

Hello

I'm trying to debug and understand the root cause of huge/infinite trace we have in our system.
compactors shows the following:

level=warn ts=2024-05-20T09:33:45.925427878Z caller=compactor.go:247 msg="max size of trace exceeded" tenant=single-tenant traceId=53a5802e218b95xxxxxxxxxxxx rootSpanName= rootServiceName= discarded_span_count=44817

Grafana tempo search can't fetch the trace due to size limits , so i tried to use the tempo cli in order to get some idea of that specific trace, however, the cli command just hangs iterating the backend blocs and does not provide any info
I tried to wait and even after 30 mins there is no progress.

 go run ./cmd/tempo-cli query trace-summary 53a5802e218b95xxxxxxxxxxxx single-tenant  --backend=s3 --bucket=axxxxxxxxxx --s3-endpoint s3.us-east-1.amazonaws.com
total blocks to search:  772
..............0.......................................................^Csignal: interrupt

Im running with latest tempo cli source code and couldn't find any way to limit or interrupt the search for partial info.

Please advise how to proceed.

Expected behavior
tempo-cli query trace-summary should provide stats and summary (or at least partial info for scanned blocks) regardless of trace size.

Environment:

  • Infrastructure: EKS 1.26 + s3 backend
  • Deployment tool: helm 2.4 distributed
@joe-elliott
Copy link
Member

I've never seen this command hang and we use it semi regularly. Could you use fmt.Printlns to determine where it was hanging?

@edgarkz
Copy link
Contributor Author

edgarkz commented May 20, 2024

@joe-elliott can you please provide an example of the above function usage? im not familiar with.

@edgarkz
Copy link
Contributor Author

edgarkz commented May 20, 2024

Some updates, i managed to get an output using additional argument --percentage 0.01 but its pretty empty of info.

Is there an option to add --percentage param to tempo-cli query blocks so i can try and dump those spans?

total blocks to search:  7
..0.....Number of blocks: 4 
Span count: 176817 
Trace size: 130369402 B 
Trace duration: 111822 seconds 
Root service name:  
Root span info:
No root span found
top frequent service.names: 
[xxxxx.Service xxxx.Service xxxx.Service xxxx.Service xxxx.Service

@joe-elliott
Copy link
Member

@joe-elliott can you please provide an example of the above function usage? im not familiar with.

Just place fmt.Println("foo") anywhere in the codebase and you will see it print out when the line is hit. It can be used to get a sense of where the application is hanging. Since you are running the cli directly from source go run ./cmd/tempo-cli it's quite easy.

Some updates, i managed to get an output using additional argument --percentage 0.01 but its pretty empty of info.

Scanning 1% of your total blocks you found 176K spans and 130MB of trace data. The cli is likely hanging b/c this trace is enormous. If trace summary isn't returning then query blocks definitely won't. This suggests that your trace is 17M spans and 13GB.

Is there an option to add --percentage param to tempo-cli query blocks so i can try and dump those spans?

Sure, give it a shot. Here is the code that restricts to a %age of blocks:

https://github.com/grafana/tempo/blob/main/cmd/tempo-cli/cmd-query-trace-summary.go#L95-L103

You could copy that into the query blocks code here:

https://github.com/grafana/tempo/blob/main/cmd/tempo-cli/cmd-query-blocks.go#L86

@edgarkz
Copy link
Contributor Author

edgarkz commented May 20, 2024

Thank you @joe-elliott
I'll raise a PR with those changes since its really useful to have this limit in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants