-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout & cache on metrics generator local-blocks processor #3768
Comments
Hi, thanks for posting your configuration. Can you also give an estimate of how many span/s (tempo_distributor_spans_received_total) this cluster is receiving? Also, a file listing or information about the blocks in the generator (maybe default To start, yes there is local caching in the local-blocks processor, which is why it is faster on the next call. Thanks for testing async, agree generally sync is faster which is why it is the default. Aggregate By is not heavy on the frontend and queriers, it is mainly metrics generator. Here are next steps I would recommend:
|
around ~500k spans/s
For example one file in wal:
|
after updating parquet_row_group_size_bytes to 100MB
|
WAL "blocks" are composed of internal flushes which are mini-parquet files. This looks like flush 191, and is 130MB. That number of flushes for a WAL block is kind of high. Are flush_check_period and max_block_duration default values? With default values flush_check_period=10s, max_block_duration=1m, there are 6 flushes per wal block, and 60 blocks total for last hour. The final blocks are in For 500K spans/s you may need 50+ generators to get the "cold" latency to your target. |
one of the block file:
does this make sense? I am using default MaxBlockBytes(500MB) and max_block_duration to 3m |
This is listing the directory, can you list the files inside the folder (i.e. data.parquet) ?
Yes, definitely. This blog post has a walk through and to use |
Got it, example on data.parquet:
|
@icemanDD Hi, did the new settings and dedicated columns help? |
Hi @mdisibio that helps a bit, but we realize traceql metrics works better, does it follow the same pattern for optimization? Or it needs scaling for querier for better performance, especially query older data? |
Describe the bug
When onboard metrics generator local-blocks processor(Aggregate by), it always timeout for for the first query but would load the results for the second query or third query, and we did not set any cache layer.
Is there any local cache for this feature?
How should we optimize query-frontend, querier and metrics generator to get local-blocks working on millions of spans?
To Reproduce
tempo config:
use async iterator is not helpful:
VPARQUET_ASYNC_ITERATOR="1"
Environment
Tempo 2.4.2
Expected behavior
local-blocks could return results in less 10-15 secs with the proper configuration, or only return the queried data in 10-15 secs instead of trigger the timeout
The text was updated successfully, but these errors were encountered: