-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mimirtool backfill: failed uploading block #8164
Comments
Hello, it seems that block validation step (done by Mimir after block is fully uploaded, but before it is "accepted") has failed, and now block cannot be uploaded or validated anymore. Such block is treated as incomplete-upload, and will be eventually deleted by Mimir. After that, you can try upload it again. Alternatively, you can try to delete |
Ah good to know thanks! Though it appears I get this issue on every block I try to import 🤔 |
Is there any way to see how the validation failed? It's a block taken from a classic prometheus instance, I figure it should work without issues? (though maybe it's too old in the past ? ) |
You should be able to see failure reasons in compactor logs. Based on mimirtool output, these blocks look quite big (40 GB?), so validation can take a while, and compactor must be running during the process without interruption. |
Yeah I only get that:
Guess there is some timeout to tweak somewhere? |
to be precise*, so it seems to fail after 80s ? |
I don't know about any timeout on the compactor side for validation / downloading blocks, but individual storage clients can have operation timeouts (eg. Swift storage has request timeout config option). |
@pstibrany so you think the error comes from the storage provider? I'm using GCS, and doesn't seem to be any timeouts configurable :/ |
My thinking was that client-side could have some timeout, but in fact I wouldn't expect that to be logged as context canceled. We use GCS client internally, and don't see timeouts when reading data from it. I'm wondering ... Is that the full compactor log, or is there possibly more messages that could be related to validation (block upload feature)? Any other error logged by compactor that would explain what's happening? |
The 3 lines were next to each other in the compactor logs. So I guess there is no other messages related to this upload ? |
Could you try grepping on block ID, maybe something still shows up? |
I just tested with another block, live tailing the logs only gets me that:
|
To unblock you, you can disable validation ( |
Yes it looks like it's uploaded correctly. It's not yet shown in Mimir though, I guess I need to wait some time? Can I refresh reset some cache somewhere? |
It can take about 30mins to block to become queryable -- compactor needs to include the block in bucket-index, and then queriers and store-gateways need to reload the bucket index and fetch parts of the block (store-gateways do that). If you try to query it before, empty results may be cached in the results cache. You can flush results cache by restarting memcached. |
Okay I might see some empty cached results, though I have deployed mimir without memcached enabled, is it still used somewhere? |
Well I restarted my 3 store gateways, but still nothing. I'll wait a bit more, but it's been over 1h already 🤔 |
What time range does the uploaded block cover? Can you see if store-gateways loaded your new block? |
The upload block cover roughly from january 2024 to a few weeks ago. I can see in the store gateway:
If I grep a block I uploaded. Another for instance:
|
👍 That shows that blocks are now in bucket-index, and store-gateways know about them. Queriers should too, since they load the same bucket-index. Since you don't have any cache ("I have deployed mimir without memcached enabled"), I don't see other reason why your queries should not work anymore. You can try something like |
Well I'm getting timeout with this query 😅 even on a 5 minutes interval if it's a few weeks ago. In the querier logs:
Do you have any ideas? |
Seems it's working now after the night! I'll take it! Thanks for the help! |
Unfortunately I'm not familiar with this one. |
Describe the bug
I'm trying to backfill prometheus blocks to mimir.
Maybe linked to my blocks ?
Mimir/mimirtool version: 2.12.0
So first time I upload a block ( kinda big, ~50G), the 10.52.1.224 is the IP of the compactor:
And once I try again:
Any ideas?
Environment
The text was updated successfully, but these errors were encountered: