-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log flooding with "skipping resharding" warning (remote_write with write_relabel_configs) #7124
Comments
Thank you for your report, looks like there are a couple things happening here:
Fixing the first case would also make it almost impossible to trigger the second case as low volumes are unlikely to ever reshard above the minimum anyway, so perhaps let's just start with that. |
Hello, this may be an overly simple attempt at fixing this problem, I am still learning. Please let me know if I am heading in the correct direction on my fork: https://github.com/thaniri/prometheus/pull/2/files If I understand what is going on here, resharding will be skipped if this if statement is evaluated to true. Copy and pasting this if statement here means that the warning is not printed when we evaluate calculateDesiredShards(), but rather after we evaluate calculateDesiredShards() we later decide to not reshard because we have not written to remote storage in the configured time frame. What I would like to know is what we should do with the failing test cases in queue_manager_test.go
The test case is correctly saying this is wrong, because we have moved the logic that "decides" to not reshard to a higher level in the code. I cannot find an existing test suite for Can you please offer up any advice on how we should handle this test case? It seems like an actually important case to capture, but I am not sure where to move this test to (or if I need to create a completely new test suite). |
Thanks for looking into this @thaniri! The PR you linked looks like a good approach to me, I like that As far as testing goes, I would say you should change that test to focus on testing updating sharding at the higher level. Perhaps break the inner part of |
Thank you @csmarchbanks. I have now put up a pull request: #7143 As for the testing, I realized that the test case that was failing actually just needed to be tweaked a little bit, not that we needed an entirely new test suite. |
Fixed in #7143 |
What did you do?
Installed prometheus which scrapes some targets and writes metrics with label
longterm=true
(with the help ofwrite_relabel_configs
) to remote (InfluxDB v1.7.10). Prometheus is up and running for 5 minutes.What did you expect to see?
Samples of metrics with label
longterm=true
in remote storage and crystal clean prometheus log without errors/warnings.What did you see instead? Under which circumstances?
Remote write is working and samples I need are writing into remote storage, however prometheus is constantly flooding to log with
Skipping resharding, last successful send was beyond threshold
. I believe it's not normal for production use. When I disablewrite_relabel_configs
and prometheus starts to write all metrics to remote - I don't see warnings anymore, however writing all metrics to remote is not my case)Environment
System information:
Prometheus is run in Docker container prom/prometeus:v2.17.1 (on Kubernetes 1.11) with Linux 4.19.88 x86_64
Prometheus version:
I've also tried prometheus docker image with version v2.15.1 which I have on another cluster. Result is same.
Also I've tried to use VictoriaMetrics v1.34.7 as remote storage. Result is same. I believe it's Prometheus issue and it's not remote storage related. When I disable
write_relabel_configs
there are no warnings anymore, but I don't want to write all metrics from Prometheus to remote, only part of them.The text was updated successfully, but these errors were encountered: