Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upquery shows bogus results #3776
Comments
brian-brazil
added
kind/bug
component/local storage
labels
Feb 1, 2018
This comment has been minimized.
This comment has been minimized.
|
That's not good. What versions of Prometheus has been run using the same storage? |
This comment has been minimized.
This comment has been minimized.
|
Also, incidentally you should read https://www.robustperception.io/federation-what-is-it-good-for/ |
This comment has been minimized.
This comment has been minimized.
|
Unlikely, but do you have a proxy or cache in front of the Prometheus instance you're querying? |
This comment has been minimized.
This comment has been minimized.
|
The same storage was used by the previous version, Prom 2.0.0 . We have this setting in 16 regions, however only 1 showed the above bogus result. No proxy or cache involved. |
This comment has been minimized.
This comment has been minimized.
|
Are there any indications of disk data corruption on the machine? We have checksums so should catch that, but you might have gotten really unlucky. |
This comment has been minimized.
This comment has been minimized.
|
There were issues in the past related to the retention fix: #3534. However Prometheus 2.0.0 showed the correct results. |
This comment has been minimized.
This comment has been minimized.
|
@auhlig sorry about the delay on this as it is quite severe. If you roll back to Prometheus 2.0.0, do the results become correct again. How big are your |
This comment has been minimized.
This comment has been minimized.
|
Hi @fabxc,
|
This comment has been minimized.
This comment has been minimized.
|
Without the data, we can only presume that this fixed unless someone else reports it. |
brian-brazil
closed this
Mar 8, 2018
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Nothing in the logs? Left over .tmp folders would usually indicate some kind of failed compactions. |
This comment has been minimized.
This comment has been minimized.
|
Can you provide us one or more blocks for affected time ranges before wiping this time? Probably fine to skip on the chunk files – at least initially. |
This comment has been minimized.
This comment has been minimized.
|
Grepping for (1) |
This comment has been minimized.
This comment has been minimized.
|
Something like the following?
Most of the 14221 *.tmp folders look like:
|
This comment has been minimized.
This comment has been minimized.
|
The That's pretty unfortunate for remote debugging. In the Log message (3) confuses me as it apparently cannot remove the chunk directory. I thought it was an issue in the code but we are actually calling What Prometheus version are you using by now? Note that 2.1.0 and 2.2.0 both have issues and you should definitely upgrade to 2.2.1. |
fabxc
reopened this
Mar 22, 2018
This comment has been minimized.
This comment has been minimized.
|
Regarding the blocks:
Regarding (1): It's always the same series as shown below.
Regarding os.RemoveAll: Is the directory we're trying to delete used by another goroutine? That could potentially lead to the error. We're still the same Prometheus version:
I'll update staging today and continue with our productive regions next week. |
This comment has been minimized.
This comment has been minimized.
|
We believe this was fixed in 2.3.0 as part of the big promql changes, if you still see this after that version please let us know. |
brian-brazil
closed this
Jun 15, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |

auhlig commentedFeb 1, 2018
•
edited
What did you do?
We're using 2 Prometheis in a federation. A "collector" and a "frontend". The collector gets the metrics from, in this case, cAdvisor. The frontend federates aggregated data from the collector (config). While querying for
container_cpu_system_seconds_totalin the collector returns the expected results, the same query in the frontend shows other metrics. See screenshot below.What did you expect to see?
only
container_cpu_system_seconds_totalmetricsWhat did you see instead? Under which circumstances?

As shown in the screenshot below, the query for
container_cpu_system_seconds_totalalso shows unrelated metrics.Environment
Linux 4.14.11-coreos x86_64
prometheus, version 2.1.0 (branch: HEAD, revision: 85f23d8)
build user: root@6e784304d3ff
build date: 20180119-12:01:23
go version: go1.9.2
Can be found here.