-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[prometheus exporter] Metric value overflow on metrics_expiration #6935
Comments
@gouthamve, is this something you can help with? |
Thanks @jpkrohling. 🙏🏼 I have more evidence to suggest this is a bug in The following are screenshots of exactly the same data, with identical queries and timestamp ranges. The first screenshot is from a locally running Grafana querying against a local prometheus server scraping data from my local OTEL collector's The second screenshot is from an observability provider (M3 is the backing metrics store) where by the prometheus exporter Looks fine, the data is prometheus remote write exporter Logs show the number jumping to the MaxInt64 value as reported earlier:
|
I'm confused. The issue title says "prometheusremotewrite", but the sample config uses the It looks like what is happening here is that the data flow is something like |
@albertteoh can you try building a collector from this branch and let me know if the issue persists? |
Ah, sorry, silly mistake on my part. I had originally titled it with
I tried your branch and looks good 👍🏼 Thanks very much for the quick turnaround, @Aneurysm9! |
Describe the bug
When a metric expires, the value for the metric appears to overflow the max int64 value:
9223372036854775808
.Steps to reproduce
docker run --rm --network="host" --env JAEGER_AGENT_HOST=localhost --env JAEGER_AGENT_PORT=6835 -p8080-8083:8080-8083 jaegertracing/example-hotrod:latest all
6835
.latency_bucket
metrics are correct for the first minute of data, and make a note of which bucket hascount > 0
. For example, where the metric has labelle = "250"
.latency_bucket{service_name = "driver", le="250"}
What did you expect to see?
After a minute (on metric expiry), the metric with
le = "250"
(example querylatency_bucket{service_name = "driver", le="250"}
) should no longer be query-able (a null value).What did you see instead?
After a minute (on metric expiry), the metric with
le = "250"
(example querylatency_bucket{service_name = "driver", le="250"}
) will jump to9223372036854775808
.What version did you use?
Version:
main
branchWhat config did you use?
Config:
Environment
OS: "Ubuntu 20.04"
Compiler(if manually compiled): "go 1.17.5")
Additional context
spanmetrics
processor should be doing to avoid this problem?metric_expiration
duration?The following screenshots illustrate the problem for the default
5m
metric_expiration
configuration:Before Expiry
![before-expiry](https://user-images.githubusercontent.com/26584478/147091355-f41ae85e-10f7-4e7f-92cf-bfccd6a7483c.png)
After Expiry
![after-expiry](https://user-images.githubusercontent.com/26584478/147091346-cce82f75-460b-467c-bda8-c68e1b59bb11.png)
The logs also reflect the above screenshots, showing the correct metrics initially, then suddenly jumping to a very large count:
le = "250"
bucket has a count of3
.le = "250"
bucket jumps to a very large number, which appears to be 1 + MaxInt64.The text was updated successfully, but these errors were encountered: