Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpha.2 Missing metrics on scrape when a sample limit is set (even when sample limit is not being actually hit) #2769

Closed
checketts opened this Issue May 25, 2017 · 9 comments

Comments

Projects
None yet
3 participants
@checketts
Copy link

checketts commented May 25, 2017

What did you do?
Running the v2.0.0-alpha.2 and comparing the results against my existing v1.6.2 deployment

What did you expect to see?
Using v1.6.2 I can see the scrape samples coming in regularly with the query scrape_samples_scraped{app="drydock"}
image

What did you see instead? Under which circumstances?
With alpha.2, scraping the same targets I see metrics being missed
image

Environment
Running in Docker container in Kubernetes 1.4.1

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 25, 2017

Do you know which metrics are missing? Is it always the same ones?

@checketts

This comment has been minimized.

Copy link
Author

checketts commented May 25, 2017

I believe they are the same metrics that are missing. The red line is my non-leader server which reports fewer metrics and notice how it is not showing the fluctuation.

On the alpha.2 server there is a log line that corresponds to the dropped metrics of

time="2017-05-25T15:37:20Z" level=error msg="append failed" err="sample limit of 7500 exceeded" source="scrape.go:518" target="{__address__="192.168.198.182:8180", __metrics_path__="/prometheus", __scheme__="http", app="drydock", app_version="1.0.809_master", instance="192.168.198.182:8180", job="kubernetes-pods", kubernetes_namespace="drydock", kubernetes_pod_name="drydock-2762539047-fc53l", pod_template_hash="2762539047"}"

So that would imply it is passing the 7500 limit I had set, but as we can see from the v1.6.2 server, there are <1000 metrics.

Also The query prometheus_target_scrapes_exceeded_sample_limit_total{app="drydock"} is returning no datapoints on either server, so that log message seems to be in error.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 25, 2017

Does the problem go away if you remove the limit? I see some issues with this code in 2.0, but nothing that'd cause this.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 25, 2017

Ah, we're passing along the wrong error message.

@checketts

This comment has been minimized.

Copy link
Author

checketts commented May 25, 2017

Removing the sample limit completely made the problem go away. Increasing the limit didn't work (though I had only tried tripling the value)

@checketts checketts changed the title Alpha.2 Missing metrics on scrape Alpha.2 Missing metrics on scrape when a sample limit is set (even when sample limit is not being actually hit) May 25, 2017

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 30, 2017

Is this fixed with #2772 @brian-brazil

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 31, 2017

No, I've got the code to fix just need to write the unittests.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 6, 2017

Fixed in #2787

@fabxc fabxc closed this Jun 6, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.