Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prom2: 'Out of bounds' tsdb error drops target entirely #2894

Closed
johrstrom opened this Issue Jul 3, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@johrstrom
Copy link

johrstrom commented Jul 3, 2017

What did you do?
I'm scraping NetData targets that give the timestamp with the metric in the response as seen below. Sometimes some metrics are not often updated (or ever after they're created) leading to out of bounds errors.

Here's an example, I've added the comment to the out of bounds metric for the purpose of this ticket.

apps_vmem_tc_qos_helper{instance="host"} 402 1499115245472
apps_vmem_email{instance="host"} 0 1498781084668                 # this metric is out of bounds
apps_vmem_logs{instance="host"} 111669 1499115245472
apps_vmem_ssh{instance="host} 10027 1499115245472

My settings for block durations seem to be very reasonable.
storage.tsdb.max-block-duration 17h48m0s
storage.tsdb.min-block-duration 2h0m0s

What did you expect to see?
I would expect that only the time series' that are out of bounds be dropped, not the target entirely. Logs only indicate this is a warning. I assume the expected behaviour is to drop only the erroneous time series' during ingestion and append the rest that are correctly within the bounds.

So for example if a target has 3 metrics which are out of bounds, I should ingest all except those three.

What did you see instead? Under which circumstances?
All time series' from a target with any metrics that are 'out of bounds' are being dropped. So even if 1 out of 1,000 is out of bounds, all 1,000 get dropped and the target is considered 'down' due to the out of bounds errors.

  • Prometheus version:
    I'm a little bit ahead of the 2.0.0-alpha.2 release but behind the current branch.
prometheus, version 2.0.0-alpha.2a (branch: dev-2.0, revision: dfef6cf2345465285e27802fcd24219ccbab1523)
  build user:       jeff@localhost.localdomain
  build date:       20170616-16:54:53
  go version:       go1.8

Logs
Jul 03 21:15:21 ip prometheus[1829]: time="2017-07-03T21:15:21Z" level=warning msg="append failed" err="out of bounds" source="scrape.go:606" target=....

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jul 4, 2017

This looks like a trivial fix where we need to add a case to this switch statement. Will send a PR soon.

gouthamve pushed a commit to gouthamve/prometheus that referenced this issue Jul 4, 2017

Goutham Veeramachaneni
Handle scrapes with OutOfBounds metrics better
fixes prometheus#2894

Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>

@fabxc fabxc closed this Jul 5, 2017

@johrstrom

This comment has been minimized.

Copy link
Author

johrstrom commented Jul 5, 2017

I've built and deployed at commit 24e9dea but the issue persists. I'll try to track down what seems to have been missed, but this issue is about items in the far past not the future just to be clear.

These are the only relevant log lines I see.
level=warning msg="append failed" err="out of bounds" source="scrape.go:626"

I never see these and when I turned on debug these never came either.

Can this be reopened?

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jul 7, 2017

Fixed for good in #2906 Can you give it another try @johrstrom?

@gouthamve gouthamve closed this Jul 7, 2017

@johrstrom

This comment has been minimized.

Copy link
Author

johrstrom commented Jul 7, 2017

Confirmed, thank your the quick support.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.