Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics exporter dropped when some of the metrics are too old to ingest / idle (msg="append failed" err="out of bounds") #3930

Closed
jhooyberghs opened this Issue Mar 8, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@jhooyberghs
Copy link

jhooyberghs commented Mar 8, 2018

What did you do?
We're scraping data from multiple node exporters. We make use of the text collector feature (https://github.com/prometheus/node_exporter#textfile-collector), which gets filled with data/metrics every 15/30 minutes by a cronjob. However when this cronjob fails, those text collector metrics become too old to ingest and the entire node exporter metrics get dropped because "out of bounds"

When investigating the issue I also stumbled upon #2894 which sound a lot like our issue, but I tried with prometheus 2.0 and 2.1 which include these fixes and both gave me the same issue.

What did you expect to see?
We expect the normal node exporter metrics to still be handled, and only the "stale" metrics to be dropped

What did you see instead? Under which circumstances?
Each time when the textfile collector metrics don't get updated, for example by simply disabling our cronjob, all newly requested metrics are dropped and the entire exporter seems to be down for prometheus. In the logfiles this is chown as:

Mar 08 13:38:26 prometheus2 prometheus[445]: level=warn ts=2018-03-08T13:38:26.766018935Z caller=scrape.go:921 component="scrape manager" scrape_pool=consul_sd target=http://172.22.4.178:9100/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=3
Mar 08 13:38:26 prometheus2 prometheus[445]: level=warn ts=2018-03-08T13:38:26.766064571Z caller=scrape.go:686 component="scrape manager" scrape_pool=consul_sd target=http://172.22.4.178:9100/metrics msg="append failed" err="out of bounds"
Mar 08 13:38:41 prometheus2 prometheus[445]: level=warn ts=2018-03-08T13:38:41.765552322Z caller=scrape.go:921 component="scrape manager" scrape_pool=consul_sd target=http://172.22.4.178:9100/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=3
Mar 08 13:38:41 prometheus2 prometheus[445]: level=warn ts=2018-03-08T13:38:41.76560626Z caller=scrape.go:686 component="scrape manager" scrape_pool=consul_sd target=http://172.22.4.178:9100/metrics msg="append failed" err="out of bounds"

Environment

  • System information:

Linux 3.16.0-5-amd64 x86_64

  • Prometheus version:

prometheus, version 2.1.0 (branch: HEAD, revision: 85f23d8)
build user: root@6e784304d3ff
build date: 20180119-12:01:23
go version: go1.9.2

  • Prometheus configuration file:
---
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    monitor: master
rule_files:
- "/etc/prometheus/alert.rules"
- "/etc/prometheus/rules/filesystem_usage_checks.rules"
- "/etc/prometheus/rules/vualto_checks.rules"
- "/etc/prometheus/rules/aem_checks.rules"
scrape_configs:
- job_name: prometheus
  scrape_interval: 10s
  scrape_timeout: 10s
  static_configs:
  - labels:
      alias: Prometheus
    targets:
    - localhost:9090
  metric_relabel_configs:
  - source_labels:
    - __name__
    regex: go_(.*)
    action: drop
  - source_labels:
    - __name__
    regex: http_(.*)
    action: drop
- job_name: exported
  file_sd_configs:
  - files:
    - "/etc/prometheus/include/*.yaml"
  metric_relabel_configs:
  - source_labels:
    - __name__
    regex: go_(.*)
    action: drop
  - source_labels:
    - __name__
    regex: http_(.*)
    action: drop
  relabel_configs:
  - source_labels:
    - env
    regex: sandbox
    action: drop
- job_name: consul_sd
  honor_labels: true
  consul_sd_configs:
  - server: REPLACED
    scheme: https
    tls_config:
      insecure_skip_verify: true
  relabel_configs:
  - source_labels:
    - __meta_consul_service
    regex: "(consul)"
    target_label: job
    action: drop
  - source_labels:
    - __meta_consul_service
    regex: "(.*)"
    target_label: job
    replacement: "$1"
  - source_labels:
    - __meta_consul_node
    regex: "(.*)"
    target_label: instance
    replacement: "$1"
remote_read: []
remote_write: []
  • Logs:
Mar 08 13:38:26 prometheus2 prometheus[445]: level=warn ts=2018-03-08T13:38:26.766018935Z caller=scrape.go:921 component="scrape manager" scrape_pool=consul_sd target=http://172.22.4.178:9100/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=3
Mar 08 13:38:26 prometheus2 prometheus[445]: level=warn ts=2018-03-08T13:38:26.766064571Z caller=scrape.go:686 component="scrape manager" scrape_pool=consul_sd target=http://172.22.4.178:9100/metrics msg="append failed" err="out of bounds"
Mar 08 13:38:41 prometheus2 prometheus[445]: level=warn ts=2018-03-08T13:38:41.765552322Z caller=scrape.go:921 component="scrape manager" scrape_pool=consul_sd target=http://172.22.4.178:9100/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=3
Mar 08 13:38:41 prometheus2 prometheus[445]: level=warn ts=2018-03-08T13:38:41.76560626Z caller=scrape.go:686 component="scrape manager" scrape_pool=consul_sd target=http://172.22.4.178:9100/metrics msg="append failed" err="out of bounds"
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 8, 2018

You should not use timestamps with the textfile collector, and they are forbidden in the next version.

It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.