Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results are nondeterministic for regex __name__ queries #3794

Closed
fmoessbauer opened this Issue Feb 4, 2018 · 6 comments

Comments

Projects
None yet
2 participants
@fmoessbauer
Copy link

fmoessbauer commented Feb 4, 2018

What did you do?

Use regex query on medium time ranges (e.g. 1 week):
{__name__=~"mem_(active|buffers|cached|free|inactive).*", host="kronos"}

What did you expect to see?

All matching series for the selected time range. If executed multiple times, I expect to see the same results (roughly, if range is up to now). The results should be deterministic.

What did you see instead? Under which circumstances?

Some series are missing for some timesteps. If I repeat the query multiple times, the results are nondeterministic. Sometimes all series are shown, sometimes not. Which series are fully shown also changes (see pictures). Prometheus retention is: --storage.tsdb.retention=1d

prometheus-01

prometheus-02

prometheus-03

Environment

  • System information:

Linux 4.13.0-25-generic x86_64

  • Prometheus version:
prometheus, version 2.1.0 (branch: master, revision: 460fe4dd0c352577395391ed861cec1b7d7d2312)
  build user:       root@94e5fb29216a
  build date:       20180202-18:18:34
  go version:       go1.9.2
  • Prometheus configuration file:
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  external_labels:
    monitor: 'codelab-monitor'

scrape_configs:
  - job_name: 'prometheus'

    scrape_interval: 15s

    static_configs:
      - targets: ['localhost:9090', 'collectd-exporter:9103']

  - job_name: 'linux-server'
    scrape_interval: 60s
    static_configs:
      - targets:
        - '192.168.2.xx:xxxx'

# [...]

rule_files:
  - alert-rules/*.rule

alerting:
  alertmanagers:
    - static_configs:
      - targets:
        - alertmanager:9093

# Remote write configuration (for Graphite, OpenTSDB, or InfluxDB).
remote_write:
  - url: "http://prom-io:9201/write"

# Remote read configuration (for InfluxDB only at the moment).
remote_read:
  - url: "http://prom-io:9201/read"
  • Logs:

Probably related (appears a few times, but not time correlated to the issue)

level=error ts=2018-02-04T16:37:34.24095367Z caller=engine.go:527 component="query engine" msg="error selecting series set" err="error sending request: context canceled"
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 4, 2018

Using regexes on names like this is strongly discouraged (it's only for exploration/debugging), however it should not have this behaviour. Can you confirm that these time series exists continuously and there were no failed scrapes? Put another way, does this also happen when zoomed in to a 10m period?

@fmoessbauer

This comment has been minimized.

Copy link
Author

fmoessbauer commented Feb 4, 2018

If I scroll using a 10m interval, the available series change all the time, but in a non-deterministic way. If I scroll back and forth by the same time-step, the series reported also vary. This also holds for 1m intervals.

Interestingly for time periods within the retention time of prometheus (last 1d) the series are reported correctly. It seems, that this is a bug in the remote-storage-adapter (using influxdb). Hence I checked that the data in influxdb is correct by just using one (previously missing) series. There, the data gets read back by prometheus correctly and deterministic.

The remote storage adapter I use is a up-to-date build of the offical version.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 4, 2018

Ah, does this happen without remote read being used? This is likely a bug in the adapter or influx, rather than in Prometheus itself.

@fmoessbauer

This comment has been minimized.

Copy link
Author

fmoessbauer commented Feb 4, 2018

At least I did not see this weird behavior when querying local-only data.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 8, 2018

That's likely a bug in your remote read endpoint then. I'd suggest verifying that the endpoint itself works as expected first.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.