Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus 2.0.0-rc.2 - runtime error: index out of range #3412

Closed
thesamet opened this Issue Nov 4, 2017 · 7 comments

Comments

Projects
None yet
5 participants
@thesamet
Copy link

thesamet commented Nov 4, 2017

What did you do?
Visiting localhost:9090/graph

What did you expect to see?
Being able to browser metrics

What did you see instead? Under which circumstances?
When visiting the above URL, the UI shows this alert message: "Error loading available metrics!"
The process logs:

level=error ts=2017-11-04T19:21:22.046095033Z caller=stdlib.go:89 component=web caller="http: panic serving 10.32.104.81:58067" msg="runtime error: index out of range"

This doesn't happen in Prometheus 1.8.1.

Environment
Prometheus 2.0.0-rc2 on Linux. Service discovery using consul.

  • System information:

      Linux 4.4.0-57-generic x86_64
    
  • Prometheus version:

  build user:       root@a6d2e4a7b8da
  build date:       20171025-18:42:54
  go version:       go1.9.1
  • Alertmanager version:
    Not installed.

  • Prometheus configuration file:

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: consul
    consul_sd_configs:
    - server: consul:8500
      services:
      - myservice
    relabel_configs:
    - source_labels: ["__meta_consul_address", "__meta_consul_service_port"]
      separator: ":"
      target_label: __address__
      regex: "(.*):9(.*)"
      replacement: "$1:19$2"
    - source_labels: ["__meta_consul_service"]
      target_label: job
    - source_labels: ["__meta_consul_tags"]
      regex: ".*(prod|stage).*"
      target_label: env
  • Logs:
level=error ts=2017-11-04T19:21:22.213504081Z caller=stdlib.go:89 component=web caller="http: panic serving 10.32.104.81:58069" msg="runtime error: index out of range"
@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Nov 4, 2017

@thesamet can you reproduce this even with a wiped storage? It works for me with 2.0.0-rc.2.

If you had the data directory before 2.0.0-rc.2, you may have been affected by this:

Data written in previous pre-release versions may have been affected by the out-of-order bug. Reading this data may reveal artefacts and incorrect data. Starting with a clean storage directory is advised. The WAL directory may safely be kept.

(from https://github.com/prometheus/prometheus/blob/348ea482eadb59bcd5d89c08ea59f3a3a5cc8087/CHANGELOG.md#200-rc2--2017-10-25)

@thesamet

This comment has been minimized.

Copy link
Author

thesamet commented Nov 4, 2017

Looks like the issue disappeared after removing the data directory. However, it was a fresh install. I've been changing the config doing some relabel_configs... Maybe some Ctrl-C's got my data directory into an inconsistent state?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Nov 4, 2017

@thesamet So the data directory was created by 2.0.0-rc.2?

/cc @fabxc @beorn7 anything come to mind here? Unfortunately there's no full stack trace, just the index out of range happening somewhere during the web API request (probably in the storage).

@thesamet

This comment has been minimized.

Copy link
Author

thesamet commented Nov 4, 2017

@juliusv yes

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Nov 6, 2017

Certainly, hitting Ctrl-C shouldn't corrupt your storage beyond repair.

Without a stack trace, it's hard to see what's going on, though.

The fact that the stack trace doesn't show is an issue of its own. (I assume it's the recover in the parser that hides the call stack here?)

Question is if this is a bug to block the final 2.0.0 release on. I'm missing the deeper insight to make that call.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jul 24, 2018

Given that it was filed against 2.0.0-rc2, I'm going to close the issue. @thesamet feel free to re-open it if you're still experiencing the problem.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.