Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus crashes #3763

Closed
davidkarlsen opened this Issue Jan 30, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@davidkarlsen
Copy link

davidkarlsen commented Jan 30, 2018

What did you do?
Run prometheus 2.1.0

What did you expect to see?
Keep it running w/o crashes

What did you see instead? Under which circumstances?

Starts with this error, then a number of the error messages, before it crashes (and docker restarts the container):

an 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: fatal error: concurrent map iteration and map write
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: 
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: goroutine 5983167 [running]:
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: runtime.throw(0x1c1d139, 0x26)
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: #011/usr/local/go/src/runtime/panic.go:605 +0x95 fp=0xc46d24cc50 sp=0xc46d24cc30 pc=0x42bca5
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: runtime.mapiternext(0xc46d24ce58)
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: #011/usr/local/go/src/runtime/hashmap.go:778 +0x6f1 fp=0xc46d24cce8 sp=0xc46d24cc50 pc=0x40a031
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: github.com/prometheus/prometheus/discovery/file.(*TimestampCollector).Collect(0xc4201bbf20, 0xc45a46a3c0)
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: #011/go/src/github.com/prometheus/prometheus/discovery/file/file.go:99 +0x17b fp=0xc46d24cf98 sp=0xc46d24cce8 pc=0xaacd7b
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: github.com/prometheus/prometheus/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func2(0xc48698c9d0, 0xc45
a46a3c0, 0x28f9b00, 0xc4201bbf20)
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: #011/go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/client_golang/prometheus/registry.go:382 +0x61 fp=0xc46d2
4cfc0 sp=0xc46d24cf98 pc=0x78d411
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: runtime.goexit()
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: #011/usr/local/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc46d24cfc8 sp=0xc46d24cfc0 pc=0x45cba1
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: created by github.com/prometheus/prometheus/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: #011/go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/client_golang/prometheus/registry.go:380 +0x2e1
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: 
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: goroutine 1 [chan receive, 1204 minutes]:
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run(0xc421071c58, 0xc42030c900, 0x8)
Jan 30 14:01:18 alp-aot-ccm02 docker/7394567d59bb[17967]: #011/go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:43 +0xf6

Environment

[root@alp-aot-ccm02 ~]# uname -a
Linux alp-aot-ccm02 3.10.0-693.1.1.el7.x86_64 #1 SMP Thu Aug 3 08:15:31 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@alp-aot-ccm02 ~]# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-61.git85d7426.el7.x86_64
 Go version:      go1.8.3
 Git commit:      85d7426/1.12.6
 Built:           Tue Sep 26 15:30:51 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-61.git85d7426.el7.x86_64
 Go version:      go1.8.3
 Git commit:      85d7426/1.12.6
 Built:           Tue Sep 26 15:30:51 2017
 OS/Arch:         linux/amd64
[root@alp-aot-ccm02 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.4 (Maipo)
[root@alp-aot-ccm02 ~]# 
  • System information:

See above

  • Prometheus version:

2.1.0

  • Prometheus configuration file:
insert configuration here
  • Alertmanager configuration file:
# Managed by salt /platforms/ccm/prometheus/files/prometheus.yml.jinja

# my global config
global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  scrape_timeout: 13s
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'CCM'
    #finodsenv: production

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
  - '/etc/prometheus/dynarules/*.yml'

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    metrics_path: /prometheus/metrics
    static_configs:
      - targets: ['localhost:9090']


  - job_name: 'dynaconf'
    metrics_path: '/finods/metrics'
    file_sd_configs:
     - files: [ '/etc/prometheus/dynaconf/*.yml', '/etc/prometheus/dynaconf-pci/*.yml' ]
  - job_name: 'adcm'
    metrics_path: '/fsadcm/api/metrics/'
    static_configs:
      - targets: [ 'alp-ce-depot01.unix.cosng.net' ]
  - job_name: 'consul'
    consul_sd_configs:
      - server: 'alp-aot-ccm02.unix.cosng.net:8500'
        datacenter: 'production'
    metric_relabel_configs:
      - source_labels: [container_label_container_group]
        target_label: container_group
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: .*prom_monitored.*
        action: keep
      - source_labels: [__meta_consul_service]
        target_label: job 
      - source_labels: [__meta_consul_tags]
        regex: .*,alias-([^,]+),.*
        replacement: '${1}'
        target_label: alias
      - source_labels: [__meta_consul_tags]
        regex: .*,metrics_path=([^,]+),.*
        replacement: '${1}'
        target_label: __metrics_path__
      - source_labels: [__meta_consul_tags]
        regex: .*,finodsgroup=([^,]+),.*
        replacement: '${1}'
        target_label: finodsgroup
      - source_labels: [__meta_consul_tags]
        regex: .*,container_group=([^,]+),.*
        replacement: '${1}'
        target_label: container_group

  • Logs:
see in top of issue
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 30, 2018

Thanks for reporting, already fixed in #3735

@davidkarlsen

This comment has been minimized.

Copy link
Author

davidkarlsen commented Jan 31, 2018

Great - any ETA for next release?

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.