Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow #3772

Closed
jdfalk opened this Issue Jan 31, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@jdfalk
Copy link

jdfalk commented Jan 31, 2018

What did you do?
Prometheus server stopped suddenly with stack overflow.

What did you expect to see?
Prometheus server working

What did you see instead? Under which circumstances?
`evel=info ts=2018-01-31T01:04:56.703929274Z caller=compact.go:387 component=tsdb msg="compact blocks" count=3 mint=1517335200000 maxt=1517356800000
level=info ts=2018-01-31T01:07:31.081608777Z caller=compact.go:387 component=tsdb msg="compact blocks" count=3 mint=1517292000000 maxt=1517356800000
runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

runtime stack:
runtime.throw(0x1bfbb92, 0xe)
/usr/local/go/src/runtime/panic.go:605 +0x95
runtime.newstack(0x0)
/usr/local/go/src/runtime/stack.go:1050 +0x6e1
runtime.morestack()
/usr/local/go/src/runtime/asm_amd64.s:415 +0x86
`

Environment
Docker

  • System information:

Linux 4.4.0-1049-aws x86_64

  • Prometheus version:

prometheus, version 2.1.0 (branch: HEAD, revision: 85f23d8)
build user: root@6e784304d3ff
build date: 20180119-12:01:23
go version: go1.9.2

  • Alertmanager version:

N/A

  • Prometheus configuration file:
global:
  scrape_interval:     15s # Default 10s
  scrape_timeout:      15s # Default 10s
  evaluation_interval: 15s # Default 10s

  # Attach these extra labels to all timeseries collected by this Prometheus instance.
  external_labels:
    monitor: 'bbs-ops'

rule_files:
  - 'rules/prometheus-aws-eu-west-1.rules.yml'
  - 'rules/node.rules.yml'
  - 'rules/cloudwatch.rules.yml'
  - 'rules/mysql.rules.yml'

scrape_configs:
  - job_name: 'prometheus'
    honor_labels: true
    static_configs:
      - targets: ['localhost:9090']
    relabel_configs:
      - source_labels: [ __address__ ]
        target_label: instance
        regex: '(.+?)(:9090)?'
        replacement: '${1}'
 
  - job_name: 'jenkins'
    metrics_path: '/prometheus/'
    static_configs:
      - targets: ['redacted']
    relabel_configs:
      - source_labels: [ __address__ ]
        target_label: instance
        regex: '(.+?)(:3000)?'
        replacement: '${1}'

  - job_name: 'libvirt'
    relabel_configs:
      - source_labels: [ __address__ ]
        target_label: instance
        regex: '(.+?)(:.+)?'
        replacement: '${1}'

  - job_name: 'bind'
    static_configs:
      - targets: ['localhost:9119']
    relabel_configs:
      - source_labels: [ __address__ ]
        target_label: instance
        regex: '(.+?)(:.+)?'
        replacement: '${1}'

  - job_name: 'grafana'
    static_configs:
      - targets: ['redacted:3000']
    relabel_configs:
      - source_labels: [ __address__ ]
        target_label: instance
        regex: '(.+?)(:3000)?'
        replacement: '${1}'
  
  - job_name: 'Sensu'
    static_configs:
      - targets: ['10.143.251.244:9351']
    relabel_configs:
      - source_labels: [ __address__ ]
        target_label: instance
        regex: '(.+?)(:9351)?'
        replacement: '${1}'

  - job_name: 'mysql'
    scrape_interval: 45s
    scrape_timeout: 40s
    relabel_configs:
      - source_labels: [ __address__ ]
        target_label: instance
        regex: '(.+?)(:.+)?'
        replacement: '${1}'

  - job_name: 'node'
    relabel_configs:
      - source_labels: [ __address__ ]
        target_label: instance
        regex: '(.+?)(:.+)?'
        replacement: '${1}'


  # Federate from regional prometheus servers.
  - job_name: 'Prometheus Federation'
    honor_labels: true
    metrics_path: /federate
    params:
      match[]:
        - '{job=~"(.+)"}'
    static_configs:
      - targets: ['redacted:9090']
  
  
  



  - job_name: 'OurJob'
    dns_sd_configs:
      - names: [
          # NODE
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          # metrics.node.redacted,
          metrics.node.redacted,
          # metrics.node.redacted,
          # metrics.node.redacted,
          metrics.node.redacted,
          metrics2.node.redacted,
          metrics3.node.redacted,
          metrics4.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          # metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          metrics.node.redacted,
          # metrics.node.redacted,
          metrics.node.redacted,
          # MYSQL
          # metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          # metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          # metrics.mysql.redacted,
          metrics.mysql.redacted,
          # metrics.mysql.redacted,
          # metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics2.mysql.redacted,
          metrics3.mysql.redacted,
          metrics4.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          metrics.mysql.redacted,
          # LIBVIRT
          # metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          # metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          # metrics.libvirt.redacted,
          # metrics.libvirt.redacted,
          # metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          metrics.libvirt.redacted,
          # MEMCACHED
          # metrics.memcached.redacted,
          metrics.memcached.redacted,
          metrics.memcached.redacted,
          metrics.memcached.redacted,
          metrics.memcached.redacted,
          metrics.memcached.redacted,
          metrics.memcached.redacted,
          metrics.memcached.redacted,
          # metrics.memcached.redacted,
          metrics.memcached.redacted,
          # metrics.memcached.redacted,
          # metrics.memcached.redacted,
          # metrics.memcached.redacted,
          metrics.memcached.redacted,
          metrics.memcached.redacted,
          metrics.memcached.redacted,
          metrics.memcached.redacted,
    relabel_configs:
      - source_labels: [ __address__ ]
        target_label: instance
        regex: '(.+?)(:.+?)?'
        replacement: '${1}'
      - source_labels: [ __address__ ]
        target_label: name
        regex: '(.+?)(:.+?)?'
        replacement: '${1}'
      - source_labels: ['__meta_dns_name']
        regex:         'metrics.?\.(.+?)\..+?\..+?\.redacted\.com'
        target_label:  job
        replacement:   '${1}'
      - source_labels: ['__meta_dns_name']
        regex:         'metrics.?\.(.+?)\..+?\.redacted\.com'
        target_label:  job
        replacement:   '${1}'
      - source_labels: ['__meta_dns_name']
        regex:         'metrics.?\.(.+?)\..+?\.redacted\.net'
        target_label:  job
        replacement:   '${1}'
      - source_labels: ['__meta_dns_name']
        regex:         'metrics.?\..+?\.(.+?)\..+?\.redacted\.com'
        target_label:  pa
        replacement:   '${1}'
      - source_labels: ['__meta_dns_name']
        regex:         'metrics.?\..+?\..+?\.(.+?)\.redacted\.com'
        target_label:  zone
        replacement:   '${1}'
      - source_labels: ['__meta_dns_name']
        regex:         'metrics.?\..+?\.(.+?)\.redacted\.net'
        target_label:  zone
        replacement:   '${1}'

  - job_name: 'cadvisor'
    honor_labels: true
    scrape_interval: 60s
    scrape_timeout: 60s
    ec2_sd_configs:
      - region: us-east-1
        port: 8080
      - region: us-east-2
        port: 8080
      - region: us-west-1
        port: 8080
      - region: us-west-2
        port: 8080
    relabel_configs:
      - source_labels: [__meta_ec2_tag_infra_monitoring_prometheus_cadvisor]
        regex: true
        action: keep
      # Use the instance ID as the instance label
      - source_labels: [__meta_ec2_instance_id]
        target_label: instance
      - source_labels: [__meta_ec2_tag_Name]
        target_label: name

  #AWS
  - job_name: 'AWS Memcached'
    ec2_sd_configs:
      - region: us-east-1
        port: 9150
      - region: us-east-2
        port: 9150
      - region: us-west-1
        port: 9150
      - region: us-west-2
        port: 9150
    relabel_configs:
      - source_labels: [__meta_ec2_tag_infra_monitoring_prometheus_memcached]
        regex: true
        action: keep
      # Use the instance ID as the instance label
      - source_labels: [__meta_ec2_instance_id]
        target_label: instance
      - source_labels: [__meta_ec2_tag_Name]
        target_label: name
        
  - job_name: 'AWS MYSQL'
    ec2_sd_configs:
      - region: us-east-1
        port: 9104
      - region: us-east-2
        port: 9104
      - region: us-west-1
        port: 9104
      - region: us-west-2
        port: 9104
    relabel_configs:
      - source_labels: [__meta_ec2_tag_infra_monitoring_prometheus_mysql]
        regex: true
        action: keep
      # Use the instance ID as the instance label
      - source_labels: [__meta_ec2_instance_id]
        target_label: instance
      - source_labels: [__meta_ec2_tag_Name]
        target_label: name


  # Monitor basic AWS information
  - job_name: 'AWS Node Info'
    ec2_sd_configs:
      - region: us-east-1
        port: 9100
      - region: us-east-2
        port: 9100
      - region: us-west-1
        port: 9100
      - region: us-west-2
        port: 9100
    relabel_configs:
      # Only monitor instances with infra:monitoring:prometheus:node = "true"
      - source_labels: [__meta_ec2_tag_infra_monitoring_prometheus_node]
        regex: true
        action: keep
      # Use the instance ID as the instance label
      - source_labels: [__meta_ec2_instance_id]
        target_label: instance
      - source_labels: [__meta_ec2_tag_Name]
        target_label: name

  • Alertmanager configuration file:
N/A
  • Logs:
    filteredlogs.txt
    Removed messages about scrape manager unable to find systems that we have recently decommissioned, and haven't fully removed from dns.
insert Prometheus and Alertmanager logs relevant to the issue here
@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Feb 15, 2018

Fixed in #3848

@gouthamve gouthamve closed this Feb 15, 2018

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.