Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus crashes on `panic: runtime error: integer divide by zero` #2273

Closed
ichekrygin opened this Issue Dec 12, 2016 · 9 comments

Comments

Projects
None yet
3 participants
@ichekrygin
Copy link

ichekrygin commented Dec 12, 2016

What did you do?
Run Prometheus

What did you expect to see?
Prometheus up and running

What did you see instead? Under which circumstances?
Prometheus crashed on

time="2016-12-12T14:46:00Z" level=info msg="Collision detected for fingerprint 1e96a5019c8e7521, metric etcd_rafthttp_message_sent_latency_microseconds{instance=\"10.72.4.110:2379\", job=\"etcd\", msgType=\"MsgHeartbeat\", quantile=\"0.9\", remoteID=\"58d9fb3eff84d3cd\", sendingType=\"pipeline\"}, mapping to new fingerprint 0000000000000016." source="mapper.go:182" 
panic: runtime error: integer divide by zero

goroutine 376 [running]:
panic(0x18939a0, 0xc420018040)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/prometheus/prometheus/storage/local/chunk.doubleDeltaEncodedChunk.Add(0xc43c592c00, 0x287, 0x400, 0x158f37f3864, 0x40569068fcfdb0df, 0x2d0c6e110b2a7f85, 0xc5ef313840, 0x54e34d, 0xc4218ed9c0, 0xc421872d50)
        /go/src/github.com/prometheus/prometheus/storage/local/chunk/doubledelta.go:86 +0xf76
github.com/prometheus/prometheus/storage/local/chunk.(*doubleDeltaEncodedChunk).Add(0xc615c82080, 0x158f37f3864, 0x40569068fcfdb0df, 0xc5ef313940, 0x557434, 0xc4218ed9c0, 0xd7cbbb540abb1586, 0xc5a0bc5650)
        <autogenerated>:36 +0x7e
github.com/prometheus/prometheus/storage/local/chunk.(*Desc).Add(0xc5680d94c0, 0x158f37f3864, 0x40569068fcfdb0df, 0x0, 0x0, 0x2, 0x8020106, 0x2)
        /go/src/github.com/prometheus/prometheus/storage/local/chunk/chunk.go:136 +0x4a
github.com/prometheus/prometheus/storage/local.(*memorySeries).add(0xc5a0bc5650, 0x158f37f3864, 0x40569068fcfdb0df, 0xc5a0bc5650, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/storage/local/series.go:245 +0x115
github.com/prometheus/prometheus/storage/local.(*MemorySeriesStorage).Append(0xc42011e000, 0xc643f4dfe0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/storage/local/storage.go:858 +0x398
github.com/prometheus/prometheus/storage.Fanout.Append(0xc4201adba0, 0x2, 0x2, 0xc643f4dfe0, 0xc480817d40, 0x26f5670)
        /go/src/github.com/prometheus/prometheus/storage/storage.go:60 +0x66
github.com/prometheus/prometheus/storage.(*Fanout).Append(0xc42043b020, 0xc643f4dfe0, 0xc5ef313b50, 0xc5ef313b40)
        <autogenerated>:3 +0x6e
github.com/prometheus/prometheus/retrieval.ruleLabelsAppender.Append(0x265af40, 0xc42043b020, 0xc48081c3c0, 0xc643f4dfe0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/retrieval/target.go:241 +0x1b3
github.com/prometheus/prometheus/retrieval.(*ruleLabelsAppender).Append(0xc480813220, 0xc643f4dfe0, 0x0, 0x0)
        <autogenerated>:33 +0x6e
github.com/prometheus/prometheus/retrieval.(*scrapeLoop).append(0xc4807fea00, 0xc66c1aa000, 0x4a7e, 0x4c00)
        /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:456 +0x92
github.com/prometheus/prometheus/retrieval.(*scrapeLoop).run(0xc4807fea00, 0x6fc23ac00, 0x6fc23ac00, 0x0)
        /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:425 +0x602
created by github.com/prometheus/prometheus/retrieval.(*scrapePool).sync
        /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:240 +0x3e5

Environment

  • System information:
/prometheus # uname -srm
Linux 4.7.3-coreos-r2 x86_64
  • Prometheus version:
/prometheus # prometheus -version
prometheus, version 1.4.1 (branch: master, revision: 2a89e8733f240d3cd57a6520b52c36ac4744ce12)
  build user:       root@e685d23d8809
  build date:       20161128-09:59:22
  go version:       go1.7.3
  • Alertmanager version:

N/A

  • Prometheus configuration file:
    global:
      scrape_interval: 30s
      scrape_timeout: 30s
    rule_files:
    - /etc/prometheus/recording.rules
    scrape_configs:
    - job_name: etcd
      static_configs:
        - targets:
          - 10.72.0.1:2379
          - 10.72.0.2:2379
          - 10.72.0.3:2379
          - 10.72.0.4:2379
          - 10.72.0.5:2379
    - job_name: 'prometheus'
      static_configs:
        - targets: ['localhost:9090']
    - job_name: 'cloudwatch'
      static_configs:
        - targets: ['prometheus-cloudwatch-exporter:80']
    
    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-nodes'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-service-endpoints'
      scheme: https
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)(?::\d+);(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_service_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
    - job_name: 'kubernetes-services'
      scheme: https
      metrics_path: /probe
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
      - role: service
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_service_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: kubernetes_name
    - job_name: 'kubernetes-pods'
      scheme: https
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: (.+):(?:\d+);(\d+)
        replacement: ${1}:${2}
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_pod_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
  • Alertmanager configuration file:

N/A

  • Logs:
time="2016-12-12T14:45:52Z" level=info msg="Collision detected for fingerprint 8b31db99ef569973, metric container_cpu_usage_seconds_total{beta_kubernetes_io_arch=\"amd64\", beta_kubernetes_io_instance_type=\"c3.8
xlarge\", beta_kubernetes_io_os=\"linux\", cpu=\"cpu03\", failure_domain_beta_kubernetes_io_region=\"us-west-2\", failure_domain_beta_kubernetes_io_zone=\"us-west-2b\", id=\"/system.slice/systemd-tmpfiles-setup.s
ervice\", instance=\"ip-10-72-21-104.us-west-2.compute.internal\", job=\"kubernetes-nodes\", kubernetes_io_hostname=\"ip-10-72-21-104.us-west-2.compute.internal\", role=\"minion\"}, mapping to new fingerprint 000
0000000000001." source="mapper.go:182" 
time="2016-12-12T14:45:52Z" level=info msg="Collision detected for fingerprint aeb7e8aa589c92cc, metric container_cpu_usage_seconds_total{beta_kubernetes_io_arch=\"amd64\", beta_kubernetes_io_instance_type=\"c3.8
xlarge\", beta_kubernetes_io_os=\"linux\", cpu=\"cpu05\", failure_domain_beta_kubernetes_io_region=\"us-west-2\", failure_domain_beta_kubernetes_io_zone=\"us-west-2b\", id=\"/system.slice/download-socat-binary.se
rvice\", instance=\"ip-10-72-21-104.us-west-2.compute.internal\", job=\"kubernetes-nodes\", kubernetes_io_hostname=\"ip-10-72-21-104.us-west-2.compute.internal\", role=\"minion\"}, mapping to new fingerprint 0000
000000000002." source="mapper.go:182" 
...
time="2016-12-12T14:46:00Z" level=info msg="Collision detected for fingerprint 1e96a5019c8e7521, metric etcd_rafthttp_message_sent_latency_microseconds{instance=\"10.72.4.110:2379\", job=\"etcd\", msgType=\"MsgHeartbeat\", quantile=\"0.9\", remoteID=\"58d9fb3eff84d3cd\", sendingType=\"pipeline\"}, mapping to new fingerprint 0000000000000016." source="mapper.go:182" 
panic: runtime error: integer divide by zero

goroutine 376 [running]:
panic(0x18939a0, 0xc420018040)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/prometheus/prometheus/storage/local/chunk.doubleDeltaEncodedChunk.Add(0xc43c592c00, 0x287, 0x400, 0x158f37f3864, 0x40569068fcfdb0df, 0x2d0c6e110b2a7f85, 0xc5ef313840, 0x54e34d, 0xc4218ed9c0, 0xc421872d50)
        /go/src/github.com/prometheus/prometheus/storage/local/chunk/doubledelta.go:86 +0xf76
github.com/prometheus/prometheus/storage/local/chunk.(*doubleDeltaEncodedChunk).Add(0xc615c82080, 0x158f37f3864, 0x40569068fcfdb0df, 0xc5ef313940, 0x557434, 0xc4218ed9c0, 0xd7cbbb540abb1586, 0xc5a0bc5650)
        <autogenerated>:36 +0x7e
github.com/prometheus/prometheus/storage/local/chunk.(*Desc).Add(0xc5680d94c0, 0x158f37f3864, 0x40569068fcfdb0df, 0x0, 0x0, 0x2, 0x8020106, 0x2)
        /go/src/github.com/prometheus/prometheus/storage/local/chunk/chunk.go:136 +0x4a
github.com/prometheus/prometheus/storage/local.(*memorySeries).add(0xc5a0bc5650, 0x158f37f3864, 0x40569068fcfdb0df, 0xc5a0bc5650, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/storage/local/series.go:245 +0x115
github.com/prometheus/prometheus/storage/local.(*MemorySeriesStorage).Append(0xc42011e000, 0xc643f4dfe0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/storage/local/storage.go:858 +0x398
github.com/prometheus/prometheus/storage.Fanout.Append(0xc4201adba0, 0x2, 0x2, 0xc643f4dfe0, 0xc480817d40, 0x26f5670)
        /go/src/github.com/prometheus/prometheus/storage/storage.go:60 +0x66
github.com/prometheus/prometheus/storage.(*Fanout).Append(0xc42043b020, 0xc643f4dfe0, 0xc5ef313b50, 0xc5ef313b40)
        <autogenerated>:3 +0x6e
github.com/prometheus/prometheus/retrieval.ruleLabelsAppender.Append(0x265af40, 0xc42043b020, 0xc48081c3c0, 0xc643f4dfe0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/retrieval/target.go:241 +0x1b3
github.com/prometheus/prometheus/retrieval.(*ruleLabelsAppender).Append(0xc480813220, 0xc643f4dfe0, 0x0, 0x0)
        <autogenerated>:33 +0x6e
github.com/prometheus/prometheus/retrieval.(*scrapeLoop).append(0xc4807fea00, 0xc66c1aa000, 0x4a7e, 0x4c00)
        /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:456 +0x92
github.com/prometheus/prometheus/retrieval.(*scrapeLoop).run(0xc4807fea00, 0x6fc23ac00, 0x6fc23ac00, 0x0)
        /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:425 +0x602
created by github.com/prometheus/prometheus/retrieval.(*scrapePool).sync
        /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:240 +0x3e5
@brancz

This comment has been minimized.

Copy link
Member

brancz commented Dec 12, 2016

This is storage so @beorn7 is probably best to judge, but it seems like the problem is the division here. Don't have too much time to look into it, would a 0 division check suffice @beorn7 ?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Dec 12, 2016

@ichekrygin is this a reproducible behavior or did it only happen once?

@ichekrygin

This comment has been minimized.

Copy link
Author

ichekrygin commented Dec 12, 2016

Happened just once (so far)

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Dec 12, 2016

Thanks for the quick info @ichekrygin. I'm not an expert on the storage so it might take a few days until we get the fix. /cc @fabxc (we were thinking of cutting a new release, so we should probably wait for this one before we do that)

@ichekrygin

This comment has been minimized.

Copy link
Author

ichekrygin commented Dec 12, 2016

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Dec 12, 2016

Yep, that's what I linked to above, I'm just unsure which value to use in case of 0, so waiting for @beorn7 for advice as he's most familiar with the storage :)

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Dec 13, 2016

Back from vacation business. Will look into this ASAP.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Dec 13, 2016

This results from data corruption. I'll add some sanity checks to catch this without crashing.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.