Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: runtime error: integer divide by zero #2953

Closed
Cplo opened this Issue Jul 16, 2017 · 14 comments

Comments

Projects
None yet
4 participants
@Cplo
Copy link

Cplo commented Jul 16, 2017

What did you do?
Run Prometheus on k8s

What did you expect to see?
No more crash

What did you see instead? Under which circumstances?
This is happening in both k8s clusters when running for a few days

Environment

  • System information:

    uname -srm
    Linux 4.4.64-1.el7.elrepo.x86_64 x86_64

  • Prometheus version:
    prometheus, version 1.7.1 (branch: master, revision: 3afb3ff)
    build user: root@0aa1b7fc430d
    build date: 20170612-11:44:05
    go version: go1.8.3

  • Alertmanager version:

    N/A

  • Prometheus configuration file:

**prometheus.yml**
global:
  scrape_interval: 1m
  scrape_timeout: 30s
  evaluation_interval: 1m
  external_labels:
    monitor: tos-monitor
rule_files:
- /etc/alert.rules
- /etc/recording.rules
scrape_configs:
- job_name: k8s-apiservers
  scrape_interval: 1m
  scrape_timeout: 30s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: https://10.10.10.1
    role: endpoints
    tls_config:
      ca_file: /srv/kubernetes/ca.crt
      cert_file: /srv/kubernetes/kubecfg.crt
      key_file: /srv/kubernetes/kubecfg.key
      insecure_skip_verify: false
    namespaces:
      names: []
  tls_config:
    ca_file: /srv/kubernetes/ca.crt
    cert_file: /srv/kubernetes/kubecfg.crt
    key_file: /srv/kubernetes/kubecfg.key
    insecure_skip_verify: false
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: default;kubernetes;https
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_component]
    separator: ;
    regex: (.+)
    target_label: tos_component
    replacement: kube-apiserver
    action: replace
- job_name: k8s-kubelet
  scrape_interval: 1m
  scrape_timeout: 30s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: https://10.10.10.1
    role: node
    tls_config:
      ca_file: /srv/kubernetes/ca.crt
      cert_file: /srv/kubernetes/kubecfg.crt
      key_file: /srv/kubernetes/kubecfg.key
      insecure_skip_verify: false
    namespaces:
      names: []
  tls_config:
    insecure_skip_verify: true
  relabel_configs:
  - source_labels: []
    separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
- job_name: k8s-service-endpoints
  scrape_interval: 1m
  scrape_timeout: 30s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: https://10.10.10.1
    role: endpoints
    tls_config:
      ca_file: /srv/kubernetes/ca.crt
      cert_file: /srv/kubernetes/kubecfg.crt
      key_file: /srv/kubernetes/kubecfg.key
      insecure_skip_verify: false
    namespaces:
      names: []
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    separator: ;
    regex: (https?)
    target_label: __scheme__
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: $1
    action: replace
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    separator: ;
    regex: ([^:]+)(?::\d+)?;(\d+)
    target_label: __address__
    replacement: $1:$2
    action: replace
  - source_labels: []
    separator: ;
    regex: __meta_kubernetes_service_label_(.+)
    replacement: $1
    action: labelmap
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: kubernetes_namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: kubernetes_name
    replacement: $1
    action: replace
- job_name: k8s-services
  params:
    module:
    - http_2xx
  scrape_interval: 1m
  scrape_timeout: 30s
  metrics_path: /probe
  scheme: http
  kubernetes_sd_configs:
  - api_server: https://10.10.10.1
    role: service
    tls_config:
      ca_file: /srv/kubernetes/ca.crt
      cert_file: /srv/kubernetes/kubecfg.crt
      key_file: /srv/kubernetes/kubecfg.key
      insecure_skip_verify: false
    namespaces:
      names: []
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    target_label: __param_target
    replacement: $1
    action: replace
  - source_labels: []
    separator: ;
    regex: (.*)
    target_label: __address__
    replacement: blackbox
    action: replace
  - source_labels: [__param_target]
    separator: ;
    regex: (.*)
    target_label: instance
    replacement: $1
    action: replace
  - source_labels: []
    separator: ;
    regex: __meta_kubernetes_service_label_(.+)
    replacement: $1
    action: labelmap
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: kubernetes_namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: kubernetes_name
    replacement: $1
    action: replace
- job_name: k8s-pods
  scrape_interval: 1m
  scrape_timeout: 30s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: https://10.10.10.1
    role: pod
    tls_config:
      ca_file: /srv/kubernetes/ca.crt
      cert_file: /srv/kubernetes/kubecfg.crt
      key_file: /srv/kubernetes/kubecfg.key
      insecure_skip_verify: false
    namespaces:
      names: []
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: $1
    action: replace
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    separator: ;
    regex: ([^:]+)(?::\d+)?;(\d+)
    target_label: __address__
    replacement: $1:$2
    action: replace
  - source_labels: []
    separator: ;
    regex: __meta_kubernetes_pod_label_(.+)
    replacement: $1
    action: labelmap
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: kubernetes_namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: kubernetes_pod_name
    replacement: $1
    action: replace
**recording.rules**
kubelet:node:cpu:usage{} = sum(rate(container_cpu_usage_seconds_total{id="/"}[5m])) BY (kubernetes_io_hostname) / sum(machine_cpu_cores) BY (kubernetes_io_hostname) * 100
kubelet:node:mem:usage{} = sum(container_memory_working_set_bytes{id="/"}) BY (kubernetes_io_hostname) / sum(machine_memory_bytes) BY (kubernetes_io_hostname) * 100
kubelet:namespace:cpu:usage{} = sum(rate(container_cpu_usage_seconds_total[5m])) BY (namespace) / scalar(sum(machine_cpu_cores)) * 100
kubelet:namespace:cpu:used{} = sum(rate(container_cpu_usage_seconds_total[5m])) BY (namespace)
kubelet:namespace:mem:usage{} = sum(container_memory_working_set_bytes) BY (namespace) / scalar(sum(machine_memory_bytes)) * 100
kubelet:namespace:mem:used{} = sum(container_memory_working_set_bytes) BY (namespace)
kubelet:namespace:network_received:used{} = sum(rate(container_network_receive_bytes_total[5m])) BY (namespace)
kubelet:namespace:network_sent:used{} = -sum(rate(container_network_transmit_bytes_total[5m])) BY (namespace)
kubelet:pod:cpu:used{} = sum(rate(container_cpu_usage_seconds_total[5m])) BY (pod_name, namespace, kubernetes_io_hostname)
kubelet:pod:mem:used{} = sum(container_memory_working_set_bytes) BY (pod_name, namespace, kubernetes_io_hostname)
kubelet:pod:network:received:used{} = sum(rate(container_network_receive_bytes_total[5m])) BY (pod_name, namespace, kubernetes_io_hostname)
kubelet:pod:network:sent:used{} = -sum(rate(container_network_transmit_bytes_total[5m])) BY (pod_name, namespace, kubernetes_io_hostname)
kubelet:node:update_latency_count:rate{} = rate(kubelet_node_status_update_latency_microseconds_count[5m])
kubelet:node:filesystem:usage{} = container_fs_usage_bytes{id="/"} / container_fs_limit_bytes{id="/"} * 100
kubelet:k8s_container:cpu:used{} = sum(rate(container_cpu_usage_seconds_total[5m])) BY (container_name, pod_name, namespace, kubernetes_io_hostname)
kubelet:docker_container:cpu:used{} = sum(rate(container_cpu_usage_seconds_total{id!="/",name!="",namespace=""}[5m])) BY (name)
kubelet:k8s_container:mem:used{} = sum(container_memory_working_set_bytes) BY (namespace, pod_name, container_name, kubernetes_io_hostname)
kubelet:k8s_container:mem:usage{} = sum(container_memory_working_set_bytes{container_name!="",container_name!="POD"}) BY (namespace, pod_name, container_name, kubernetes_io_hostname) / sum(container_spec_memory_limit_bytes) BY (namespace, pod_name, container_name, kubernetes_io_hostname) * 100
kubelet:docker_container:mem:used{} = sum(container_memory_working_set_bytes{id!="/",name!="",namespace=""}) BY (name)
kubelet:k8s_container:network_received:used{} = sum(rate(container_network_receive_bytes_total[5m])) BY (container_name, pod_name, namespace, kubernetes_io_hostname)
kubelet:k8s_container:network_sent:used{} = sum(rate(container_network_transmit_bytes_total[5m])) BY (container_name, pod_name, namespace, kubernetes_io_hostname)
kubelet:docker_container:network_received:used{} = sum(rate(container_network_receive_bytes_total{id!="/",name!="",namespace=""}[5m])) BY (name)
kubelet:docker_container:network_sent:used{} = sum(rate(container_network_transmit_bytes_total{id!="/",name!="",namespace=""}[5m])) BY (name)
kubelet:container:cpu_usage{} = sum(rate(container_cpu_usage_seconds_total[5m])) BY (container_name, pod_name, namespace, kubernetes_io_hostname)
kubelet:container:mem_usage{} = sum(container_memory_working_set_bytes) BY (container_name, pod_name, namespace, kubernetes_io_hostname)
apiserver:request_count:rate{} = sum(rate(apiserver_request_count[5m])) BY (instance)
apiserver:latency_histogram_quantile{quantile="0.99"} = histogram_quantile(0.99, sum(rate(apiserver_request_latencies_bucket{verb!~"CONNECT|WATCHLIST|WATCH"}[10m])) WITHOUT (instance, node, resource)) / 1000000
apiserver:latency_histogram_quantile{quantile="0.9"} = histogram_quantile(0.9, sum(rate(apiserver_request_latencies_bucket{verb!~"CONNECT|WATCHLIST|WATCH"}[10m])) WITHOUT (instance, node, resource)) / 1000000
apiserver:latency_histogram_quantile{quantile="0.5"} = histogram_quantile(0.5, sum(rate(apiserver_request_latencies_bucket{verb!~"CONNECT|WATCHLIST|WATCH"}[10m])) WITHOUT (instance, node, resource)) / 1000000
apiserver:etcd_access_latency{quantile="0.99"} = etcd_request_latencies_summary{quantile="0.99"} / 1000000
apiserver:etcd_access_latency{quantile="0.9"} = etcd_request_latencies_summary{quantile="0.9"} / 1000000
apiserver:etcd_access_latency{quantile="0.5"} = etcd_request_latencies_summary{quantile="0.5"} / 1000000
node_exporter:node_cpu_use:percent{} = 100 * (1 - avg(rate(node_cpu{mode="idle"}[5m])) BY (instance))
node_exporter:node_mem_use:percent{} = 100 * (1 - (node_memory_MemFree + node_memory_Cached + node_memory_Buffers) / (1 + node_memory_MemTotal))
node_exporter:node_disk_use:percent{} = 100 * (1 - node_filesystem_free{device=~"/dev.*",mountpoint!~"/etc.*|/run.*"} / (1 + node_filesystem_size))
node_exporter:node_swapmem_use:percent{} = 100 * (node_memory_SwapTotal - node_memory_SwapFree) / (node_memory_SwapTotal + 1)
kubestate:pod:running_count:percent{} = kubelet_running_pod_count * scalar(count(kube_node_status_capacity_pods)) / scalar(sum(kube_node_status_capacity_pods)) * 100
  • Alertmanager configuration file:
    N/A

  • Logs:

prometheus nodeA
panic: runtime error: integer divide by zero
goroutine 925330 [running]:
github.com/prometheus/prometheus/storage/local/chunk.doubleDeltaEncodedChunk.Add(0xc48584cc00, 0x237, 0x400, 0x15d43080e84, 0x7ff8000000000001, 0xc448fdf801, 0x1000000027489f0, 0xc4392c23f0, 0xc457f9b840, 0x15d4954)
  /go/src/github.com/prometheus/prometheus/storage/local/chunk/doubledelta.go:86 +0xf78
github.com/prometheus/prometheus/storage/local/chunk.(doubleDeltaEncodedChunk).Add(0xc4c86369e0, 0x15d43080e84, 0x7ff8000000000001, 0xc457f9b780, 0x4107fd, 0xc44df7e1c8, 0xc48ccbdb40, 0x0) :39 +0x75
github.com/prometheus/prometheus/storage/local/chunk.(Desc).Add(0xc499b25d40, 0x15d43080e84, 0x7ff8000000000001, 0xc4a08c6400, 0x453cc0, 0xc4a08c6400, 0x1b7fd00, 0xc48ccbdb58)
  /go/src/github.com/prometheus/prometheus/storage/local/chunk/chunk.go:142 +0x53 github.com/prometheus/prometheus/storage/local.(*memorySeries).add(0xc4392c23f0, 0x15d43080e84, 0x7ff8000000000001, 0xc4392c23f0, 0x0, 0x0)
  /go/src/github.com/prometheus/prometheus/storage/local/series.go:226 +0x109 github.com/prometheus/prometheus/storage/local.(*MemorySeriesStorage).Append(0xc42045c000, 0xc43132d480, 0x0, 0x0)
  /go/src/github.com/prometheus/prometheus/storage/local/storage.go:953 +0x324 github.com/prometheus/prometheus/storage.Fanout.Append(0xc420400020, 0x2, 0x2, 0xc43132d480, 0x19da820, 0xc457f9ba00)
/go/src/github.com/prometheus/prometheus/storage/storage.go:60 +0x66 github.com/prometheus/prometheus/storage.(Fanout).Append(0xc420400440, 0xc43132d480, 0x27489f0, 0xc448fdf8c0) :3 +0x69
github.com/prometheus/prometheus/retrieval.(countingAppender).Append(0xc48f9a79c0, 0xc43132d480, 0xc457f9ba78, 0xc448fdf9a8)
  /go/src/github.com/prometheus/prometheus/retrieval/target.go:261 +0x48 github.com/prometheus/prometheus/retrieval.ruleLabelsAppender.Append(0x2695400, 0xc48f9a79c0, 0xc46fed7230, 0xc43132d480, 0x0, 0x0)
  /go/src/github.com/prometheus/prometheus/retrieval/target.go:201 +0x1af github.com/prometheus/prometheus/retrieval.(ruleLabelsAppender).Append(0xc48f9a79e0, 0xc43132d480, 0x0, 0x0) :35 +0x69
github.com/prometheus/prometheus/retrieval.(*scrapeLoop).append(0xc488c3d9d0, 0xc49b094000, 0x22a, 0x330, 0x2e80ac07, 0x2759c20, 0xc49b094000)
  /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:520 +0x2ec github.com/prometheus/prometheus/retrieval.(scrapeLoop).run(0xc488c3d9d0, 0xdf8475800, 0x6fc23ac00, 0x0)
  /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:430 +0x614 created by github.com/prometheus/prometheus/retrieval.(*scrapePool).reload.func1
  /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:195 +0x82
prometheusNodeB
panic: runtime error: integer divide by zero
goroutine 1614997 [running]: 
github.com/prometheus/prometheus/storage/local/chunk.doubleDeltaEncodedChunk.Add(0xc488f93400, 0xbe, 0x400, 0x15d3c097f3a, 0x0, 0x0, 0xc4a2597b88, 0x42a15f, 0x1b86de0, 0xc4a2597b98)   
  /go/src/github.com/prometheus/prometheus/storage/local/chunk/doubledelta.go:86 +0xf78 
github.com/prometheus/prometheus/storage/local/chunk.(doubleDeltaEncodedChunk).Add(0xc43613ec40, 0x15d3c097f3a, 0x0, 0xc4a2597bb0, 0x1543cc2, 0xc490fe8680, 0x1b86de0, 0xc490fe8680) :39 +0x75 
github.com/prometheus/prometheus/storage/local/chunk.(Desc).Add(0xc490fe8680, 0x15d3c097f3a, 0x0, 0x0, 0x14, 0x3, 0xc448f73520, 0xc494c67980)   
  /go/src/github.com/prometheus/prometheus/storage/local/chunk/chunk.go:142 +0x53 
github.com/prometheus/prometheus/storage/local.(*memorySeries).add(0xc425876a80, 0x15d3c097f3a, 0x0, 0xc425876a80, 0x0, 0x0)      
  /go/src/github.com/prometheus/prometheus/storage/local/series.go:226 +0x109   
github.com/prometheus/prometheus/storage/local.(MemorySeriesStorage).Append(0xc420786000, 0xc47e5dfc20, 0x0, 0x0) /go/src/github.com/prometheus/prometheus/storage/local/storage.go:953 +0x324 github.com/prometheus/prometheus/storage.Fanout.Append(0xc4201c32c0, 0x2, 0x2, 0xc47e5dfc20, 0x0, 0x0) /go/src/github.com/prometheus/prometheus/storage/storage.go:60 +0x66   
github.com/prometheus/prometheus/storage.(Fanout).Append(0xc4201c3c40, 0xc47e5dfc20, 0x0, 0x0) :3 +0x69 
github.com/prometheus/prometheus/rules.(Group).Eval.func1(0xc46bb9e8b0, 0x1af9836, 0x9, 0xc42015eaa0, 0x15d3c097f3a, 0x26a2d40, 0xc420c20ba0) 
  /go/src/github.com/prometheus/prometheus/rules/manager.go:296 +0x23a created by github.com/prometheus/prometheus/rules.(Group).Eval            
  /go/src/github.com/prometheus/prometheus/rules/manager.go:315 +0x138

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 16, 2017

@beorn7 beorn7 self-assigned this Jul 17, 2017

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 17, 2017

That's caused by data corruption. I can certainly introduce a guard against it.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 17, 2017

That guard is already in place. That panic is truly a "must never happen" thing. There must be something weirder going on here.

beorn7 added a commit that referenced this issue Jul 17, 2017

WIP: Hunt down bug #2953
In the current state, this quarantines the series with a meaningful
error message and a chunk dump where it panic'd before.

However, this is definitely a highly irregular occurrence, thus this
is only for debugging, not for merging into master.
@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 17, 2017

@Cplo This is odd as it appears to happen in your case in a reproducible way (thus it doesn't seem to be a random data corruption), but it isn't seen anywhere else (or at least not frequently enough that we would see other reports).

I have created https://github.com/prometheus/prometheus/tree/beorn7/storage with a guard that quarantines the series in the case you have encountered above, together with an error message. Could you build a binary from that branch and run it in the same conditions as above? When the error occurs, you will get a message like the following instead of a panic:

WARN[0005] Series quarantined.                           fingerprint=5ca8b9ba06c61814 metric=up{instance="localhost:9999", job="fake"} reason="zero bytes for time delta found while adding sample pair 0 @[1500290170.061], chunk dump: 0000010001" source="storage.go:1905"

If you could then post that message here, that would be great.

@Cplo

This comment has been minimized.

Copy link
Author

Cplo commented Jul 17, 2017

@beorn7 Very pleased to receive your reply. so I will build the binary from branch beorn7/storage. If prometheus crashes again, I will post the related logs here.

@Cplo

This comment has been minimized.

Copy link
Author

Cplo commented Jul 24, 2017

hi , @beorn7 the error occurred again , the related logs:

time="2017-07-24T19:53:43+08:00" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="(*config.TargetGroup)(nil)" 
time="2017-07-24T19:53:43+08:00" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="(*config.TargetGroup)(nil)" 
time="2017-07-24T19:53:43+08:00" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="(*config.TargetGroup)(nil)" 
time="2017-07-24T19:53:43+08:00" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="(*config.TargetGroup)(nil)" 
time="2017-07-24T19:53:43+08:00" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="(*config.TargetGroup)(nil)" 
time="2017-07-24T19:53:43+08:00" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="(*config.TargetGroup)(nil)" 
time="2017-07-24T19:53:43+08:00" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="(*config.TargetGroup)(nil)" 
time="2017-07-24T19:53:44+08:00" level=info msg="Collision detected for fingerprint dac7cf62f79709a4, metric container_cpu_usage_seconds_total{cpu="cpu15", executor="allowed", group="governor", id="/syst
em.slice/docker-429885ec41007306f8fc4d696ef77d3fbb3cee64be025b4327ea4e9eeef3c660.scope/docker", instance="governor02", job="k8s-kubelet", kubernetes_io_hostname="governor02", pd="allowed"}, mapping to ne
w fingerprint 0000000000000003." source="mapper.go:182" 
time="2017-07-24T19:53:45+08:00" level=info msg="Collision detected for fingerprint e0cc41b136ab4ffb, metric container_memory_failures_total{executor="allowed", group="governor", id="/system.slice/docker
-429885ec41007306f8fc4d696ef77d3fbb3cee64be025b4327ea4e9eeef3c660.scope/docker", instance="governor02", job="k8s-kubelet", kubernetes_io_hostname="governor02", pd="allowed", scope="hierarchy", type="pgfa
ult"}, mapping to new fingerprint 0000000000000004." source="mapper.go:182" 
time="2017-07-24T19:53:45+08:00" level=info msg="Collision detected for fingerprint 21ac2c0ae1757c25, metric container_memory_failures_total{executor="allowed", group="governor", id="/system.slice/docker
-429885ec41007306f8fc4d696ef77d3fbb3cee64be025b4327ea4e9eeef3c660.scope/docker", instance="governor02", job="k8s-kubelet", kubernetes_io_hostname="governor02", pd="allowed", scope="hierarchy", type="pgma
jfault"}, mapping to new fingerprint 0000000000000005." source="mapper.go:182" 
time="2017-07-24T19:53:46+08:00" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="(*config.TargetGroup)(nil)" 
panic: runtime error: integer divide by zero

goroutine 232623 [running]:
github.com/prometheus/prometheus/storage/local/chunk.(*doubleDeltaEncodedChunk).NewIterator(0xc43c8ca8e0, 0x7, 0x6)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/chunk/doubledelta.go:216 +0x316
github.com/prometheus/prometheus/storage/local/chunk.(*Desc).LastTime(0xc459c1d300, 0xc46956efe8, 0x7, 0x1)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/chunk/chunk.go:202 +0x73
github.com/prometheus/prometheus/storage/local.(*memorySeries).preloadChunksForRange(0xc428fad260, 0x9cd7fd1e22c6d5f4, 0x15d746d8d7f, 0x15d7472215f, 0xc42060e000, 0x26b80e0, 0xc44b0274c0, 0x0, 0x0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/series.go:478 +0x492
github.com/prometheus/prometheus/storage/local.(*MemorySeriesStorage).preloadChunksForRange(0xc42060e000, 0x9cd7fd1e22c6d5f4, 0xc428fad260, 0x15d746d8d7f, 0x15d7472215f, 0x0, 0x0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/storage.go:1097 +0xd1
github.com/prometheus/prometheus/storage/local.(*MemorySeriesStorage).QueryRange(0xc42060e000, 0x7fd0a3f50c50, 0xc460598a80, 0x15d746d8d7f, 0x15d7472215f, 0xc48439a1d0, 0x1, 0x1, 0x0, 0x0, ...)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/storage.go:595 +0x446
github.com/prometheus/prometheus/storage/local.memorySeriesStorageQuerier.QueryRange(0xc42060e000, 0x7fd0a3f50c50, 0xc460598a80, 0x15d746d8d7f, 0x15d7472215f, 0xc48439a1d0, 0x1, 0x1, 0xc45d75c640, 0x3, .
..)
        <autogenerated>:74 +0xa0
github.com/prometheus/prometheus/storage/fanin.querier.QueryRange.func1(0x26bca20, 0xc42060e000, 0x6, 0xc4245edc00, 0xc460c09330, 0x46cf73, 0xc4795c4068)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/fanin/fanin.go:70 +0x78
github.com/prometheus/prometheus/storage/fanin.querier.query(0x26bca20, 0xc42060e000, 0x278f978, 0x0, 0x0, 0x7fd0a3f50c50, 0xc460598a80, 0xc46956f458, 0x41211f, 0xc460c09448, ...)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/fanin/fanin.go:81 +0x58
github.com/prometheus/prometheus/storage/fanin.querier.QueryRange(0x26bca20, 0xc42060e000, 0x278f978, 0x0, 0x0, 0x7fd0a3f50c50, 0xc460598a80, 0x15d746d8d7f, 0x15d7472215f, 0xc48439a1d0, ...)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/fanin/fanin.go:71 +0x105
github.com/prometheus/prometheus/storage/fanin.(*querier).QueryRange(0xc46d71f050, 0x7fd0a3f50c50, 0xc460598a80, 0x15d746d8d7f, 0x15d7472215f, 0xc48439a1d0, 0x1, 0x1, 0x0, 0x18dc6e0, ...)
        <autogenerated>:2 +0xdc
github.com/prometheus/prometheus/promql.(*Engine).populateIterators.func1(0x7fd106a58060, 0xc45d75c640, 0x3bff6c43a35)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/engine.go:599 +0x3e7
github.com/prometheus/prometheus/promql.inspector.Visit(0xc46628c480, 0x7fd106a58060, 0xc45d75c640, 0x7fd106a58060, 0x4134a7)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/ast.go:306 +0x3a
github.com/prometheus/prometheus/promql.Walk(0x26a6c20, 0xc46628c480, 0x7fd106a58060, 0xc45d75c640)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/ast.go:255 +0x58
github.com/prometheus/prometheus/promql.Walk(0x26a6c20, 0xc46628c480, 0x26a6ba0, 0xc439dcca40)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/ast.go:275 +0x1d1
github.com/prometheus/prometheus/promql.Walk(0x26a6c20, 0xc46628c480, 0x7fd106a58008, 0xc457db1620)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/ast.go:285 +0x6ab
github.com/prometheus/prometheus/promql.Walk(0x26a6c20, 0xc46628c480, 0x7fd106a106e8, 0xc45d75c690)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/ast.go:278 +0x510
github.com/prometheus/prometheus/promql.Inspect(0x7fd106a106e8, 0xc45d75c690, 0xc46628c480)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/ast.go:316 +0x4b
github.com/prometheus/prometheus/promql.(*Engine).populateIterators(0xc4204ba960, 0x7fd0a3f50c50, 0xc460598a80, 0x26bc9c0, 0xc46d71f050, 0xc446972f30, 0xc47ebad860, 0xc460c09ae0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/engine.go:605 +0x12d
github.com/prometheus/prometheus/promql.(*Engine).execEvalStmt(0xc4204ba960, 0x7fd0a3f50c50, 0xc460598a80, 0xc42ac795c0, 0xc446972f30, 0x0, 0x0, 0x0, 0x0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/engine.go:454 +0x19b
github.com/prometheus/prometheus/promql.(*Engine).exec(0xc4204ba960, 0x7fd0a3f50c50, 0xc460598a80, 0xc42ac795c0, 0x0, 0x0, 0x0, 0x0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/engine.go:437 +0x3a0
github.com/prometheus/prometheus/promql.(*query).Exec(0xc42ac795c0, 0x7fd106a4f4e8, 0xc4201e5d70, 0x15d7472215f)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/promql/engine.go:281 +0x9e
github.com/prometheus/prometheus/rules.RecordingRule.Eval(0xc420b98fa7, 0x1e, 0x26aeb60, 0xc4207527d0, 0x0, 0x7fd106a4f4e8, 0xc4201e5d70, 0x15d7472215f, 0xc4204ba960, 0xc42027c880, ...)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/rules/recording.go:57 +0xcc
github.com/prometheus/prometheus/rules.(*RecordingRule).Eval(0xc420bcf0b0, 0x7fd106a4f4e8, 0xc4201e5d70, 0x15d7472215f, 0xc4204ba960, 0xc42027c880, 0x276f020, 0x10, 0xc42002cdf8, 0x0, ...)
        <autogenerated>:7 +0xb7
github.com/prometheus/prometheus/rules.(*Group).Eval.func1(0xc44b84c000, 0x1b045d2, 0x9, 0xc4207537c0, 0x15d7472215f, 0x26b8060, 0xc420bcf0b0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/rules/manager.go:277 +0x18c
created by github.com/prometheus/prometheus/rules.(*Group).Eval
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/rules/manager.go:315 +0x138
@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 24, 2017

Now you tickled the same bug in a different way. Perhaps that will actually give us a hint what's going on.

I'll investigate. Maybe I have to give you yet another debug binary. We'll see... :o/

beorn7 added a commit that referenced this issue Jul 25, 2017

beorn7 added a commit that referenced this issue Jul 25, 2017

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 25, 2017

@Cplo I have pushed another commit to https://github.com/prometheus/prometheus/tree/beorn7/storage . Could you build from there again and try it out?

This time, the server will panic again when it encounters the problem, but it will dump the chunk as part of the error message. In that way, we'll at least get an idea if it is a chunk with corrupted data or a zero'd chunk that somehow snug in.

@Cplo

This comment has been minimized.

Copy link
Author

Cplo commented Jul 26, 2017

@beorn7 ok

@Cplo

This comment has been minimized.

Copy link
Author

Cplo commented Aug 6, 2017

@beorn7 The latest crash logs

panic: zero bytes for time delta found, chunk dump: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

goroutine 432 [running]:
github.com/prometheus/prometheus/storage/local/chunk.doubleDeltaEncodedChunk.timeBytes(0xc534398000, 0x133, 0x400, 0xc4203cdb02)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/chunk/doubledelta.go:341 +0xfb
github.com/prometheus/prometheus/storage/local/chunk.doubleDeltaEncodedChunk.sampleSize(0xc534398000, 0x133, 0x400, 0xc42acc0d80)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/chunk/doubledelta.go:351 +0x3f
github.com/prometheus/prometheus/storage/local/chunk.doubleDeltaEncodedChunk.Len(0xc534398000, 0x133, 0x400, 0x89)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/chunk/doubledelta.go:362 +0x4f
github.com/prometheus/prometheus/storage/local/chunk.doubleDeltaEncodedChunk.Add(0xc534398000, 0x133, 0x400, 0x15db7727e2b, 0x0, 0xc50da9e601,
 0x10000000275ddd0, 0xc462c08cb0, 0xc4ddb19840, 0x15dcba4)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/chunk/doubledelta.go:87 +0x9e
github.com/prometheus/prometheus/storage/local/chunk.(*doubleDeltaEncodedChunk).Add(0xc48e4e8bc0, 0x15db7727e2b, 0x0, 0xc4ddb19701, 0x1a, 0xc43be24b00, 0x0, 0x0)
        <autogenerated>:39 +0x75
github.com/prometheus/prometheus/storage/local/chunk.(*Desc).Add(0xc4ddf48f80, 0x15db7727e2b, 0x0, 0x0, 0x1c, 0x3, 0xc481fa2680, 0xc500b110c0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/chunk/chunk.go:142 +0x53
github.com/prometheus/prometheus/storage/local.(*memorySeries).add(0xc462c08cb0, 0x15db7727e2b, 0x0, 0xc462c08cb0, 0x0, 0x0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/series.go:226 +0x109
github.com/prometheus/prometheus/storage/local.(*MemorySeriesStorage).Append(0xc42038c000, 0xc431322b60, 0x0, 0x0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/local/storage.go:953 +0x324
github.com/prometheus/prometheus/storage.Fanout.Append(0xc420284780, 0x2, 0x2, 0xc431322b60, 0x19e5360, 0x0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/storage/storage.go:60 +0x66
github.com/prometheus/prometheus/storage.(*Fanout).Append(0xc42000cb40, 0xc431322b60, 0x275ddd0, 0xc4e2cfcea0)
        <autogenerated>:3 +0x69
github.com/prometheus/prometheus/retrieval.(*countingAppender).Append(0xc517e186a0, 0xc431322b60, 0xc4ddb19a78, 0xc50da9e768)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/retrieval/target.go:261 +0x48
github.com/prometheus/prometheus/retrieval.ruleLabelsAppender.Append(0x26aa720, 0xc517e186a0, 0xc48f4e8fc0, 0xc431322b60, 0x0, 0x0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/retrieval/target.go:201 +0x1af
github.com/prometheus/prometheus/retrieval.(*ruleLabelsAppender).Append(0xc517e186c0, 0xc431322b60, 0x0, 0x0)
        <autogenerated>:35 +0x69
github.com/prometheus/prometheus/retrieval.(*scrapeLoop).append(0xc48377da40, 0xc4dc024000, 0x137b, 0x1400, 0xe0a36f0, 0x276f020, 0xc4dc024000)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/retrieval/scrape.go:520 +0x2ec
github.com/prometheus/prometheus/retrieval.(*scrapeLoop).run(0xc48377da40, 0xdf8475800, 0x6fc23ac00, 0x0)
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/retrieval/scrape.go:430 +0x614
created by github.com/prometheus/prometheus/retrieval.(*scrapePool).sync
        /home/chenpeng/mygo/src/github.com/prometheus/prometheus/retrieval/scrape.go:258 +0x3a6

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 7, 2017

Thanks a lot. We got a smoking gun, but without the gunner. Or in other word: Thanks to your dump above, we know it's an uninitialized chunk rather than a chunk with corrupted data. That's valuable information. I have no clue at the moment how an uninitialized chunk can slip in. It might be something very specific to your setup, as we haven't got any other report of this kind of crash.

I'll give it another try to stare at code at my next convenience. In the meantime, please let me know if there is anything special about your setup. This could be weird stuff like this problem only occurring on one special machine or one type of machine.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 18, 2017

Sooo… I did some more thorough code analysis using all my Go Guru skills (slowly becoming a Go Guru guru ;). I couldn't find any entry point that would let slip in a completely null'd doubleDeltaEncodedChunk, even if read from a corrupted storage device.

@Cplo Did you think about my previous question, i.e. "Please let me know if there is anything special about your setup. This could be weird stuff like this problem only occurring on one special machine or one type of machine." I'm at a point where I start suspecting faulty hardware.

Of course, it would be helpful if anybody else in the world saw the same problem.

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Nov 12, 2017

Closing this due the release of 2.0, only a single report of this panic, still no idea how this can happen, and no response for 3 months.

@grobie grobie closed this Nov 12, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.