Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus crashes and hangs on `fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)` #2263

Closed
ichekrygin opened this Issue Dec 7, 2016 · 9 comments

Comments

Projects
None yet
4 participants
@ichekrygin
Copy link

ichekrygin commented Dec 7, 2016

What did you do?
Run Prometheus on K8S

What did you expect to see?
Prometheus is running w/out down time

What did you see instead? Under which circumstances?
Prometheus has crashed on:

time="2016-12-07T13:33:47Z" level=info msg="Done checkpointing in-memory metrics and chunks in 9.001990326s." source="persistence.go:573" 
fatal error: scanobject n == 0
runtime: pointer 0xc44d7c60c0 to unused region of spanidx=0x16be3 span.base()=0x0 span.limit=0x0 span.state=0
runtime: found in object at *(0xc47ce2cc80+0x150)
object=0xc47ce2cc80 k=0x623e716 s.base()=0xc47ce2c000 s.limit=0xc47ce2de00 s.sizeclass=27 s.elemsize=640
 *(object+0) = 0xc433a40240
 *(object+8) = 0xc433a40280
 *(object+16) = 0xc433a402c0
 *(object+24) = 0xc433a40300
...
 *(object+328) = 0xc454e4b880
 *(object+336) = 0xc44d7c60c0 <==
 *(object+344) = 0xc453c958c0
...
 *(object+624) = 0x0
 *(object+632) = 0x0
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

Even on crash, I would expect Prometheus process to exit and let K8S handle the restart, instead, it appears the process just hung.

Environment
Kubernetes on AWS (CoreOS)

Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.4", GitCommit:"3b417cc4ccd1b8f38ff9ec96bb50a81ca0ea9d56", GitTreeState:"clean", BuildDate:"2016-10-21T02:42:39Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
  • System information:

    insert output of uname -srm here
    Linux 4.7.3-coreos-r2 x86_64

  • Prometheus version:

    insert output of prometheus -version here

prometheus, version 1.4.1 (branch: master, revision: 2a89e8733f240d3cd57a6520b52c36ac4744ce12)
  build user:       root@e685d23d8809
  build date:       20161128-09:59:22
  go version:       go1.7.3
  • Alertmanager version:

n/a

  • Prometheus configuration file:
    global:
      scrape_interval: 30s
      scrape_timeout: 30s
    rule_files:
    - /etc/prometheus/recording.rules
    scrape_configs:
    - job_name: etcd
      static_configs:
        - targets:
          - 10.72.0.0:2379
          - 10.72.0.1:2379
          - 10.72.4.110:2379
          - 10.72.16.28:2379
          - 10.72.6.19:2379
    - job_name: 'prometheus'
      static_configs:
        - targets: ['localhost:9090']
    - job_name: 'cloudwatch'
      static_configs:
        - targets: ['prometheus-cloudwatch-exporter:80']
    
    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-nodes'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-service-endpoints'
      scheme: https
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)(?::\d+);(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_service_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
    - job_name: 'kubernetes-services'
      scheme: https
      metrics_path: /probe
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
      - role: service
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_service_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: kubernetes_name
    - job_name: 'kubernetes-pods'
      scheme: https
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: (.+):(?:\d+);(\d+)
        replacement: ${1}:${2}
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_pod_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
  • Alertmanager configuration file:
    N/A

  • Logs:
    attached

@ichekrygin ichekrygin changed the title Prometheus crashes and hangs on `*(object+560) = 0x0 *(object+568) = 0x0 *(object+576) = 0x0 *(object+584) = 0x0 *(object+592) = 0x0 *(object+600) = 0x0 *(object+608) = 0x0 *(object+616) = 0x0 *(object+624) = 0x0 *(object+632) = 0x0 fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)` Prometheus crashes and hangs on `fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)` Dec 7, 2016

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 7, 2016

This is likely a bug in Go. What version of Prometheus is this, and what version of Go was it compiled with?

@ichekrygin

This comment has been minimized.

Copy link
Author

ichekrygin commented Dec 7, 2016

@brian-brazil, sorry keep hitting Ctrl+Enter which submits the changes :)

@ichekrygin

This comment has been minimized.

Copy link
Author

ichekrygin commented Dec 7, 2016

@ichekrygin

This comment has been minimized.

Copy link
Author

ichekrygin commented Dec 7, 2016

@brian-brazil

prometheus, version 1.4.1 (branch: master, revision: 2a89e8733f240d3cd57a6520b52c36ac4744ce12)
  build user:       root@e685d23d8809
  build date:       20161128-09:59:22
  go version:       go1.7.3
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 8, 2016

Doesn't look like there's any known issues in 1.7.3, so this is something new.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Dec 8, 2016

Could also be an edge case in goleveldb, which certainly uses the unsafe package.

@ichekrygin

This comment has been minimized.

Copy link
Author

ichekrygin commented Jan 12, 2017

Hi, any progress on this issue?

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Sep 28, 2017

Are you still seeing this? This has been stale for sometime and with the new 2.0 release coming up, this becomes obsolete. Closing this now, but feel free to re-open if you are still facing this in 1.7.x releases.

@gouthamve gouthamve closed this Sep 28, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.