Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid memory address or nil pointer dereference during tsdb compact #3767

Closed
arslanm opened this Issue Jan 30, 2018 · 2 comments

Comments

Projects
None yet
1 participant
@arslanm
Copy link

arslanm commented Jan 30, 2018

What did you do?

This looks like a storage problem and if it needs to be reported in prometheus/tsdb repository let me know.

I'm testing prometheus in our environment. It was running on the machine for about a week, having approx 25GB of data, until it panicked. Runtime parameters:

PROMETHEUS_OPTS="--config.file=/etc/prometheus/prometheus.yml \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.console.templates=/etc/prometheus/consoles \
--web.enable-admin-api \
--storage.tsdb.path=/data \
--storage.tsdb.retention=744h \
--storage.tsdb.min-block-duration=15m \
--log.level=debug \
--storage.tsdb.max-block-duration=30m"

First I received too many open files error, which threw this:

level=info ts=2018-01-30T07:07:36.729352613Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=1.901934286s
level=info ts=2018-01-30T07:22:30.039363062Z caller=compact.go:387 component=tsdb msg="compact blocks" count=1 mint=1517295600000 maxt=1517296500000
level=error ts=2018-01-30T07:22:35.142872779Z caller=compact.go:397 component=tsdb msg="removed tmp folder after failed compaction" err="open /data/01C52Z46RQ4HPAKTFKQGD5283D.tmp: too many open files"
level=error ts=2018-01-30T07:22:35.143850714Z caller=db.go:265 component=tsdb msg="compaction failed" err="persist head block: write merged meta: open /data/01C52Z46RQ4HPAKTFKQGD5283D.tmp/meta.json.tmp: too many open files"
level=error ts=2018-01-30T07:22:35.446422835Z caller=dns.go:164 component="discovery manager scrape" discovery=dns msg="Error refreshing DNS targets" err="could not load resolv.conf: open /etc/resolv.conf: too many open files"
level=info ts=2018-01-30T07:22:36.158462691Z caller=compact.go:387 component=tsdb msg="compact blocks" count=1 mint=1517295600000 maxt=1517296500000
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x120 pc=0x15933f9]

goroutine 261 [running]:
github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*LeveledCompactor).write(0xc42012a3c0, 0x7ffce0e4080e, 0x5, 0xc4a6da5810, 0xc482a4bcd0, 0x1, 0x1, 0x28f5400, 0xc49b3f2240)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/compact.go:431 +0x579
github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*LeveledCompactor).Write(0xc42012a3c0, 0x7ffce0e4080e, 0x5, 0x2901300, 0xc47d5b24a0, 0x16145dd8180, 0x16145eb3d20, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/compact.go:362 +0x2ad
github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*DB).compact(0xc42014c300, 0x0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/db.go:360 +0x21d
github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*DB).run(0xc42014c300)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/db.go:263 +0x30a
created by github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.Open
        /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/db.go:227 +0x589

I then modified /etc/sysconfig/security.limits, added below lines and started prometheus:

prometheus	-	nofile  65536
prometheus	-	nproc   2048

which also panicked:

level=info ts=2018-01-30T18:27:39.972274201Z caller=compact.go:387 component=tsdb msg="compact blocks" count=1 mint=1517309100000 maxt=1517310000000
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x120 pc=0x15933f9]

goroutine 246 [running]:
github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*LeveledCompactor).write(0xc420164e60, 0x7ffda536d80e, 0x5, 0xc420256070, 0xc465d25cd0, 0x1, 0x1, 0x28f5400, 0xc42e4e6300)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/compact.go:431 +0x579
github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*LeveledCompactor).Write(0xc420164e60, 0x7ffda536d80e, 0x5, 0x2901300, 0xc425cac000, 0x16146ab7fe0, 0x16146b93b80, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/compact.go:362 +0x2ad
github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*DB).compact(0xc42017e300, 0x1, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/db.go:360 +0x21d
github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*DB).run(0xc42017e300)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/db.go:263 +0x30a
created by github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.Open
        /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/db.go:227 +0x589

Environment

GCE instance running x86_64 CentOS 6.

  • System information:

Linux 3.18.17-13.el6.x86_64 x86_64

  • Prometheus version:
$ prometheus --version
prometheus, version 2.1.0 (branch: HEAD, revision: 85f23d82a045d103ea7f3c89a91fba4a93e6367a)
  build user:       root@6e784304d3ff
  build date:       20180119-12:01:23
  go version:       go1.9.2
  • Alertmanager version:
N/A
  • Prometheus configuration file:
global:
  scrape_interval: 15s
  evaluation_interval: 15s
scrape_configs:
  # Scrape local prometheus
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
    relabel_configs:
      - source_labels: [__address__]
        action: replace
        regex: ([^:]+)(?::\d+)?
        replacement: $1
        target_label: instance

  # Scrape GCE instances for Node Exporter
  - job_name: 'node_exporter'
    gce_sd_configs:
      - project: project_id
        zone: us-central1-a
        # excluding K8S nodes which are already scraped by K8S Prometheus
        filter: 'name ne ^gke.*'
        port: 9100
      - project: project_id
        zone: us-central1-b
        filter: 'name ne ^gke.*'
        port: 9100
      - project: project_id
        zone: us-central1-c
        filter: 'name ne ^gke.*'
        port: 9100
      - project: project_id
        zone: us-central1-f
        filter: 'name ne ^gke.*'
        port: 9100
    relabel_configs:
      - source_labels: [__meta_gce_public_ip]
        target_label: instance_extip
      - source_labels: [__meta_gce_instance_name]
        target_label: instance_name
      - source_labels: [__meta_gce_metadata_cluster_name]
        target_label: k8s_cluster
      - source_labels: [__meta_gce_project]
        target_label: instance_project
      - source_labels: [__meta_gce_tags]
        target_label: instance_tags
      - source_labels: [__meta_gce_zone]
        action: replace
        regex: (.+?)/([a-z0-9-]+)$
        replacement: $2
        target_label: instance_zone
      - source_labels: [__address__]
        action: replace
        regex: ([^:]+)(?::\d+)?
        replacement: $1
        target_label: instance

  # Scrape lower level prometheus
  - job_name: 'prometheus_federation'
    honor_labels: true
    metrics_path: /federate
    scheme: https
    basic_auth:
      username: user
      password: pass
    params:
      match[]:
        - '{job="prometheus"}'
        - '{job="kubernetes-service-endpoints"}'
        - '{job="kubernetes-apiservers"}'
        - '{job="kubernetes-cadvisor"}'
        - '{job="kubernetes-nodes"}'
        - '{job="kubernetes-pods"}'
    dns_sd_configs:
      - refresh_interval: 15s
        names:
        - prometheus-instances.domain
  • Alertmanager configuration file:
N/A
@arslanm

This comment has been minimized.

Copy link
Author

arslanm commented Jan 30, 2018

Looks like this was still because of a too many open files error even after /etc/security/limits.conf having prometheus user limits:

prometheus	-	nofile  65536
prometheus	-	nproc   2048

This isn't sufficient if the prometheus daemon showing up in the process list like this:

495      16238  149 38.5 26473384 2956360 ?    Ssl  18:56   2:26 /usr/bin/prometheus --config.file=/etc/prometheus/prometheus.yml --web.console.libraries=/etc/prometheus/console_libraries --web.console.templates=/etc/prometheus/consoles --web.enable-admin-api --storage.tsdb.path=/data --storage.tsdb.retention=744h --storage.tsdb.min-block-duration=15m --storage.tsdb.max-block-duration=30m

With the uid showing at the first column instead of the username. Prometheus seems to be working fine after modifying /etc/security/limits.conf and changing the lines to:

*	-	nofile  65536
*	-	nproc   2048

to apply limits for everyone. Closing this issue.

@arslanm arslanm closed this Jan 30, 2018

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.