Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read symbols: read symbols: invalid checksum" #5148

Closed
xudawei opened this Issue Jan 29, 2019 · 2 comments

Comments

Projects
None yet
2 participants
@xudawei
Copy link

xudawei commented Jan 29, 2019

Bug Report

What did you do?
Running prometheus server on physical machine with raid10 which were composed of SSDs

What did you expect to see?
The head chunk should be released every two hours,Memory should be freed

What did you see instead? Under which circumstances?
About 2 days, At some point in time, head chunks will no longer execute and the memory will continue to grow until the OOM.
I found some errors about " compaction failed " :
reload blocks: open block /bankapp/prometheus/data/29.0.164.36/01D1Y9M7F00CRFGD6ZHTVMMMHA: read symbols: read symbols: invalid checksum

Environment
About 80K samples appended/s
About 500G data file one day.

  • System information:

    RHEL7.4 official system

  • Prometheus version:

    Prometheus 2.6.0

  • Prometheus configuration file:

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 1m
  external_labels:
    slave: '29.0.164.36:9090'
    slavezone: 'central,core,lscore'
    idc: 'gl'


alerting:
  alertmanagers:
    - static_configs:
      - targets: ["alert.gl.com.cn"]

scrape_configs:
  - job_name: 'node_in_consul'
    scrape_interval: 60s
    scrape_timeout: 10s
    params:
      collect[]:
        - cpu
        - textfile
        - meminfo
        - loadavg
        - filesystem
        - netdev
        - time
        - netstat
        - diskstats
    consul_sd_configs:
      - server: "consul.core:8500"
        services: ["node_exporter"]
        token: d0e3ef2e-2147-b996-aa92-b21891043efc
        datacenter: central
      - server: "consul.core:8500"
        services: ["node_exporter"]
        token: d0e3ef2e-2147-b996-aa92-b21891043efc
        datacenter: core
      - server: "consul.core:8500"
        services: ["node_exporter"]
        token: d0e3ef2e-2147-b996-aa92-b21891043efc
        datacenter: lscore
    relabel_configs:
      - source_labels: [__meta_consul_address]
        target_label: ip
      - source_labels: [__meta_consul_tags]
        regex: ',(?:[^,]+,){0}([^=]+)=([^,]+),.*'
        replacement: '${2}'
        target_label: '${1}'
      - source_labels: [__meta_consul_tags]
        regex: ',(?:[^,]+,){1}([^=]+)=([^,]+),.*'
        replacement: '${2}'
        target_label: '${1}'
      - source_labels: [__meta_consul_tags]
        regex: ',(?:[^,]+,){2}([^=]+)=([^,]+),.*'
        replacement: '${2}'
        target_label: '${1}'
      - source_labels: [__meta_consul_tags]
        regex: ',(?:[^,]+,){3}([^=]+)=([^,]+),.*'
        replacement: '${2}'
        target_label: '${1}'
      - source_labels: [__meta_consul_tags]
        regex: ',(?:[^,]+,){4}([^=]+)=([^,]+),.*'
        replacement: '${2}'
        target_label: '${1}'

  - job_name: 'prometheus'
    scrape_interval: 15s
    scrape_timeout: 10s
    static_configs:
      - targets: ['29.0.164.36:9090']
  • Logs:
Jan 24 01:09:53 EQUHST00004216 prometheus[3449]: level=info ts=2019-01-23T17:09:53.129981241Z caller=head.go:567 component=tsdb msg="WAL checkpoint complete" first=2041 last=2185 duration=1m35.194529434s
Jan 24 03:09:19 EQUHST00004216 prometheus[3449]: level=info ts=2019-01-23T19:09:19.07306972Z caller=compact.go:416 component=tsdb msg="write block" mint=1548259200000 maxt=1548266400000 ulid=01D1Y1ANWWS36DK493DT4PV83D
Jan 24 03:10:50 EQUHST00004216 prometheus[3449]: level=info ts=2019-01-23T19:10:50.939357835Z caller=head.go:520 component=tsdb msg="head GC completed" duration=1m21.960888153s
Jan 24 03:12:31 EQUHST00004216 prometheus[3449]: level=info ts=2019-01-23T19:12:31.157013625Z caller=head.go:567 component=tsdb msg="WAL checkpoint complete" first=2186 last=2338 duration=1m40.217584382s
Jan 24 05:13:27 EQUHST00004216 prometheus[3449]: level=info ts=2019-01-23T21:13:27.881901938Z caller=compact.go:416 component=tsdb msg="write block" mint=1548266400000 maxt=1548273600000 ulid=01D1Y86D4PST8EEJ6A8N8MQQJ5
Jan 24 05:15:59 EQUHST00004216 prometheus[3449]: level=info ts=2019-01-23T21:15:59.723350051Z caller=head.go:520 component=tsdb msg="head GC completed" duration=2m17.525086709s
Jan 24 05:17:48 EQUHST00004216 prometheus[3449]: level=info ts=2019-01-23T21:17:48.13550594Z caller=head.go:567 component=tsdb msg="WAL checkpoint complete" first=2339 last=2504 duration=1m48.412106758s
Jan 24 05:24:40 EQUHST00004216 prometheus[3449]: level=info ts=2019-01-23T21:24:40.019780013Z caller=compact.go:369 component=tsdb msg="compact blocks" count=3 mint=1548244800000 maxt=1548266400000 ulid=01D1Y975DJJ6PK9VF8XCXTV73K sources="[01D1XKK7CNNBMZGNZFP2AT71MH 01D1XTEYMRCZ9G0BGJ4F8ARVYH 01D1Y1ANWWS36DK493DT4PV83D]" duration=6m46.561626486s
Jan 24 05:51:33 EQUHST00004216 prometheus[3449]: level=info ts=2019-01-23T21:51:33.705059402Z caller=compact.go:369 component=tsdb msg="compact blocks" count=3 mint=1548201600000 maxt=1548266400000 ulid=01D1Y9M7F00CRFGD6ZHTVMMMHA sources="[01D1X028P9YAETKRCT9BS9QA6T 01D1XM6VM06QS1MT1HF2M340ZM 01D1Y975DJJ6PK9VF8XCXTV73K]" duration=26m32.168544173s
Jan 24 05:51:40 EQUHST00004216 prometheus[3449]: level=error ts=2019-01-23T21:51:40.851805092Z caller=db.go:324 component=tsdb msg="compaction failed" err="reload blocks: open block /bankapp/prometheus/data/29.0.164.36/01D1Y9M7F00CRFGD6ZHTVMMMHA: read symbols: read symbols: invalid checksum"
Jan 24 05:51:42 EQUHST00004216 prometheus[3449]: level=error ts=2019-01-23T21:51:42.607494072Z caller=db.go:324 component=tsdb msg="compaction failed" err="compact [/bankapp/prometheus/data/29.0.164.36/01D1X028P9YAETKRCT9BS9QA6T /bankapp/prometheus/data/29.0.164.36/01D1Y9M7F00CRFGD6ZHTVMMMHA /bankapp/prometheus/data/29.0.164.36/01D1XM6VM06QS1MT1HF2M340ZM /bankapp/prometheus/data/29.0.164.36/01D1Y975DJJ6PK9VF8XCXTV73K]: read symbols: read symbols: invalid checksum"
Jan 24 05:52:47 EQUHST00004216 prometheus[3449]: level=error ts=2019-01-23T21:52:47.418355665Z caller=db.go:324 component=tsdb msg="compaction failed" err="compact [/bankapp/prometheus/data/29.0.164.36/01D1X028P9YAETKRCT9BS9QA6T /bankapp/prometheus/data/29.0.164.36/01D1Y9M7F00CRFGD6ZHTVMMMHA /bankapp/prometheus/data/29.0.164.36/01D1XM6VM06QS1MT1HF2M340ZM /bankapp/prometheus/data/29.0.164.36/01D1Y975DJJ6PK9VF8XCXTV73K]: read symbols: read symbols: invalid checksum"
Jan 24 05:53:56 EQUHST00004216 prometheus[3449]: level=error ts=2019-01-23T21:53:56.299115296Z caller=db.go:324 component=tsdb msg="compaction failed" err="compact [/bankapp/prometheus/data/29.0.164.36/01D1X028P9YAETKRCT9BS9QA6T /bankapp/prometheus/data/29.0.164.36/01D1Y9M7F00CRFGD6ZHTVMMMHA /bankapp/prometheus/d…
  • Performance monitor
    snipaste_2019-01-29_20-26-03
@xudawei

This comment has been minimized.

Copy link
Author

xudawei commented Jan 31, 2019

help wanted

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Feb 1, 2019

It looks like you have corrupted files. In that case there's nothing much to do except stop Prometheus and remove the offending file. See https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects

I'm closing it for now. If you have further questions, please use our user mailing list, which you can also search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.