Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: chunk desc eviction requested with unknown chunk desc offset #2409

Closed
ncabatoff opened this Issue Feb 8, 2017 · 7 comments

Comments

Projects
None yet
2 participants
@ncabatoff
Copy link

ncabatoff commented Feb 8, 2017

What did you do?

Upgraded from 1.4.1 to 1.5 last week. Upgraded to 1.5.1 yesterday, at the same time as slightly reducing memory chunks/max chunks to persist (from 14000000/7000000 to 12000000/6000000). It came up at about 17:02 my time, and crashed almost 6hrs later at 00:48 my time.

What did you expect to see?

No crash.

What did you see instead? Under which circumstances?

time="2017-02-08T05:48:51Z" level=info msg="Maintenance loop stopped." source="storage.go:1259" 
panic: chunk desc eviction requested with unknown chunk desc offset
goroutine 54 [running]:
panic(0x17bdd20, 0xcacea0c6d0)
	/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/prometheus/prometheus/storage/local.(*memorySeries).evictChunkDescs(0xc6cc46bb90, 0x1)
	/go/src/github.com/prometheus/prometheus/storage/local/series.go:287 +0x34d
github.com/prometheus/prometheus/storage/local.(*MemorySeriesStorage).maintainMemorySeries(0xc420150000, 0x65f00b0c7d3e0067, 0x15981c5a7b3, 0xc78fa56100)
	/go/src/github.com/prometheus/prometheus/storage/local/storage.go:1423 +0x2e9
github.com/prometheus/prometheus/storage/local.(*MemorySeriesStorage).loop(0xc420150000)
	/go/src/github.com/prometheus/prometheus/storage/local/storage.go:1302 +0x268
created by github.com/prometheus/prometheus/storage/local.(*MemorySeriesStorage).Start
	/go/src/github.com/prometheus/prometheus/storage/local/storage.go:389 +0x449

Environment

  • System information:

Linux 4.9.4-100.fc24.x86_64 x86_64

  • Prometheus version:

v1.5.1 (from hub.docker.com).

  • Prometheus configuration file:
global:
  scrape_interval:     45s
  evaluation_interval: 4m
  scrape_timeout:      45s

rule_files:
  - "/etc/prometheus/production/rules.txt"
  - "/etc/prometheus/production/alerts.txt"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
        - targets: ['localhost:9292']

  - job_name: 'cadvisor'
    static_configs:
         - targets: ['cadvisor:8080']

  - job_name: 'node'
    static_configs:
         - targets: ['node-exporter:9100']

  - job_name: 'federate'
    metrics_path: '/federate'
    honor_labels: true
    params:
      'match[]':
        - '{__name__=~".+"}'
    file_sd_configs:
        - files: ['/etc/prometheus/production/conf.d/*.json', '/etc/prometheus/testing/conf.d/*.json']

alerting:
  alertmanagers:
    - static_configs:
      - targets: ['alertmanager:9093']
  • Logs:
    Last log message prior to crash:
time="2017-02-08T05:22:14Z" level=info msg="Done checkpointing in-memory metrics and chunks in 1m29.702801248s." source="persistence.go:639" 
@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 8, 2017

Yeah, found that, too. #2410 addresses it.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 8, 2017

For context: This was a long existing bug, but it went undetected (without panic) until it would wreak havoc and corrupt data.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 9, 2017

I can still create the error under extreme load even with #2410. Still investigating...

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 9, 2017

I understand #2412 a bit better now, and also why this was tickled under memory pressure.
Still running more tests to see if this fixes the problem. As it occurs so rarely, it's difficult to reproduce.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 9, 2017

OK, much more insight by now. In about 6 hours, we'll have confidence if what's currently in the release-1.5 branch and the master branch will fix this.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 10, 2017

Look good. Will be out in 1.5.2 in a few hourse.

@beorn7 beorn7 closed this Feb 10, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.