Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: 2.8.0+ When prometheus is asked for a reload we almost every time get `panic: released unknown string` #5480

Open
beorn- opened this Issue Apr 18, 2019 · 5 comments

Comments

Projects
None yet
4 participants
@beorn-
Copy link

beorn- commented Apr 18, 2019

Bug Report

Since 2.8.x when we trigger a prometheus daemon reload the prometheus instance crashes.

the instance sustains a stable 150k datapoints/s stream

We are running on linux stretch amd64.

  • System information:

    Linux 4.18.0-0.bpo.1-amd64 x86_64

  • Prometheus version:

    prometheus, version 2.9.1 (branch: HEAD, revision: ad71f27)
    build user: root@09f919068df4
    build date: 20190416-17:50:04
    go version: go1.12.4

  • Prometheus configuration file:

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  external_labels:
      monitor: 'prometheus'

rule_files:
  - "rules/*.yml"
  - "rules/app/*.yml"

scrape_configs:

[..]

remote_write:
  - url: "http://<redacted>:8000/write"
    queue_config:
      capacity: 1000000
      max_shards: 2000
  • Logs:
Apr 18 10:39:59 srv01 systemd[1]: Reloading Monitoring system and time series database.
Apr 18 10:39:59 srv01 prometheus[8124]: level=info ts=2019-04-18T10:39:59.297Z caller=main.go:724 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml 
Apr 18 10:39:59 srv01 systemd[1]: Reloaded Monitoring system and time series database.
Apr 18 10:39:59 srv01 prometheus[8124]: ts=2019-04-18T10:39:59.302Z caller=dedupe.go:111 component=remote level=info queue=0:http://lts01/url msg="Stopping remote storage..."
Apr 18 10:39:59 srv01 prometheus[8124]: ts=2019-04-18T10:39:59.459Z caller=dedupe.go:111 component=remote level=info queue=0:http://lts01/url msg="WAL watcher stopped" queue=0:http://lts01/url
Apr 18 10:40:00 srv01 prometheus[8124]: ts=2019-04-18T10:40:00.444Z caller=dedupe.go:111 component=remote level=info queue=0:http://lts01/url msg="Remote storage stopped."
Apr 18 10:40:00 srv01 prometheus[8124]: panic: released unknown string          
Apr 18 10:40:00 srv01 systemd[1]: prometheus.service: Main process exited, code=exited, status=2/INVALIDARGUMENT 

@beorn- beorn- changed the title Panic: 2.8.0+ When prometheus is asked for a reload we get `panic: released unknown string` Panic: 2.8.0+ When prometheus is asked for a reload we almost every time get `panic: released unknown string` Apr 18, 2019

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Apr 18, 2019

Just to clarify I believe the string interning was merged to master when 2.8 was out. So it didn't get into a release until 2.9.

@beorn- any chance you have the whole goroutine dump from when the panic happened?

@jacob-scheatzle

This comment has been minimized.

Copy link

jacob-scheatzle commented Apr 19, 2019

We're also seeing this panic on 2.9.1, but it is after a WAL checkpoint competes

  • Version
prometheus, version 2.9.1 (branch: HEAD, revision: ad71f2785fc321092948e33706b04f3150eee44f)
  build user:       root@09f919068df4
  build date:       20190416-17:50:04
  go version:       go1.12.4
  • Logs
panic: released unknown string
goroutine 417 [running]:
github.com/prometheus/prometheus/storage/remote.(*pool).release(0xc0001702c0, 0x0, 0x0)
/app/storage/remote/intern.go:76 +0x1a6
github.com/prometheus/prometheus/storage/remote.release(0xc003e06000, 0xf, 0xf)
/app/storage/remote/queue_manager.go:389 +0x104
github.com/prometheus/prometheus/storage/remote.(*QueueManager).StoreSeries(0xc000f7c820, 0xc02cb2fc00, 0x4, 0x4, 0x1)
/app/storage/remote/queue_manager.go:362 +0x3d1
github.com/prometheus/prometheus/storage/remote.(*WALWatcher).readSegment(0xc0001a64d0, 0xc00f2ba000, 0x1, 0x0, 0x0, 0x0)
/app/storage/remote/wal_watcher.go:419 +0x51b
github.com/prometheus/prometheus/storage/remote.(*WALWatcher).readCheckpoint(0xc0001a64d0, 0xc030107f80, 0x29, 0x0, 0x0)
/app/storage/remote/wal_watcher.go:497 +0x406
github.com/prometheus/prometheus/storage/remote.(*WALWatcher).garbageCollectSeries(0xc0001a64d0, 0x4, 0x4, 0x1)
/app/storage/remote/wal_watcher.go:392 +0x605
github.com/prometheus/prometheus/storage/remote.(*WALWatcher).watch(0xc0001a64d0, 0x4, 0x1, 0x0, 0x0)
/app/storage/remote/wal_watcher.go:314 +0x113b
github.com/prometheus/prometheus/storage/remote.(*WALWatcher).run(0xc0001a64d0, 0x15a25d67, 0x38e19e0)
/app/storage/remote/wal_watcher.go:202 +0x5b5
github.com/prometheus/prometheus/storage/remote.(*WALWatcher).loop(0xc0001a64d0)
/app/storage/remote/wal_watcher.go:159 +0x10d
created by github.com/prometheus/prometheus/storage/remote.(*WALWatcher).Start
/app/storage/remote/wal_watcher.go:143 +0x1ab
prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Unit prometheus.service entered failed state.
prometheus.service failed.
prometheus.service holdoff time over, scheduling restart.
Stopped Prometheus Monitoring framework.
Started Prometheus Monitoring framework.```

level=info ts=2019-04-18T23:00:08.747Z caller=compact.go:499 component=tsdb msg="write block" mint=1555617600000 maxt=1555624800000 duration=8.742463531s
level=info ts=2019-04-18T23:00:09.985Z caller=head.go:540 component=tsdb msg="head GC completed" duration=550.925943ms
level=info ts=2019-04-18T23:00:15.215Z caller=head.go:604 component=tsdb msg="WAL checkpoint complete" first=3 last=7 duration=5.22919895s
panic: released unknown string
goroutine 454 [running]:
prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Unit prometheus.service entered failed state.
prometheus.service failed.
prometheus.service holdoff time over, scheduling restart.
Stopped Prometheus Monitoring framework.
Started Prometheus Monitoring framework.
...
level=warn ts=2019-04-18T23:00:16.846Z caller=wal.go:116 component=tsdb msg="last page of the wal is torn, filling it with zeros" segment=/var/lib/prometheus/wal/00000018
@beorn-

This comment has been minimized.

Copy link
Author

beorn- commented Apr 19, 2019

Just to clarify I believe the string interning was merged to master when 2.8 was out. So it didn't get into a release until 2.9.

@beorn- any chance you have the whole goroutine dump from when the panic happened?

nope. but i can make that kind of thing happen with delve.

@beorn-

This comment has been minimized.

Copy link
Author

beorn- commented Apr 19, 2019

I am in delve trying to grab useful information but we know what's happening. at least we get the code path

 0  0x000000000042da70 in runtime.fatalpanic
    at /usr/lib/go/src/runtime/panic.go:690
 1  0x000000000042d4c5 in runtime.gopanic
    at /usr/lib/go/src/runtime/panic.go:565
 2  0x0000000001a16f04 in github.com/prometheus/prometheus/storage/remote.(*pool).release
    at /home/rbaugue/development/prometheus/storage/remote/intern.go:80
 3  0x0000000001a19304 in github.com/prometheus/prometheus/storage/remote.release
    at /home/rbaugue/development/prometheus/storage/remote/queue_manager.go:389
 4  0x0000000001a18a98 in github.com/prometheus/prometheus/storage/remote.(*QueueManager).Stop
    at /home/rbaugue/development/prometheus/storage/remote/queue_manager.go:336
 5  0x0000000001a1ee54 in github.com/prometheus/prometheus/storage/remote.(*Storage).ApplyConfig
    at /home/rbaugue/development/prometheus/storage/remote/storage.go:106
 6  0x0000000001aa2274 in github.com/prometheus/prometheus/storage/remote.(*Storage).ApplyConfig-fm
    at /home/rbaugue/development/prometheus/storage/remote/storage.go:76
 7  0x0000000001a9d725 in main.reloadConfig
    at /home/rbaugue/development/prometheus/cmd/prometheus/main.go:742
 8  0x0000000001aa002d in main.main.func13
    at /home/rbaugue/development/prometheus/cmd/prometheus/main.go:565
 9  0x000000000082d8b7 in github.com/oklog/oklog/pkg/group.(*Group).Run.func1
    at /home/rbaugue/development/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38
10  0x000000000045ca91 in runtime.goexit
    at /usr/lib/go/src/runtime/asm_amd64.s:1337

When i look into frame 2 's args i get

p = *github.com/prometheus/prometheus/storage/remote.pool {
        mtx: sync.RWMutex {
                w: (*sync.Mutex)(0xc000298020),
                writerSem: 0,
                readerSem: 0,
                readerCount: 0,
                readerWait: 0,},
        pool: map[string]*github.com/prometheus/prometheus/storage/remote.entry [

[...]


                "": *nil,
                ...+43870 more
        ],}
s = ""

Which is expected.

@cstyan

@richinsr

This comment has been minimized.

Copy link

richinsr commented Apr 19, 2019

We are seeing the same thing in v2.9 when asking for a reload


level=info ts=2019-04-19T18:37:05.720Z caller=main.go:724 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
ts=2019-04-19T18:37:05.721Z caller=dedupe.go:111 component=remote level=info queue="0:https://<redacted>:8086/api/v1/prom/write?db=foo" msg="Stopping remote storage..."
ts=2019-04-19T18:37:05.905Z caller=dedupe.go:111 component=remote level=info queue="0:https://<redacted>:8086/api/v1/prom/write?db=foo" msg="WAL watcher stopped" queue="0:https://<redacted>:8086/api/v1/prom/write?db=foo"
ts=2019-04-19T18:37:06.013Z caller=dedupe.go:111 component=remote level=info queue="0:https://<redacted>:8086/api/v1/prom/write?db=foo" msg="Remote storage stopped."
panic: released unknown string

goroutine 243 [running]:
github.com/prometheus/prometheus/storage/remote.(*pool).release(0xc00041e9c0, 0x0, 0x0)
	/app/storage/remote/intern.go:76 +0x1a6
github.com/prometheus/prometheus/storage/remote.release(0xc144239680, 0x11, 0x11)
	/app/storage/remote/queue_manager.go:386 +0x104
github.com/prometheus/prometheus/storage/remote.(*QueueManager).Stop(0xc149894680)
	/app/storage/remote/queue_manager.go:333 +0x328
github.com/prometheus/prometheus/storage/remote.(*Storage).ApplyConfig(0xc00018a690, 0xc2a4d36680, 0x0, 0x0)
	/app/storage/remote/storage.go:106 +0x3f4
main.reloadConfig(0x7fff3f37ee23, 0x1e, 0x257bba0, 0xc00093a6f0, 0xc0002e79c0, 0x7, 0x7, 0x0, 0x0)
	/app/cmd/prometheus/main.go:742 +0x285
main.main.func13(0x0, 0x0)
	/app/cmd/prometheus/main.go:565 +0x4ed
github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc0000d05a0, 0xc0000d0480, 0xc000284000)
	/app/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
created by github.com/oklog/oklog/pkg/group.(*Group).Run
	/app/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xbe
level=info ts=2019-04-19T18:37:14.492Z caller=main.go:321 msg="Starting Prometheus" version="(version=2.9.0, branch=HEAD, revision=1a9cdbd024fea05a1888c3164058087476cfdc3b)"
level=info ts=2019-04-19T18:37:14.492Z caller=main.go:322 build_context="(go=go1.12.3, user=root@4d7ed54ba6d9, date=20190415-09:06:49)"
level=info ts=2019-04-19T18:37:14.492Z caller=main.go:323 host_details="(Linux 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 <redacted>(none))"
level=info ts=2019-04-19T18:37:14.492Z caller=main.go:324 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-04-19T18:37:14.492Z caller=main.go:325 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-04-19T18:37:14.494Z caller=main.go:640 msg="Starting TSDB ..."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.