Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue Manager: panic: close of closed channel #5451

Closed
bismarck opened this Issue Apr 10, 2019 · 3 comments

Comments

Projects
None yet
2 participants
@bismarck
Copy link

bismarck commented Apr 10, 2019

Bug Report

What did you do?
Ran prometheus with remote write enabled.
What did you expect to see?
No panic
What did you see instead? Under which circumstances?
Got a panic for close of closed channel . I suspect it occurs because (t *QueueManager) Stop() is called in the middle of (t *QueueManager) reshardLoop().
Environment

  • System information:

    Linux 4.15.0-1037-azure x86_64

  • Prometheus version:

    prometheus, version 2.8.1 (branch: remote-write-fix, revision: 4a913999f7e6f50ac8186f589561498c07e6dd53)
    build user: user@machine
    build date: 20190405-17:39:10
    go version: go1.12.1

This was a build of PR #5429 and Brian's TSDB changes 1099999. I'm pretty sure this issue is present in master.

  • Prometheus configuration file:
remote_write:
  - url: "http://HOST:7201/api/v1/prom/remote/write"
    queue_config:
      capacity: 100000
      min_shards: 1
      max_retries: 10
  • Logs:
Apr 06 14:39:49 my-host prometheus[19149]: ts=2019-04-06T14:39:49.5887504Z caller=dedupe.go:111 component=remote level=info queue=0:http://10.113.11.11:7201/api/v1/prom/remote/w
Apr 06 14:39:49 my-host prometheus[19149]: panic: close of closed channel
Apr 06 14:39:49 my-host prometheus[19149]: goroutine 183 [running]:
Apr 06 14:39:49 my-host prometheus[19149]: github.com/prometheus/prometheus/storage/remote.(*shards).stop(0xc03ee227e0)
Apr 06 14:39:49 my-host prometheus[19149]:         /Users/my.user/Work/github/prometheus/storage/remote/queue_manager.go:585 +0x59
Apr 06 14:39:49 my-host prometheus[19149]: github.com/prometheus/prometheus/storage/remote.(*QueueManager).Stop(0xc06a8da340)
Apr 06 14:39:49 my-host prometheus[19149]:         /Users/my.user/Work/github/prometheus/storage/remote/queue_manager.go:325 +0x236
Apr 06 14:39:49 my-host prometheus[19149]: github.com/prometheus/prometheus/storage/remote.(*Storage).ApplyConfig(0xc0001525b0, 0xc0020d6270, 0x0, 0x0)
Apr 06 14:39:49 my-host prometheus[19149]:         /Users/my.user/Work/github/prometheus/storage/remote/storage.go:106 +0x3f4
Apr 06 14:39:49 my-host prometheus[19149]: main.reloadConfig(0x7ffe37d62d0b, 0x2e, 0x24df6e0, 0xc0007f0270, 0xc0002d0b80, 0x7, 0x7, 0x0, 0x0)
Apr 06 14:39:49 my-host prometheus[19149]:         /Users/my.user/Work/github/prometheus/cmd/prometheus/main.go:742 +0x285
Apr 06 14:39:49 my-host prometheus[19149]: main.main.func13(0xc000129770, 0x125)
Apr 06 14:39:49 my-host prometheus[19149]:         /Users/my.user/Work/github/prometheus/cmd/prometheus/main.go:565 +0x4ed
Apr 06 14:39:49 my-host prometheus[19149]: github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc0002e8180, 0xc0002e80c0, 0xc000252030)
Apr 06 14:39:49 my-host prometheus[19149]:         /Users/my.user/Work/github/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
Apr 06 14:39:49 my-host prometheus[19149]: created by github.com/oklog/oklog/pkg/group.(*Group).Run
Apr 06 14:39:49 my-host prometheus[19149]:         /Users/my.user/Work/github/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xbe
Apr 06 14:39:50 my-host systemd[1]: prometheus.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 06 14:39:50 my-host systemd[1]: prometheus.service: Failed with result 'exit-code'.
Apr 06 14:41:20 my-host systemd[1]: prometheus.service: Service hold-off time over, scheduling restart.
Apr 06 14:41:20 my-host systemd[1]: prometheus.service: Scheduled restart job, restart counter is at 2.
Apr 06 14:41:20 my-host systemd[1]: Stopped Prometheus TSDB service.
Apr 06 14:41:20 my-host systemd[1]: Started Prometheus TSDB service.
@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Apr 10, 2019

@bismarck do you have the logs from before the logs you posted? was there a reshard or config reload taking place?

@bismarck

This comment has been minimized.

Copy link
Author

bismarck commented Apr 10, 2019

@cstyan Looks like both a reshard and reload took place:

Apr 06 14:38:09 my-hostprometheus[19149]: ts=2019-04-06T14:38:09.4844057Z caller=dedupe.go:111 component=remote level=info queue=0:http://10.113.11.11:7201/api/v1/prom/remote/write msg="Remote storage resharding" from=28 to=1
Apr 06 14:38:39 my-hostprometheus[19149]: ts=2019-04-06T14:38:39.4844581Z caller=dedupe.go:111 component=remote level=info queue=0:http://10.113.11.11:7201/api/v1/prom/remote/write msg="Remote storage resharding" from=1 to=26
Apr 06 14:38:49 my-hostprometheus[19149]: ts=2019-04-06T14:38:49.4843317Z caller=dedupe.go:111 component=remote level=info queue=0:http://10.113.11.11:7201/api/v1/prom/remote/write msg="Remote storage resharding" from=26 to=1
Apr 06 14:38:59 my-hostprometheus[19149]: ts=2019-04-06T14:38:59.4844702Z caller=dedupe.go:111 component=remote level=info queue=0:http://10.113.11.11:7201/api/v1/prom/remote/write msg="Remote storage resharding" from=1 to=32
Apr 06 14:39:09 my-hostprometheus[19149]: ts=2019-04-06T14:39:09.4843355Z caller=dedupe.go:111 component=remote level=info queue=0:http://10.113.11.11:7201/api/v1/prom/remote/write msg="Remote storage resharding" from=32 to=1
Apr 06 14:39:19 my-hostprometheus[19149]: ts=2019-04-06T14:39:19.4843703Z caller=dedupe.go:111 component=remote level=info queue=0:http://10.113.11.11:7201/api/v1/prom/remote/write msg="Remote storage resharding" from=1 to=28
Apr 06 14:39:29 my-hostprometheus[19149]: ts=2019-04-06T14:39:29.4845374Z caller=dedupe.go:111 component=remote level=info queue=0:http://10.113.11.11:7201/api/v1/prom/remote/write msg="Remote storage resharding" from=28 to=1
Apr 06 14:39:49 my-hostprometheus[19149]: level=info ts=2019-04-06T14:39:49.5201524Z caller=main.go:724 msg="Loading configuration file" filename=/etc/prometheus/prometheus-platform-config.yml
Apr 06 14:39:49 my-hostprometheus[19149]: ts=2019-04-06T14:39:49.5252961Z caller=dedupe.go:111 component=remote level=info queue=0:http://10.113.11.11:7201/api/v1/prom/remote/write msg="Stopping remote storage..."
Apr 06 14:39:49 my-hostprometheus[19149]: ts=2019-04-06T14:39:49.5887504Z caller=dedupe.go:111 component=remote level=info queue=0:http://10.113.11.11:7201/api/v1/prom/remote/write msg="Remote storage stopped."
Apr 06 14:39:49 my-hostprometheus[19149]: panic: close of closed channel
Apr 06 14:39:49 my-hostprometheus[19149]: goroutine 183 [running]:
``
@bismarck

This comment has been minimized.

Copy link
Author

bismarck commented Apr 10, 2019

We adjusted min_shards in the remote write config from 1 to 5 and the panics haven't occurred since.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.