Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus 2: sighup not handled gracefully during initialization phase #2699

Closed
mwitkow opened this Issue May 9, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@mwitkow
Copy link
Contributor

mwitkow commented May 9, 2017

What did you do?
Ran P2 from checkout ff0ee26.
We have an external job that SIGHUPs prometheus when config reloads. There is a potential race condition between the first SIGHUP and prometheus being ready. That bug was fixed in P1, but seems to be back in P2.

What did you see instead? Under which circumstances?

2017-05-09T14:00:15.768151000Z 2017/05/09 14:00:15 dinit: pid 37 finished: [/prometheus/prometheus -config.file=/prometheus.yml -log.level=info -query.staleness-delta=60s -query.max-concurrency=100 -query.timeout=2m -storage.local.path=/prometheus-data -storage.tsdb.appendable-blocks=2 -storage.tsdb.max-block-duration=36h -storage.tsdb.min-block-duration=2h -storage.tsdb.retention=360h -storage.tsdb.no-lockfile=true -web.external-url=http://hubmon2.h-te1.internal.improbable.io:9090/] with error: signal: hangup

The error: signal: hangup shouldn't happen. Note: this is after Prometheus starts with an existing volume and existing data, so the issue may be in the code initializing the TSDB.

@fabxc

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 9, 2017

Do you recall an issue for Prometheus 1.x on that? The signal handler registration has always been more towards the bottom. However, the storage was initialized asynchronously whereas TSDB does all initialization blocking in the constructor as it is generally expected to not take longer than a couple of seconds. (Unlike P1 crash recovery IIRC)
I'd generally like to stick with that as synchronous component initialization makes wiring everything together a bit easier to reason about.

Would just moving our signal handler registration to the start of main() fix it?

@mwitkow

This comment has been minimized.

Copy link
Contributor Author

mwitkow commented May 9, 2017

@fabxc There was definitely a bug related to crash-recovery and SIGHUPs, but my issue-search-fu is failing me.

It seems to me that moving at least the registration of a signal handler from:

// Wait for reload or termination signals. Start the handler for SIGHUP as
to above localstorage init will be both safe and would work around the issue.

mwitkow added a commit to improbable-io/prometheus that referenced this issue May 9, 2017

fabxc added a commit that referenced this issue May 10, 2017

@mwitkow

This comment has been minimized.

Copy link
Contributor Author

mwitkow commented May 10, 2017

Closed with the merge of #2700.

@mwitkow mwitkow closed this May 10, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.