-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Horcrux crashes and then cannot restart because of pid file #226
Comments
Thank you for the report! This is indeed due to an edge case bug in the new nonce pre-sharing mechanism in v3.2.0. We have a patch in #227 and will cut a v3.2.1 patch release ASAP. |
This edge case being triggered suggests that you might have a different issue as well though. You may want to increase your |
Is there any kind of documentation about these settings i can go over? also it seems like the PID mechanism needs a change, if the PID in the file dosent exist then a new instance should be started and the PID updated, as my understanding is the PID file is supposed to stop the running of two instances at the same time, this will allow the instance to recover from an un expected restart or an edge case like this one. |
Documentation for these parameters is here, but a fine-tuning guide would be a useful addition. I'd suggest determining the happy path minimum value for both:
I agree this would be a nice feature. The required manual removal of the pid file under unclean shutdown was intentional originally, but it is more painful than it is worth I think in operation |
The PID mechanism is good, but it just needs to add a check if the mentioned process exists or not, if it does not exist then it means it is safe to start a new process and update the PID in the file, its also possible in order to make sure there is no race condition to have it just delete the PID file and continue to shutdown so the next restart attempt will be successful (assuming this is running with some auto restart mechanism like systemd or docker restarts) |
Yes exactly, we don't want to remove the PID mechanism but we can delete the PID file on startup if (and only if) the process no longer exists by the PID within. That has been added to the PR with updated tests. |
On second thought the PID file is not necessary, if a new instance tries to run while one is already running it will fail to bind the port and crash anyways. I would remove the PID mechanism or at least provide a flag to disable it. |
I agree with you, @agouin correct us if we are wrong. In my setups I had added
and I have never had problems in the past |
Version v3.2.0
Here is the log for the crash
since this is a crash the PID file is not removed, and the service can not restart properly after the crash making the signer stuck until manual deletion of the PID file.
I have not seen this error happen in previous versions so I am assuming its a bug of the latest release, for now the only thing i can think of doing is downgrade to the previous version.
The text was updated successfully, but these errors were encountered: