Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Make PID file handling a little more robust. #6738
First, we now use
First, we now use AtomicWriter to write out PID files, which should make it more resilient to sudden failures. In addition, instead of blindly unwrapping the result of read_pid, which results a Result, we now check for the presence of an error and log it. Crucially, if read_pid returns an error, our branching logic returns None, which causes the corrupt PID file to get deleted and rewritten by the supervisor. Now, not only are the chances of having a 0 byte PID file reduced, but even if it does manage to occur, the supervisor will proceed along as though no PID file ever existed to begin with, rather than crashing. Signed-off-by: Josh Black <email@example.com>
christophermaier left a comment
One behavioral thing to note is that if the Supervisor restarts while leaving services up, and one of those services' PID files is corrupted or otherwise unreadable, the Supervisor will now try to repeatedly start the service again, even though a service is still running, because there isn't a way for it to truly synchronize its internal state with what's actually happening on the system. I don't think there's really anything that can be done in that case, though (and it's certainly preferable to crashing the Supervisor). Killing the old process will fix the situation.
A future refactoring where the Supervisor can get the PID information directly from the Launcher will definitely solve that issue, though.