Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsidering shutdown behaviour #47

Open
istathar opened this issue Aug 2, 2014 · 2 comments
Open

Reconsidering shutdown behaviour #47

istathar opened this issue Aug 2, 2014 · 2 comments

Comments

@istathar
Copy link
Owner

istathar commented Aug 2, 2014

Redoing the plumbing of Vault and Daemon has led me to speculate that graceful shutdown is not something we need to support.

At present, the various signal handlers set the shutdown MVar, which then is observed by the daemon(s) running, which in theory should tell them to stop what they're doing. In practice they are blocked in a foreign ZMQ poll, and nothing will happen until that returns. And when it does, handleMessages will do the next piece of work (if there was one) and then loop back to checking the MVar. That could well be a while.

I suppose it's nice that it finishes the next piece of work (or the current one if it actually happens to be in one), but it's not really necessary.

We're worried about failure modes arising from abrupt termination, and in such cases we have no say whether the finalizers are run (the usual "crash-only software" notion). I'm about to write a watchdog thread that will terminate the program if getting close to the lock timeout. The only way that can affect is to System.Exit, hard.

Ceph side, a write will either complete or be lost. As far as I can tell, either is fine for us.

Meanwhile back to threads and asyncs, when you [or something like withAsync calls cancel, it just raises ThreadKilled on the target thread. That's just another exception to be bracketed and re-thrown, and likewise is at the mercy of a foreign call. So that foreign work (ie, rados write) will complete before the thread dies.

As implemented so far, I passed the Async back to the main program and have them to wait on — which sits there until cancelling on those Async(s) results in the the thread ending (wait emits the ThreadKilled which means you need to wrap catch around it. Bit of extra mess, but no problem).

What this all adds up to is that we don't need to pass around a MVar semaphore to signal shutdown; we can do it with cancel.

I'm going to think a bit more about it, but unless you want to talk about it more @christian-marie I'll rip out that MVar mess and go with a pure async implementation.

AfC

@christian-marie
Copy link
Contributor

The poll should only take 10ms, and waiting for the last recieved message to be processed was/still is the desired behaviour.

As for a watchdog, I'd put that in the withLock helpers in Daemon.hs, and they probably shouldn't System.Exit, they want to be a little more drastic than that.

I suggest:

import System.Posix.Signals
raiseSignal sigABRT

As for the child thread just recieving a killed signal being safe. Whilst it'll probably complete a foreign call, it certainly does not guarantee that an entire operation is complete. So we could have written one bucket, one latest file, etc, etc. In theory the system is resistant to this, but we really need to audit it.

@christian-marie
Copy link
Contributor

I'd be careful not to confuse two completely separate issues here:

  1. Behaviour on unexpected failure/loss of networking.
  2. Behaviour on shutdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants