httpServe appears to be interfering with slave-thread #74

bitemyapp · 2015-06-24T20:25:20Z

Children aren't getting harvested on ctrl-c in GHCi, can see their output continue even though the web server itself successfully shut down.

I think slave-thread needs 1 a ThreadKilled 2 exception to bubble up to the parent thread (the one in main which invoked httpServe) in order for it to do its thing.

This looks like a good candidate for starting to narrow things down.

I'll try to put together a minimal repro in a bit.

Some context from IRC:

< mightybyte> One of our apps started exhibiting this behavior a few months back.
< bitemyapp> I think perhaps it is not letting the ThreadKilled bubble up.
< mightybyte> And I think it happened around the time that some code was added that probably forked some threads.
< mightybyte> So that all lines up with what you're saying.

The text was updated successfully, but these errors were encountered:

bitemyapp · 2015-06-24T20:50:52Z

Repro project: https://github.com/bitemyapp/snap-slave-thread-repro/blob/master/src/Main.hs#L34-L55 (lts-2.14)

Screenshot:

bitemyapp · 2015-06-24T21:29:17Z

Also :main vs. main makes no difference.

gregorycollins · 2015-06-25T05:06:21Z

What's the desired behaviour? You want httpServe to rethrow ThreadKilled?

We can do that if necessary, but the repro example doesn't seem kosher to me. Up to this point the idea has been that the server loops forever until it gets ThreadKilled from outside, and then it exits. Rethrowing it seemed superfluous, since the only reason httpServe would exit is if it got a ThreadKilled.

Looking at the example:

do
    ...
    SlaveThread.fork canary
    _ <- try $ httpServe conf site :: IO (Either SomeException ())
    cleanup

I think the only reason you would get the ctrl-c semantics you want in normal operation is that the main thread is special and uncaught exceptions from there kill the whole program immediately. And really immediately -- child threads don't get gracefully killed in this scenario.

Anyways it should probably look like this:

do
    ...
    bracket (SlaveThread.fork canary) killThread $ const $ do
        void $ try $ httpServe conf site :: IO (Either SomeException ())
        cleanup   -- cleanup should be bracketed here too, btw!

I think if you write it in this style the issue wouldn't occur.

mightybyte · 2015-06-25T14:17:36Z

Greg's bracket solution seems to work for me.

gregorycollins · 2015-06-25T17:26:26Z

...and if you want to do it really really correctly, you should wait for the child thread to finish processing before you exit from main --- these days I would use the async package for this, your bracket cleanup task turns into \x -> cancel x >> wait x in that scenario.

bitemyapp · 2015-06-25T19:08:05Z

@gregorycollins I would expect httpServe to rethrow ThreadKilled so that any cleanup/shutdown not strictly part of Snap can get the appropriate notice to do what it needs to do. I don't know why I would want my web server to hide SIGINT or similar signals.

This reproduction is not a perfect simulation of what we're doing in our application.

In our application, it's doing a SlaveThread.fork from the App init.

app :: SnapletInit App App
app = makeSnaplet "Server" "Server" Nothing $ do
...
    spawnAgent frequencyInMinutes f = liftIO $ SlaveThread.fork $ forever
        (threadDelay (frequencyInMinutes * 1000000 * 60) >>
            logExceptions f)

Snap preventing slave-thread from being able to do its thing when we're forking from within the httpServe invocation seems even worse than breaking the separately forked canary even though that should work fine as well.

Your suggested edit to the example doesn't fix the problem and isn't adaptable to the "real code". I could modify the example to be more like the real thing, but I don't think that'll help. I don't know the threadId of everything I forked via SlaveThread at the top-level where I fire off httpServe. The whole point of SlaveThread is, "when I die, kill my children". If you don't let the parent thread die, nobody that uses slave-thread can use Snap. Much of the point of slave-thread is to avoid using async for this particular problem.

bitemyapp · 2015-06-25T19:53:00Z

@gregorycollins @mightybyte Would you be willing to make it stop hiding ThreadKilled? Hiding ThreadKilled is like catching and gobbling OutOfMemoryError on the JVM.

gregorycollins · 2015-06-26T15:49:43Z

@bitemyapp I don't agree that ThreadKilled is in any way like OutOfMemoryError. ThreadKilled is a signal that the thread should clean itself up and exit, and that's what httpServe does. Semantically httpServe is bottom, i.e. it will never return unless it receives an async thread, so it "owns" the thread now and getting ThreadKilled forces it to exit. To me it's not cut and dried that rethrowing is preferable here, and changing this might break clients if they depend on the existing behavior.

The whole point of SlaveThread is, "when I die, kill my children".

I think you're misunderstanding the semantics here, read the implementation: if you call SlaveThread.fork from a normal thread, you don't get cleanup of children on exit -- and think about it, if you did, we wouldn't be having this discussion because your child threads would have killed themselves. That behaviour only happens for threads themselves forked by SlaveThread.fork (which makes sense, the forkIO'd thread has to be wrapped by the slave-thread cleanup routing).

Besides, like I said before, I don't think your finally handlers get a chance to run in the event of main-thread seppuku, i.e. you've been relying on unclean shutdown. A couple of questions before we consider changing this default:

does this work for you if you run the snap server from a forked slave-thread? You should get the slave-thread cleanup for the subthreads in this case.
in the short term why not try:

httpServe foo `finally` throwIO ThreadKilled

bitemyapp · 2015-06-26T16:36:09Z

@gregorycollins I did try the latter, didn't fix it. I'm left with a few possibilities.

GHCi is special (I don't think that applies here, but I could be wrong)
The way Snap manages the thread-pool is causing slave-thread not to clean up after itself when the web server dies (the forks are from within the app init)

So, I think you're right, if my rethrow didn't fix it – changing it won't fix it either. Do you have any ideas as to why a SlaveThread.fork from App init wouldn't clean up when the server gets ctrl-c?

gregorycollins · 2015-06-26T16:43:34Z

I mean, I wouldn't expect for those slave threads to be cleaned up, not unless

the thread that forked them was itself run by SlaveThread.fork, or
you explicitly killed them yourself using killThread in a finally or bracket block

gregorycollins · 2015-06-26T16:44:48Z

Also a potential reason why the behaviour is different in ghci: you don't get the main-thread-seppuku "feature" there because the ghci interpreter thread is not the main thread.

bitemyapp · 2015-06-26T16:48:12Z

@gregorycollins GHCi does still show the ThreadKilled exception, seems like GHCi shouldn't break that behavior but I don't know what it really does in this case.

I'm pretty sure your first item is impossible because it doesn't seem like Snap provides a way to control or configure thread pool management.

Second item is ~roughly what I was thinking I would have to do, but I don't have the ability to do this from outside httpServe without registering the ThreadIds somewhere.

gregorycollins · 2015-06-26T16:54:06Z

Running the snap initializer in a slave-thread itself would mean that threads it directly forked (using SlaveThread.fork) would get cleaned up on exit. Of course, that just means that you'd have to write a latch in the main thread to wait for clean shutdown on ctrl-c there.

If the bracket idiom is too much of a PITA you can try ResourceT.

bitemyapp · 2015-06-26T16:58:30Z

@gregorycollins I might try SlaveThread forking the httpServe and bracketing on an async value or something so I know when it ended.

bitemyapp · 2015-06-26T16:59:41Z

@gregorycollins you kicking this around with me is helping clear up my confusion on this, thank you very much!

bitemyapp · 2015-06-29T21:03:59Z

@gregorycollins I tested forking the httpServe invocation and cleanup via SlaveThread.fork and added a 10 second timeout to see if killing the thread would also kill the children -- it did not.

It seems right now like the most direct way to work around this is to have a manual registry of the task runner threads and harvest those on cleanup.

gregorycollins · 2015-07-01T16:41:52Z

Yes, I think you should keep a list of the worker threads. Alternatively if you're forking this thread within a snaplet initializer then we provide

onUnload :: IO () -> Initializer b v ()

You can use this to kill the thread when the snaplet is torn down.

bitemyapp · 2015-07-01T18:31:26Z

@gregorycollins this is a good alternative I'll keep in mind. Instead, we're catching exceptions from httpServe and performing cleanup in main.

In fact, there was a problem with using a lifted try from enclosed exceptions which you can see here.

I had to change it back to Control.Exception.try for it to work.

I just keep a TVar of a Services product type that tracks what services are live. They stash their ThreadId in the TVar on App Init. This works for now, but making it so users could provide their own fork for forkIO and forkOn in snap-server would make it feasible to use SlaveThread and Snap together.

gregorycollins · 2015-07-02T17:26:33Z

Code using try here still smells bad -- you can get another async exception between httpServe finishing and your cleanup handler running. IMO plain old bracket is best, but if the nesting is too onerous then ResourceT would be the next recommendation.

To do this kind of thing properly you have to carefully mask exceptions at the right places and make sure you don't leave a cancellation point that could cause an exception to terminate your computation unexpectedly. This is easy to get wrong and I have definitely gotten it wrong many times in the past. ResourceT will largely automate this for you while allowing you to keep your straight-line monadic code.

bitemyapp · 2015-07-04T21:49:44Z

@gregorycollins IIUC – I would be wrapping httpServe in bracket and putting all the cleanup logic in there? Seems reasonable.

gregorycollins · 2015-07-06T18:07:54Z

More or less: you should bracket wherever you're creating a slave thread using forkIO. E.g.

bracket (forkIO slaveThread) killThread $ const $ httpServe ...

When using bracket and I have to allocate more than one thing that requires destruction, I usually define construction/destruction helper functions to do the work. It's tricky because a lot of times you need to use bracketOnError in the construction function to make sure either all of the values get created or non of them do.

bitemyapp · 2015-07-06T18:31:07Z

@gregorycollins the try isn't for cleanup though, it's so we can log the error if one occurred. bracket doesn't do anything for us there AFAICT. The cleanup code is just piggy-backing off the fact that we already had to try so may as well do the cleanup after logging the error.

Is try still appropriate given that?

gregorycollins · 2015-07-07T17:29:22Z

Personally I would wrap the bracket with a surrounding catch, since bracket will rethrow -- by the time you end up in your exception handler (where you can log) all of the resources should be cleaned up.

bitemyapp · 2015-07-07T18:35:06Z

@gregorycollins seems sensible, I'll kick that around. Thank you!

bitemyapp mentioned this issue Jun 29, 2015

export slaves nikita-volkov/slave-thread#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

httpServe appears to be interfering with slave-thread #74

httpServe appears to be interfering with slave-thread #74

bitemyapp commented Jun 24, 2015

bitemyapp commented Jun 24, 2015

bitemyapp commented Jun 24, 2015

gregorycollins commented Jun 25, 2015

mightybyte commented Jun 25, 2015

gregorycollins commented Jun 25, 2015

bitemyapp commented Jun 25, 2015

bitemyapp commented Jun 25, 2015

gregorycollins commented Jun 26, 2015

bitemyapp commented Jun 26, 2015

gregorycollins commented Jun 26, 2015

gregorycollins commented Jun 26, 2015

bitemyapp commented Jun 26, 2015

gregorycollins commented Jun 26, 2015

bitemyapp commented Jun 26, 2015

bitemyapp commented Jun 26, 2015

bitemyapp commented Jun 29, 2015

gregorycollins commented Jul 1, 2015

bitemyapp commented Jul 1, 2015

gregorycollins commented Jul 2, 2015

bitemyapp commented Jul 4, 2015

gregorycollins commented Jul 6, 2015

bitemyapp commented Jul 6, 2015

gregorycollins commented Jul 7, 2015

bitemyapp commented Jul 7, 2015

httpServe appears to be interfering with slave-thread #74

httpServe appears to be interfering with slave-thread #74

Comments

bitemyapp commented Jun 24, 2015

bitemyapp commented Jun 24, 2015

bitemyapp commented Jun 24, 2015

gregorycollins commented Jun 25, 2015

mightybyte commented Jun 25, 2015

gregorycollins commented Jun 25, 2015

bitemyapp commented Jun 25, 2015

bitemyapp commented Jun 25, 2015

gregorycollins commented Jun 26, 2015

bitemyapp commented Jun 26, 2015

gregorycollins commented Jun 26, 2015

gregorycollins commented Jun 26, 2015

bitemyapp commented Jun 26, 2015

gregorycollins commented Jun 26, 2015

bitemyapp commented Jun 26, 2015

bitemyapp commented Jun 26, 2015

bitemyapp commented Jun 29, 2015

gregorycollins commented Jul 1, 2015

bitemyapp commented Jul 1, 2015

gregorycollins commented Jul 2, 2015

bitemyapp commented Jul 4, 2015

gregorycollins commented Jul 6, 2015

bitemyapp commented Jul 6, 2015

gregorycollins commented Jul 7, 2015

bitemyapp commented Jul 7, 2015