Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httpServe appears to be interfering with slave-thread #74

Open
bitemyapp opened this issue Jun 24, 2015 · 24 comments
Open

httpServe appears to be interfering with slave-thread #74

bitemyapp opened this issue Jun 24, 2015 · 24 comments

Comments

@bitemyapp
Copy link

Children aren't getting harvested on ctrl-c in GHCi, can see their output continue even though the web server itself successfully shut down.

I think slave-thread needs 1 a ThreadKilled 2 exception to bubble up to the parent thread (the one in main which invoked httpServe) in order for it to do its thing.

This looks like a good candidate for starting to narrow things down.

I'll try to put together a minimal repro in a bit.

Some context from IRC:

< mightybyte> One of our apps started exhibiting this behavior a few months back.
< bitemyapp> I think perhaps it is not letting the ThreadKilled bubble up.
< mightybyte> And I think it happened around the time that some code was added that probably forked some threads.
< mightybyte> So that all lines up with what you're saying.
@bitemyapp
Copy link
Author

@bitemyapp
Copy link
Author

Also :main vs. main makes no difference.

@gregorycollins
Copy link
Member

What's the desired behaviour? You want httpServe to rethrow ThreadKilled?

We can do that if necessary, but the repro example doesn't seem kosher to me. Up to this point the idea has been that the server loops forever until it gets ThreadKilled from outside, and then it exits. Rethrowing it seemed superfluous, since the only reason httpServe would exit is if it got a ThreadKilled.

Looking at the example:

do
    ...
    SlaveThread.fork canary
    _ <- try $ httpServe conf site :: IO (Either SomeException ())
    cleanup

I think the only reason you would get the ctrl-c semantics you want in normal operation is that the main thread is special and uncaught exceptions from there kill the whole program immediately. And really immediately -- child threads don't get gracefully killed in this scenario.

Anyways it should probably look like this:

do
    ...
    bracket (SlaveThread.fork canary) killThread $ const $ do
        void $ try $ httpServe conf site :: IO (Either SomeException ())
        cleanup   -- cleanup should be bracketed here too, btw!

I think if you write it in this style the issue wouldn't occur.

@mightybyte
Copy link
Member

Greg's bracket solution seems to work for me.

@gregorycollins
Copy link
Member

...and if you want to do it really really correctly, you should wait for the child thread to finish processing before you exit from main --- these days I would use the async package for this, your bracket cleanup task turns into \x -> cancel x >> wait x in that scenario.

@bitemyapp
Copy link
Author

@gregorycollins I would expect httpServe to rethrow ThreadKilled so that any cleanup/shutdown not strictly part of Snap can get the appropriate notice to do what it needs to do. I don't know why I would want my web server to hide SIGINT or similar signals.

This reproduction is not a perfect simulation of what we're doing in our application.

In our application, it's doing a SlaveThread.fork from the App init.

app :: SnapletInit App App
app = makeSnaplet "Server" "Server" Nothing $ do
...
    spawnAgent frequencyInMinutes f = liftIO $ SlaveThread.fork $ forever
        (threadDelay (frequencyInMinutes * 1000000 * 60) >>
            logExceptions f)

Snap preventing slave-thread from being able to do its thing when we're forking from within the httpServe invocation seems even worse than breaking the separately forked canary even though that should work fine as well.

Your suggested edit to the example doesn't fix the problem and isn't adaptable to the "real code". I could modify the example to be more like the real thing, but I don't think that'll help. I don't know the threadId of everything I forked via SlaveThread at the top-level where I fire off httpServe. The whole point of SlaveThread is, "when I die, kill my children". If you don't let the parent thread die, nobody that uses slave-thread can use Snap. Much of the point of slave-thread is to avoid using async for this particular problem.

@bitemyapp
Copy link
Author

@gregorycollins @mightybyte Would you be willing to make it stop hiding ThreadKilled? Hiding ThreadKilled is like catching and gobbling OutOfMemoryError on the JVM.

@gregorycollins
Copy link
Member

@bitemyapp I don't agree that ThreadKilled is in any way like OutOfMemoryError. ThreadKilled is a signal that the thread should clean itself up and exit, and that's what httpServe does. Semantically httpServe is bottom, i.e. it will never return unless it receives an async thread, so it "owns" the thread now and getting ThreadKilled forces it to exit. To me it's not cut and dried that rethrowing is preferable here, and changing this might break clients if they depend on the existing behavior.

The whole point of SlaveThread is, "when I die, kill my children".

I think you're misunderstanding the semantics here, read the implementation: if you call SlaveThread.fork from a normal thread, you don't get cleanup of children on exit -- and think about it, if you did, we wouldn't be having this discussion because your child threads would have killed themselves. That behaviour only happens for threads themselves forked by SlaveThread.fork (which makes sense, the forkIO'd thread has to be wrapped by the slave-thread cleanup routing).

Besides, like I said before, I don't think your finally handlers get a chance to run in the event of main-thread seppuku, i.e. you've been relying on unclean shutdown. A couple of questions before we consider changing this default:

  • does this work for you if you run the snap server from a forked slave-thread? You should get the slave-thread cleanup for the subthreads in this case.
  • in the short term why not try:
httpServe foo `finally` throwIO ThreadKilled

@bitemyapp
Copy link
Author

@gregorycollins I did try the latter, didn't fix it. I'm left with a few possibilities.

  1. GHCi is special (I don't think that applies here, but I could be wrong)
  2. The way Snap manages the thread-pool is causing slave-thread not to clean up after itself when the web server dies (the forks are from within the app init)

So, I think you're right, if my rethrow didn't fix it – changing it won't fix it either. Do you have any ideas as to why a SlaveThread.fork from App init wouldn't clean up when the server gets ctrl-c?

@gregorycollins
Copy link
Member

I mean, I wouldn't expect for those slave threads to be cleaned up, not unless

  1. the thread that forked them was itself run by SlaveThread.fork, or
  2. you explicitly killed them yourself using killThread in a finally or bracket block

@gregorycollins
Copy link
Member

Also a potential reason why the behaviour is different in ghci: you don't get the main-thread-seppuku "feature" there because the ghci interpreter thread is not the main thread.

@bitemyapp
Copy link
Author

@gregorycollins GHCi does still show the ThreadKilled exception, seems like GHCi shouldn't break that behavior but I don't know what it really does in this case.

I'm pretty sure your first item is impossible because it doesn't seem like Snap provides a way to control or configure thread pool management.

Second item is ~roughly what I was thinking I would have to do, but I don't have the ability to do this from outside httpServe without registering the ThreadIds somewhere.

@gregorycollins
Copy link
Member

Running the snap initializer in a slave-thread itself would mean that threads it directly forked (using SlaveThread.fork) would get cleaned up on exit. Of course, that just means that you'd have to write a latch in the main thread to wait for clean shutdown on ctrl-c there.

If the bracket idiom is too much of a PITA you can try ResourceT.

@bitemyapp
Copy link
Author

@gregorycollins I might try SlaveThread forking the httpServe and bracketing on an async value or something so I know when it ended.

@bitemyapp
Copy link
Author

@gregorycollins you kicking this around with me is helping clear up my confusion on this, thank you very much!

@bitemyapp
Copy link
Author

@gregorycollins I tested forking the httpServe invocation and cleanup via SlaveThread.fork and added a 10 second timeout to see if killing the thread would also kill the children -- it did not.

It seems right now like the most direct way to work around this is to have a manual registry of the task runner threads and harvest those on cleanup.

@gregorycollins
Copy link
Member

Yes, I think you should keep a list of the worker threads. Alternatively if you're forking this thread within a snaplet initializer then we provide

onUnload :: IO () -> Initializer b v ()

You can use this to kill the thread when the snaplet is torn down.

@bitemyapp
Copy link
Author

@gregorycollins this is a good alternative I'll keep in mind. Instead, we're catching exceptions from httpServe and performing cleanup in main.

In fact, there was a problem with using a lifted try from enclosed exceptions which you can see here.

I had to change it back to Control.Exception.try for it to work.

I just keep a TVar of a Services product type that tracks what services are live. They stash their ThreadId in the TVar on App Init. This works for now, but making it so users could provide their own fork for forkIO and forkOn in snap-server would make it feasible to use SlaveThread and Snap together.

@gregorycollins
Copy link
Member

Code using try here still smells bad -- you can get another async exception between httpServe finishing and your cleanup handler running. IMO plain old bracket is best, but if the nesting is too onerous then ResourceT would be the next recommendation.

To do this kind of thing properly you have to carefully mask exceptions at the right places and make sure you don't leave a cancellation point that could cause an exception to terminate your computation unexpectedly. This is easy to get wrong and I have definitely gotten it wrong many times in the past. ResourceT will largely automate this for you while allowing you to keep your straight-line monadic code.

@bitemyapp
Copy link
Author

@gregorycollins IIUC – I would be wrapping httpServe in bracket and putting all the cleanup logic in there? Seems reasonable.

@gregorycollins
Copy link
Member

More or less: you should bracket wherever you're creating a slave thread using forkIO. E.g.

bracket (forkIO slaveThread) killThread $ const $ httpServe ...

When using bracket and I have to allocate more than one thing that requires destruction, I usually define construction/destruction helper functions to do the work. It's tricky because a lot of times you need to use bracketOnError in the construction function to make sure either all of the values get created or non of them do.

@bitemyapp
Copy link
Author

@gregorycollins the try isn't for cleanup though, it's so we can log the error if one occurred. bracket doesn't do anything for us there AFAICT. The cleanup code is just piggy-backing off the fact that we already had to try so may as well do the cleanup after logging the error.

Is try still appropriate given that?

@gregorycollins
Copy link
Member

Personally I would wrap the bracket with a surrounding catch, since bracket will rethrow -- by the time you end up in your exception handler (where you can log) all of the resources should be cleaned up.

@bitemyapp
Copy link
Author

@gregorycollins seems sensible, I'll kick that around. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants