New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
potential deadlock in DefaultApplicationLifecycle #7949
Comments
Just curious, is this something you encountered in a real Play app? I'd be ok with using the default thread pool here though. I can't see a benefit to using trampoline. |
Yeah it's fine to use the default thread pool here. The trampoline is a performance measure that's not really needed during shutdown. If we use the default thread pool we'd want to be careful about how it's shut down. It would need to be the last shutdown hook executed. Perhaps it would be safest for the shutdown code to make its own temporary thread or thread pool. |
I will be taking this.. :) |
@gmethvin, Yes, I did encounter this in a real app. We use a DB connection pool and, since we use a blocking driver, a dedicated thread pool that runs all the code that interacts with the DB. The thread pool registers its shutdown hook first, and then the connection pool registers its shutdown hook, and of course the latter runs the DB shutdown code in the aforementioned thread pool. This led to the issue I observed. Your remark about Play controlling the thread that |
@mberndt123 Sorry, that's not what I meant to say. The problem is that Play currently assumes it has the freedom to decide where the stop hooks are executed. The assumption is that you don't care where the thread that creates the future is executed, and if you had something that was sensitive to which thread pool it's in, you would return a Future that explicitly uses a specific thread pool. To be clear, I think this should be changed to use the default execution context. I'm just explaining the logic behind the way it was done originally. I would recommend we also update the documentation to specify that all stop hooks are executed in the default execution context. That way this behavior can be relied upon in the future. |
@gmethvin, IIRC Play's default execution context is the actor system's dispatcher. Since the actor system itself is also shut down using a stop hook, I doubt that's going to work. To me it seems that the stop hooks should run in the thread that calls the ApplicationLoader's load method when the application is started, since that is around before everything else. That would mean blocking that thread while it's waiting for the futures to complete – I guess that's acceptable in this case. |
Are you observing the shutdown hooks running in reverse order, because that's what should be happening? The DB shutdown code should run before the thread pool shutdown code, so the DB shutdown code should have a normal, functioning thread pool while it's shutting down. |
@richdougherty, I don't think it's useful to discuss the details of my application. I have written the above example code precisely to allow you to understand the problem without having to worry about any of that stuff, and I assure you that it captures the essence of what's going on in my application. I have also updated it just now to print some more useful messages and trigger the bug more consistently. If you need any more information to reproduce, understand or fix this, please let me know… |
Hi @mberndt123, thanks for the info you've provided so far. You have indeed clearly shown how there can be a dependency between the shutdown code for a thread pool and the thread pool itself, resulting in a deadlock. I think we can straightforwardly reproduce and fix the bug based on the code you showed above. My question was about something slightly different. Your message here suggests an interaction between two resources may be occurring, i.e. the thread pool should not shut down before the database is shut down. The way you phrased that suggested that the thread pool and database shutdown ordering is incorrect. I.e. the database depends on the thread pool but the thread pool shuts down first. That should not happen. The logic of shutdown hook registration and execution should prevent issues with shutting down a thread pool before a thread pool consumer has been shut down. If you're seeing issues with a thread pool shutting down before some resource that depends on it, then that's a subtly different bug for us to consider. If there's a bug like this then it wouldn't just be restricted to thread pools, and it wouldn't be fixed by running shutdown hooks on a safe threadpool. For example, this bug could occur if we shut down a cache before we shut down something depending on that cache - nothing to do with thread pools at all. If you could clarify what you meant in that message and whether you believe the shutdown hooks are running in the correct order then that would be helpful for us, so we can fix all the bugs in this area. |
Hi @richdougherty, I'm surprised you think so, because I don't think my phrasing suggests that. In any case, it is not what I meant, the shutdown hooks run in the correct order in my application. //edit: and by correct, I mean the reverse order of how they were registered |
Thanks for confirming. |
Hi there,
I ran into an interesting bug today. In
DefaultApplicationLifecycle
sstop()
method, there is a comment that reads as follows:The problem is, we actually do care if "one hook" is a hook that synchronously shuts down a thread pool (i. e. it waits for all threads to terminate) and "another hook" is a hook that executes in a thread in that thread pool. Then the hook that waits for all threads to terminate will itself run in the thread pool, leading to a deadlock.
If that isn't clear, here's some code to demonstrate the problem.
This will print "timeout" after 3 seconds.
The text was updated successfully, but these errors were encountered: