New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed-size thread pool leaks on connection recovery #97
Comments
Feel free to submit a PR once you are more certain :)
|
Will do! I actually ended up going with a workaround where I pass a custom connection = MarchHare.connect(..., executor_factory: CachedWorkaroundFactory.new(size)) and its implementation class CachedWorkaroundFactory
def initialize(size)
@pool = MarchHare::ThreadPools.fixed_of_size(size)
end
def call
@pool
end
end This tricks march hare into reusing the same thread pool, instead of shutting down a pool and then booting up another, which is generally faster and more efficient (shutting down the pool has the advantage that any submitted tasks may still take some time to finish so you may still have double the amount of threads you expected processing concurrently for a time). I am a bit unsure why exactly march hare doesn't do this by default and always creates a new pool. The code for this was added in 74ef8c7 in early 2014 so I'm guessing that at that time the pool was only ever created once (as there was no connection recovery) and that this decision was never revisited when connection recovery was added. Is that what happened, or am I missing some important detail in how the pool is used that means that it should not be reused in such a way? |
Reusing a pool is not generally safe.
|
Using a cached executor service is a better option than what you have: it won't run in most issues with thread pool reuse but inactive threads will be cleaned up after 60 seconds of inactivity. |
My guess is that we need to make sure consumer work pool is shut down before performing recovery. The problem is, there is no easy way to pick a timeout for the currently running threads to finish before forcing shutdown with |
I'm not sure reusing a pool is a problem in this case. According to https://www.rabbitmq.com/api-guide.html in the section "Consumer thread pool" they state that
So this means that the rabbitmq Java client will not "harm" or make the pool unusable in any way.
Since the pool is created by march hare using the factory, and unless someone implements a factory that leaks the pool in other ways, it means that march hare is the sole user of that pool. This is what led me to the "cached pool" workaround I suggested. |
See above. Both in the Java client and March Hare, it is not generally safe to reuse a pool because there can be things that are waiting to shut it down. This may or may not be a common occurrence in practice, that's why my comment above says "generally." Anyhow, the only solution I see at the moment is listed above. |
I'm still a bit unconvinced about the unsafety of reusing pools, so I've been looking at both the java rabbitmq implementation, and march_hare, in an attempt to confirm or disprove my thesis. This is my attempt to document my findings. So starting with the java rabbitmq client: How can a thread pool can be made invalid? Looking at the
Where are pools shut down?
The two lines of code above are all the And important detail here is that any threads already working on work items will not observe that the connection is gone and will still keep on working on those tasks. Note that this only happens to ongoing work, since new work is not going to be picked up, as that happens in the So, as far as the Java rabbimq library goes, I was able to find no evidence to suggest that any client directly using the Java rabbitmq library would find issues with reusing thread pools, other than the detail of, whenever a thread pool is reused, some number of threads in the pool may still be occupied with working on tasks from the previous connection, but will become available as soon as they finish them. So I am led to believe that if pools are made unusable, this will happen inside march_hare's codebase and not here, so the next steps in my investigation is find where that happens and from then decide how hard it would be to change that to make reuse a safe operation vs changing march_hare to always shut down the pool and start anew. (So, stay tuned for the next episode ;) ) |
Thanks for the fairly detailed investigation. I certainly don't expect the issue to be in the Java client. The goal should not be |
I'm a bit puzzled: can you expand on the why is the goal not to reuse? A single march hare connection would still have a single executor, it's just that the multiple underlying java rabbitmq connections that are being wrapped would share the same executor---e.g. different march hare connections for sure should have different executors, but conceptually having multiple executors for the same march hare connection seems unnecessary. Are you really against that approach? If I did the work to make sure it is safe, and submitted a PR, would you still not like it? I'm trying to understand the underlying design decision here, as I'm still a bit confused about this choice. |
Still confused about all this... If I have Please tell me is this intentional: require 'march_hare'
con = MarchHare.connect(
automatic_recovery: false,
uri: 'amqp://guest:guest@localhost',
thread_pool_size: 5 # this line prevents JVM termination
)
con.create_channel # note: I need this channel to be created in order to replicate
|
Consumer dispatch pool size is orthogonal to connection recovery. If you provide a custom executor to the Java client, you are responsible for shutting it down so that the JVM can terminate cleanly When automatic recovery isn't used, a connection termination should not necessarily result in a JVM termination. Use VisualVM or Discussions belong to the mailing list (rabbitmq-users is fine). |
@nbekirov your problem may be because march hare's thread pool uses non-daemon threads. On the JVM, threads can be daemon or non-daemon. The only difference is that the JVM terminates when all non-daemon threads finished. So this means that if/while march hare's threads are live, your app will not exit. |
Java client certainly allows you to provide a custom I will take a look at the state of those settings. IIRC you can even provide a custom |
When using the
thread_pool_size
option or specifying anexecutor_factory
, march hare creates a new pool every time it connects to rabbitmq.This is the right and expected behavior when the first connection is made, but when the connection is lost and march hare tries to retry it, then the
Session#automatically_recover
method calls eitherSession#new_uri_connection_impl
orSession#new_connection_impl
both of which create a new thread pool and a new connection.This means that the previous thread pool is never shutdown, and thus means that the system slowly leaks threads until it cannot allocate any more if there are intermittent network issues that make the recovery behavior kick in regularly.
A simple example that shows this is:
By executing this script and just quitting and restarting rabbitmq I see:
I believe the fix would be to call
maybe_shut_down_executor
after thesleep
on line https://github.com/ruby-amqp/march_hare/blob/master/lib/march_hare/session.rb#L249 but still need to investigate it a bit more before being sure.The text was updated successfully, but these errors were encountered: