New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ActiveStorage controllers leak ActiveRecord database connections #44242
Comments
In particular what confuses me about this is that this is happening in the redirect controllers for blobs and representations. They don't actually need to stream any data from my understanding, and it seems to be a by-product of the fact that activestorage::base controller includes the streaming concern |
They don't need to be streamed and by including ActiveStorage::Streaming, they spin up new threads which can cause problems with connection pools as detailed in rails#44242
I merged #44244 which should limit the impact a bit, but this |
Ok, so I had a quick look, I'm pretty sure the I think |
That being said rails/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb Lines 633 to 646 in bd41655
So ultimately this should work. I wonder if you problem isn't that since the newly spawned thread take a while to complete, you end up with much more active threads than you expect. I mean, since the original puma thread returns and is ready to pick up a new request, a single puma thread could end up spawning threads faster than they complete, so theoretically I don't think there's any limit to how many thread can be alive concurrently. Unless I'm missing something that's a big problem. It might make sense to use a thread pool with a max size for AC::Live so that you can at least right size your connection pool. e..g 5 puma thread and 10 AC::Live threads = 15 connections. |
Quick correction on the above after a quick chat with Matthew. The main thread always block on the "live" thread, so you can only have at most as many "live" threads than "puma" threads.
If we're right this means in the current state of things, you'd need to set the connection pool to twice the puma workers count. I'll see if I can figure something so it wouldn't be necessary. If anything it's the main thread that should check the connection back in so that the live thread can use it. But I'm likely missing something. |
I think a good solution to this would be to use a Fiber instead of a Thread. It would drastically simplify It's quite a lot of work though, so might take a while before I come up with a fix (if any). |
Indeed: however the reason why this doesn't show up as often though is because activerecord seems to clear the connection pool whenever there's no more room, and all the threads are dead: I tested this with setting the reaping_frequency to 0 and repeatedly calling the active storage redirect endpoint, and once it fills up with dead threads, it clears all those connections and starts again. So the request never actually fails. So the connection timeout error only occurred under a "moderate" load, presumably whenever the conditions were just right that one thread isn't notified that the pool has been cleared. Edit: didn't see your second comment Woops, but it lines up with what we're seeing |
They don't need to be streamed and by including ActiveStorage::Streaming, they spin up new threads which can cause problems with connection pools as detailed in rails#44242
They don't need to be streamed and by including ActiveStorage::Streaming, they spin up new threads which can cause problems with connection pools as detailed in #44242
Not sure if its 100% related, but i noticed the mentioned exception as well. I have a small app, which shows avatars for list of people at once (maybe 40 people) and i get quite a bit of could not obtain... errors there. Setup is puma with 5 threads, sidekiq 5 threads, connection pool 10 on postgres. update: currently i use disk storage |
Am I missing something or did 5d08b95 by @gmcgibbon not make it into 7.0.2.3? It's definitely not in the changelog https://github.com/rails/rails/blob/v7.0.2.3/activestorage/CHANGELOG.md. Is there something we can track to see when this gets released? (sorry if this is a stupid question - I don't quite understand the release processes 🙈). |
Not a stupid question! Your idea to check the changelog is definitely the best way to see if that change has been released. Regarding releases since the change was merged to |
That's what I suspected as well, thx @skipkayhil 🙇 Let's hope |
@tisba if that patch is important to you, you can point your Gemfile to the |
The streaming behaviour also impacts direct_uploads_controller.rb since it also inherits from ActiveStorage::BaseController |
Since 7.0.3 has been released, I think this issue here can finally be closed. |
5d08b95 only fixes the redirect controller cases. From my understanding threads spawned by ActionController::Live still leak in the general case, and the recommended fix was to move from threads to fibres. Perhaps a new issue should be created for this? |
This issue has been automatically marked as stale because it has not been commented on for at least three months. |
Is it possible to manually check connections back in at the end of a Live action? We have a non-ActiveStorage controller that uses ActionController::Live and have hundreds of dead connections left lying around. |
We've been noticing a lot of connection pool timeout errors in our application as of recent, all being thrown from the active storage controllers. I.e., the notorious
could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use
.After a bit of debugging we've noticed that every time the timeout occurs, the connection pool is bunged up with dead threads spawned by ActionController::Live:
Our connection pool is set to five and our puma workers are capped at five as well, and we aren't using any multithreaded code in our app or gems: Except for ActiveStorage, which spins up a thread to stream blobs back.
Should ActiveStorage/ActionController be checking in these threads before its thread dies?
Steps to reproduce
Here's an test file script that logs the connection pool every time a blob is requested. You can see it fill up over time with dead threads:
Actual behavior
Expected behavior
I would expect the threads to check back in their connection to the thread pool.
I presume this is happening because active storage is accessing some information on the blob records, which creates an implicit checkout.
System configuration
Rails version:
7.0.0
Ruby version:
3.0.3
The text was updated successfully, but these errors were encountered: