ActiveStorage controllers leak ActiveRecord database connections #44242

lukel97 · 2022-01-24T15:26:19Z

We've been noticing a lot of connection pool timeout errors in our application as of recent, all being thrown from the active storage controllers. I.e., the notorious could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use.

After a bit of debugging we've noticed that every time the timeout occurs, the connection pool is bunged up with dead threads spawned by ActionController::Live:

{:size=>5, :connections=>5, :busy=>0, :dead=>5, :idle=>0, :waiting=>0, :checkout_timeout=>5.0}
#<Thread:0x00007f5d80ad97d0 /var/www/minm/shared/bundle/ruby/3.0.0/gems/actionpack-7.0.0/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007f5d8811c208 /var/www/minm/shared/bundle/ruby/3.0.0/gems/actionpack-7.0.0/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x0000558b8891b838 /var/www/minm/shared/bundle/ruby/3.0.0/gems/actionpack-7.0.0/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007f5d80b488d8 /var/www/minm/shared/bundle/ruby/3.0.0/gems/actionpack-7.0.0/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007f5d80811520 /var/www/minm/shared/bundle/ruby/3.0.0/gems/actionpack-7.0.0/lib/action_controller/metal/live.rb:340 dead>

Our connection pool is set to five and our puma workers are capped at five as well, and we aren't using any multithreaded code in our app or gems: Except for ActiveStorage, which spins up a thread to stream blobs back.

Should ActiveStorage/ActionController be checking in these threads before its thread dies?

Steps to reproduce

Here's an test file script that logs the connection pool every time a blob is requested. You can see it fill up over time with dead threads:

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  # Activate the gem you are reporting the issue against.
  gem "rails", "~> 7.0.0"
  gem "sqlite3"
end

require "rack/test"
require "active_record/railtie"
require "action_controller/railtie"
require "active_storage/engine"
require "tmpdir"

class TestApp < Rails::Application
  config.root = __dir__
  config.hosts << "example.org"
  config.eager_load = false
  config.session_store :cookie_store, key: "cookie_store_key"
  # config.active_record.legacy_connection_handling = false
  secrets.secret_key_base = "secret_key_base"

  # config.logger = Logger.new($stdout)
  # Rails.logger  = config.logger

  config.active_storage.service = :local
  config.active_storage.service_configurations = {
    local: {
      root: Dir.tmpdir,
      service: "Disk"
    }
  }
end


db_path = Rails.root.join('test.sqlite3')
File.delete(db_path) if File.exists? db_path
ENV["DATABASE_URL"] = "sqlite3:#{db_path}"

Rails.application.initialize!

Rails.application.routes.disable_clear_and_finalize = true  #add this line
Rails.application.routes.draw do
  get "/log_connections/:signed_id/*filename" => 'log_connections#show'
end

require ActiveStorage::Engine.root.join("db/migrate/20170806125915_create_active_storage_tables.rb").to_s

ActiveRecord::Schema.define do
  CreateActiveStorageTables.new.change

  create_table :users, force: true
end

class User < ActiveRecord::Base
  has_one_attached :profile
end

class TestController < ActionController::Base
  include Rails.application.routes.url_helpers

  def index
    render plain: "Home"
  end
end

class LogConnectionsController < ActiveStorage::Blobs::RedirectController
  skip_before_action :verify_authenticity_token
  include Rails.application.routes.url_helpers
  def show
    s = ""
    ActiveRecord::Base.connection_pool.connections.each { |c|
      if c.owner.nil?
        s += "---\n"
      else
        s += "#{c.owner.inspect}\n"
      end
    }
    s += "Current thread: #{Thread.current.inspect}\n"
    puts s
    super
  end
end

require "minitest/autorun"

class BugTest < Minitest::Test
  include Rack::Test::Methods
  def test_upload_and_download
    user = User.create!(
      profile: {
        content_type: "text/plain",
        filename: "dummy.txt",
        io: ::StringIO.new("dummy"),
      }
    )

    10.times do
      get "/log_connections/#{user.profile.signed_id}/#{user.profile.filename}"
    end
  end

  private
  def app
    Rails.application
  end
end

Actual behavior

---
---
---
---
#<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
Current thread: #<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
---
---
---
#<Thread:0x00007fac8de714b8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
#<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
Current thread: #<Thread:0x00007fac8de714b8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
---
---
#<Thread:0x00007fac8de41678 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
#<Thread:0x00007fac8de714b8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
Current thread: #<Thread:0x00007fac8de41678 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
---
#<Thread:0x00007fac8deaaf38 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
#<Thread:0x00007fac8de41678 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de714b8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
Current thread: #<Thread:0x00007fac8deaaf38 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
#<Thread:0x00007fac8df115a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
#<Thread:0x00007fac8deaaf38 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de41678 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de714b8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
Current thread: #<Thread:0x00007fac8df115a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>

Expected behavior

I would expect the threads to check back in their connection to the thread pool.
I presume this is happening because active storage is accessing some information on the blob records, which creates an implicit checkout.

System configuration

Rails version:
7.0.0
Ruby version:
3.0.3

The text was updated successfully, but these errors were encountered:

lukel97 · 2022-01-24T16:32:03Z

In particular what confuses me about this is that this is happening in the redirect controllers for blobs and representations. They don't actually need to stream any data from my understanding, and it seems to be a by-product of the fact that activestorage::base controller includes the streaming concern

They don't need to be streamed and by including ActiveStorage::Streaming, they spin up new threads which can cause problems with connection pools as detailed in rails#44242

byroot · 2022-01-26T12:16:24Z

I merged #44244 which should limit the impact a bit, but this ActionController::Live thread leak seems concerning, I'll try to find some time to repro and investigate it.

casperisfine · 2022-01-26T13:03:23Z

Ok, so I had a quick look, I'm pretty sure the ConnectionPool::Reaper will get these threads back into the pool eventually. Problem is the default reap frequency is 60 seconds, which is huge.

I think AC::Live should try to check the connection back in eagerly. It's a bit tricky because Action Controller isn't supposed to know much about Active Record and connections, but we can probably figure something out.

casperisfine · 2022-01-26T13:18:18Z

That being said ConnectionPool#checkout does end up calling reap if there's no connection available:

rails/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb

Lines 633 to 646 in bd41655

    
           def acquire_connection(checkout_timeout) 
        
             # NOTE: we rely on <tt>@available.poll</tt> and +try_to_checkout_new_connection+ to 
        
             # +conn.lease+ the returned connection (and to do this in a +synchronized+ 
        
             # section). This is not the cleanest implementation, as ideally we would 
        
             # <tt>synchronize { conn.lease }</tt> in this method, but by leaving it to <tt>@available.poll</tt> 
        
             # and +try_to_checkout_new_connection+ we can piggyback on +synchronize+ sections 
        
             # of the said methods and avoid an additional +synchronize+ overhead. 
        
             if conn = @available.poll || try_to_checkout_new_connection 
        
               conn 
        
             else 
        
               reap 
        
               @available.poll(checkout_timeout) 
        
             end 
        
           end

So ultimately this should work. I wonder if you problem isn't that since the newly spawned thread take a while to complete, you end up with much more active threads than you expect.

I mean, since the original puma thread returns and is ready to pick up a new request, a single puma thread could end up spawning threads faster than they complete, so theoretically I don't think there's any limit to how many thread can be alive concurrently.

Unless I'm missing something that's a big problem. It might make sense to use a thread pool with a max size for AC::Live so that you can at least right size your connection pool. e..g 5 puma thread and 10 AC::Live threads = 15 connections.

casperisfine · 2022-01-26T13:35:47Z

Quick correction on the above after a quick chat with Matthew. The main thread always block on the "live" thread, so you can only have at most as many "live" threads than "puma" threads.

Our connection pool is set to five and our puma workers are capped at five as well

If we're right this means in the current state of things, you'd need to set the connection pool to twice the puma workers count.

I'll see if I can figure something so it wouldn't be necessary. If anything it's the main thread that should check the connection back in so that the live thread can use it. But I'm likely missing something.

casperisfine · 2022-01-26T13:59:09Z

I think a good solution to this would be to use a Fiber instead of a Thread. It would drastically simplify AC::Live and solve a lot of other wonky pitfalls.

It's quite a lot of work though, so might take a while before I come up with a fix (if any).

lukel97 · 2022-01-26T14:34:42Z

Ok, so I had a quick look, I'm pretty sure the ConnectionPool::Reaper will get these threads back into the pool eventually. Problem is the default reap frequency is 60 seconds, which is huge.

Indeed: however the reason why this doesn't show up as often though is because activerecord seems to clear the connection pool whenever there's no more room, and all the threads are dead: I tested this with setting the reaping_frequency to 0 and repeatedly calling the active storage redirect endpoint, and once it fills up with dead threads, it clears all those connections and starts again. So the request never actually fails.

So the connection timeout error only occurred under a "moderate" load, presumably whenever the conditions were just right that one thread isn't notified that the pool has been cleared.

Edit: didn't see your second comment Woops, but it lines up with what we're seeing

They don't need to be streamed and by including ActiveStorage::Streaming, they spin up new threads which can cause problems with connection pools as detailed in rails#44242

They don't need to be streamed and by including ActiveStorage::Streaming, they spin up new threads which can cause problems with connection pools as detailed in #44242

Skulli · 2022-03-30T06:39:02Z

Not sure if its 100% related, but i noticed the mentioned exception as well. I have a small app, which shows avatars for list of people at once (maybe 40 people) and i get quite a bit of could not obtain... errors there. Setup is puma with 5 threads, sidekiq 5 threads, connection pool 10 on postgres.
Found a similar description here:

https://discuss.rubyonrails.org/t/activerecord-connectiontimeouterror-as-soon-as-upgrading-from-rails-6-1-to-rails-7-what-changed-in-activerecord-or-activestorage/80276

update: currently i use disk storage

tisba · 2022-04-08T10:41:50Z

Am I missing something or did 5d08b95 by @gmcgibbon not make it into 7.0.2.3? It's definitely not in the changelog https://github.com/rails/rails/blob/v7.0.2.3/activestorage/CHANGELOG.md.

Is there something we can track to see when this gets released? (sorry if this is a stupid question - I don't quite understand the release processes 🙈).

skipkayhil · 2022-04-10T20:26:38Z

Is there something we can track to see when this gets released? (sorry if this is a stupid question - I don't quite understand the release processes 🙈).

Not a stupid question! Your idea to check the changelog is definitely the best way to see if that change has been released.

Regarding releases since the change was merged to 7-0-stable, I don't think there has been a release of 7-0-stable branch since 7.0.2 on Feb 8th. 7.0.2.1, 7.0.2.2, 7.0.2.3 were all security releases (which as far as I know means the security fix was added onto the previously released commit and no other changes were released). I would expect it to see it for sure in 7.0.3

tisba · 2022-04-11T07:01:30Z

That's what I suspected as well, thx @skipkayhil 🙇 Let's hope 7.0.3 is not too far out.

casperisfine · 2022-04-12T06:58:12Z

@tisba if that patch is important to you, you can point your Gemfile to the 7-0-stable branch until 7.0.3 is out.

jprosevear · 2022-04-16T11:50:01Z

The streaming behaviour also impacts direct_uploads_controller.rb since it also inherits from ActiveStorage::BaseController

tisba · 2022-05-17T16:14:02Z

Since 7.0.3 has been released, I think this issue here can finally be closed.

lukel97 · 2022-05-21T23:16:26Z

5d08b95 only fixes the redirect controller cases. From my understanding threads spawned by ActionController::Live still leak in the general case, and the recommended fix was to move from threads to fibres. Perhaps a new issue should be created for this?

rails-bot · 2022-08-19T23:18:12Z

This issue has been automatically marked as stale because it has not been commented on for at least three months.
The resources of the Rails team are limited, and so we are asking for your help.
If you can still reproduce this error on the 7-0-stable branch or on main, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.

jdelStrother · 2022-09-20T21:08:16Z

Is it possible to manually check connections back in at the end of a Live action? We have a non-ActiveStorage controller that uses ActionController::Live and have hundreds of dead connections left lying around.

lukel97 mentioned this issue Jan 24, 2022

Don't stream redirect controller responses #44244

Merged

byroot self-assigned this Jan 26, 2022

p8 added activerecord activestorage labels Feb 1, 2022

gmcgibbon pushed a commit that referenced this issue Feb 23, 2022

Don't stream redirect controller responses

5d08b95

They don't need to be streamed and by including ActiveStorage::Streaming, they spin up new threads which can cause problems with connection pools as detailed in #44242

rails-bot bot added the stale label Aug 19, 2022

rails-bot bot closed this as completed Aug 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ActiveStorage controllers leak ActiveRecord database connections #44242

ActiveStorage controllers leak ActiveRecord database connections #44242

lukel97 commented Jan 24, 2022 •

edited

lukel97 commented Jan 24, 2022 •

edited

byroot commented Jan 26, 2022

casperisfine commented Jan 26, 2022

casperisfine commented Jan 26, 2022

casperisfine commented Jan 26, 2022

casperisfine commented Jan 26, 2022

lukel97 commented Jan 26, 2022 •

edited

Skulli commented Mar 30, 2022 •

edited

tisba commented Apr 8, 2022

skipkayhil commented Apr 10, 2022 •

edited

tisba commented Apr 11, 2022

casperisfine commented Apr 12, 2022

jprosevear commented Apr 16, 2022

tisba commented May 17, 2022

lukel97 commented May 21, 2022

rails-bot bot commented Aug 19, 2022

jdelStrother commented Sep 20, 2022

ActiveStorage controllers leak ActiveRecord database connections #44242

ActiveStorage controllers leak ActiveRecord database connections #44242

Comments

lukel97 commented Jan 24, 2022 • edited

Steps to reproduce

Actual behavior

Expected behavior

System configuration

lukel97 commented Jan 24, 2022 • edited

byroot commented Jan 26, 2022

casperisfine commented Jan 26, 2022

casperisfine commented Jan 26, 2022

casperisfine commented Jan 26, 2022

casperisfine commented Jan 26, 2022

lukel97 commented Jan 26, 2022 • edited

Skulli commented Mar 30, 2022 • edited

tisba commented Apr 8, 2022

skipkayhil commented Apr 10, 2022 • edited

tisba commented Apr 11, 2022

casperisfine commented Apr 12, 2022

jprosevear commented Apr 16, 2022

tisba commented May 17, 2022

lukel97 commented May 21, 2022

rails-bot bot commented Aug 19, 2022

jdelStrother commented Sep 20, 2022

lukel97 commented Jan 24, 2022 •

edited

lukel97 commented Jan 24, 2022 •

edited

lukel97 commented Jan 26, 2022 •

edited

Skulli commented Mar 30, 2022 •

edited

skipkayhil commented Apr 10, 2022 •

edited