Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActiveStorage controllers leak ActiveRecord database connections #44242

Closed
lukel97 opened this issue Jan 24, 2022 · 17 comments
Closed

ActiveStorage controllers leak ActiveRecord database connections #44242

lukel97 opened this issue Jan 24, 2022 · 17 comments

Comments

@lukel97
Copy link
Contributor

lukel97 commented Jan 24, 2022

We've been noticing a lot of connection pool timeout errors in our application as of recent, all being thrown from the active storage controllers. I.e., the notorious could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use.

After a bit of debugging we've noticed that every time the timeout occurs, the connection pool is bunged up with dead threads spawned by ActionController::Live:

{:size=>5, :connections=>5, :busy=>0, :dead=>5, :idle=>0, :waiting=>0, :checkout_timeout=>5.0}
#<Thread:0x00007f5d80ad97d0 /var/www/minm/shared/bundle/ruby/3.0.0/gems/actionpack-7.0.0/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007f5d8811c208 /var/www/minm/shared/bundle/ruby/3.0.0/gems/actionpack-7.0.0/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x0000558b8891b838 /var/www/minm/shared/bundle/ruby/3.0.0/gems/actionpack-7.0.0/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007f5d80b488d8 /var/www/minm/shared/bundle/ruby/3.0.0/gems/actionpack-7.0.0/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007f5d80811520 /var/www/minm/shared/bundle/ruby/3.0.0/gems/actionpack-7.0.0/lib/action_controller/metal/live.rb:340 dead>

Our connection pool is set to five and our puma workers are capped at five as well, and we aren't using any multithreaded code in our app or gems: Except for ActiveStorage, which spins up a thread to stream blobs back.

Should ActiveStorage/ActionController be checking in these threads before its thread dies?

Steps to reproduce

Here's an test file script that logs the connection pool every time a blob is requested. You can see it fill up over time with dead threads:

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  # Activate the gem you are reporting the issue against.
  gem "rails", "~> 7.0.0"
  gem "sqlite3"
end

require "rack/test"
require "active_record/railtie"
require "action_controller/railtie"
require "active_storage/engine"
require "tmpdir"

class TestApp < Rails::Application
  config.root = __dir__
  config.hosts << "example.org"
  config.eager_load = false
  config.session_store :cookie_store, key: "cookie_store_key"
  # config.active_record.legacy_connection_handling = false
  secrets.secret_key_base = "secret_key_base"

  # config.logger = Logger.new($stdout)
  # Rails.logger  = config.logger

  config.active_storage.service = :local
  config.active_storage.service_configurations = {
    local: {
      root: Dir.tmpdir,
      service: "Disk"
    }
  }
end


db_path = Rails.root.join('test.sqlite3')
File.delete(db_path) if File.exists? db_path
ENV["DATABASE_URL"] = "sqlite3:#{db_path}"

Rails.application.initialize!

Rails.application.routes.disable_clear_and_finalize = true  #add this line
Rails.application.routes.draw do
  get "/log_connections/:signed_id/*filename" => 'log_connections#show'
end

require ActiveStorage::Engine.root.join("db/migrate/20170806125915_create_active_storage_tables.rb").to_s

ActiveRecord::Schema.define do
  CreateActiveStorageTables.new.change

  create_table :users, force: true
end

class User < ActiveRecord::Base
  has_one_attached :profile
end

class TestController < ActionController::Base
  include Rails.application.routes.url_helpers

  def index
    render plain: "Home"
  end
end

class LogConnectionsController < ActiveStorage::Blobs::RedirectController
  skip_before_action :verify_authenticity_token
  include Rails.application.routes.url_helpers
  def show
    s = ""
    ActiveRecord::Base.connection_pool.connections.each { |c|
      if c.owner.nil?
        s += "---\n"
      else
        s += "#{c.owner.inspect}\n"
      end
    }
    s += "Current thread: #{Thread.current.inspect}\n"
    puts s
    super
  end
end

require "minitest/autorun"

class BugTest < Minitest::Test
  include Rack::Test::Methods
  def test_upload_and_download
    user = User.create!(
      profile: {
        content_type: "text/plain",
        filename: "dummy.txt",
        io: ::StringIO.new("dummy"),
      }
    )

    10.times do
      get "/log_connections/#{user.profile.signed_id}/#{user.profile.filename}"
    end
  end

  private
  def app
    Rails.application
  end
end

Actual behavior

---
---
---
---
#<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
Current thread: #<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
---
---
---
#<Thread:0x00007fac8de714b8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
#<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
Current thread: #<Thread:0x00007fac8de714b8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
---
---
#<Thread:0x00007fac8de41678 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
#<Thread:0x00007fac8de714b8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
Current thread: #<Thread:0x00007fac8de41678 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
---
#<Thread:0x00007fac8deaaf38 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
#<Thread:0x00007fac8de41678 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de714b8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
Current thread: #<Thread:0x00007fac8deaaf38 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
#<Thread:0x00007fac8df115a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>
#<Thread:0x00007fac8deaaf38 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de41678 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de714b8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
#<Thread:0x00007fac8de0a3a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 dead>
Current thread: #<Thread:0x00007fac8df115a8 /usr/local/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_controller/metal/live.rb:340 run>

Expected behavior

I would expect the threads to check back in their connection to the thread pool.
I presume this is happening because active storage is accessing some information on the blob records, which creates an implicit checkout.

System configuration

Rails version:
7.0.0
Ruby version:
3.0.3

@lukel97
Copy link
Contributor Author

lukel97 commented Jan 24, 2022

In particular what confuses me about this is that this is happening in the redirect controllers for blobs and representations. They don't actually need to stream any data from my understanding, and it seems to be a by-product of the fact that activestorage::base controller includes the streaming concern

@byroot byroot self-assigned this Jan 26, 2022
lukel97 added a commit to lukel97/rails that referenced this issue Jan 26, 2022
They don't need to be streamed and by including
ActiveStorage::Streaming, they spin up new threads which can cause
problems with connection pools as detailed in rails#44242
@byroot
Copy link
Member

byroot commented Jan 26, 2022

I merged #44244 which should limit the impact a bit, but this ActionController::Live thread leak seems concerning, I'll try to find some time to repro and investigate it.

@casperisfine
Copy link
Contributor

Ok, so I had a quick look, I'm pretty sure the ConnectionPool::Reaper will get these threads back into the pool eventually. Problem is the default reap frequency is 60 seconds, which is huge.

I think AC::Live should try to check the connection back in eagerly. It's a bit tricky because Action Controller isn't supposed to know much about Active Record and connections, but we can probably figure something out.

@casperisfine
Copy link
Contributor

That being said ConnectionPool#checkout does end up calling reap if there's no connection available:

def acquire_connection(checkout_timeout)
# NOTE: we rely on <tt>@available.poll</tt> and +try_to_checkout_new_connection+ to
# +conn.lease+ the returned connection (and to do this in a +synchronized+
# section). This is not the cleanest implementation, as ideally we would
# <tt>synchronize { conn.lease }</tt> in this method, but by leaving it to <tt>@available.poll</tt>
# and +try_to_checkout_new_connection+ we can piggyback on +synchronize+ sections
# of the said methods and avoid an additional +synchronize+ overhead.
if conn = @available.poll || try_to_checkout_new_connection
conn
else
reap
@available.poll(checkout_timeout)
end
end

So ultimately this should work. I wonder if you problem isn't that since the newly spawned thread take a while to complete, you end up with much more active threads than you expect.

I mean, since the original puma thread returns and is ready to pick up a new request, a single puma thread could end up spawning threads faster than they complete, so theoretically I don't think there's any limit to how many thread can be alive concurrently.

Unless I'm missing something that's a big problem. It might make sense to use a thread pool with a max size for AC::Live so that you can at least right size your connection pool. e..g 5 puma thread and 10 AC::Live threads = 15 connections.

@casperisfine
Copy link
Contributor

Quick correction on the above after a quick chat with Matthew. The main thread always block on the "live" thread, so you can only have at most as many "live" threads than "puma" threads.

Our connection pool is set to five and our puma workers are capped at five as well

If we're right this means in the current state of things, you'd need to set the connection pool to twice the puma workers count.

I'll see if I can figure something so it wouldn't be necessary. If anything it's the main thread that should check the connection back in so that the live thread can use it. But I'm likely missing something.

@casperisfine
Copy link
Contributor

I think a good solution to this would be to use a Fiber instead of a Thread. It would drastically simplify AC::Live and solve a lot of other wonky pitfalls.

It's quite a lot of work though, so might take a while before I come up with a fix (if any).

@lukel97
Copy link
Contributor Author

lukel97 commented Jan 26, 2022

Ok, so I had a quick look, I'm pretty sure the ConnectionPool::Reaper will get these threads back into the pool eventually. Problem is the default reap frequency is 60 seconds, which is huge.

Indeed: however the reason why this doesn't show up as often though is because activerecord seems to clear the connection pool whenever there's no more room, and all the threads are dead: I tested this with setting the reaping_frequency to 0 and repeatedly calling the active storage redirect endpoint, and once it fills up with dead threads, it clears all those connections and starts again. So the request never actually fails.

So the connection timeout error only occurred under a "moderate" load, presumably whenever the conditions were just right that one thread isn't notified that the pool has been cleared.

Edit: didn't see your second comment Woops, but it lines up with what we're seeing

public-rant pushed a commit to opensource-rant/rails that referenced this issue Feb 17, 2022
They don't need to be streamed and by including
ActiveStorage::Streaming, they spin up new threads which can cause
problems with connection pools as detailed in rails#44242
gmcgibbon pushed a commit that referenced this issue Feb 23, 2022
They don't need to be streamed and by including
ActiveStorage::Streaming, they spin up new threads which can cause
problems with connection pools as detailed in #44242
@Skulli
Copy link

Skulli commented Mar 30, 2022

Not sure if its 100% related, but i noticed the mentioned exception as well. I have a small app, which shows avatars for list of people at once (maybe 40 people) and i get quite a bit of could not obtain... errors there. Setup is puma with 5 threads, sidekiq 5 threads, connection pool 10 on postgres.
Found a similar description here:

https://discuss.rubyonrails.org/t/activerecord-connectiontimeouterror-as-soon-as-upgrading-from-rails-6-1-to-rails-7-what-changed-in-activerecord-or-activestorage/80276

update: currently i use disk storage

@tisba
Copy link

tisba commented Apr 8, 2022

Am I missing something or did 5d08b95 by @gmcgibbon not make it into 7.0.2.3? It's definitely not in the changelog https://github.com/rails/rails/blob/v7.0.2.3/activestorage/CHANGELOG.md.

Is there something we can track to see when this gets released? (sorry if this is a stupid question - I don't quite understand the release processes 🙈).

@skipkayhil
Copy link
Member

skipkayhil commented Apr 10, 2022

Is there something we can track to see when this gets released? (sorry if this is a stupid question - I don't quite understand the release processes 🙈).

Not a stupid question! Your idea to check the changelog is definitely the best way to see if that change has been released.

Regarding releases since the change was merged to 7-0-stable, I don't think there has been a release of 7-0-stable branch since 7.0.2 on Feb 8th. 7.0.2.1, 7.0.2.2, 7.0.2.3 were all security releases (which as far as I know means the security fix was added onto the previously released commit and no other changes were released). I would expect it to see it for sure in 7.0.3

@tisba
Copy link

tisba commented Apr 11, 2022

That's what I suspected as well, thx @skipkayhil 🙇 Let's hope 7.0.3 is not too far out.

@casperisfine
Copy link
Contributor

@tisba if that patch is important to you, you can point your Gemfile to the 7-0-stable branch until 7.0.3 is out.

@jprosevear
Copy link
Contributor

The streaming behaviour also impacts direct_uploads_controller.rb since it also inherits from ActiveStorage::BaseController

@tisba
Copy link

tisba commented May 17, 2022

Since 7.0.3 has been released, I think this issue here can finally be closed.

@lukel97
Copy link
Contributor Author

lukel97 commented May 21, 2022

5d08b95 only fixes the redirect controller cases. From my understanding threads spawned by ActionController::Live still leak in the general case, and the recommended fix was to move from threads to fibres. Perhaps a new issue should be created for this?

@rails-bot
Copy link

rails-bot bot commented Aug 19, 2022

This issue has been automatically marked as stale because it has not been commented on for at least three months.
The resources of the Rails team are limited, and so we are asking for your help.
If you can still reproduce this error on the 7-0-stable branch or on main, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.

@rails-bot rails-bot bot added the stale label Aug 19, 2022
@rails-bot rails-bot bot closed this as completed Aug 27, 2022
@jdelStrother
Copy link
Contributor

Is it possible to manually check connections back in at the end of a Live action? We have a non-ActiveStorage controller that uses ActionController::Live and have hundreds of dead connections left lying around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants