Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something odd with Puma 6.0.0 / 6.0.1 #2998

Open
vitalinfo opened this issue Oct 24, 2022 · 46 comments
Open

Something odd with Puma 6.0.0 / 6.0.1 #2998

vitalinfo opened this issue Oct 24, 2022 · 46 comments

Comments

@vitalinfo
Copy link

vitalinfo commented Oct 24, 2022

Info for 6.0.1: #2998 (comment)


I don't have any clue what and where is going wrong, but just can to share our New Relic graphs (since 3 hours, since 1 hour)
when graph back to normal I've just downgraded back to Puma 5.6.5
@nateberkopec any thoughts or ideas what could went wrong and had such impact on the app?
Thanks

image

image

@nateberkopec
Copy link
Member

Probably not the problem, but can you try again with wait_for_less_busy_worker false?

@vitalinfo
Copy link
Author

vitalinfo commented Oct 25, 2022

@nateberkopec we used to wait_for_less_busy_worker for Puma 5 with value 0.01 (I haven't changed config for Puma 6). Does it make sense to make a try with false in this case?
our current config:

require 'barnes'

workers Integer(ENV['WEB_CONCURRENCY'] || 2)
threads_count = Integer(ENV['MAX_THREADS'] || 5)
threads threads_count, threads_count

# NOTE: Production testing at GitLab suggests values between 0.001 and 0.010 are best
wait_for_less_busy_worker(ENV['PUMA_WAIT_FOR_LESS_BUSY_WORKER'].to_f) unless ENV['PUMA_WAIT_FOR_LESS_BUSY_WORKER'].nil?
fork_worker
nakayoshi_fork

preload_app!

port        ENV['PORT']     || 80
environment ENV['RACK_ENV'] || 'production'

lowlevel_error_handler do |e, env|
  Airbrake.notify(e, env) if defined?(Airbrake)
  [500, {}, ["An error has occurred, and engineers have been informed.
              Please reload the page. If you continue to have problems, contact support\n"]]
end

before_fork do
  Barnes.start
end

where PUMA_WAIT_FOR_LESS_BUSY_WORKER is 0.01
and of course nakayoshi_fork has been removed for Puma 6

@dentarg
Copy link
Member

dentarg commented Oct 25, 2022

Related to #2999 perhaps?

@dentarg
Copy link
Member

dentarg commented Oct 25, 2022

Or #3000?

@nateberkopec
Copy link
Member

@vitalinfo Thanks. You're right, we didn't change the implementation of wait_for_less_busy worker in 6.0 so if it was already on for you in Puma 5, there's no need to change the setting.

I agree with @dentarg it could be related to #2999 or #3000 though. An application that doesn't properly return a response quickly could easily cause the request queueing noticed here.

@simonc
Copy link

simonc commented Nov 5, 2022

Hi there,

I can confirm there's something going on. After upgrading a bunch of gems, we started to get huge memory spikes. Since each gem was upgraded in a separate commit (thanks dependabot) I was able to track the faulty commit using git bisect, deploying, waiting for stats, rince & repeat. The commit that I ended up on was the Puma upgrade.

Screenshot 2022-11-05 at 10 12 13

Everytime Puma 6 was part of the upgraded gems, the memory went wild.

@dentarg
Copy link
Member

dentarg commented Nov 5, 2022

@simonc @vitalinfo can you see if 3d33475 (#3002) resolves the issue for you?

@simonc
Copy link

simonc commented Nov 6, 2022

@dentarg So far (2h) running smoothly ☺️ I'll leave it running for 24h and see if it continues this way. I'll keep you posted.

@GMorris-professional
Copy link

We noticed exactly the same issue. @simonc did the proposed fix resolve your issue? :)

@simonc
Copy link

simonc commented Nov 7, 2022

@GMorris-professional Not 24h later, but long enough, and it's still running normally, no huge memory spike or anything 😊

@dentarg
Copy link
Member

dentarg commented Nov 12, 2022

Closing as #3002 that seems to address this was merged (this issue was probably the same as #2999 that was also closed by #3002)

@vitalinfo
Copy link
Author

@dentarg
unfortunately doesn't help for me, tried 6.0.1 today on production

image

image

image

@vitalinfo
Copy link
Author

@dentarg after Puma revert to 5.6.5
image
image
image

@MSP-Greg
Copy link
Member

MSP-Greg commented Jan 3, 2023

@vitalinfo

Thank you for the report. I'm not sure how best to proceed. Your app has problems, but others (like #3022 (comment)) may not. So, any additional info would helpful. Maybe your Gemfile.lock file, any particular features you may or may not be using (both the app & puma), etc.

Puma v6 has a handful of new tests that that are a bit more 'real world'. Obviously, more are needed.

@vitalinfo
Copy link
Author

@MSP-Greg here is our Gemfile

# frozen_string_literal: true

source 'https://rubygems.org'

ruby '3.1.3'

gem 'rails', '>= 7'

gem 'bootsnap'

gem 'pg'
gem 'sass-rails'
gem 'uglifier'

# Pages & CSS
gem 'bootstrap-sass'
gem 'browser'
gem 'haml-rails'
gem 'page_title_helper'
gem 'possessive'
gem 'sprockets'
gem 'tinymce-rails', '< 6'

# patterns
gem 'draper'
gem 'interactor'
gem 'interactor-rails'
gem 'resource_concern'

# activerecord
gem 'activerecord-import'
gem 'activerecord_touchy'
gem 'active_record_union'
gem 'acts_as_downcasable_on'
gem 'acts_as_paranoid', git: 'https://github.com/vitalinfo/acts_as_paranoid'
gem 'acts_as_strippable_on'
gem 'acts-as-taggable-on'
gem 'after_commit_action'
gem 'after_commit_state'
gem 'bcrypt_pbkdf'
gem 'counter_culture'
gem 'ed25519'
gem 'friendly_id'
gem 'net-sftp', '~> 4.0.0.rc1'
# TODO: upgrade to the 7.0.1 or latest after the net-sftp 4.0.0 release
gem 'net-ssh', '~> 7.0.0.beta1', require: 'net/ssh'
gem 'nilify_blanks'
gem 'pg_search', git: 'https://github.com/Casecommons/pg_search' # NOTE: switch back to gem, with version > 2.3.6
gem 'strong_migrations'
gem 'validates_unchangeable'
gem 'validate_url'
gem 'where_exists'

gem 'data_migrate', git: 'https://github.com/ilyakatz/data-migrate'

# javascript
gem 'bootstrap-datepicker-rails', '< 1.7' # while has deprecations
gem 'bootstrap-timepicker-rails-addon', git: 'https://github.com/vitalinfo/bootstrap-timepicker-rails-addon'
gem 'bootstrap-tooltip-rails'
gem 'cocoon'
gem 'coffee-rails'
gem 'font-awesome-rails', git: 'https://github.com/bokmann/font-awesome-rails' # NOTE: Rails 6.1 support
gem 'jbuilder'
gem 'jquery-rails'
gem 'jquery-ui-rails'
gem 'js-routes'
gem 'turbolinks'
gem 'webpacker'

# Users
gem 'devise', git: 'https://github.com/plataformatec/devise'
gem 'omniauth'
gem 'omniauth-apple'
gem 'omniauth-classlink'
gem 'omniauth-clever', git: 'https://github.com/ClassTagInc/omniauth-clever'
gem 'omniauth-facebook'
gem 'omniauth-google-oauth2'
gem 'omniauth-microsoft_graph', git: 'https://github.com/vitalinfo/omniauth-microsoft_graph'
gem 'pundit'

# Integrations
gem 'aws-sdk-lambda'
gem 'aws-sdk-mediaconvert'
gem 'aws-sdk-pinpoint'
gem 'aws-sdk-s3'
gem 'clever-ruby', '>= 3.0.1', git: 'https://github.com/ClassTagInc/clever-ruby', branch: 'api-v3'
gem 'faraday'
gem 'geocoder'
gem 'google-api-client'
gem 'googleauth'
gem 'google-cloud-translate-v3'
gem 'griddler-sendgrid'
gem 'grpc'
gem 'intercom'
gem 'net-ftp'
gem 'net-imap', require: false
gem 'net-pop', require: false
gem 'net-smtp', require: false
gem 'plivo', '> 4'
gem 'rack-tracker'
gem 'sendgrid-ruby'
gem 'shopify_api'
gem 'shopify_api_retry', git: 'https://github.com/ClassTagInc/shopify_api_retry.git'
gem 'smartystreets_ruby_sdk'
source 'https://gems.contribsys.com/' do
  gem 'sidekiq-pro'
end
gem 'sidekiq'
gem 'sidekiq-heroku-scaler'
gem 'sidekiq-scheduler'
gem 'stripe'
gem 'yard'

# Media
gem 'refile', require: 'refile/rails', git: 'https://github.com/vitalinfo/refile'
gem 'refile_cache', git: 'https://github.com/vitalinfo/refile_cache'
gem 'refile-mini_magick'
gem 'refile-s3', require: false, git: 'https://github.com/vitalinfo/refile-s3'
gem 'sinatra'

gem 'httparty'
gem 'icalendar'

gem 'admino'
gem 'kaminari' # paging gem
gem 'paper_trail' # track changes in models
gem 'responders'
gem 'simple_form', git: 'https://github.com/vitalinfo/simple_form'
gem 'timecop'

gem 'american_date', git: 'https://github.com/vitalinfo/ruby-american_date'
gem 'chronic'
gem 'groupdate'
gem 'phonelib'
gem 'retriable'
gem 'ziptz'

gem 'capybara', git: 'https://github.com/teamcapybara/capybara' # NOTE: support Puma 6
gem 'faker'
gem 'rack-canonical-hostname', require: 'rack/canonical_host'
gem 'rack-cors', require: 'rack/cors'
gem 'rails_autolink'
gem 'rails-settings-cached'
gem 'rqrcode'
gem 'selenium-webdriver'
gem 'shortener', git: 'https://github.com/jpmcgrath/shortener', branch: 'develop'
gem 'wicked_pdf'

# Gems for command line debugging
gem 'timezone'

# Gems for Stores
gem 'vacuum'
gem 'walmart_open'

group :development, :test, :staging do
  gem 'bullet'
  gem 'factory_bot_rails'
  gem 'graphiql-rails'
end

group :development, :test do
  gem 'awesome_print'
  gem 'bundler-audit', require: false
  gem 'dotenv-rails', git: 'https://github.com/bkeepers/dotenv'
  gem 'pry'
  gem 'rspec-context-private'
  gem 'rspec-rails', require: false
  gem 'rubocop-git', require: false
  gem 'rubocop-graphql', require: false
  gem 'rubocop-performance'
  gem 'rubocop-rails'
  gem 'rubocop-rspec', require: false
end

group :test do
  gem 'capybara-slow_finder_errors'
  gem 'climate_control'
  gem 'database_cleaner'
  gem 'fakeredis', git: 'https://github.com/vitalinfo/fakeredis'
  gem 'percy-capybara'
  gem 'pry-stack_explorer'
  gem 'puffing-billy'
  gem 'rack_session_access'
  gem 'rails-controller-testing'
  gem 'rspec-collection_matchers'
  gem 'rspec_junit_formatter'
  gem 'rspec-retry'
  gem 'rspec-wait'
  gem 'shoulda-matchers'
  gem 'vcr', git: 'https://github.com/vcr/vcr'
  gem 'webmock'
end

group :development do
  gem 'annotate'
  gem 'better_errors'
  gem 'binding_of_caller'
  gem 'coffeelint'
  gem 'derailed'
  gem 'haml-lint'
  gem 'launchy'
  gem 'letter_opener'
  gem 'listen'
  gem 'parallel_tests'
  gem 'web-console'

  gem 'flamegraph'
  gem 'memory_profiler'
  gem 'rack-mini-profiler', require: false
  gem 'rails_best_practices', require: false
  gem 'stackprof'

  gem 'wkhtmltopdf-binary', git: 'https://github.com/vitalinfo/wkhtmltopdf_binary_gem' # NOTE: Support Ubuntu 22
end

group :production, :staging do
  gem 'newrelic_rpm', '< 8.10'

  gem 'airbrake'
  gem 'nokogiri' # for premailer
  gem 'premailer-rails',
      git: 'https://github.com/vitalinfo/premailer-rails', branch: 'chores/rails-7' # NOTE: Rails 7 support
  gem 'puma', '< 6'
  gem 'rails_12factor'
  gem 'wkhtmltopdf-heroku',
      git: 'https://github.com/rposborne/wkhtmltopdf-heroku/', branch: '2.12.6.1-jammy' # NOTE: Stack-22 support
end

gem 'barnes'
gem 'connection_pool'
gem 'dalli'
gem 'graphql-batch'
gem 'graphql-client', '< 0.17' # NOTE: because of shopify_api
gem 'rack-timeout'
gem 'redis'
gem 'rubyzip'
gem 'zip-zip'

# NOTE: Ruby 2.7 Support
gem 'e2mmap'
gem 'thwait'

# NOTE: Ruby 3.0 Support
gem 'webrick'

gem 'safely_block'

gem 'github_api', require: false, git: 'https://github.com/vitalinfo/github', branch: 'fix-hashie-dependency'

I don't know, what other information could be helpful, puma config I've posted above, we are using ActionCable and don't use heroku autoscaling, but use Hirefire

if you need anything else, please let me know

@MSP-Greg
Copy link
Member

MSP-Greg commented Jan 3, 2023

Thanks.

we are using ActionCable

I'll see what I can find...

@MSP-Greg
Copy link
Member

MSP-Greg commented Jan 3, 2023

@vitalinfo

There is some code in Puma that I found odd in both v5 & v6 related to the handling of '101 Switching Protocols' responses. Since no one had problems with v5, I thought I'd take another look after the holidays.

Two questions:

  1. Which version of Rack is being used? Are both using v3, v2, or mixed?
  2. Can you test a patch?

@vitalinfo
Copy link
Author

@MSP-Greg

  1. Rack v2
  2. we can try

@dentarg dentarg reopened this Jan 4, 2023
@dentarg dentarg changed the title Something odd with Puma 6.0.0 Something odd with Puma 6.0.0 / 6.0.1 Jan 4, 2023
@dentarg
Copy link
Member

dentarg commented Jan 4, 2023

@vitalinfo can you try to reproduce your issue outside of your application? in an app anyone can run

@vitalinfo
Copy link
Author

@dentarg I think this's a problem, because it doesn't repro for us on staging, only on production, where the throughput much higher

@dentarg
Copy link
Member

dentarg commented Jan 4, 2023

If you can't provide a repro I think you need to dig into your production environment and try to understand more what's going on. The puma open source project can't really do that or offer that level of support.

Sure, Greg may provide a patch for you to test (not something you should expect really) but it would be great for everyone to be able to understand what's going on for real instead of us guessing.

@MSP-Greg
Copy link
Member

MSP-Greg commented Jan 4, 2023

Greg may provide a patch for you

Not sure about that yet. Since our CI is passing, and ActionCable seems to be using Puma v6, not sure what the issue is. It's been a while since I've runup the Rails test suite, but I hope to do so very soon.

Re ActionCable, there's a few ways to set up WebSocket connections.

Currently, I think most apps are setup such that when they receive a Get request with websocket upgrade headers, the app grabs the socket and Puma is 'removed'. So the app is handling the '101' response itself.

The app can also return headers for a '101' response, in which case Puma should send the response with the headers returned by the app, and then hand the socket over to the app.

I don't believe that Puma is handling the second case correctly, but that's been the case for both v6 and v5 (haven't checked older versions). I have code for a fix, but I need to write tests, etc. Currently, a '101' response is considered a 'no_body' response, so the socket isn't handed over to the app. I.think...

@johnnyshields
Copy link
Contributor

Would be interesting to see object allocations and memory usage, which would indicate if there is a memory leak causing this. Scout APM has a good tool for it, you could sign up for a free trial and check it out.

@johnnyshields
Copy link
Contributor

@vitalinfo have you been able to take any further steps to isolate the issue?

@vitalinfo
Copy link
Author

@johnnyshields not yet, it's in my plan for Feb

@johnnyshields
Copy link
Contributor

johnnyshields commented Mar 1, 2023

@vitalinfo did you manage to spend some further time on this? Again would recommend Scout APM (they have a 14 day free trial) which can be really useful for pinpointing memory issues, speaking from first-hand experience.

@dentarg
Copy link
Member

dentarg commented Mar 31, 2023

@vitalinfo any news?

@atiqueakhtar
Copy link

atiqueakhtar commented Apr 3, 2023

Hi @dentarg ,

We recently upgraded our Puma version to 6.0.2 from 5.6.5, and everything appeared to be working fine on our staging environment. However, after reviewing @vitalinfo 's comment (#2998 (comment)) regarding their production app, we are hesitant about deploying Puma 6.0.2 on our own production app until we can identify the root cause of the issue.

Would you advise waiting until we have further information on @vitalinfo 's application before proceeding with the deployment?

@johnnyshields
Copy link
Contributor

Same here ^^

@dentarg
Copy link
Member

dentarg commented Apr 3, 2023

I would advise you to make your own decisions :-)

@simonc
Copy link

simonc commented Apr 3, 2023

I really don't know about v6.0.x but we've been running on v6.1 for two months now and it's running smoothly 😉

@vitalinfo
Copy link
Author

@dentarg sorry for delay, today I tried one more time on production, but with the almost latest version of Puma 6.2.1
New-Relic graphs looks pretty promising and server works fine
image

but I've faced with another problem OpenSSL::SSL::SSLError
image

the exceptions start right after new Puma deploy and stops once I've reverted it to version 5

we are using Redis 6.2 on Heroku and looks like it configured correctly according to their docs https://devcenter.heroku.com/articles/connecting-heroku-redis#connecting-in-ruby
also we have OpenSSL::SSL::VERIFY_NONE for ActionCable connection as well

production:
  adapter: redis
  url: <%= ENV['REDIS_URL'] %>
  ssl_params:
    verify_mode: <%= OpenSSL::SSL::VERIFY_NONE %>

full backtrace

vendor/ruby-3.2.2/lib/ruby/3.2.0/openssl/[buffering.rb](https://github.com/ClassTagInc/classtag_web/tree/master/vendor/ruby-3.2.2/lib/ruby/3.2.0/openssl/buffering.rb#L415):415 in syswrite_nonblock
413:
  def write_nonblock(s, exception: true)
414:
    flush
415:
    syswrite_nonblock(s, exception: exception)
416:
  end
417:
vendor/ruby-3.2.2/lib/ruby/3.2.0/openssl/[buffering.rb](https://github.com/ClassTagInc/classtag_web/tree/master/vendor/ruby-3.2.2/lib/ruby/3.2.0/openssl/buffering.rb#L415):415 in write_nonblock
413:
  def write_nonblock(s, exception: true)
414:
    flush
415:
    syswrite_nonblock(s, exception: exception)
416:
  end
417:
/gems/redis-4.8.1/lib/redis/connection/ruby.rb:78 in block in write
/gems/redis-4.8.1/lib/redis/connection/ruby.rb:77 in loop
/gems/redis-4.8.1/lib/redis/connection/ruby.rb:77 in write
/gems/redis-4.8.1/lib/redis/connection/ruby.rb:378 in write
/gems/redis-4.8.1/lib/redis/client.rb:320 in block in write
/gems/redis-4.8.1/lib/redis/client.rb:299 in io
/gems/redis-4.8.1/lib/redis/client.rb:318 in write
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/redis.rb:182 in send_command
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/redis.rb:177 in unsubscribe
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/redis.rb:140 in block (2 levels) in remove_channel
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/redis.rb:160 in when_connected
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/redis.rb:140 in block in remove_channel
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/redis.rb:139 in synchronize
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/redis.rb:139 in remove_channel
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/subscriber_map.rb:31 in block in remove_subscriber
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/subscriber_map.rb:26 in synchronize
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/subscriber_map.rb:26 in remove_subscriber
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/redis.rb:36 in unsubscribe
/gems/actioncable-7.0.4.3/lib/action_cable/subscription_adapter/channel_prefix.rb:18 in unsubscribe
/gems/actioncable-7.0.4.3/lib/action_cable/channel/streams.rb:122 in block in stop_all_streams
/gems/actioncable-7.0.4.3/lib/action_cable/channel/streams.rb:121 in each
/gems/actioncable-7.0.4.3/lib/action_cable/channel/streams.rb:121 in stop_all_streams
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:400 in block in make_lambda
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:271 in block in simple
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:599 in block in invoke_after
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:599 in each
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:599 in invoke_after
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:132 in block in run_callbacks
/gems/airbrake-13.0.3/lib/airbrake/rails/action_cable/notify_callback.rb:10 in call
/gems/airbrake-13.0.3/lib/airbrake/rails/action_cable.rb:9 in block (2 levels) in <main>
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:127 in instance_exec
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:127 in block in run_callbacks
/gems/activesupport-7.0.4.3/lib/active_support/execution_wrapper.rb:88 in wrap
/gems/actioncable-7.0.4.3/lib/action_cable/engine.rb:77 in block (3 levels) in <class:Engine>
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:127 in instance_exec
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:127 in block in run_callbacks
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:138 in run_callbacks
/gems/actioncable-7.0.4.3/lib/action_cable/channel/base.rb:191 in unsubscribe_from_channel
/gems/actioncable-7.0.4.3/lib/action_cable/connection/subscriptions.rb:51 in remove_subscription
/gems/actioncable-7.0.4.3/lib/action_cable/connection/subscriptions.rb:64 in block in unsubscribe_from_all
/gems/actioncable-7.0.4.3/lib/action_cable/connection/subscriptions.rb:64 in each
/gems/actioncable-7.0.4.3/lib/action_cable/connection/subscriptions.rb:64 in unsubscribe_from_all
/gems/actioncable-7.0.4.3/lib/action_cable/connection/base.rb:188 in handle_close
/gems/actioncable-7.0.4.3/lib/action_cable/server/worker.rb:59 in block in invoke
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:118 in block in run_callbacks
/gems/activesupport-7.0.4.3/lib/active_support/tagged_logging.rb:99 in block in tagged
/gems/activesupport-7.0.4.3/lib/active_support/tagged_logging.rb:37 in tagged
/gems/activesupport-7.0.4.3/lib/active_support/tagged_logging.rb:99 in tagged
/gems/actioncable-7.0.4.3/lib/action_cable/connection/tagged_logger_proxy.rb:24 in tag
/gems/actioncable-7.0.4.3/lib/action_cable/server/worker/active_record_connection_management.rb:16 in with_database_connections
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:127 in block in run_callbacks
/gems/actioncable-7.0.4.3/lib/action_cable/engine.rb:71 in block (4 levels) in <class:Engine>
/gems/activesupport-7.0.4.3/lib/active_support/execution_wrapper.rb:92 in wrap
/gems/actioncable-7.0.4.3/lib/action_cable/engine.rb:66 in block (3 levels) in <class:Engine>
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:127 in instance_exec
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:127 in block in run_callbacks
/gems/activesupport-7.0.4.3/lib/active_support/callbacks.rb:138 in run_callbacks
/gems/actioncable-7.0.4.3/lib/action_cable/server/worker.rb:42 in work
/gems/actioncable-7.0.4.3/lib/action_cable/server/worker.rb:58 in invoke
/gems/actioncable-7.0.4.3/lib/action_cable/server/worker.rb:53 in block in async_invoke
/gems/concurrent-ruby-1.2.2/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:352 in run_task
/gems/concurrent-ruby-1.2.2/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:343 in block (3 levels) in create_worker
/gems/concurrent-ruby-1.2.2/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:334 in loop
/gems/concurrent-ruby-1.2.2/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:334 in block (2 levels) in create_worker
/gems/concurrent-ruby-1.2.2/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:333 in catch
/gems/concurrent-ruby-1.2.2/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:333 in block in create_worker

any thoughts?
thanks

@dentarg
Copy link
Member

dentarg commented Apr 24, 2023

I have no thoughts on that, I can't see how Puma would be causing that but then I don't know your full stack, set up etc. I think you need to dig in and understand what's going on yourself. You can't expect support for your production environment in open source bug trackers (sorry if it sounds harsh but it is the reality - we can only act on easily reproducible examples)

@vitalinfo
Copy link
Author

@dentarg I think my issue could be related to this one #3122, but need to deep dive to AC and Redis connection initialisation, verify_mode is losing somewhere

@johnnyshields
Copy link
Contributor

johnnyshields commented May 9, 2023

In the case of #3122 it was fixed by upgrading the non-puma third party library. It's possible that what you are experiencing is a bug in either the redis driver or action cable. I use redis as a key-value store in my app, and I also push messages to it in pub-sub mode (but not with ActionCable), and in my use case on Puma 6.2.2 I am not seeing any issue.

@johnnyshields
Copy link
Contributor

@vitalinfo try it without fork_worker in your Puma.rb and see if that resolves the issue.

Here's what ChatGPT says, which seems plausible:

In Rails, I am using ActionCable and Redis with Puma. In Puma I am using the fork_worker option. How can I configure my puma.rb to avoid an issue when the worker forks?

When using the fork_worker option in Puma with Rails, ActionCable, and Redis, you might run into issues because the Redis connection isn't being properly re-established after forking. To avoid this issue, you can configure your puma.rb to properly handle the forking process by reconnecting to Redis in the on_worker_fork block.

Here's a sample puma.rb configuration file that demonstrates how to handle this:

# puma.rb

# Set the number of workers you want to use
workers ENV.fetch('PUMA_WORKERS') { 2 }.to_i

# Use the 'fork_worker' option
fork_worker

# Other Puma configurations
threads_count = ENV.fetch('RAILS_MAX_THREADS') { 5 }.to_i
threads threads_count, threads_count
port ENV.fetch('PORT') { 3000 }
environment ENV.fetch('RAILS_ENV') { 'development' }
plugin :tmp_restart

# Configure the 'on_worker_fork' block
on_worker_fork do
  ActiveSupport.on_load(:active_record) do
    ActiveRecord::Base.connection.disconnect!
  end

  if defined?(ActionCable::Server::Redis)
    ActionCable.server.config.redis_connector.disconnect!

    # Re-establish the Redis connection for each worker after forking
    ActionCable.server.config.redis_connector = ActionCable::Server::RedisConnector.new(
      { url: ENV.fetch('REDIS_URL') { 'redis://localhost:6379/1' } },
      ActionCable.server.config.redis_pool_size
    )
  end
end

# Reconnect ActiveRecord and Redis after worker boots
on_worker_boot do
  ActiveSupport.on_load(:active_record) do
    ActiveRecord::Base.establish_connection
  end
end

This configuration ensures that each worker process establishes its own Redis connection after forking, avoiding issues with shared connections across worker processes.

@dentarg
Copy link
Member

dentarg commented Jun 9, 2023

@vitalinfo have you made any progress?

I think we should close this issue, it seems to be something with your environment and application, not something odd with Puma, as many others have reported running Puma 6.2 without problems.

@vitalinfo
Copy link
Author

@dentarg just wanted to share with my progress
I've updated to Puma 6.3 and added

on_worker_boot do
  ActionCable.server.restart
end

and a few days it looks solid on staging, will try on production next week

@vitalinfo
Copy link
Author

vitalinfo commented Jun 12, 2023

@dentarg production doesn't look stable with 6.3
image

after revert to Puma 5.6.5
image

@johnnyshields
Copy link
Contributor

@vitalinfo have you confirmed that the issue is strictly limited to ActionCable? Does disabling ActionCable make the errors go away?

@TafadzwaD
Copy link

@vitalinfo Where you able to determine the root cause of this? Having the same issue on 6.x versions of Puma on Rails .

@johnnyshields
Copy link
Contributor

@TafadzwaD can you give us further info about your environment?

@aovertus
Copy link

aovertus commented Dec 20, 2023

Hi there, we are getting a good amount of leak on the number of connections as well. Our servers would stop responding at some point.

I took a snapshot (not ideal) just before restarting. We run in Cluster mode, one of the worker is completely fine, the other is running wild. We are taking a chance at downgrading to v5 for now.

We run on fargate with 1 CPU, 4Go RAM.
Rails 7.1
ruby 3.2
Sentry-ruby 5.15.1- background worker 5
Sidekiq pro 7.2

close_wait.txt

@dentarg
Copy link
Member

dentarg commented Dec 20, 2023

close_wait.txt

@aovertus Interesting, but can you spell out the column headers so we don't have to guess?

@aovertus
Copy link

@dentarg sorry about that, I just updated the file

@aovertus
Copy link

I'll throw an extra note here as we had no downtime since we downgrade to 5.6.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants