New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Something odd with Puma 6.0.0 / 6.0.1 #2998
Comments
Probably not the problem, but can you try again with |
@nateberkopec we used to
where |
Related to #2999 perhaps? |
Or #3000? |
@vitalinfo Thanks. You're right, we didn't change the implementation of wait_for_less_busy worker in 6.0 so if it was already on for you in Puma 5, there's no need to change the setting. I agree with @dentarg it could be related to #2999 or #3000 though. An application that doesn't properly return a response quickly could easily cause the request queueing noticed here. |
Hi there, I can confirm there's something going on. After upgrading a bunch of gems, we started to get huge memory spikes. Since each gem was upgraded in a separate commit (thanks dependabot) I was able to track the faulty commit using git bisect, deploying, waiting for stats, rince & repeat. The commit that I ended up on was the Puma upgrade. Everytime Puma 6 was part of the upgraded gems, the memory went wild. |
@simonc @vitalinfo can you see if 3d33475 (#3002) resolves the issue for you? |
@dentarg So far (2h) running smoothly |
We noticed exactly the same issue. @simonc did the proposed fix resolve your issue? :) |
@GMorris-professional Not 24h later, but long enough, and it's still running normally, no huge memory spike or anything 😊 |
@dentarg |
@dentarg after Puma revert to 5.6.5 |
Thank you for the report. I'm not sure how best to proceed. Your app has problems, but others (like #3022 (comment)) may not. So, any additional info would helpful. Maybe your Gemfile.lock file, any particular features you may or may not be using (both the app & puma), etc. Puma v6 has a handful of new tests that that are a bit more 'real world'. Obviously, more are needed. |
@MSP-Greg here is our Gemfile
I don't know, what other information could be helpful, puma config I've posted above, we are using ActionCable and don't use heroku autoscaling, but use Hirefire if you need anything else, please let me know |
Thanks.
I'll see what I can find... |
There is some code in Puma that I found odd in both v5 & v6 related to the handling of '101 Switching Protocols' responses. Since no one had problems with v5, I thought I'd take another look after the holidays. Two questions:
|
|
@vitalinfo can you try to reproduce your issue outside of your application? in an app anyone can run |
@dentarg I think this's a problem, because it doesn't repro for us on staging, only on production, where the throughput much higher |
If you can't provide a repro I think you need to dig into your production environment and try to understand more what's going on. The puma open source project can't really do that or offer that level of support. Sure, Greg may provide a patch for you to test (not something you should expect really) but it would be great for everyone to be able to understand what's going on for real instead of us guessing. |
Not sure about that yet. Since our CI is passing, and ActionCable seems to be using Puma v6, not sure what the issue is. It's been a while since I've runup the Rails test suite, but I hope to do so very soon. Re ActionCable, there's a few ways to set up WebSocket connections. Currently, I think most apps are setup such that when they receive a Get request with websocket upgrade headers, the app grabs the socket and Puma is 'removed'. So the app is handling the '101' response itself. The app can also return headers for a '101' response, in which case Puma should send the response with the headers returned by the app, and then hand the socket over to the app. I don't believe that Puma is handling the second case correctly, but that's been the case for both v6 and v5 (haven't checked older versions). I have code for a fix, but I need to write tests, etc. Currently, a '101' response is considered a 'no_body' response, so the socket isn't handed over to the app. I.think... |
Would be interesting to see object allocations and memory usage, which would indicate if there is a memory leak causing this. Scout APM has a good tool for it, you could sign up for a free trial and check it out. |
@vitalinfo have you been able to take any further steps to isolate the issue? |
@johnnyshields not yet, it's in my plan for Feb |
@vitalinfo did you manage to spend some further time on this? Again would recommend Scout APM (they have a 14 day free trial) which can be really useful for pinpointing memory issues, speaking from first-hand experience. |
@vitalinfo any news? |
Hi @dentarg , We recently upgraded our Puma version to 6.0.2 from 5.6.5, and everything appeared to be working fine on our staging environment. However, after reviewing @vitalinfo 's comment (#2998 (comment)) regarding their production app, we are hesitant about deploying Puma 6.0.2 on our own production app until we can identify the root cause of the issue. Would you advise waiting until we have further information on @vitalinfo 's application before proceeding with the deployment? |
Same here ^^ |
I would advise you to make your own decisions :-) |
I really don't know about v6.0.x but we've been running on v6.1 for two months now and it's running smoothly 😉 |
@dentarg sorry for delay, today I tried one more time on production, but with the almost latest version of Puma 6.2.1 but I've faced with another problem the exceptions start right after new Puma deploy and stops once I've reverted it to version 5 we are using Redis 6.2 on Heroku and looks like it configured correctly according to their docs https://devcenter.heroku.com/articles/connecting-heroku-redis#connecting-in-ruby
full backtrace
any thoughts? |
I have no thoughts on that, I can't see how Puma would be causing that but then I don't know your full stack, set up etc. I think you need to dig in and understand what's going on yourself. You can't expect support for your production environment in open source bug trackers (sorry if it sounds harsh but it is the reality - we can only act on easily reproducible examples) |
In the case of #3122 it was fixed by upgrading the non-puma third party library. It's possible that what you are experiencing is a bug in either the redis driver or action cable. I use redis as a key-value store in my app, and I also push messages to it in pub-sub mode (but not with ActionCable), and in my use case on Puma 6.2.2 I am not seeing any issue. |
@vitalinfo try it without Here's what ChatGPT says, which seems plausible:
# puma.rb
# Set the number of workers you want to use
workers ENV.fetch('PUMA_WORKERS') { 2 }.to_i
# Use the 'fork_worker' option
fork_worker
# Other Puma configurations
threads_count = ENV.fetch('RAILS_MAX_THREADS') { 5 }.to_i
threads threads_count, threads_count
port ENV.fetch('PORT') { 3000 }
environment ENV.fetch('RAILS_ENV') { 'development' }
plugin :tmp_restart
# Configure the 'on_worker_fork' block
on_worker_fork do
ActiveSupport.on_load(:active_record) do
ActiveRecord::Base.connection.disconnect!
end
if defined?(ActionCable::Server::Redis)
ActionCable.server.config.redis_connector.disconnect!
# Re-establish the Redis connection for each worker after forking
ActionCable.server.config.redis_connector = ActionCable::Server::RedisConnector.new(
{ url: ENV.fetch('REDIS_URL') { 'redis://localhost:6379/1' } },
ActionCable.server.config.redis_pool_size
)
end
end
# Reconnect ActiveRecord and Redis after worker boots
on_worker_boot do
ActiveSupport.on_load(:active_record) do
ActiveRecord::Base.establish_connection
end
end
|
@vitalinfo have you made any progress? I think we should close this issue, it seems to be something with your environment and application, not something odd with Puma, as many others have reported running Puma 6.2 without problems. |
@dentarg just wanted to share with my progress
and a few days it looks solid on staging, will try on production next week |
@dentarg production doesn't look stable with 6.3 |
@vitalinfo have you confirmed that the issue is strictly limited to ActionCable? Does disabling ActionCable make the errors go away? |
@vitalinfo Where you able to determine the root cause of this? Having the same issue on 6.x versions of Puma on Rails . |
@TafadzwaD can you give us further info about your environment? |
Hi there, we are getting a good amount of leak on the number of connections as well. Our servers would stop responding at some point. I took a snapshot (not ideal) just before restarting. We run in Cluster mode, one of the worker is completely fine, the other is running wild. We are taking a chance at downgrading to v5 for now. We run on fargate with 1 CPU, 4Go RAM. |
@aovertus Interesting, but can you spell out the column headers so we don't have to guess? |
@dentarg sorry about that, I just updated the file |
I'll throw an extra note here as we had no downtime since we downgrade to |
Info for 6.0.1: #2998 (comment)
I don't have any clue what and where is going wrong, but just can to share our New Relic graphs (since 3 hours, since 1 hour)
when graph back to normal I've just downgraded back to Puma 5.6.5
@nateberkopec any thoughts or ideas what could went wrong and had such impact on the app?
Thanks
The text was updated successfully, but these errors were encountered: