Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use puma worker count equal to processor count in production #46838

Merged
merged 1 commit into from Dec 28, 2022

Conversation

dhh
Copy link
Member

@dhh dhh commented Dec 27, 2022

Use all the processor performance of the host by default in production.

@rails-bot rails-bot bot added the railties label Dec 27, 2022
@dhh dhh merged commit 839ac1e into main Dec 28, 2022
@dhh dhh deleted the puma-production-worker-count branch December 28, 2022 02:09
@lazaronixon
Copy link
Contributor

lazaronixon commented Dec 28, 2022

@natematykiewicz
Copy link
Contributor

natematykiewicz commented Dec 29, 2022

Optimizing the default production config for the cheapest possible Heroku plan seems pretty unreasonable. People who are on plans that cheap can set their WEB_CONCURRENCY env var if they don't have enough resources to run 2 processes (it has 2 cores) with 512MB of RAM.

This would be in 7.1 which is a "major" version in Rails terms, and this is only going to affect people who accept the diff on the generated file. A few commits ago, WEB_CONCURRENCY was in the file but commented and you were expected to uncomment it. So a decent amount of people (like myself) already set the value, even.

The article you quoted says this:

to fully make use of multiple cores, your application should have a process count that matches the number of physical cores on the system.

That's exactly what this PR does. Seems like a pretty sane default.

@lazaronixon
Copy link
Contributor

lazaronixon commented Dec 29, 2022

But even the plan Performance-M just gives you 2.5GB of memory. For 4 cores the recommended memory is 8GB, which is not offered by most cloud services in this range. IMO we could let it as before, as a comment, and use 2 as default.

The first line of the article makes it clear:

Increasing process count increases RAM utilization, which can be a limiting factor.

If memory would be the same, independent of the number of workers, I wouldn't have objections here.

@natematykiewicz
Copy link
Contributor

natematykiewicz commented Dec 29, 2022

Performance-M is not a good choice for Rails apps. https://judoscale.com/guides/how-many-dynos

image

@lazaronixon
Copy link
Contributor

So we are in a worse situation, just 1024 MB of RAM. Even in the article, they recommend a WEB_CONCURRENCY of 2, with the current puma configuration file we would have this value set to 4 or even 8(I'm not sure).

@natematykiewicz
Copy link
Contributor

Looks like I misspoke earlier and the Free (now Eco) have 1 CPU. Which means this change does nothing for Eco dynos.

@lazaronixon
Copy link
Contributor

lazaronixon commented Dec 29, 2022

No, it has 1 DYNO, 4 physical cores, and 4 virtual cores, but I'm not sure if you see 4 or 8 using the method Concurrent.physical_processor_count. 😬

https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#process-count-value

image

@trevorturk
Copy link
Contributor

trevorturk commented Jan 2, 2023

FWIW, on Heroku, with a Standard-1X dyno, Concurrent.physical_processor_count returns 4, when I believe you'll want to set WEB_CONCURRENCY=1 or your Dyno Load would go over 1: https://devcenter.heroku.com/articles/metrics#dyno-load.

I don't have a strong opinion on changing this default, but I do think it may be slightly problematic for Heroku users. Perhaps Heroku could update their documentation, or set a default ENV var for you. Rails Guides could also detail some caveats, and it may be worth considering a comment in this template file for clarity.

@lazaronixon
Copy link
Contributor

lazaronixon commented Jan 2, 2023

That's the point 👍. On Heroku, you can set SENSIBLE_DEFAULTS then it will set the correct WEB_CONCURRENCY based on the plan, but the fact is that most cloud services don't give enough memory to run through all processors.

https://devcenter.heroku.com/changelog-items/618

@simi
Copy link
Contributor

simi commented Jan 2, 2023

ℹ️ This could be also problem in K8s deploys where scaling should be done through pods, not by running daemons and multiple processes.

@trevorturk
Copy link
Contributor

Oh interesting about Heroku's SENSIBLE_DEFAULTS, I hadn't seen that before. Thanks for the tip. In that case, it looks like 2 is Heroku's recommended WEB_CONCURRENCY for a Standard-1x Dyno. (I'm using Falcon instead of Puma, which wants 1 process per CPU.)

I still think it may be fine to change this default, but it seems like some additional documentation for cloud hosting providers might be nice to have.

zzak added a commit that referenced this pull request Jan 7, 2023
@bdewater
Copy link
Contributor

bdewater commented Jan 8, 2023

ℹ️ This could be also problem in K8s deploys where scaling should be done through pods, not by running daemons and multiple processes.

Scaling with pods only means missing out on memory savings via copy-on-write.

I'm no K8s expert, but does it really matter from a container perspective whether it's a single worker or master process with forked workers is accepting requests? Based on puma/puma#2645 (comment) it doesn't seem to be.

@mindlace
Copy link

mindlace commented Jan 10, 2023

This implementation doesn't return the right values when running in a container (it will always return 1).

To handle the container case, the code should look at /proc/1/sched and if that doesn't start with init, it should look at /sys/fs/cgroup/cpuset/cpuset.cpus to get the number allocated to that container.

For this specific PR, choosing to use the physical_processor_count method will also give incorrect values when running under jruby.

@dhh
Copy link
Member Author

dhh commented Jan 10, 2023

Old default was 1, so that would amount to the same. But would be great if we could get it fixed up in a way so we by default use all available cores inside a container too. Please do investigate a patch ✌️

@simi
Copy link
Contributor

simi commented Jan 10, 2023

This implementation doesn't return the right values when running in a container (it will always return 1).

To handle the container case, the code should look at /proc/1/sched and if that doesn't start with init, it should look at /sys/fs/cgroup/cpuset/cpuset.cpus to get the number allocated to that container.

For this specific PR, choosing to use the physical_processor_count method will also give incorrect values when running under jruby.

Would you mind to report this to https://github.com/ruby-concurrency/concurrent-ruby/issues/new?

@dhh
Copy link
Member Author

dhh commented Jan 21, 2023

I just tried to replicate the problem, but couldn't. Concurrent.physical_processor_count reports the correct number of available cores when answering from within a Docker container running on a VM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants