Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run GC out-of-band by default #50449

Open
dhh opened this issue Dec 26, 2023 · 11 comments
Open

Run GC out-of-band by default #50449

dhh opened this issue Dec 26, 2023 · 11 comments

Comments

@dhh
Copy link
Member

dhh commented Dec 26, 2023

Ruby's GC can be a bit of a crude beast, and from the perspective of an application, it's awakened seemingly at random. When it runs, it smites that request with a serious penalty, and seemingly through no fault of its own. We've seen dramatic improvement in our P99 and P100 on HEY by running OOB GC, following @byroot's excellent deep-dive into the Shopify OOB GC work.

So let's enable this great trick by default, and in case we need a few simple levers to control it, let's make them few and simple.

We can use https://www.rubydoc.info/gems/puma/Puma%2FDSL:out_of_band.

cc @byroot

@dhh dhh modified the milestone: 8.0.0 Dec 26, 2023
@byroot
Copy link
Member

byroot commented Dec 27, 2023

As discussed on campfire, it's debatable whether it's a good default or not.

Every time a major GC triggers past the boot sequence, is in my opinion a Ruby GC bug (or a misconfiguration). We fixed many of these "bugs" in Ruby 3.3 with Peter Zhu, and I'll continue to work with him to make major GC as rare as possible in future Ruby versions.

OOB GC is a good workaround for these "bugs" if you really care about minimizing tail latency, and don't mind trading off some throughput. Especially on small deployments, Puma's out_of_band would be a throughput killer. And from the online discussion chatter I see (pure biased perception on my part) it seems that most users seem to care more about throughput (which directly translate in hosting cost, so makes sense).

Also, major GC really gain painful the larger your heap is. In Shopify's monolith a major GC takes ~4s which is awful, but on more moderately sized applications, I don't think it's bad enough to be worth this tradeoff. I'd be curious to know how long GC.start takes on HEY.

That said, Ruby GC can be a latency hog, perhaps an alternative solution is to make GC pauses more visible, and direct people toward https://github.com/Shopify/autotuner to improve that if they notice a problem.

Something that has been on my personal TODO for a while was to make it trivial to add extra timings in the request logs, because I wanted to make a gems that adds GC and GVL time in there. But now that I convinced Koichi to add GC.total_time back in Ruby 3.1, we could actually do that by default.

@dhh
Copy link
Member Author

dhh commented Dec 27, 2023

What drawbacks do you see with running GC OOB as a bridging default while all the GC bugs and issues are getting worked out? That the GC might run too often? Curious to understand that if, say, a GC run takes 200ms on a medium sized app (I believe that's what it was for HEY), would running the GC twice as frequently as its otherwise triggered be 2x 200ms or 2x 100ms?

Unless there's a large, additional overhead to running the GC somewhat more often, it seems that running it OOB by default is a win? Maybe we could also try to autotune the GC variables on the basis of how much RAM is available to the VM when booting?

@byroot
Copy link
Member

byroot commented Dec 27, 2023

What drawbacks do you see with running GC OOB

Throughput impact. While the process is running GC, it's not accepting and processing requests. It's even worse with Puma with multiple threads, because you first need to wait for all current requests to complete before you are "out of band".

Maybe we could also try to autotune the GC variables on the basis of how much RAM is available to the VM

GC tuning is not only function of RAM available, but mainly of the application profile. e.g. how many resident objects, how many allocations per requests etc.

@ko1
Copy link
Contributor

ko1 commented Dec 27, 2023

One just idea is

  • Support GC.disable(:major) that allow minor GC but not major GC
  • Check GC.start if GC.need_major_gc after processing request

On multi-thread (puma-based?) web application it does not work though because other threads stops on major GC.

Directly, the incremental GC is probably not tuned well enough.

@byroot
Copy link
Member

byroot commented Dec 27, 2023

One just idea is

I was thinking the same. I guess I need to write the feature request :) (FYI @peterzhu2118)

On multi-thread (puma-based?) web application it does not work though because other threads stops on major GC.

Yes, Puma would need another hook to handle this correctly, but I'm sure that can be added to Puma if the Ruby feature justify a new hook.

@byroot
Copy link
Member

byroot commented Dec 27, 2023

Thinking more about it, this should also prevent object promotion, otherwise it may make the issue worse.

On paper, once boot is done, no object should ever survive a request cycle (of course it's never perfect in practice), so if we prevent promotion, it should also prevent the need for a major.

@dhh
Copy link
Member Author

dhh commented Dec 28, 2023

Would like to see us pursue both paths at the same time. Both offer a OOB GC default, even if it requires a 1:1 worker/thread ratio, and then also work to remove the need for GC'ing like this between requests at all.

@byroot
Copy link
Member

byroot commented Dec 28, 2023

offer a OOB GC default

Again, OOB GC is a hard tradeoff that I'm not quite confident to default on users, it's really something you need to know is there and be ready to tune. In other words, way too sharp of a tool to be enabled by default.

Additionally, it's basically incompatible with small scale deployments, e.g. a single worker Puma on heroku, when OOB GC triggers you immediately go to 0 capacity for the duration of the GC, which is easily multiple hundred milliseconds. It's only viable if you have a decent number of processes.

But more importantly it's not a boolean config, as you need to select a frequency that is heavily dependent on the app size, profile and also on what gems you use (native gems that create unprotected objects tend to cause major GC a lot more).

So I think we could add a commented out snippet in the generated puma.rb, but I'm against enabling it by default.

@dhh
Copy link
Member Author

dhh commented Dec 28, 2023

So I guess what I'm most curious about is whether we can find a way to only run GC when it's actually close to being required anyway. Because here's how I see the two scenarios:

Request A triggers in-request GC. During this time no new clients are accepted to the server AND the current client is made to wait. Request B needs to wait until processing of A and the GC is complete to be served.

Request X puts the system over the GC limit. The GC is scheduled to run OOB after the request. Request Y needs to wait until X is done and the GC is done to be served.

The only way the second scenario seems worse than the first to me is if we are running GC more often than we normally would. Presumably we can find a way not to do that?

@byroot
Copy link
Member

byroot commented Mar 11, 2024

We discussed this at a recent Shopify ruby-infra gathering and will try to get the feature described in #50449 (comment) in Ruby 3.4.

One slight note though is that we believe that disabling major GC must also disable promotion of objects to the old generation.

I'll update this issue once the feature request is open upstream.

@byroot
Copy link
Member

byroot commented Apr 22, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants