-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gc.freeze() - an API to mark objects as uncollectable #75739
Comments
When you're forking many worker processes off of a parent process, the resulting children are initially very cheap in memory. They share memory pages with the base process until a write happens [1]_. Sadly, the garbage collector in Python touches every object's PyGC_Head during a collection, even if that object stays alive, undoing all the copy-on-write wins. Instagram disabled the GC completely for this reason [2]_. This fixed the COW issue but made the processes more vulnerable to memory growth due to new cycles being silently introduced when the application code is changed by developers. While we could fix the most glaring cases, it was hard to keep the memory usage at bay. We came up with a different solution that fixes both issues. It requires a new API to be added to CPython's garbage collector. gc.freeze() As soon as possible in the lifecycle of the parent process we disable the garbage collector. Then we call a new API called After calling Why do we need to disable the collector on the parent process as soon as possible? When the GC cleans up memory in the mean time, it leaves space in pages for new objects. Those pages become shared after fork and as soon as the child process starts creating its own objects, they will likely be written to the shared pages, initiating a lot of copy-on-write activity. In other words, we're wasting a bit of memory in the shared pages to save a lot of memory later (that would otherwise be wasted on copying entire pages after forking). Other attempts We also tried moving the GC head to another place in memory. This creates some indirection but cache locality on that segment is great so performance isn't really hurt. However, this change introduces two new pointers (16 bytes) per object. This doesn't sound like a lot but given millions of objects and tens of processes per box, this alone can cost hundreds of megabytes per host. Memory that we wanted to save in the first place. So that idea was scrapped. Attribution The original patch is by Zekun Li, with help from Jiahao Li, Matt Page, David Callahan, Carl S. Shapiro, and Chenyang Wu. .. [1] https://en.wikipedia.org/wiki/Copy-on-write |
This is only useful if the parent process has a lot of memory that's never used by the child processes right? Otherwise, you would lose via refcounting COWs. |
Nice idea! I think it helps not only sharing more memory for forking application, For example, web worker loading application after fork. (uWSGI's --lazy-app option). |
AFAIK, Python shutdown process calls full GC. Don't touching permanent generation makes shutdown faster.
Of course, GC permanent generation while shutdown doesn't make sense. So I think these notable downside should be documented. |
I think the basic idea makes a lot of sense, i.e. have a generation that is never collected. An alternative way to implement it would be to have an extra generation, e.g. rather than just 0, 1, 2 also have generation 3. The collection would by default never collect generation 3. Generation 4 would be equivalent to the frozen generation. You could still force collection by calling gc.collect(3). Whether that generation should be collected on shutdown would still be a question. If this gets implemented, it will impact the memory bitmap based GC idea I have been prototyping. Currently I am thinking of using two bits for each small GC object. The bits would mean: 00 - untracked, 01 - gen 0, 10 - gen 1, 11 - gen 2. With the introduction of a frozen generation, I would have to use another bit I think. Another thought is maybe we don't actually need 3 generations as they are currently used. We could have gen 0 which is collected frequently and gen 1 that is collected rarely. The frozen objects could go into gen 2 which are not automatically collected or have a user adjustable collection frequency. Collection of gen 1 would not automatically move objects into gen 2. I think bpo-3110 (https://bugs.python.org/issue31105) is also related. The current GC thresholds are not very good. I've look at what Go does and the GC collection is based on a relative increase in memory usage. Python could do perhaps something similar. The accounting of actual bytes allocated and deallocated is tricky because the *_Del/Free functions don't actually know how much memory is being freed, at least not in a simple way. |
I like the idea of a gen 4 that never gets collected. This would have been useful for the original problem that inspired me to add the It's unfortunate that you'd have to add a bit to handle this, but maybe you're right that we only really need three generations. |
Le 25/09/2017 à 20:55, Neil Schemenauer a écrit :
API-wise it would sound better to have a separate gc.collect_frozen()... Though I think a gc.unfreeze() that moves the frozen generation into the
Yeah... It's worse than that. Take for example a bytearray object. The IMHO, the only reliable way to use memory footprint to drive the GC (*) And let's not talk about hairier cases, such as having multiple PS: every heuristic has its flaws. As I noted on python-(dev|ideas), |
Alright Python people, I don't see anybody being against the idea on the thread. Can we get a review of the linked PR? I don't think it would be good form for me to accept it. |
What about msg302790? |
What we saw in prod is that memory fragmentation caused by gc is the main reason of shared memory shrink. The memory fragmentation is figured out by doing a full collection before fork and keep it disabled, it'll make a bunch of copy-on-write in child process. This can't solve the copy-on-write caused by ref count, but we're thinking about freezing the ref count on those permanent objects too. So this is useful if you did some warm-up work in parent process. Also it could speedup gc if you have large amount of permanent objects. |
GC doesn't cause "memory fragmentation".
It may increase cost of refcount operation, because it makes all INCREF and DECREF bigger.
I don't understand this statement.
Yes, this helps not only "prefork" application, but also all long running applications |
As Instagram's report, disabling cycler GC really helps even if there is refcont. |
Should gc.freeze() do gc.collect() right before freezing? I don't like Other nitpicking: get_freeze_count() or get_frozen_count()? |
So what we did is: We keep gc **disabled** on parent process and freeze after warmup, enable gc on child process. The reason not to do a full collection is mentioned in previous comments/original ticket - (I called it) memory fragmentation. The observation is - We keep gc disabled on both parent and child process and did a full collection before fork, it makes the shared memory shrink a lot compared to no collection. - There's no way for disabled gc to touch the head to make copy-on-write. Of course, enable gc will make the shared memory shrink more. But the former case is accounting more than latter one. So my understand is that gc frees some objects and makes some memory pages becomes available to allocate in child process. Allocation on the shared memory pages will cause the copy-on-write even without gc. Though this behavior may have better name? |
OK, now I got what you're talking. But I don't think "memory hole" is big problem, because we already has refcount. Solving memory hole issue is easy: just stop allocating new object from existing pages. Instead of trying "share most data", I recommend to "use small number of processes" approach. In my company, we don't use "prefork", but "--lazy-app" option of uWSGI for graceful reloading. (e.g. "afterfork") So I prefer optimizing normal memory usage. It is good for all applications, not only "prefork" applications. In this case, I'm +1 to gc.freeze() proposal because it can be used for single process applications. |
Based on Inadasan's, Antoine's, Neil's, and Barry's review, I'm merging the change to 3.7. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: