Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-31558: Add gc.freeze() #3705

Merged
merged 3 commits into from Oct 16, 2017

Conversation

Projects
None yet
7 participants
@brainfvck
Copy link
Contributor

commented Sep 23, 2017

Introduces a new API that allows for moving all objects currently tracked by the garbage collector to a permanent generation, effectively removing them from future collection events. This can be used to protect those objects from having their PyGC_Head mutated. In effect, this enables great copy-on-write stability at fork(). More details on the issue.

https://bugs.python.org/issue31558

Zekun Li
@the-knights-who-say-ni

This comment has been minimized.

Copy link

commented Sep 23, 2017

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

@ambv

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2017

I will add documentation, NEWS.d, and update Misc/ACKS in a separate commit.

@ambv ambv requested review from vstinner, 1st1 and nascheme Sep 23, 2017

@@ -185,6 +185,7 @@ struct _gc_runtime_state {
collections, and are awaiting to undergo a full collection for
the first time. */
Py_ssize_t long_lived_pending;
struct gc_generation permanent_generation;

This comment has been minimized.

Copy link
@ambv

ambv Sep 23, 2017

Contributor

I would put this field after generation0.

}

PyDoc_STRVAR(gc_get_freeze_stats__doc__,
"get_freeze_stats() -> n\n"

This comment has been minimized.

Copy link
@ambv

ambv Sep 23, 2017

Contributor

Since this is not equivalent to what get_stats() is returning, can you rename to get_frozen_count()?

);

static PyObject *
gc_freeze(PyObject *module)

This comment has been minimized.

Copy link
@ambv

ambv Sep 23, 2017

Contributor

We'd prefer if you used ArgumentClinic for those functions. It's very easy to use: https://docs.python.org/3/howto/clinic.html

@@ -813,6 +817,8 @@ collect(int generation, Py_ssize_t *n_collected, Py_ssize_t *n_uncollectable,
for (i = 0; i < NUM_GENERATIONS; i++)
PySys_FormatStderr(" %zd",
gc_list_size(GEN_HEAD(i)));
PySys_WriteStderr("\ngc: objects in permanent generation: %d",

This comment has been minimized.

Copy link
@elprans

elprans Sep 26, 2017

Contributor

%zd

Zekun Li
@@ -734,6 +734,10 @@ def test_get_stats(self):
self.assertEqual(new[1]["collections"], old[1]["collections"])
self.assertEqual(new[2]["collections"], old[2]["collections"] + 1)

def test_freeze(self):
gc.freeze()

This comment has been minimized.

Copy link
@pitrou

pitrou Oct 6, 2017

Member

You should really undo the freezing after this test. We don't want tests to have so large side effects.

This comment has been minimized.

Copy link
@ambv

ambv Oct 6, 2017

Contributor

Good point! This will require adding the "thaw" or "unfreeze" function that will move objects back from the permanent generation to a properly collected one. Do you think it would be enough for that call to move all of them to gen0? This won't be exactly "undoing the freeze" but doesn't require keeping track of which object belonged to which generation before.

This comment has been minimized.

Copy link
@pitrou

pitrou Oct 6, 2017

Member

Anything that empties the permanent generation is good enough IMHO. People can call gc.collect explicitly afterwards if they want to collect those objects.

PS: as a non-native English locutor, I much prefer "unfreeze" to "thaw" :-)

This comment has been minimized.

Copy link
@ambv

ambv Oct 6, 2017

Contributor

Oh, didn't notice your "general comment" until now. So it seems we agree that moving everything to a single generation makes sense. But you're suggesting the oldest generation (gen2) instead. Makes sense.

This comment has been minimized.

Copy link
@brainfvck

brainfvck Oct 7, 2017

Author Contributor

ok, I was just too lazy to add it but fine will update..

Freeze all current tracked objects and ignore them for future collections.
This can be used before a fork to make the gc copy-on-write friendly.

This comment has been minimized.

Copy link
@pitrou

pitrou Oct 6, 2017

Member

Rather than "a fork", perhaps use the wording "a POSIX fork() call", as that may not be obvious to all readers.

@pitrou

This comment has been minimized.

Copy link
Member

commented Oct 6, 2017

Two general comments:

  • this needs to be documented in Doc/library/gc.rst
  • we really need a way to unfreeze the current permanent collection; in which generation it gets unfrozen doesn't really matter overall, though it may sound logical to unfreeze it into the oldest generation
Zekun Li
gc_unfreeze_impl(PyObject *module)
/*[clinic end generated code: output=1c15f2043b25e169 input=2dd52b170f4cef6c]*/
{
gc_list_merge(&_PyRuntime.gc.permanent_generation.head, GEN_HEAD(NUM_GENERATIONS-1));

This comment has been minimized.

Copy link
@pitrou

pitrou Oct 7, 2017

Member

Do we need to update _PyRuntime.gc.permanent_generation.count after this?

This comment has been minimized.

Copy link
@brainfvck

brainfvck Oct 7, 2017

Author Contributor

No, it's always 0 since we don't use it, instead gc_list_size() is used for count.
count is not always the number of objects in generations(it is for 0 but not 1 and 2).

@pitrou

pitrou approved these changes Oct 8, 2017

Copy link
Member

left a comment

The current version of the PR looks basically good to me. Thank you!

@ambv ambv merged commit c75edab into python:master Oct 16, 2017

4 checks passed

bedevere/issue-number Issue number 31558 found
Details
bedevere/news "skip news" label found
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@Somewater

This comment has been minimized.

Copy link

commented Sep 11, 2018

Does fix already in 3.7.0 release?
I did some test script:
https://gist.github.com/Somewater/40d7a808d1efd7b2c77f22f8bcb73553
I expected that the script allocates 1-2 GB of memory at the beginning and does not require more memory in the process of work. But it does - just after start of iterations in workers:
http://pix.toile-libre.org/?img=1536607288.png
What I did wrong?

@pitrou

This comment has been minimized.

Copy link
Member

commented Sep 11, 2018

@Somewater I think you misunderstood what gc.freeze allows you to do. It prevents the garbage collector from touching existing objects, but reference counting still exists, so if you iterate over those objects, you will still write in their corresponding memory areas (because you're incref'ing then decref'ing them). In short, what gc.freeze allows to share are objects that are never touched after the workers have been started (such as docstrings in modules, etc.). Not "live" objects that are touched by the application.

@Somewater

This comment has been minimized.

Copy link

commented Sep 11, 2018

Thanks for answer. But can I manage with it without full disable of GC? I have big structure in parent process memory (python's objects) and have to read it from forked childs.

@pitrou

This comment has been minimized.

Copy link
Member

commented Sep 11, 2018

If you're actually using those objects in the children, they will be duplicated because of reference counting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.