Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-31558: Add gc.freeze() #3705

Merged
merged 3 commits into from Oct 16, 2017
Merged

bpo-31558: Add gc.freeze() #3705

merged 3 commits into from Oct 16, 2017

Conversation

@brainfvck
Copy link
Contributor

@brainfvck brainfvck commented Sep 23, 2017

Introduces a new API that allows for moving all objects currently tracked by the garbage collector to a permanent generation, effectively removing them from future collection events. This can be used to protect those objects from having their PyGC_Head mutated. In effect, this enables great copy-on-write stability at fork(). More details on the issue.

https://bugs.python.org/issue31558

@the-knights-who-say-ni
Copy link

@the-knights-who-say-ni the-knights-who-say-ni commented Sep 23, 2017

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

Loading

@ambv
Copy link
Contributor

@ambv ambv commented Sep 23, 2017

I will add documentation, NEWS.d, and update Misc/ACKS in a separate commit.

Loading

@ambv ambv requested review from vstinner, 1st1 and nascheme Sep 23, 2017
@@ -185,6 +185,7 @@ struct _gc_runtime_state {
collections, and are awaiting to undergo a full collection for
the first time. */
Py_ssize_t long_lived_pending;
struct gc_generation permanent_generation;
Copy link
Contributor

@ambv ambv Sep 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this field after generation0.

Loading

}

PyDoc_STRVAR(gc_get_freeze_stats__doc__,
"get_freeze_stats() -> n\n"
Copy link
Contributor

@ambv ambv Sep 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not equivalent to what get_stats() is returning, can you rename to get_frozen_count()?

Loading

);

static PyObject *
gc_freeze(PyObject *module)
Copy link
Contributor

@ambv ambv Sep 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd prefer if you used ArgumentClinic for those functions. It's very easy to use: https://docs.python.org/3/howto/clinic.html

Loading

@@ -813,6 +817,8 @@ collect(int generation, Py_ssize_t *n_collected, Py_ssize_t *n_uncollectable,
for (i = 0; i < NUM_GENERATIONS; i++)
PySys_FormatStderr(" %zd",
gc_list_size(GEN_HEAD(i)));
PySys_WriteStderr("\ngc: objects in permanent generation: %d",
Copy link
Contributor

@elprans elprans Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%zd

Loading

@@ -734,6 +734,10 @@ def test_get_stats(self):
self.assertEqual(new[1]["collections"], old[1]["collections"])
self.assertEqual(new[2]["collections"], old[2]["collections"] + 1)

def test_freeze(self):
gc.freeze()
Copy link
Member

@pitrou pitrou Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should really undo the freezing after this test. We don't want tests to have so large side effects.

Loading

Copy link
Contributor

@ambv ambv Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! This will require adding the "thaw" or "unfreeze" function that will move objects back from the permanent generation to a properly collected one. Do you think it would be enough for that call to move all of them to gen0? This won't be exactly "undoing the freeze" but doesn't require keeping track of which object belonged to which generation before.

Loading

Copy link
Member

@pitrou pitrou Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that empties the permanent generation is good enough IMHO. People can call gc.collect explicitly afterwards if they want to collect those objects.

PS: as a non-native English locutor, I much prefer "unfreeze" to "thaw" :-)

Loading

Copy link
Contributor

@ambv ambv Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, didn't notice your "general comment" until now. So it seems we agree that moving everything to a single generation makes sense. But you're suggesting the oldest generation (gen2) instead. Makes sense.

Loading

Copy link
Contributor Author

@brainfvck brainfvck Oct 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I was just too lazy to add it but fine will update..

Loading

Freeze all current tracked objects and ignore them for future collections.
This can be used before a fork to make the gc copy-on-write friendly.
Copy link
Member

@pitrou pitrou Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than "a fork", perhaps use the wording "a POSIX fork() call", as that may not be obvious to all readers.

Loading

@pitrou
Copy link
Member

@pitrou pitrou commented Oct 6, 2017

Two general comments:

  • this needs to be documented in Doc/library/gc.rst
  • we really need a way to unfreeze the current permanent collection; in which generation it gets unfrozen doesn't really matter overall, though it may sound logical to unfreeze it into the oldest generation

Loading

gc_unfreeze_impl(PyObject *module)
/*[clinic end generated code: output=1c15f2043b25e169 input=2dd52b170f4cef6c]*/
{
gc_list_merge(&_PyRuntime.gc.permanent_generation.head, GEN_HEAD(NUM_GENERATIONS-1));
Copy link
Member

@pitrou pitrou Oct 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update _PyRuntime.gc.permanent_generation.count after this?

Loading

Copy link
Contributor Author

@brainfvck brainfvck Oct 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's always 0 since we don't use it, instead gc_list_size() is used for count.
count is not always the number of objects in generations(it is for 0 but not 1 and 2).

Loading

pitrou
pitrou approved these changes Oct 8, 2017
Copy link
Member

@pitrou pitrou left a comment

The current version of the PR looks basically good to me. Thank you!

Loading

@Somewater
Copy link

@Somewater Somewater commented Sep 11, 2018

Does fix already in 3.7.0 release?
I did some test script:
https://gist.github.com/Somewater/40d7a808d1efd7b2c77f22f8bcb73553
I expected that the script allocates 1-2 GB of memory at the beginning and does not require more memory in the process of work. But it does - just after start of iterations in workers:
http://pix.toile-libre.org/?img=1536607288.png
What I did wrong?

Loading

@pitrou
Copy link
Member

@pitrou pitrou commented Sep 11, 2018

@Somewater I think you misunderstood what gc.freeze allows you to do. It prevents the garbage collector from touching existing objects, but reference counting still exists, so if you iterate over those objects, you will still write in their corresponding memory areas (because you're incref'ing then decref'ing them). In short, what gc.freeze allows to share are objects that are never touched after the workers have been started (such as docstrings in modules, etc.). Not "live" objects that are touched by the application.

Loading

@Somewater
Copy link

@Somewater Somewater commented Sep 11, 2018

Thanks for answer. But can I manage with it without full disable of GC? I have big structure in parent process memory (python's objects) and have to read it from forked childs.

Loading

@pitrou
Copy link
Member

@pitrou pitrou commented Sep 11, 2018

If you're actually using those objects in the children, they will be duplicated because of reference counting.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants