New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add cyclic garbage collection to NumPy #15065
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems somewhere between a cleanup and an enhancement. There are a lot of moving parts. I ran coverage locally and marked the new code paths that were not hit. I think refcount.c.src
might need a little more thought: it is very dense and hard to reason about now.
* or if it doesn't have any objects | ||
*/ | ||
if (!descr) { | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to cov, the !descr
path is never hit. Is it reasonable to remove the check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it PyArray_DATA(self) == NULL
for now, the issue is that allocation adds the object to the GC tracker too early on in a sense. It probably cannot happen that the traverse ever kicks in, since we are busy within C if it fails at this point. But that is what this seems to have been for (the change was by @J-Sand, maybe he has a quick note on it).
It is likely that this is just a band-aid solution, and the correct thing is to ensure that tracking is only enabled after everything is initialized. This might be a major rough edge :/ thought this was further along.
visitproc visit, void *arg) | ||
#else | ||
@inner_function@(char *data, PyArray_Descr *descr) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really hard to reason about. Maybe it would be better to expand some of these templates at the cost of some code repitition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed slightly, not sure it is much better with inline #defines
for the arguments though.
NPY_COPY_PYOBJECT_PTR(&temp, data); | ||
Py_XDECREF(temp); | ||
for( i = 0; i < n; i++, data++) { | ||
HANDLE_ITEM_AT_UNALIGNED(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure this is never hit
Py_XDECREF(temp); | ||
while (it.index < it.size) { | ||
data = (PyObject **)it.dataptr; | ||
HANDLE_ITEM_AT_UNALIGNED(data); | ||
PyArray_ITER_NEXT(&it); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is ever hit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not, it is pretty rare to create unaligned object arrays. Same goes for object arrays actually holding NULL instead of None. Although, I think the first one at least may be simple enough with a structured dtype, which would indeed be a good test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, made a test for this, I think.
n = PyArray_SIZE(arr); | ||
if (obj == NULL) { | ||
for (i = 0; i < n; i++) { | ||
*optr++ = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code path obj == NULL
is never hit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it is exposed API, note that this function was not touched, the file was just modified so much it appears to be changed. Also it could just as well use XINCREF
and remove the branching, I doubt it is worth to micro-optimize...
/* if the array doesn't own its data, visiting fa->base is sufficient */ | ||
return 0; | ||
} | ||
/* NOTE: In the future, user dtypes could in principle be visited */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the dtype be visited anyway, since it has metadata that could refer back to the array? No need to implement the visiting in the dtype object yet, but I think we should call visit here to play it safe for when we fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is pretty esoteric, but yes. We should visit the descriptor maybe. That would mean we cannot untrack arrays though, since any array has a dtype. (at least unless we also check that the dtype has no metadata). I think I meant the comment as visiting a user-dtypes items (which is much further off).
Not sure what to best do. Could just not do the untracking, or do it anyway, but visit. Either way, this would fix the vast majority of use-cases.
numpy/core/src/multiarray/ctors.c
Outdated
/* | ||
* tp_alloc will have enabled GC tracking, but it's only necessary | ||
* for object arrays. Subtypes may also define additional fields, so | ||
* a subtype would have to handle this by itself. | ||
*/ | ||
PyObject_GC_UnTrack(fa); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is correct? What does cpython do about dict
subclasses, for instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I have no thought to much about the untracking, it is the method I took from the original PR. But I am not sure how you think this can be incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I actually looked a bit at the Python code, and it seems to me that this is not correct and I would have expected it to run into asserts on a debug python build (under the assumption that such an array is added to container that implements cyclic gc). I am a bit surprised I did not notice that (although without checking I am not 100% we have CI which runs a python debug build).
There would be solution to this: One can implement tp_is_gc
which returns False
after untracking/for arrays which are not added to the GC. (There is probably only a single use case of this: Python uses it to distinguish between HEAP and static types.)
Right now, I think this is arguably a bug, and chances are that having a few more arrays does not matter (there must be a huge number of other python containers around that need to do more traversing).
Still planning to try out the speed penalty for keeping a huge object array around, but even if it is noticable, I am not sure it matters too much as to whether or not we should do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the object being tracked at this point already? From the docs:
PyObject_GC_Track
should be called once all the fields followed by the tp_traverse handler become valid, usually near the end of the constructor.
Could we just call PyObject_GC_Track
here instead? That way, the subtype will just call PyObject_GC_Track
itself if it adds new members, which ought to be safe, i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, aren't we supposed to call PyObject_GC_Track
ourselves anyway? I'm not seeing the bit of the docs that imply its called for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default tp_alloc
calls it for you when Py_TPFLAGS_HAVE_GC
. Strictly speaking: probably yes. But I also do not think it matters, since there we own the GIL the whole time here, and there is no call back into python before everything is finished up.
Anyway, if someone wants to review+merge this, I am happy to look at this, otherwise, for now we can also close until someone actually complains due to a memory leak. The new changes in python probably need a bit of cleaning, and the release notes may need to warn hypothetical hacks that subclass from C at least.
694b4cc
to
777a38d
Compare
Thought I might be able to unstick this a bit by limiting it to ndarray for now (the other part is preserved https://github.com/seberg/numpy/tree/cyclic-gc-iterators). But in any case the untracking is a pattern I have not found in cpython itself. However, it is completely fine to untrack (and even retrack), and it seems even fine to untrack multiple times. If this remains stuck, just as well... |
I suppose, the one real change to consider here is that might make sense to not bother doing this (possibly weird) untracking some arrays and just keep it tracked since the traverse will be very short. |
This commit is based on work of James Sanders in: numpy#8303 Closes numpy#6581. Co-authored-by: James Sanders <J-Sand@users.noreply.github.com>
I am -1 on spending more effort on this now. I don't see a clear use case for adding this complexity, and it does not seem to be causing users much pain. We can revisit it if a user comes forward with a clear case where the perceived memory leak is meaningful. |
OK, fair enough. |
This is probably right, but not really finished up. Just pushing since it was almost finished, since if it seems like unless this comes up again, we do not care enough about fixing this lingering bug.
@seberg Needs rebase. Do you want to keep it open. |
Well, if we can find someone to push it through, I am happy to rebase it. It needs a little bit of thought, since I am not quite sure about those attempts to untrack arrays that don't (really) need it – i.e. the ones that don't contain objects. Although, I know a thing or two more about it now, so maybe I can make sense of it now... In general, I think it fixes a couple of impossible to find memory leaks even though they seem reported rarely... |
This would also be nice to revive. I am not sure how much work it even is, the main thing to figure out (besides from trying to clean up the code a bit) is really how to best support subclasses. |
Wrong comment: This should be revived IMO, it is a pretty big bug in most ways you can look at it. I think it could be revived almost as-is (with some thoughts on whether we should even bother to selectively enable/disable tracking and how to do it correctly). But, it may also be that sometime soon, and with the HPy project making progress the approach should change here anyway (i.e. the current decref/incref code paths being replaced). |
This PR adds cyclic garbage collection to all (or at least most) objects within NumPy that can in theory form such cycles. It builds, and replaces, replaces gh-8303. Some thought may still be warrented in: