Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: reference cycle in np.vectorize #11977

Merged
merged 5 commits into from Jan 9, 2019
Merged

Conversation

mattip
Copy link
Member

@mattip mattip commented Sep 18, 2018

Issue #11867 demonstrated a leak when using np.vectorize on a bound class method in CPython 3. The test demonstrates the issue.

Still WIP since I have not found the root cause of the leak.

Edit: adding GC_Track to ufunc allows the garbage collector to break the cycle

@eric-wieser
Copy link
Member

My guess would be that the problem is np.ufunc not implementing a traverse method to tell the GC that it holds a cycle

@mattip mattip changed the title WIP, BUG: reference cycle in np.vectorize BUG: reference cycle in np.vectorize Sep 18, 2018
@mattip
Copy link
Member Author

mattip commented Sep 25, 2018

ping

def test_leaks(self):
# gh-11867
# np.frompyfunc(<class method>) creates a reference cycle
# that requires a gc collection cycle to break (on CPython 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we worked out why this is the case? Am I right in thinking that this patch changes it from being unbreakable to just requiring a gc to break?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I do not really understand the root cause and why only in this specific case

@mattip
Copy link
Member Author

mattip commented Sep 25, 2018

Squashed to a single commit.

@eric-wieser
Copy link
Member

eric-wieser commented Sep 26, 2018

I think the commit message is wrong - you didn't break the reference cycle, you simply allowed the GC to find it (and break it on the python side)

@eric-wieser
Copy link
Member

I wonder if you can use assert_no_gc_cycles here, which prints out the cycle upon failure, if possible

@mattip mattip force-pushed the vectorize-refcount branch 2 times, most recently from a358d0d to 8f75986 Compare October 10, 2018 07:03
@mattip
Copy link
Member Author

mattip commented Oct 10, 2018

fixed first commit message, reworked test to focus on class.atrribute = frompyfunc(class.method) where the reference cycle is actually created

@eric-wieser
Copy link
Member

Would be good to finish this up

@eric-wieser
Copy link
Member

eric-wieser commented Nov 10, 2018

tp_traverse will need minor changes to accommodate the extra reference added in #8955 - I suppose the PR of the two that doesn't go in first will need updating.

Fix will be as simple as:

if (ufunc->identity == PyUFunc_IdentityValue) {
    Py_VISIT(ufunc->identity_value);
}

@mattip
Copy link
Member Author

mattip commented Nov 10, 2018

merging with master to resolve conflicts

@mattip
Copy link
Member Author

mattip commented Nov 11, 2018

i will wait for #8955 and then rework this PR

@mattip mattip force-pushed the vectorize-refcount branch 2 times, most recently from d1279ce to b2c7901 Compare December 4, 2018 07:18
@mattip
Copy link
Member Author

mattip commented Dec 18, 2018

rebased at git merge-base master maintenance/1.16.x. Would it be possible to get this into 1.16?

return np.int(0)

def npy_bound(self, a):
return types.MethodType(np.frompyfunc(self.bound, 1, 1), a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very confused by what exactly we're trying to test here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.bound is part of a reference cycle that requires a gc collection cycle to break. I added more clarification in the test docstring. Improvements welcome

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me, if nobody complains will merge in a bit probably. The test couldbe a bit more obvious to read, or maybe add a trivial one:

def func(a):
    return 0
func.vectorized = np.frompyfunc(func)

giving a minimal cycle. The other part would probably be simpler by making it a realistic use case:

class A(object):
    def __init__(self):
        # Wrap bound_method into a ufunc with `np.frompyfunc`. This creates
        # a reference cycle:
        #     self.method -> self   # since it is a bound method (knows "self")
        #     self -> self.wrapped_method  # Set here
        #     self.wrapped_method -> self.method  # Stores self.method to call it
        self.wrapped_method = np.frompyfunc(self.method)

    def method(self):
        return 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess that breaks the parameterization. Although, I do not think you need the types.MethodType if the just do the self.bound_vectorized = np.frompyfunc(self.bound, 1, 1) before returning it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following. Where would I add that line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think what confuses me is that I believe you do not need this at all. You can simply remove the npy_bound and npy_unbound logic and just make it a.f = np.frompyfunc(func, 1, 1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted

@mattip
Copy link
Member Author

mattip commented Dec 18, 2018

should be squash-merged or I can squash and force-push

@charris
Copy link
Member

charris commented Dec 18, 2018

If you want it in 1.16, use the 1.16.1 milestone.

@mattip
Copy link
Member Author

mattip commented Dec 31, 2018

@eric-wieser ping

numpy/core/src/umath/ufunc_object.c Outdated Show resolved Hide resolved
return np.int(0)

def npy_bound(self, a):
return types.MethodType(np.frompyfunc(self.bound, 1, 1), a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess that breaks the parameterization. Although, I do not think you need the types.MethodType if the just do the self.bound_vectorized = np.frompyfunc(self.bound, 1, 1) before returning it?

@seberg
Copy link
Member

seberg commented Jan 9, 2019

Thanks @mattip lets put it in before we find something else to nitpick about ;).

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test now makes sense to me too - thanks for persisting

@seberg
Copy link
Member

seberg commented Jan 9, 2019

Cool, mac OS job is expected to fail. Next thing will be the object arrays ;).

@seberg seberg merged commit f5b6850 into numpy:master Jan 9, 2019
@mattip mattip deleted the vectorize-refcount branch May 20, 2019 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants