Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbounded memory use when specifying otypes='d' in vectorize. #11867

Closed
carlohamalainen opened this issue Sep 2, 2018 · 5 comments
Closed

Unbounded memory use when specifying otypes='d' in vectorize. #11867

carlohamalainen opened this issue Sep 2, 2018 · 5 comments

Comments

@carlohamalainen
Copy link

@nadiahpk first encountered this in the way that SciPy's hypergeom uses vectorize.

Here is a stand-alone example for reproducing the memory behaviour:

import numpy as np

class myclass():
    def f_class_method(self, k): return 0

    def __init__(self):
        self.f_class_method_vec_d = np.vectorize(self.f_class_method, otypes='d')
        self.f_class_method_vec   = np.vectorize(self.f_class_method)

# Memory use keeps growing:
def main1():
    while True:
        myclass().f_class_method_vec_d(0)

# Memory use constant:
def main2():
    while True:
        myclass().f_class_method_vec(0)

# Manual workaround for main1, now has constant memory use:
def main3():
    while True:
        m = myclass()
        m.f_class_method_vec_d(0)
        del m.f_class_method_vec_d._ufunc

I believe the issue is that self._ufunc becomes a circular reference when the function being vectorized is a class method.

I have a workaround here: carlohamalainen@9b5747b

@mattip
Copy link
Member

mattip commented Sep 3, 2018

Are you sure there is no leak with your fix? When I tried a test with sys.getrefcount() I find I need to activate the last 2 del statements to make it pass (CPython3 only, CPython2 passes without any deletes). Applying your fix does not change the situation for me. I am using the latest NumPy head.

    def test_leaks(self, dels):
        class A(object):
            def f(self, k):
                return 0

            def __init__(self):
                self.f_vec_otypes = np.vectorize(self.f, otypes='d')
                self.f_vec = np.vectorize(self.f)

        a_tag = A()
        f_refcount = sys.getrefcount(A.f)
        for i in range(10):
            a = A()
            out = a.f_vec_otypes(0)
            out = a.f_vec(0)
            # del a.f_vec_otypes._ufunc
            # del a.f_vec._ufunc
            # del a.f_vec_otypes
            # del a.f_vec
        assert_equal(sys.getrefcount(A.f), f_refcount)

Still playing with it to understand where the extra reference(s) is being created. Adding for i in range(10): gc.collect() influences the refcount number, but does not totally remove the need for the del statements.

@carlohamalainen
Copy link
Author

You're right, the memory usage still grows, just more slowly.

I added one del to my manual workaround and I see the memory usage (RES in htop) stay constant at 27528:

def main3():
    while True:
        m = myclass()
        m.f_class_method_vec_d(0)
        del m.f_class_method_vec_d._ufunc
        del m.f_class_method_vec_d  # added this del

@mattip
Copy link
Member

mattip commented Sep 7, 2018

vectorize calls frompyfunc only in the actual function __call__. I wonder if that is what is creating references that even the GC cannot break.

@seberg
Copy link
Member

seberg commented Jan 9, 2019

Oops, forgot to add/keep a fixes to the commit. This was closed by gh-11977

@seberg seberg closed this as completed Jan 9, 2019
@carlohamalainen
Copy link
Author

Thanks @mattip and @seberg :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants