New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+3] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only when raising; Propagate all errors to python interpreter level (#7811) #8002

Merged
merged 19 commits into from Jan 18, 2017

Conversation

Projects
None yet
10 participants
@raghavrv
Member

raghavrv commented Dec 7, 2016

Fixes #7811

  • Fixes memory leak in MAE (Using safe_realloc instead of master's alloc/calloc-and-forget-about-previously-held-memory approach took care of the memory leak.)
  • Uniformly use safe_realloc everywhere (Added StackRecord* and PriorityHeapRecord* to fused typerealloc_ptr`)
  • Acquire GIL only when an error needs to be raised
  • Propagate all errors to python interpreter level. (Use except * when appropriate)

cc: @glouppe @jmschrei @nelson-liu @agramfort @lesteve

Is it okay to release gil in safe_realloc? Any reason why it was not done before?

Also ref my mails to cython-devel:

@raghavrv raghavrv added the Bug label Dec 7, 2016

@raghavrv raghavrv added this to the 0.19 milestone Dec 7, 2016

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Dec 7, 2016

Contributor

Thanks for tackling this, @raghavrv . Did you run memory benchmarks to verify that safe_realloc fixes the issue? I thought it was more deep seated than that...(though I'd be happy if it was not 😄 )

Contributor

nelson-liu commented Dec 7, 2016

Thanks for tackling this, @raghavrv . Did you run memory benchmarks to verify that safe_realloc fixes the issue? I thought it was more deep seated than that...(though I'd be happy if it was not 😄 )

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Dec 7, 2016

Member

I ran the snippet from #7811 (comment) on this PR and the memory leak seems to be taken care of.

Not a cython expert so I won't comment about the actual changes introduced by this PR.

Member

lesteve commented Dec 7, 2016

I ran the snippet from #7811 (comment) on this PR and the memory leak seems to be taken care of.

Not a cython expert so I won't comment about the actual changes introduced by this PR.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 7, 2016

Member

Do you think such a snippet could be added as a test? I don't think there is precedence for memory leak tests. But I think we should have... Maybe as a more general test as a part of #4841?

Member

raghavrv commented Dec 7, 2016

Do you think such a snippet could be added as a test? I don't think there is precedence for memory leak tests. But I think we should have... Maybe as a more general test as a part of #4841?

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 7, 2016

Member

@nelson-liu Yeah! Loic's snippet seems to be stable in this PR...

Member

raghavrv commented Dec 7, 2016

@nelson-liu Yeah! Loic's snippet seems to be stable in this PR...

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Dec 7, 2016

Member

Do you think such a snippet could be added as a test?

It takes plus than one minute to run so the short answer is probably not in its current form.

We have some memory usage based tests in joblib. Look at https://github.com/joblib/joblib/blob/master/joblib/test/test_numpy_pickle.py#L368 and https://github.com/joblib/joblib/blob/master/joblib/test/common.py#L40 for example. In our experience they tend to be a bit brittle (you need the memory peak to be not too short-lived for memory_profiler to have a chance to see it ...).

Member

lesteve commented Dec 7, 2016

Do you think such a snippet could be added as a test?

It takes plus than one minute to run so the short answer is probably not in its current form.

We have some memory usage based tests in joblib. Look at https://github.com/joblib/joblib/blob/master/joblib/test/test_numpy_pickle.py#L368 and https://github.com/joblib/joblib/blob/master/joblib/test/common.py#L40 for example. In our experience they tend to be a bit brittle (you need the memory peak to be not too short-lived for memory_profiler to have a chance to see it ...).

@jmschrei

This comment has been minimized.

Show comment
Hide comment
@jmschrei

jmschrei Dec 7, 2016

Member

From a cython POV this looks good, but I haven't tested it to make sure it fixes the issue.

Member

jmschrei commented Dec 7, 2016

From a cython POV this looks good, but I haven't tested it to make sure it fixes the issue.

if array == NULL:
# no free; __dealloc__ handles that
return -1
self.array_ = array

This comment has been minimized.

@nelson-liu

nelson-liu Dec 7, 2016

Contributor

not quite relevant to the issue at hand, but do you think these same fixes should be applied for the other data structures in utils.pyx?

@nelson-liu

nelson-liu Dec 7, 2016

Contributor

not quite relevant to the issue at hand, but do you think these same fixes should be applied for the other data structures in utils.pyx?

This comment has been minimized.

@raghavrv

raghavrv Dec 8, 2016

Member

Indeed it can be extended to all the data structures but I first need to confirm if releasing gil in safe_realloc is safe... I can't figure out why it was not done so far... @glouppe or @jmschrei can help us figure that out maybe? :) If the current changes look okay, then I can replace all malloc calls with safe_realloc... There is also yet another line in tree.pyx where safe_realloc was explicitly avoided because it used to not release gil...

The thing is tree tests are not very extensive as they need to be and I am paranoid that these kind of clean ups extraneous to issue at hand might break user code in subtle ways...

Thx for raising the question!

@raghavrv

raghavrv Dec 8, 2016

Member

Indeed it can be extended to all the data structures but I first need to confirm if releasing gil in safe_realloc is safe... I can't figure out why it was not done so far... @glouppe or @jmschrei can help us figure that out maybe? :) If the current changes look okay, then I can replace all malloc calls with safe_realloc... There is also yet another line in tree.pyx where safe_realloc was explicitly avoided because it used to not release gil...

The thing is tree tests are not very extensive as they need to be and I am paranoid that these kind of clean ups extraneous to issue at hand might break user code in subtle ways...

Thx for raising the question!

This comment has been minimized.

@jmschrei

jmschrei Dec 8, 2016

Member

It should be fine. I'm guessing it wasn't put in originally because there was no need to. If it's not using and GIL operations anyway then there shouldn't be a difference.

@jmschrei

jmschrei Dec 8, 2016

Member

It should be fine. I'm guessing it wasn't put in originally because there was no need to. If it's not using and GIL operations anyway then there shouldn't be a difference.

@raghavrv raghavrv requested a review from glouppe Dec 8, 2016

raise MemoryError("could not allocate (%d * %d) bytes"
% (nelems, sizeof(p[0][0])))
with gil:
raise MemoryError("could not allocate (%d * %d) bytes"

This comment has been minimized.

@jnothman

jnothman Dec 8, 2016

Member

These exceptions aren't propagated i.e. I don't think it will actually end the function's execution.

@jnothman

jnothman Dec 8, 2016

Member

These exceptions aren't propagated i.e. I don't think it will actually end the function's execution.

This comment has been minimized.

@jnothman

jnothman Dec 8, 2016

Member

Try

%%cython
cdef a() nogil:
  with gil:
    raise ValueError('Something broke')
def b():
    a()
@jnothman

jnothman Dec 8, 2016

Member

Try

%%cython
cdef a() nogil:
  with gil:
    raise ValueError('Something broke')
def b():
    a()

This comment has been minimized.

@raghavrv

raghavrv Dec 9, 2016

Member

True but having except * ensures that it propagates back to the calling function (and then to interpreter)

%%cython
cdef void a() nogil except *:
  with gil:
    raise ValueError('Something broke')
def b():
    a()
b()
@raghavrv

raghavrv Dec 9, 2016

Member

True but having except * ensures that it propagates back to the calling function (and then to interpreter)

%%cython
cdef void a() nogil except *:
  with gil:
    raise ValueError('Something broke')
def b():
    a()
b()

This comment has been minimized.

@jnothman

jnothman Dec 9, 2016

Member

Okay, I forgot those details. Haven't checked now. Just check that all nogil paths continue to propagate by that means.

@jnothman

jnothman Dec 9, 2016

Member

Okay, I forgot those details. Haven't checked now. Just check that all nogil paths continue to propagate by that means.

This comment has been minimized.

@raghavrv

raghavrv Dec 12, 2016

Member

Okay. Thanks!

And I think as long as gil is acquired when we raise the error, it will propagate when except * construct is used... Infact raising cannot be done without gil for the same reason... I think this is why the whole safe_realloc was not made to release gil? I now changed it to acquire gil only when the memory is not sufficient and hence error has to be raised...

@raghavrv

raghavrv Dec 12, 2016

Member

Okay. Thanks!

And I think as long as gil is acquired when we raise the error, it will propagate when except * construct is used... Infact raising cannot be done without gil for the same reason... I think this is why the whole safe_realloc was not made to release gil? I now changed it to acquire gil only when the memory is not sufficient and hence error has to be raised...

This comment has been minimized.

@raghavrv

raghavrv Dec 12, 2016

Member

With that confirmation could I have your +1 for merge? This would be nice to have as long as there aren't any side effects...

@raghavrv

raghavrv Dec 12, 2016

Member

With that confirmation could I have your +1 for merge? This would be nice to have as long as there aren't any side effects...

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 12, 2016

Member

From a cython POV this looks good, but I haven't tested it to make sure it fixes the issue.

@jmschrei With Loic's confirmation that it fixes the issue, are you +1 for merge?

Member

raghavrv commented Dec 12, 2016

From a cython POV this looks good, but I haven't tested it to make sure it fixes the issue.

@jmschrei With Loic's confirmation that it fixes the issue, are you +1 for merge?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Dec 13, 2016

Member
Member

jnothman commented Dec 13, 2016

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 20, 2016

Member

Sorry for the delay in the response...

I think we should use except * by default for all compiled methods / functions written in cython... But releasing the gil for the whole function should not affect propagation of error. It's the except *... Releasing gil raises a compile time exception that errors cannot be raised without gil... Hence apart from the current changes (which acquire gil only when an error needs to be raised), I think adding except * for all cdef functions should solve it... I'll push a commit in a moment...

Eitherways I think we need to fix this memory leak soon...

Member

raghavrv commented Dec 20, 2016

Sorry for the delay in the response...

I think we should use except * by default for all compiled methods / functions written in cython... But releasing the gil for the whole function should not affect propagation of error. It's the except *... Releasing gil raises a compile time exception that errors cannot be raised without gil... Hence apart from the current changes (which acquire gil only when an error needs to be raised), I think adding except * for all cdef functions should solve it... I'll push a commit in a moment...

Eitherways I think we need to fix this memory leak soon...

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 20, 2016

Member

Or for this PR let's have the except * only for functions that would end up calling safe_realloc and revisit this issue of adding "except * in every cdef" at a later time...

Member

raghavrv commented Dec 20, 2016

Or for this PR let's have the except * only for functions that would end up calling safe_realloc and revisit this issue of adding "except * in every cdef" at a later time...

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 20, 2016

Member

In nested calls, it seems like the calling function need not have except Arghh I was wrong the try.. except makes it print. The calling function and every higher level function needs to have this except *

%%cython

cdef void gil_noexcept():
    raise ValueError('At gil_noexcept: This will not be propagated')
    
cdef void nogil_noexcept() nogil:
    # Acquire gil only when you want to raise
    with gil:
        raise ValueError('At nogil_noexcept: This will also not be propagated')
    
cdef void gil_except() except *:
    raise ValueError('At gil_except: This will be propagated')

cdef void nogil_except() nogil except *:
    # Acquire gil only when you want to raise
    with gil:
        raise ValueError('At nogil_except: This will also be propagated')
            
cdef void super_function() except *:
    for fn in (gil_noexcept, nogil_noexcept,
               gil_except, nogil_except):
        try:
            fn()
        except Exception as e:
            print(e)

def python_fn():
    super_function()

python_fn()
At gil_except: This will be propagated
At nogil_except: This will also be propagated
Exception ignored in: '_cython_magic_c6c634898ce44ef1e9180f02f86d5f76.gil_noexcept'
ValueError: At gil_noexcept: This will not be propagated
Exception ignored in: '_cython_magic_c6c634898ce44ef1e9180f02f86d5f76.nogil_noexcept'
ValueError: At nogil_noexcept: This will also not be propagated
Member

raghavrv commented Dec 20, 2016

In nested calls, it seems like the calling function need not have except Arghh I was wrong the try.. except makes it print. The calling function and every higher level function needs to have this except *

%%cython

cdef void gil_noexcept():
    raise ValueError('At gil_noexcept: This will not be propagated')
    
cdef void nogil_noexcept() nogil:
    # Acquire gil only when you want to raise
    with gil:
        raise ValueError('At nogil_noexcept: This will also not be propagated')
    
cdef void gil_except() except *:
    raise ValueError('At gil_except: This will be propagated')

cdef void nogil_except() nogil except *:
    # Acquire gil only when you want to raise
    with gil:
        raise ValueError('At nogil_except: This will also be propagated')
            
cdef void super_function() except *:
    for fn in (gil_noexcept, nogil_noexcept,
               gil_except, nogil_except):
        try:
            fn()
        except Exception as e:
            print(e)

def python_fn():
    super_function()

python_fn()
At gil_except: This will be propagated
At nogil_except: This will also be propagated
Exception ignored in: '_cython_magic_c6c634898ce44ef1e9180f02f86d5f76.gil_noexcept'
ValueError: At gil_noexcept: This will not be propagated
Exception ignored in: '_cython_magic_c6c634898ce44ef1e9180f02f86d5f76.nogil_noexcept'
ValueError: At nogil_noexcept: This will also not be propagated
@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 21, 2016

Member

So from cython docs -

If you don’t do anything special, a function declared with cdef that does not return a Python object has no way of reporting Python exceptions to its caller. If an exception is detected in such a function, a warning message is printed and the exception is ignored. If you want a C function that does not return a Python object to be able to propagate exceptions to its caller, you need to declare an exception value for it.

There is also a third form of exception value declaration:

cdef int spam() except *:

This form causes Cython to generate a call to PyErr_Occurred() after every call to spam, regardless of what value it returns. If you have a function returning void that needs to propagate errors, you will have to use this form, since there isn’t any return value to test. Otherwise there is little use for this form.

I think it's safe to assume that all cdef declarations need an except * (not except -1) since some functions return -1 and some have void as the return type...

@jnothman @glouppe @jmschrei WDYT? Can I also request a comment from other cython experts (@larsmans or @jakevdp) here please :)

Member

raghavrv commented Dec 21, 2016

So from cython docs -

If you don’t do anything special, a function declared with cdef that does not return a Python object has no way of reporting Python exceptions to its caller. If an exception is detected in such a function, a warning message is printed and the exception is ignored. If you want a C function that does not return a Python object to be able to propagate exceptions to its caller, you need to declare an exception value for it.

There is also a third form of exception value declaration:

cdef int spam() except *:

This form causes Cython to generate a call to PyErr_Occurred() after every call to spam, regardless of what value it returns. If you have a function returning void that needs to propagate errors, you will have to use this form, since there isn’t any return value to test. Otherwise there is little use for this form.

I think it's safe to assume that all cdef declarations need an except * (not except -1) since some functions return -1 and some have void as the return type...

@jnothman @glouppe @jmschrei WDYT? Can I also request a comment from other cython experts (@larsmans or @jakevdp) here please :)

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 21, 2016

Member

(Manually checking which ones call functions (that call others maybe and so on) that raise an error is error-prone / could change in the future and we do not have tests for these kind of memory errors / leaks.

Which is why I'm suggesting adding "except *" for all cdef declarations as a best practice in tree code...)

Member

raghavrv commented Dec 21, 2016

(Manually checking which ones call functions (that call others maybe and so on) that raise an error is error-prone / could change in the future and we do not have tests for these kind of memory errors / leaks.

Which is why I'm suggesting adding "except *" for all cdef declarations as a best practice in tree code...)

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 26, 2016

Member

@jnothman @glouppe @jmschrei @nelson-liu Bump? (sorry for the previous travis failure. There was a typo in the size)

Member

raghavrv commented Dec 26, 2016

@jnothman @glouppe @jmschrei @nelson-liu Bump? (sorry for the previous travis failure. There was a typo in the size)

@jmschrei

There seem to be a lot of functions which are entirely math that shouldn't need the except * statement. I had thought that only the functions which could raise memory errors and their parent functions would need this? I see your rational for making it best practice, but I think that it is not necessary for some of these functions. impurity improvement should never be -1 or raise an error. It may be more verbose and confusing to future developers if we just blanket every function with it.

Show outdated Hide outdated sklearn/tree/_criterion.pyx
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Dec 27, 2016

Member

The overhead of except * should be taken into consideration. Frequently called things should probably not have it if they have no risk of exception (e.g. no Python functions called, no memory allocated).

Member

jnothman commented Dec 27, 2016

The overhead of except * should be taken into consideration. Frequently called things should probably not have it if they have no risk of exception (e.g. no Python functions called, no memory allocated).

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 27, 2016

Member

The reason why I did this is to follow a safe approach as there are some functions that quietly hide nested function calls that may raise an error which may be quietly ignored.

But I think I can try to avoid except * to purely mathematical functions and others that do not have nested calls or maybe we are sure up to one level of nesting that they will not raise an error...

(BTW github's online merge conflict resolving is pretty nifty!)

Member

raghavrv commented Dec 27, 2016

The reason why I did this is to follow a safe approach as there are some functions that quietly hide nested function calls that may raise an error which may be quietly ignored.

But I think I can try to avoid except * to purely mathematical functions and others that do not have nested calls or maybe we are sure up to one level of nesting that they will not raise an error...

(BTW github's online merge conflict resolving is pretty nifty!)

@raghavrv raghavrv changed the title from [MRG] FIX Memory leak in MAE (#7811) to [MRG] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only when raising (#7811) Dec 27, 2016

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Dec 27, 2016

Member

Done! @jmschrei @jnothman I've removed except * from places that won't raise an error. Also have ensure safe_realloc is used everywhere uniformly...

Member

raghavrv commented Dec 27, 2016

Done! @jmschrei @jnothman I've removed except * from places that won't raise an error. Also have ensure safe_realloc is used everywhere uniformly...

@raghavrv raghavrv changed the title from [MRG] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only when raising; Propagate all errors to python interpreter level (#7811) to [MRG + 1] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only when raising; Propagate all errors to python interpreter level (#7811) Jan 16, 2017

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jan 17, 2017

Member

thanks @ogrisel for the comments. Have fixed the docstring. Could you take a look now? Also Joel are you +1 for merge? Anyone else +1 for merge? @jmschrei @nelson-liu @glouppe @arjoly?

Member

raghavrv commented Jan 17, 2017

thanks @ogrisel for the comments. Have fixed the docstring. Could you take a look now? Also Joel are you +1 for merge? Anyone else +1 for merge? @jmschrei @nelson-liu @glouppe @arjoly?

@nelson-liu

This LGTM, thanks for patching it up @raghavrv

Show outdated Hide outdated sklearn/tree/_criterion.pyx
@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jan 17, 2017

Member

Thx @nelson-liu fixed...

Member

raghavrv commented Jan 17, 2017

Thx @nelson-liu fixed...

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jan 18, 2017

Member

In a couple of places (e.g. WeightedPQueue.__cinit__), I see safe_realloc(ptr, n); if ptr == NULL: raise MemoryError(), despite the same logic inside safe_realloc itself. Am I correct in thinking this is redundant? Should we clean it up here, or in a next PR?

Otherwise, I've done a manual breadth first search and am satisfied that our except -1 arse is covered, so LGTM!

Member

jnothman commented Jan 18, 2017

In a couple of places (e.g. WeightedPQueue.__cinit__), I see safe_realloc(ptr, n); if ptr == NULL: raise MemoryError(), despite the same logic inside safe_realloc itself. Am I correct in thinking this is redundant? Should we clean it up here, or in a next PR?

Otherwise, I've done a manual breadth first search and am satisfied that our except -1 arse is covered, so LGTM!

@jnothman jnothman changed the title from [MRG + 1] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only when raising; Propagate all errors to python interpreter level (#7811) to [MRG+2] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only when raising; Propagate all errors to python interpreter level (#7811) Jan 18, 2017

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jan 18, 2017

Member

I was thinking of merging this and fixing it in another PR... But appveyor is not green anyhow. I'll fix it here... And thanks for pointing out, that is correct safe_realloc and except would suffice to raise MemoryErrors...

Member

raghavrv commented Jan 18, 2017

I was thinking of merging this and fixing it in another PR... But appveyor is not green anyhow. I'll fix it here... And thanks for pointing out, that is correct safe_realloc and except would suffice to raise MemoryErrors...

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jan 18, 2017

Member

And done... The appveyor seems to be stuck in the queue for a long time...

Member

raghavrv commented Jan 18, 2017

And done... The appveyor seems to be stuck in the queue for a long time...

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jan 18, 2017

Member

(Or it is taking it's revenge on me for pushing so many times :@)

Member

raghavrv commented Jan 18, 2017

(Or it is taking it's revenge on me for pushing so many times :@)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jan 18, 2017

Member

A bit of both. I've just cancelled a whole lot of AppVeyor runs to see if it'll come unstuck. I've not yet found one in progress (not queued, not finished, not cancelled), so I don't know why it's stuck.

Member

jnothman commented Jan 18, 2017

A bit of both. I've just cancelled a whole lot of AppVeyor runs to see if it'll come unstuck. I've not yet found one in progress (not queued, not finished, not cancelled), so I don't know why it's stuck.

@NelleV

This comment has been minimized.

Show comment
Hide comment
@NelleV

NelleV Jan 18, 2017

Member

I don' think the appveyor problem is related to scikit-learn. Matplotlib currently has the same problem.

Member

NelleV commented Jan 18, 2017

I don' think the appveyor problem is related to scikit-learn. Matplotlib currently has the same problem.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jan 18, 2017

Member

Indeed even the cancelling operation has very slow throughput.

Member

jnothman commented Jan 18, 2017

Indeed even the cancelling operation has very slow throughput.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jan 18, 2017

Member

According to them, this build seems to be the issue... Anyway now I can see one build running...

Member

raghavrv commented Jan 18, 2017

According to them, this build seems to be the issue... Anyway now I can see one build running...

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jan 18, 2017

Member

But things are starting again, now.

Member

jnothman commented Jan 18, 2017

But things are starting again, now.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jan 18, 2017

Member

Thanks!! Meanwhile can you take a look at the final commit and approve. The appveyor should pass in about 30 minutes. I'll merge in an hour...

Member

raghavrv commented Jan 18, 2017

Thanks!! Meanwhile can you take a look at the final commit and approve. The appveyor should pass in about 30 minutes. I'll merge in an hour...

@jnothman

Yes, I think those duplicate errors are correctly removed.

@jnothman jnothman changed the title from [MRG+2] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only when raising; Propagate all errors to python interpreter level (#7811) to [MRG+3] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only when raising; Propagate all errors to python interpreter level (#7811) Jan 18, 2017

@raghavrv raghavrv merged commit 4907029 into scikit-learn:master Jan 18, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jan 18, 2017

Member

Thanks a lot for the reviews everyone!

Member

raghavrv commented Jan 18, 2017

Thanks a lot for the reviews everyone!

@raghavrv raghavrv deleted the raghavrv:mae_mem_leak branch Jan 18, 2017

@glouppe

This comment has been minimized.

Show comment
Hide comment
@glouppe

glouppe Jan 19, 2017

Member

Thanks for the fix!

(and sorry about not having time for a review, my bandwidth has been quite limited these days for scikit-learn, unfortunately...)

Member

glouppe commented Jan 19, 2017

Thanks for the fix!

(and sorry about not having time for a review, my bandwidth has been quite limited these days for scikit-learn, unfortunately...)

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jan 19, 2017

Member

@jnothman Do we need a bugfix whatsnew for this?

Member

raghavrv commented Jan 19, 2017

@jnothman Do we need a bugfix whatsnew for this?

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jan 19, 2017

Member

(and sorry about not having time for a review, my bandwidth has been quite limited these days for scikit-learn, unfortunately...)

@glouppe No problem :) I'll have a new PR very soon for you ;P

Member

raghavrv commented Jan 19, 2017

(and sorry about not having time for a review, my bandwidth has been quite limited these days for scikit-learn, unfortunately...)

@glouppe No problem :) I'll have a new PR very soon for you ;P

sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

[MRG+3] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only wh…
…en raising; Propagate all errors to python interpreter level (#7811) (#8002)

* FIX MAE reg. criterion: Use safe_realloc to avoid memory leak

* Release GIL in safe_realloc and clean up scaffolding

* As gil is released in safe_realloc, no need of a with gil block

* Use except * to propagate error in all cdef functions

* Don't use except * for functions that return python objects

* Don't use except * for the comparison function passed to qsort

* Omissions and Errors

* Use safe_realloc now that gil is released there

* Fix realloc size

* Acquire GIL only if we need to raise

* Use except * more judiciously; Release gil only when raising; Add comments to clarify

* Actually that was unneeded; realloc will also allocate for the first time

* StackRecord*, PriorityHeapRecord* to fused type realloc_ptr; Use safe_realloc

* Use except -1 to propagate exceptions. This should avoid overheads

* Fix docstrings and add return 0 to reset methods

* TYPO

* REVIEW Remove redundant MemoryError raising calls

@Przemo10 Przemo10 referenced this pull request Mar 17, 2017

Closed

update fork (#1) #8606

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+3] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only wh…
…en raising; Propagate all errors to python interpreter level (#7811) (#8002)

* FIX MAE reg. criterion: Use safe_realloc to avoid memory leak

* Release GIL in safe_realloc and clean up scaffolding

* As gil is released in safe_realloc, no need of a with gil block

* Use except * to propagate error in all cdef functions

* Don't use except * for functions that return python objects

* Don't use except * for the comparison function passed to qsort

* Omissions and Errors

* Use safe_realloc now that gil is released there

* Fix realloc size

* Acquire GIL only if we need to raise

* Use except * more judiciously; Release gil only when raising; Add comments to clarify

* Actually that was unneeded; realloc will also allocate for the first time

* StackRecord*, PriorityHeapRecord* to fused type realloc_ptr; Use safe_realloc

* Use except -1 to propagate exceptions. This should avoid overheads

* Fix docstrings and add return 0 to reset methods

* TYPO

* REVIEW Remove redundant MemoryError raising calls

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+3] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only wh…
…en raising; Propagate all errors to python interpreter level (#7811) (#8002)

* FIX MAE reg. criterion: Use safe_realloc to avoid memory leak

* Release GIL in safe_realloc and clean up scaffolding

* As gil is released in safe_realloc, no need of a with gil block

* Use except * to propagate error in all cdef functions

* Don't use except * for functions that return python objects

* Don't use except * for the comparison function passed to qsort

* Omissions and Errors

* Use safe_realloc now that gil is released there

* Fix realloc size

* Acquire GIL only if we need to raise

* Use except * more judiciously; Release gil only when raising; Add comments to clarify

* Actually that was unneeded; realloc will also allocate for the first time

* StackRecord*, PriorityHeapRecord* to fused type realloc_ptr; Use safe_realloc

* Use except -1 to propagate exceptions. This should avoid overheads

* Fix docstrings and add return 0 to reset methods

* TYPO

* REVIEW Remove redundant MemoryError raising calls
@Tommalla

This comment has been minimized.

Show comment
Hide comment
@Tommalla

Tommalla Aug 14, 2017

Not sure whether this or #8623 is the more appropriate place to write this, but I believe I have an issue with RandomForestRegressor leaking memory when used with the MAE criterion.

My setup is the following: I'm drawing some learning curves, so I'm training and evaluating the model on increasingly bigger fragments of the dataset (I divide it into 30 "chunks").

My dataset is 6640 x 7, numpy.float32.

With every new iteration of train -> predict -> replace the model variable with a new instance, I get a significant increase in memory consumption, up to the point when I run out of memory (10GB, around 9GB free for the experiment).

If I skip the learning curves and just train the model once on the whole dataset, it fits nicely and consumes around 3GB of memory. Likewise, if I change the criterion to MSE and leave the other parameters unchanged, I can run the whole program, including incremental training, and the memory usage doesn't go above 200MB.

I believe the leak happens somewhere in the predict method, as reducing the number of predict calls after the model is trained significantly reduces the memory consumption (around 800MB per predict for that dataset).

My current configuration:

>>> import platform; print(platform.platform())
Linux-4.12.4-1-ARCH-x86_64-with-arch-Arch-Linux
>>> import sys; print("Python", sys.version)
Python 3.6.2 (default, Jul 20 2017, 03:52:27) 
[GCC 7.1.1 20170630]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.13.1
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 0.19.1
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.18.2

Parameters passed to the RandomForestRegessor are: n_estimators=5, max_features='auto', criterion='mae'

Tommalla commented Aug 14, 2017

Not sure whether this or #8623 is the more appropriate place to write this, but I believe I have an issue with RandomForestRegressor leaking memory when used with the MAE criterion.

My setup is the following: I'm drawing some learning curves, so I'm training and evaluating the model on increasingly bigger fragments of the dataset (I divide it into 30 "chunks").

My dataset is 6640 x 7, numpy.float32.

With every new iteration of train -> predict -> replace the model variable with a new instance, I get a significant increase in memory consumption, up to the point when I run out of memory (10GB, around 9GB free for the experiment).

If I skip the learning curves and just train the model once on the whole dataset, it fits nicely and consumes around 3GB of memory. Likewise, if I change the criterion to MSE and leave the other parameters unchanged, I can run the whole program, including incremental training, and the memory usage doesn't go above 200MB.

I believe the leak happens somewhere in the predict method, as reducing the number of predict calls after the model is trained significantly reduces the memory consumption (around 800MB per predict for that dataset).

My current configuration:

>>> import platform; print(platform.platform())
Linux-4.12.4-1-ARCH-x86_64-with-arch-Arch-Linux
>>> import sys; print("Python", sys.version)
Python 3.6.2 (default, Jul 20 2017, 03:52:27) 
[GCC 7.1.1 20170630]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.13.1
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 0.19.1
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.18.2

Parameters passed to the RandomForestRegessor are: n_estimators=5, max_features='auto', criterion='mae'

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 15, 2017

Member
Member

jnothman commented Aug 15, 2017

@Tommalla

This comment has been minimized.

Show comment
Hide comment
@Tommalla

Tommalla Aug 15, 2017

@jnothman My bad, missed the new release and thought this was included in 0.18.2. Works fine now.

Thank you!

Tommalla commented Aug 15, 2017

@jnothman My bad, missed the new release and thought this was included in 0.18.2. Works fine now.

Thank you!

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+3] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only wh…
…en raising; Propagate all errors to python interpreter level (#7811) (#8002)

* FIX MAE reg. criterion: Use safe_realloc to avoid memory leak

* Release GIL in safe_realloc and clean up scaffolding

* As gil is released in safe_realloc, no need of a with gil block

* Use except * to propagate error in all cdef functions

* Don't use except * for functions that return python objects

* Don't use except * for the comparison function passed to qsort

* Omissions and Errors

* Use safe_realloc now that gil is released there

* Fix realloc size

* Acquire GIL only if we need to raise

* Use except * more judiciously; Release gil only when raising; Add comments to clarify

* Actually that was unneeded; realloc will also allocate for the first time

* StackRecord*, PriorityHeapRecord* to fused type realloc_ptr; Use safe_realloc

* Use except -1 to propagate exceptions. This should avoid overheads

* Fix docstrings and add return 0 to reset methods

* TYPO

* REVIEW Remove redundant MemoryError raising calls

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+3] FIX Memory leak in MAE; Use safe_realloc; Acquire GIL only wh…
…en raising; Propagate all errors to python interpreter level (#7811) (#8002)

* FIX MAE reg. criterion: Use safe_realloc to avoid memory leak

* Release GIL in safe_realloc and clean up scaffolding

* As gil is released in safe_realloc, no need of a with gil block

* Use except * to propagate error in all cdef functions

* Don't use except * for functions that return python objects

* Don't use except * for the comparison function passed to qsort

* Omissions and Errors

* Use safe_realloc now that gil is released there

* Fix realloc size

* Acquire GIL only if we need to raise

* Use except * more judiciously; Release gil only when raising; Add comments to clarify

* Actually that was unneeded; realloc will also allocate for the first time

* StackRecord*, PriorityHeapRecord* to fused type realloc_ptr; Use safe_realloc

* Use except -1 to propagate exceptions. This should avoid overheads

* Fix docstrings and add return 0 to reset methods

* TYPO

* REVIEW Remove redundant MemoryError raising calls
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment