New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with %matplotlib inline #7270

Open
evyasonov opened this Issue Dec 19, 2014 · 23 comments

Comments

Projects
None yet
@evyasonov

evyasonov commented Dec 19, 2014

Hey everyone

I've found a problem. Just launch the code and look at the memory. Then delete "%matplotlib inline" and launch again.

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150

OUTPUT_FILENAME = "Asd"

def printHTML(html):
    with open(OUTPUT_FILENAME, "a") as outputFile: outputFile.write(html if type(html) == str else html.encode('utf8') )

def friendlyPlot():

    figure = plt.Figure()
    ax = plt.subplot2grid((1,2), (0,0))

    ax.plot( range(1000), range(1000) )


    #plt.show() 
    fig = plt.gcf()

    imgdata = StringIO.StringIO()
    fig.savefig(imgdata, format='png')
    imgdata.seek(0)  # rewind the data
    image = imgdata.buf.encode('base64').replace('\n', '')
    printHTML('<img src="data:image/png;base64,{0}" /><br />'.format(image))
    plt.close('all')
    imgdata.close()

open(OUTPUT_FILENAME, 'w').close()

for i in range(500):
    friendlyPlot()

@evyasonov evyasonov changed the title from Memory leak to Memory leak with %matplotlib inline Dec 19, 2014

@ellisonbg ellisonbg added this to the 4.0 milestone Jan 11, 2015

@minrk minrk modified the milestones: 4.1, 4.0 Jul 11, 2015

@denfromufa

This comment has been minimized.

Show comment
Hide comment
@denfromufa

denfromufa Jul 30, 2015

I hit this bug as well, is there any way to get inline plots without memory leaks? I do not want to launch separate processes for each plot, since the arrays are quite large.

denfromufa commented Jul 30, 2015

I hit this bug as well, is there any way to get inline plots without memory leaks? I do not want to launch separate processes for each plot, since the arrays are quite large.

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Jul 30, 2015

Member

Can you check this when memory usage increases:

len(IPython.kernel.zmq.pylab.backend_inline.show._to_draw)

That's a list where figures are being stored. They should be there only temporarily, but maybe they're building up without getting cleared.

Member

takluyver commented Jul 30, 2015

Can you check this when memory usage increases:

len(IPython.kernel.zmq.pylab.backend_inline.show._to_draw)

That's a list where figures are being stored. They should be there only temporarily, but maybe they're building up without getting cleared.

@denfromufa

This comment has been minimized.

Show comment
Hide comment
@denfromufa

denfromufa Jul 30, 2015

len(IPython.kernel.zmq.pylab.backend_inline.show._to_draw)=0

BTW, I'm plotting using .plot() method on pandas dataframes.

denfromufa commented Jul 30, 2015

len(IPython.kernel.zmq.pylab.backend_inline.show._to_draw)=0

BTW, I'm plotting using .plot() method on pandas dataframes.

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Jul 30, 2015

Member

OK, so much for that theory.

It's possible pandas keeps some data around plots internally as well. The original report doesn't involve pandas, though.

How much memory does each additional plot appear to add?

Member

takluyver commented Jul 30, 2015

OK, so much for that theory.

It's possible pandas keeps some data around plots internally as well. The original report doesn't involve pandas, though.

How much memory does each additional plot appear to add?

@denfromufa

This comment has been minimized.

Show comment
Hide comment
@denfromufa

denfromufa Jul 30, 2015

ok, this seems to be my case, I was using pandas 0.16.0, but the issue is fixed in master:

pandas-dev/pandas#9814

denfromufa commented Jul 30, 2015

ok, this seems to be my case, I was using pandas 0.16.0, but the issue is fixed in master:

pandas-dev/pandas#9814

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Jul 30, 2015

Member

Great, thanks. Leaving open since the original report didn't involve pandas.

Member

takluyver commented Jul 30, 2015

Great, thanks. Leaving open since the original report didn't involve pandas.

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Sep 12, 2015

Contributor

This can be reproduced more simply:

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150



def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')


for i in range(500):
    friendlyPlot()

This does not leak memory so it is something on the IPython side not the pyplot side (I think).

import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import matplotlib.ticker



import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150



def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')


for i in range(500):
    friendlyPlot()
Contributor

tacaswell commented Sep 12, 2015

This can be reproduced more simply:

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150



def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')


for i in range(500):
    friendlyPlot()

This does not leak memory so it is something on the IPython side not the pyplot side (I think).

import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import matplotlib.ticker



import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150



def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')


for i in range(500):
    friendlyPlot()
@asteppke

This comment has been minimized.

Show comment
Hide comment
@asteppke

asteppke Oct 16, 2015

@tacaswell With your test code IPython on Windows 7 consumes here approximately 1.7GB which are not freed afterwards. Running with a slightly higher number of iterations leads to a memory error. So this is still an issue.

asteppke commented Oct 16, 2015

@tacaswell With your test code IPython on Windows 7 consumes here approximately 1.7GB which are not freed afterwards. Running with a slightly higher number of iterations leads to a memory error. So this is still an issue.

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Oct 16, 2015

Contributor

@asteppke The first or second block?

Contributor

tacaswell commented Oct 16, 2015

@asteppke The first or second block?

@asteppke

This comment has been minimized.

Show comment
Hide comment
@asteppke

asteppke Oct 16, 2015

@tacaswell With your first test code (%matplotlib inline) memory consumption goes up to 1.7GB. In contrast when using the second piece (matplotlib.use('agg')) memory usage oscillates only between 50MB and 100MB.

Both tests are executed with Python 3.4 and IPython notebook version 4.0.5.

asteppke commented Oct 16, 2015

@tacaswell With your first test code (%matplotlib inline) memory consumption goes up to 1.7GB. In contrast when using the second piece (matplotlib.use('agg')) memory usage oscillates only between 50MB and 100MB.

Both tests are executed with Python 3.4 and IPython notebook version 4.0.5.

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Jan 21, 2016

Member

I've played with this a bit more. I notice that if I re-run the for loop in @tacaswell's example a few times, memory usage doesn't increase - it seems to be the number you create in a single cell that matters. IPython certainly keeps a list of all the figures generated in the cell for the inline backend, but that list is quite definitely being cleared after the cell runs, which doesn't make memory usage drop, even after doing gc.collect().

Could our code be interacting badly with something in matplotlib? I thought _pylab_helpers.Gcf looked likely, but it doesn't seem to be holding on to anything.

I tried grabbing a reference to one of the figures and calling gc.get_referrers() on it; apart from the reference I had in user_ns, all the others looked like mpl objects - presumably many of them are in reference loops. What object is it most likely something else would be inappropriately keeping a reference to?

Member

takluyver commented Jan 21, 2016

I've played with this a bit more. I notice that if I re-run the for loop in @tacaswell's example a few times, memory usage doesn't increase - it seems to be the number you create in a single cell that matters. IPython certainly keeps a list of all the figures generated in the cell for the inline backend, but that list is quite definitely being cleared after the cell runs, which doesn't make memory usage drop, even after doing gc.collect().

Could our code be interacting badly with something in matplotlib? I thought _pylab_helpers.Gcf looked likely, but it doesn't seem to be holding on to anything.

I tried grabbing a reference to one of the figures and calling gc.get_referrers() on it; apart from the reference I had in user_ns, all the others looked like mpl objects - presumably many of them are in reference loops. What object is it most likely something else would be inappropriately keeping a reference to?

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Jan 25, 2016

Member

I'm dropping this to milestone 'wishlist'. We want to fix it, but at the moment we're not sure how to make further progress in identifying the bug, and I don't think it's worth holding up releases for it.

Anyone who can make progress gets brownie points. Also cake.

Member

takluyver commented Jan 25, 2016

I'm dropping this to milestone 'wishlist'. We want to fix it, but at the moment we're not sure how to make further progress in identifying the bug, and I don't think it's worth holding up releases for it.

Anyone who can make progress gets brownie points. Also cake.

@takluyver takluyver modified the milestones: 4.1, wishlist Jan 25, 2016

@lucasb-eyer

This comment has been minimized.

Show comment
Hide comment
@lucasb-eyer

lucasb-eyer Mar 7, 2016

Not really progress, but the memory seems to be lost somewhere inside the kernel. Neither does calling gc.collect() after or inside the loop help, and summary.print_(summary.summarize(muppy.get_objects())) doesn't find any of the leaked memory. Neither does setting all _N and _iN to None help. It's really mysterious.

lucasb-eyer commented Mar 7, 2016

Not really progress, but the memory seems to be lost somewhere inside the kernel. Neither does calling gc.collect() after or inside the loop help, and summary.print_(summary.summarize(muppy.get_objects())) doesn't find any of the leaked memory. Neither does setting all _N and _iN to None help. It's really mysterious.

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Mar 7, 2016

Member

I also wondered if it was creating uncollectable objects, but those should end up in gc.garbage when there are no other references to them, and that's still empty when I see it using up loads of RAM.

I think someone who knows about these things is going to have to use C-level tools to track down what memory is not getting freed. There's no evidence of extra Python objects being kept alive anywhere we can find.

Member

takluyver commented Mar 7, 2016

I also wondered if it was creating uncollectable objects, but those should end up in gc.garbage when there are no other references to them, and that's still empty when I see it using up loads of RAM.

I think someone who knows about these things is going to have to use C-level tools to track down what memory is not getting freed. There's no evidence of extra Python objects being kept alive anywhere we can find.

@thenomemac

This comment has been minimized.

Show comment
Hide comment
@thenomemac

thenomemac Aug 5, 2016

I'll second that a fix on this issue would be appreciated.

thenomemac commented Aug 5, 2016

I'll second that a fix on this issue would be appreciated.

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Aug 5, 2016

Member

We know, but at present no-one has worked out the cause of the bug.

Member

takluyver commented Aug 5, 2016

We know, but at present no-one has worked out the cause of the bug.

@daidoji

This comment has been minimized.

Show comment
Hide comment
@daidoji

daidoji commented Oct 24, 2016

+1

@akapocsi

This comment has been minimized.

Show comment
Hide comment
@akapocsi

akapocsi commented Nov 15, 2016

+1

@denfromufa

This comment has been minimized.

Show comment
Hide comment
@denfromufa

denfromufa Nov 15, 2016

BTW, I'm still hitting this issue from time to time on latest matplotlib, pandas, jupyter, ipython. If anyone knows any debugger that can help to troubleshoot this multi-process communication, then please let me know.

denfromufa commented Nov 15, 2016

BTW, I'm still hitting this issue from time to time on latest matplotlib, pandas, jupyter, ipython. If anyone knows any debugger that can help to troubleshoot this multi-process communication, then please let me know.

@akapocsi

This comment has been minimized.

Show comment
Hide comment
@akapocsi

akapocsi Nov 15, 2016

Could it perhaps have anything to do with the browser cache mechanism?

akapocsi commented Nov 15, 2016

Could it perhaps have anything to do with the browser cache mechanism?

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Nov 15, 2016

Member

Good thought, but I don't think so. It's IPython's process taking up extra memory, not the browser, and
@tacaswell's reproduction doesn't involve sending plots to the browser.

Member

takluyver commented Nov 15, 2016

Good thought, but I don't think so. It's IPython's process taking up extra memory, not the browser, and
@tacaswell's reproduction doesn't involve sending plots to the browser.

@lucasb-eyer

This comment has been minimized.

Show comment
Hide comment
@lucasb-eyer

lucasb-eyer Jan 4, 2018

Hi, I believe I have found part of the culprit and a way to significantly, but not completely, reduce this problem!

After scrolling through the ipykernel/pylab/backend_inline.py code, I got the hunch that interactive mode does a lot of storing of "plot-things", though I don't understand it completely, so I am not able to pinpoint the exact reason with certainty.

Here is the code to verify this (based on @tacaswell's snippet above), useful for anyone trying to implement a fix.

Initialization:

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

matplotlib.rcParams['figure.figsize'] = (24, 6)
matplotlib.rcParams['figure.dpi'] = 150

from resource import getrusage
from resource import RUSAGE_SELF

def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')

Actual test:

print("before any:  {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
friendlyPlot()
print("before loop: {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
for i in range(50):
    friendlyPlot()
print("after loop:  {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
import gc ; gc.collect(2)
print("after gc:    {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))

Running it for 50 iterations of the loop, I get:

before any:    87708 kB
before loop:  106772 kB
after loop:   786668 kB
after gc:     786668 kB

Running it for 200 iterations of the loop, I get:

before any:    87708 kB
before loop:  100492 kB
after loop:  2824316 kB
after gc:    2824540 kB

which shows the almost linear increase in memory with iterations.

Now to the fix/workaround: call matplotlib.interactive(False) before the test-snippet, and then run it.

With 50 iterations:

before any:    87048 kB
before loop:  104992 kB
after loop:   241604 kB
after gc:     241604 kB

And with 200 iterations:

before any:    87536 kB
before loop:  103104 kB
after loop:   239276 kB
after gc:     239276 kB

Which confirms that only a constant increase (independent of iterations) is left.

Using these numbers, I make a rough estimate of the leak size per iteration:

(786668-(241604 - 104992))/50   = 13001.12
(2824316-(241604 - 104992))/200 = 13438.52

And for a single iteration of the loop, I get 13560. So the amount of leak per iteration is significantly smaller than the image size, be it raw (>3MB) or png-compressed (54KB).

Also, strangely, running a small-scale test (only few iterations) repeatedly in the same cell without restarting the kernel is much less consistent, I have not been able to understand this or determine a pattern.

I hope someone with more knowledge of the internals can take it from here, as I lack the time and knowledge to dive deeper into it right now.

lucasb-eyer commented Jan 4, 2018

Hi, I believe I have found part of the culprit and a way to significantly, but not completely, reduce this problem!

After scrolling through the ipykernel/pylab/backend_inline.py code, I got the hunch that interactive mode does a lot of storing of "plot-things", though I don't understand it completely, so I am not able to pinpoint the exact reason with certainty.

Here is the code to verify this (based on @tacaswell's snippet above), useful for anyone trying to implement a fix.

Initialization:

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

matplotlib.rcParams['figure.figsize'] = (24, 6)
matplotlib.rcParams['figure.dpi'] = 150

from resource import getrusage
from resource import RUSAGE_SELF

def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')

Actual test:

print("before any:  {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
friendlyPlot()
print("before loop: {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
for i in range(50):
    friendlyPlot()
print("after loop:  {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
import gc ; gc.collect(2)
print("after gc:    {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))

Running it for 50 iterations of the loop, I get:

before any:    87708 kB
before loop:  106772 kB
after loop:   786668 kB
after gc:     786668 kB

Running it for 200 iterations of the loop, I get:

before any:    87708 kB
before loop:  100492 kB
after loop:  2824316 kB
after gc:    2824540 kB

which shows the almost linear increase in memory with iterations.

Now to the fix/workaround: call matplotlib.interactive(False) before the test-snippet, and then run it.

With 50 iterations:

before any:    87048 kB
before loop:  104992 kB
after loop:   241604 kB
after gc:     241604 kB

And with 200 iterations:

before any:    87536 kB
before loop:  103104 kB
after loop:   239276 kB
after gc:     239276 kB

Which confirms that only a constant increase (independent of iterations) is left.

Using these numbers, I make a rough estimate of the leak size per iteration:

(786668-(241604 - 104992))/50   = 13001.12
(2824316-(241604 - 104992))/200 = 13438.52

And for a single iteration of the loop, I get 13560. So the amount of leak per iteration is significantly smaller than the image size, be it raw (>3MB) or png-compressed (54KB).

Also, strangely, running a small-scale test (only few iterations) repeatedly in the same cell without restarting the kernel is much less consistent, I have not been able to understand this or determine a pattern.

I hope someone with more knowledge of the internals can take it from here, as I lack the time and knowledge to dive deeper into it right now.

@fedral

This comment has been minimized.

Show comment
Hide comment
@fedral

fedral Aug 18, 2018

it works

fedral commented Aug 18, 2018

it works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment