Memory leak with pandas plot #9003

Closed
alexisglr opened this Issue Dec 4, 2014 · 9 comments

Comments

Projects
None yet
4 participants

SO : http://stackoverflow.com/questions/27295220/matplotlib-simple-case-memory-leak-with-pandas

Seems to be a leak memory bug with pandas.plot. It works well with plt.plot(df.index, df.test) (see below)

Simple case here :

import sys
import gc
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import pandas as pd

pdindex = pd.date_range(start='01/01/2013', freq='15min', end='01/01/2019')
df = pd.DataFrame({'test':np.random.normal(0,1,len(pdindex))}, index=pdindex)


def memplot_plot(df, i):
    df.test.plot()    
    plt.title('graph' + str(i))
    plt.savefig(str(i) + '.png', dpi=144)
    plt.close() 

for i in range(1, 100):
    print '*******************************'
    print 'i : ' + str(i)    
    print  len( gc.get_objects())
    print sys.getsizeof(gc.get_objects())
    memplot_plot(df, i)    
    gc.collect()

And the output is (memory error as of i=6):

*******************************
i : 1
74682
325680
*******************************
i : 2
290627
1190248
*******************************
i : 3
506420
2145012
*******************************
i : 4
721993
3054204
*******************************
i : 5
937566
3865524
*******************************
i : 6
1153139
4892352
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\site- packages\spyderlib\widgets\externalshell\sitecustomize.py",    line 580, in runfile
    execfile(filename, namespace)
  File "C:/PERSO/script_backtesting.py", line 124, in <module>
    memplot_plot(df, i)    
  File "C:/PERSO/script_backtesting.py", line 107, in memplot_plot
    plt.savefig(str(i) + '.png', dpi=144)
  File "C:\Anaconda\lib\site-packages\matplotlib\pyplot.py", line 576, in savefig
    res = fig.savefig(*args, **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\figure.py", line 1470, in savefig
    self.canvas.print_figure(*args, **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\backend_bases.py", line 2192, in     print_figure
    **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\backends\backend_agg.py", line 513, in  print_png
    FigureCanvasAgg.draw(self)
  File "C:\Anaconda\lib\site-packages\matplotlib\backends\backend_agg.py", line 461, in     draw
    self.figure.draw(self.renderer)
  File "C:\Anaconda\lib\site-packages\matplotlib\artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\figure.py", line 1079, in draw
    func(*args)
  File "C:\Anaconda\lib\site-packages\matplotlib\artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\axes\_base.py", line 2092, in draw
    a.draw(renderer)
  File "C:\Anaconda\lib\site-packages\matplotlib\artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\axis.py", line 1103, in draw
    ticks_to_draw = self._update_ticks(renderer)
  File "C:\Anaconda\lib\site-packages\matplotlib\axis.py", line 957, in _update_ticks
    tick_tups = [t for t in self.iter_ticks()]
  File "C:\Anaconda\lib\site-packages\matplotlib\axis.py", line 903, in iter_ticks
    self.major.formatter.set_locs(majorLocs)
  File "C:\Anaconda\lib\site-packages\pandas\tseries\converter.py", line 982, in  set_locs
    self._set_default_format(vmin, vmax)
  File "C:\Anaconda\lib\site-packages\pandas\tseries\converter.py", line 966, in  _set_default_format
    format = np.compress(info['maj'], info)
  File "C:\Anaconda\lib\site-packages\numpy\core\fromnumeric.py", line 1563, in  compress
   return compress(condition, axis, out)
MemoryError

Graph of the memory for the computer from launch the script to breack and to kill the console.
enter image description here

Contributor

jreback commented Dec 4, 2014

pls pd.show_versions()

Contributor

jreback commented Dec 5, 2014

did you try this just plotting using matplotlib (taking the same data)?

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 32
OS: Windows
OS-release: Vista
machine: x86
processor: x86 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr_FR

pandas: 0.15.0
nose: 1.1.2
Cython: 0.21
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 1.5
pytz: 2014.7
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

For your second question, with plt.plot(df.index, df.test), it works without memory leak but it can show axis as datetime like for df.test.plot().

Contributor

jreback commented Dec 5, 2014

@TomAugspurger can you have a look?

jreback added the Performance label Dec 5, 2014

jreback added this to the 0.16.0 milestone Dec 5, 2014

Contributor

TomAugspurger commented Dec 5, 2014

pandas

*******************************
i : 1
60742
514568
*******************************
i : 2
276594
2380488
*******************************
i : 3
492376
4290016
*******************************
i : 4
707938
6108400
*******************************
i : 5
923500
7731040
*******************************
i : 6
1139062
9784696
*******************************
i : 7
1354624
11007832
*******************************
i : 8
1570186
13931880

matplotlib:

*******************************
i : 1
1783771
15673408
*******************************
i : 2
1786092
15673408
*******************************
i : 3
1786095
15673408
*******************************
i : 4
1786095
15673408
*******************************
i : 5
1786095
15673408
*******************************
i : 6
1786095
15673408
*******************************
i : 7
1786095
15673408
*******************************
i : 8
1786095

Might be something here. I've never debugged memory usage in python. This might be useful.

Contributor

jreback commented Dec 5, 2014

this works nicely in IPython: https://pypi.python.org/pypi/memory_profiler

alexisglr referenced this issue in matplotlib/matplotlib Dec 15, 2014

Closed

Matplotlib simple case memory leak #3892

Contributor

qwhelan commented Jan 19, 2015

I think I've identified the leak - the plotted data is being passed between a few functions via ax._plot_data. Some tests are still failing, but removing the use of that attribute seems to fix the leak. The before numbers (0 is before the first df.plot()):

    gc.get_objects  sys.getsizeof
0            47526         406504
1           262999        2115960
2           478926        4290024
3           694633        6108408
4           910340        7731048
5          1126047        9784704
6          1341754       11007840
7          1557461       13931888
8          1773168       15673416
9          1988875       17632640
10         2204582       19836768

And after:

    gc.get_objects  sys.getsizeof
0            47526         406504
1            47340         406504
2            47516         406504
3            47516         406504
4            47516         406504
5            47516         406504
6            47516         406504
7            47516         406504
8            47516         406504
9            47516         406504
10           47516         406504
Contributor

qwhelan commented Jan 19, 2015

The actual cycle is Axes._plot_data -> MPLPlot non-classmethod function -> MPLPlot.axes -> Axes

Contributor

qwhelan commented Jan 20, 2015

@alexis0587 @TomAugspurger Could you guys try out my patch in #9307 when you get a chance?

@jreback jreback modified the milestone: 0.16.0, Next Major Release, 0.16.1 Mar 6, 2015

@jreback jreback modified the milestone: 0.17.0, 0.16.1 Apr 5, 2015

sinhrks closed this in #9814 Jul 29, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment