Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with pandas plot #9003

Closed
alexisglr opened this issue Dec 4, 2014 · 9 comments · Fixed by #9814
Closed

Memory leak with pandas plot #9003

alexisglr opened this issue Dec 4, 2014 · 9 comments · Fixed by #9814
Labels
Performance Memory or execution speed performance Visualization plotting
Milestone

Comments

@alexisglr
Copy link

SO : http://stackoverflow.com/questions/27295220/matplotlib-simple-case-memory-leak-with-pandas

Seems to be a leak memory bug with pandas.plot. It works well with plt.plot(df.index, df.test) (see below)

Simple case here :

import sys
import gc
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import pandas as pd

pdindex = pd.date_range(start='01/01/2013', freq='15min', end='01/01/2019')
df = pd.DataFrame({'test':np.random.normal(0,1,len(pdindex))}, index=pdindex)


def memplot_plot(df, i):
    df.test.plot()    
    plt.title('graph' + str(i))
    plt.savefig(str(i) + '.png', dpi=144)
    plt.close() 

for i in range(1, 100):
    print '*******************************'
    print 'i : ' + str(i)    
    print  len( gc.get_objects())
    print sys.getsizeof(gc.get_objects())
    memplot_plot(df, i)    
    gc.collect()

And the output is (memory error as of i=6):

*******************************
i : 1
74682
325680
*******************************
i : 2
290627
1190248
*******************************
i : 3
506420
2145012
*******************************
i : 4
721993
3054204
*******************************
i : 5
937566
3865524
*******************************
i : 6
1153139
4892352
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\site- packages\spyderlib\widgets\externalshell\sitecustomize.py",    line 580, in runfile
    execfile(filename, namespace)
  File "C:/PERSO/script_backtesting.py", line 124, in <module>
    memplot_plot(df, i)    
  File "C:/PERSO/script_backtesting.py", line 107, in memplot_plot
    plt.savefig(str(i) + '.png', dpi=144)
  File "C:\Anaconda\lib\site-packages\matplotlib\pyplot.py", line 576, in savefig
    res = fig.savefig(*args, **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\figure.py", line 1470, in savefig
    self.canvas.print_figure(*args, **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\backend_bases.py", line 2192, in     print_figure
    **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\backends\backend_agg.py", line 513, in  print_png
    FigureCanvasAgg.draw(self)
  File "C:\Anaconda\lib\site-packages\matplotlib\backends\backend_agg.py", line 461, in     draw
    self.figure.draw(self.renderer)
  File "C:\Anaconda\lib\site-packages\matplotlib\artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\figure.py", line 1079, in draw
    func(*args)
  File "C:\Anaconda\lib\site-packages\matplotlib\artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\axes\_base.py", line 2092, in draw
    a.draw(renderer)
  File "C:\Anaconda\lib\site-packages\matplotlib\artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Anaconda\lib\site-packages\matplotlib\axis.py", line 1103, in draw
    ticks_to_draw = self._update_ticks(renderer)
  File "C:\Anaconda\lib\site-packages\matplotlib\axis.py", line 957, in _update_ticks
    tick_tups = [t for t in self.iter_ticks()]
  File "C:\Anaconda\lib\site-packages\matplotlib\axis.py", line 903, in iter_ticks
    self.major.formatter.set_locs(majorLocs)
  File "C:\Anaconda\lib\site-packages\pandas\tseries\converter.py", line 982, in  set_locs
    self._set_default_format(vmin, vmax)
  File "C:\Anaconda\lib\site-packages\pandas\tseries\converter.py", line 966, in  _set_default_format
    format = np.compress(info['maj'], info)
  File "C:\Anaconda\lib\site-packages\numpy\core\fromnumeric.py", line 1563, in  compress
   return compress(condition, axis, out)
MemoryError

Graph of the memory for the computer from launch the script to breack and to kill the console.
enter image description here

@jreback
Copy link
Contributor

jreback commented Dec 4, 2014

pls pd.show_versions()

@jreback jreback added the Visualization plotting label Dec 4, 2014
@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

did you try this just plotting using matplotlib (taking the same data)?

@alexisglr
Copy link
Author

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 32
OS: Windows
OS-release: Vista
machine: x86
processor: x86 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr_FR

pandas: 0.15.0
nose: 1.1.2
Cython: 0.21
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 1.5
pytz: 2014.7
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

For your second question, with plt.plot(df.index, df.test), it works without memory leak but it can show axis as datetime like for df.test.plot().

@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

@TomAugspurger can you have a look?

@jreback jreback added the Performance Memory or execution speed performance label Dec 5, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 5, 2014
@TomAugspurger
Copy link
Contributor

pandas

*******************************
i : 1
60742
514568
*******************************
i : 2
276594
2380488
*******************************
i : 3
492376
4290016
*******************************
i : 4
707938
6108400
*******************************
i : 5
923500
7731040
*******************************
i : 6
1139062
9784696
*******************************
i : 7
1354624
11007832
*******************************
i : 8
1570186
13931880

matplotlib:

*******************************
i : 1
1783771
15673408
*******************************
i : 2
1786092
15673408
*******************************
i : 3
1786095
15673408
*******************************
i : 4
1786095
15673408
*******************************
i : 5
1786095
15673408
*******************************
i : 6
1786095
15673408
*******************************
i : 7
1786095
15673408
*******************************
i : 8
1786095

Might be something here. I've never debugged memory usage in python. This might be useful.

@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

this works nicely in IPython: https://pypi.python.org/pypi/memory_profiler

@qwhelan
Copy link
Contributor

qwhelan commented Jan 19, 2015

I think I've identified the leak - the plotted data is being passed between a few functions via ax._plot_data. Some tests are still failing, but removing the use of that attribute seems to fix the leak. The before numbers (0 is before the first df.plot()):

    gc.get_objects  sys.getsizeof
0            47526         406504
1           262999        2115960
2           478926        4290024
3           694633        6108408
4           910340        7731048
5          1126047        9784704
6          1341754       11007840
7          1557461       13931888
8          1773168       15673416
9          1988875       17632640
10         2204582       19836768

And after:

    gc.get_objects  sys.getsizeof
0            47526         406504
1            47340         406504
2            47516         406504
3            47516         406504
4            47516         406504
5            47516         406504
6            47516         406504
7            47516         406504
8            47516         406504
9            47516         406504
10           47516         406504

@qwhelan
Copy link
Contributor

qwhelan commented Jan 19, 2015

The actual cycle is Axes._plot_data -> MPLPlot non-classmethod function -> MPLPlot.axes -> Axes

@qwhelan
Copy link
Contributor

qwhelan commented Jan 20, 2015

@alexis0587 @TomAugspurger Could you guys try out my patch in #9307 when you get a chance?

@jreback jreback modified the milestones: 0.16.0, Next Major Release, 0.16.1 Mar 6, 2015
@jreback jreback modified the milestones: 0.17.0, 0.16.1 Apr 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Visualization plotting
Projects
None yet
4 participants