Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with .rolling().max() in pandas 0.24.2 #25893

Closed
labodyn opened this issue Mar 27, 2019 · 8 comments · Fixed by #25926
Closed

Memory leak with .rolling().max() in pandas 0.24.2 #25893

labodyn opened this issue Mar 27, 2019 · 8 comments · Fixed by #25926
Labels
Bug Window rolling, ewma, expanding
Milestone

Comments

@labodyn
Copy link

labodyn commented Mar 27, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.read_csv('file.csv', index_col=0, parse_dates=True)
while True:
    df['close'].rolling(4000).max()

Problem description

Memory leak which shuts down my application. This occurs in pandas 0.24.2 but not in pandas 0.23.4. My 16 GB memory gets filled in a few hours of running this code. file.csv is attached in zip, the memory leak might only occur on certain data.
file.csv.zip

Expected Output

No memory leaks.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-46-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: None
pip: 19.0.3
setuptools: 38.4.0
Cython: None
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: 4.7.1
html5lib: None
sqlalchemy: 1.2.14
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2019

Can you profile and try to isolate the issue?

@WillAyd WillAyd added the Needs Info Clarification about behavior needed to assess issue label Mar 27, 2019
@labodyn
Copy link
Author

labodyn commented Mar 27, 2019

import os
import psutil
import numpy as np
import pandas as pd

process = psutil.Process(os.getpid())
i = 0

def sink():
    global i
    i += 1
    if i % 100 == 0:
        mem = process.memory_info().rss / 1e6
        print("mem %fMB" % mem)

while True:
    pd.Series(data=np.random.rand(5000)).rolling(4000).max()
    sink()

I was wrong, it has nothing to do with the dataset I provided. This is a pretty huge bug in pandas imo.

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2019

Thanks for the code sample. Investigation and PRs are always welcome!

@WillAyd WillAyd added Bug Window rolling, ewma, expanding and removed Needs Info Clarification about behavior needed to assess issue labels Mar 27, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Mar 27, 2019
@labodyn
Copy link
Author

labodyn commented Mar 27, 2019

The bug does not occur with rolling().mean(), but it does with rolling().min().

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2019

It does look like the min/max implementation is the only window func with calls to malloc/free:

ring = <numeric *>malloc(win * sizeof(numeric))

Per Cython docs it may be preferable to use the C-API functions for better memory management and reporting back to the Python layer:

https://cython.readthedocs.io/en/latest/src/tutorial/memory_allocation.html#memory-allocation

Would take a PR trying that or other ideas for sure

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2019

May also be that the GIL is released when free is called in current implementation

@ArtificialQualia
Copy link
Contributor

After some testing, it doesn't seem to have anything to do with the GIL or using the C-API vs direct malloc/free calls.

It looks like this bug was introduced when separating variable/fixed into separate functions. For some reason, passing some of the cdef variables to separate functions (likely starti and endi in this case) are what cause the memory leak.

A simple fix is to just combine the functions into one large function again, but it may be better to work out an alternative that doesn't cause the leak, like using pointers or perhaps a memory view.

I can take a shot at a proper fix, but it may be a couple of days until I have a PR.

@WillAyd
Copy link
Member

WillAyd commented Mar 29, 2019

Sounds good thanks for investigating @ArtificialQualia !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants