Memory usage increasing with df.dropna(inplace=True) and df.head() #11050

markns · 2015-09-10T12:15:57Z

There appears to be a memory leak in the DataFrame.dropna(inplace=True) function. Please see the ipython session:

[1]:
import ipython_memory_usage.ipython_memory_usage as imu
imu.start_watching_memory()

import pandas as pd
import numpy as np
pd.__version__

Out[1]: '0.16.2'
In [1] used 23.6875 MiB RAM in 0.70s, peaked 0.00 MiB above current, total RAM usage 59.76 MiB

In [2]:
df = pd.DataFrame(np.ones(1e7))
df.info(memory_usage=True)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10000000 entries, 0 to 9999999
Data columns (total 1 columns):
0    float64
dtypes: float64(1)
memory usage: 152.6 MB
In [2] used 153.3477 MiB RAM in 0.14s, peaked 0.00 MiB above current, total RAM usage 213.11 MiB

In [3]:
df.dropna(inplace=True)
df.head(2)

Out[3]:
0
0   1
1   1
In [3] used 0.1758 MiB RAM in 0.60s, peaked 385.92 MiB above current, total RAM usage 213.29 MiB

In [4]:
df.dropna(inplace=True)
df.head(2)

Out[4]:
0
0   1
1   1
In [4] used 152.9375 MiB RAM in 0.61s, peaked 182.86 MiB above current, total RAM usage 366.22 MiB

In [5]:
df.dropna(inplace=True)
df.head(2)

Out[5]:
0
0   1
1   1
In [5] used 152.9297 MiB RAM in 0.58s, peaked 272.79 MiB above current, total RAM usage 519.15 MiB

Continuing to run cells 3,4 and 5 will add around 150MB to the memory usage each time. The behaviour is only materialised when the df.head() command is run after the dropna inplace.

The behaviour can also be seen in a cell which is not dropping na in place:

df = df.dropna()
df.head(2)

but this only happens if the df.dropna(inplace=True) has been run within the same cell already.

The text was updated successfully, but these errors were encountered:

jreback · 2015-09-10T12:24:04Z

don't think so. you need to garbage collect.

further, don't be fooled by the inplace=True, it doesn't actually do anything but assign the reference internally. it doesn't save anything.

In [13]: def f(df):
   ....:     x = df.copy()
   ....:     x = x.dropna()
   ....:     x.head()
   ....:     

In [14]: %memit f(df)
peak memory: 631.70 MiB, increment: 0.02 MiB

In [15]: %memit f(df)
peak memory: 631.70 MiB, increment: 0.00 MiB

In [16]: %memit f(df)
peak memory: 631.70 MiB, increment: 0.00 MiB

In [17]: %memit f(df)
peak memory: 631.70 MiB, increment: 0.00 MiB

In [8]: def g(df):
   ...:     x = df.copy()
   ...:     x.dropna(inplace=True)
   ...:     x.head()
   ...:     

In [9]: %memit g(df)
peak memory: 631.64 MiB, increment: 76.30 MiB

In [10]: %memit g(df)
peak memory: 631.65 MiB, increment: 0.00 MiB

In [11]: %memit g(df)
peak memory: 631.65 MiB, increment: 0.00 MiB

In [12]: %memit g(df)
peak memory: 631.67 MiB, increment: 0.00 MiB

K11K11 · 2015-10-08T19:23:35Z

Looks like dropna() is quite memory inefficient for my case too.

jreback closed this as completed Sep 10, 2015

jreback added the Performance Memory or execution speed performance label Sep 10, 2015

ianozsvald mentioned this issue Mar 6, 2018

Upload to PyPI ianozsvald/ipython_memory_usage#4

Closed

ianozsvald mentioned this issue May 14, 2021

Collect pandas (and other) weird memory usage cases ianozsvald/ipython_memory_usage#30

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage increasing with df.dropna(inplace=True) and df.head() #11050

Memory usage increasing with df.dropna(inplace=True) and df.head() #11050

markns commented Sep 10, 2015

jreback commented Sep 10, 2015

K11K11 commented Oct 8, 2015

Memory usage increasing with df.dropna(inplace=True) and df.head() #11050

Memory usage increasing with df.dropna(inplace=True) and df.head() #11050

Comments

markns commented Sep 10, 2015

jreback commented Sep 10, 2015

K11K11 commented Oct 8, 2015