Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: MemoryError: Unable to allocate #39629

Closed
1 of 3 tasks
impredicative opened this issue Feb 7, 2021 · 4 comments
Closed
1 of 3 tasks

BUG: MemoryError: Unable to allocate #39629

impredicative opened this issue Feb 7, 2021 · 4 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@impredicative
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

Unfortunately this is not available since the errors occur exclusively over proprietary datasets which are very complex and large. It is not feasible to distill an example.

Problem description

I am seeing MemoryError exceptions all over the place on somewhat random lines. I am measuring the memory usage using psutil to ascertain that there is however vast amounts of free memory on the node. These exceptions are making Pandas completely unusable for me. It's struggling with allocating 22 MiB when there is over 2 TiB of free memory available.

"  File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/frame.py"", line 7950, in merge"
    return merge(
"  File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/reshape/merge.py"", line 74, in merge"
    op = _MergeOperation(
"  File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/reshape/merge.py"", line 652, in __init__"
    ) = self._get_merge_keys()
"  File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/reshape/merge.py"", line 1063, in _get_merge_keys"
    self.right = self.right._drop_labels_or_levels(right_drop)
"  File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/generic.py"", line 1637, in _drop_labels_or_levels"
    dropped = self.copy()
"  File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/generic.py"", line 5665, in copy"
    data = self._mgr.copy(deep=deep)
"  File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/internals/managers.py"", line 811, in copy"
"    res = self.apply(""copy"", deep=deep)"
"  File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/internals/managers.py"", line 409, in apply"
"    applied = getattr(b, f)(**kwargs)"
"  File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/internals/blocks.py"", line 679, in copy"
    values = values.copy()
"MemoryError: Unable to allocate 22.1 MiB for an array with shape (1, 2896551) and data type object"

Expected Output

These exceptions are not supposed to happen given there is sufficient free memory. There are many prior issues noting this exception, which is suggestive of severe continued bugs in Pandas.

Output of pd.show_versions()

This is not entirely available since it's running exclusively in the cloud, but the salient versions are:

python           : 3.8.6.final.0
python-bits      : 64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : en_US.UTF-8
pandas           : 1.2.1
numpy            : 1.19.4
pytz             : 2020.4
dateutil         : 2.8.1
pip              : 20.3.3
setuptools       : 50.3.1.post20201107
Cython           : None
pandas_datareader: None
numexpr          : None
pyarrow          : 3.0.0
@impredicative impredicative added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 7, 2021
@impredicative
Copy link
Author

impredicative commented Feb 7, 2021

If your response is going to be that this issue is unfixable because code is unavailable, then I guess it's the end of the road for me using Pandas. In the latter case, I will also let all other Python developers know about this issue so they then can also avoid Pandas. To reiterate, there have been many instances of this issue reported here on GitHub and also on StackOverflow. It's a significant problem and it won't go away by pretending it doesn't exist.

@jreback
Copy link
Contributor

jreback commented Feb 7, 2021

To reiterate, there have been many instances of this issue reported here on GitHub and also on StackOverflow. It's a significant problem and it won't go away by pretending it doesn't exist.

and these are exactly where?

@impredicative
Copy link
Author

impredicative commented Feb 7, 2021

and these are exactly where?

Issues: #35499, #31355, #29596, #28487

StackOverflow: search results

More importantly, whenever Pandas reports a MemoryError, it should automatically produce an audit report, perhaps similar to how pd.show_versions() produces one for versions, but of its detailed memory usage breakdown, total system memory, free system memory, top 10 memory using processes and their usage, etc. This should help in ascertaining whether the issue is with Pandas or with there actually being insufficient memory.

@jreback jreback added this to the No action milestone Feb 7, 2021
@jreback
Copy link
Contributor

jreback commented Feb 7, 2021

@impredicative a w/o reproducing example this is not a useful report.

@jreback jreback closed this as completed Feb 7, 2021
@pandas-dev pandas-dev locked and limited conversation to collaborators Feb 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants