New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python crashes when executing memory_usage(deep=True) on a sparse series #19368

Closed
quale1 opened this Issue Jan 24, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@quale1

quale1 commented Jan 24, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd

s = pd.Series([None])
s.to_sparse().memory_usage(deep=True)

# crashes - Kernel died, restarting

Problem description

Executing the memory_usage(deep=True) method on a sparse series crashes Python. (With deep=False the method works as expected.)

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 24, 2018

sparse is not fully test covered. a pull request to fix is welcome!

@jreback jreback added this to the Next Major Release milestone Jan 24, 2018

@hexgnu hexgnu referenced this issue Jan 29, 2018

Closed

BUG: don't assume series is length > 0 #19438

4 of 4 tasks complete
@hexgnu

This comment has been minimized.

Contributor

hexgnu commented Jan 29, 2018

I hooked up gdb and tracked down the issue to inside of the lib.pyx which assumes that series's are of length > 0. I made a PR that should fix it, though we'll see how the tests chooch.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Feb 6, 2018

@jreback jreback closed this in a01f74c Feb 6, 2018

harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018

BUG: don't assume series is length > 0
closes pandas-dev#19368

Author: Matthew Kirk <matt@matthewkirk.com>

Closes pandas-dev#19438 from hexgnu/segfault_memory_usage and squashes the following commits:

f9433d8 [Matthew Kirk] Use shared docstring and get rid of if condition
4ead141 [Matthew Kirk] Move whatsnew doc to Sparse
ae9f74d [Matthew Kirk] Revert base.py
cdd4141 [Matthew Kirk] Fix linting error
93a0c3d [Matthew Kirk] Merge remote-tracking branch 'upstream/master' into segfault_memory_usage
207bc74 [Matthew Kirk] Define memory_usage on SparseArray
21ae147 [Matthew Kirk] FIX: revert change to lib.pyx
3f52a44 [Matthew Kirk] Ah ha I think I got it
5e59e9c [Matthew Kirk] Use range over 0 <= for loops
e251587 [Matthew Kirk] Fix failing test with indexing
27df317 [Matthew Kirk] Merge remote-tracking branch 'upstream/master' into segfault_memory_usage
7fdd03e [Matthew Kirk] Take out comment and use product
6bd6ddd [Matthew Kirk] BUG: don't assume series is length > 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment