Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

take_1d yields surprising results when working with SparseArray #19506

Closed
hexgnu opened this issue Feb 2, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@hexgnu
Copy link
Contributor

commented Feb 2, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
import pandas.core.algorithms as algos

algos.take_1d(pd.SparseArray([0,0,1], fill_value=0), [0,1,2]) #=> array([       1, 23618416,       32])

# VS

algos.take_1d(np.array([0,0,1]), [0,1,2]) #=> array([0,0,1])

Problem description

This to me smells like a problem with SparseArray sending over a sparse representation to take_1d in the C code. I would expect these values to be the same.

Expected Output

I would expect them to be the same.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: be09289 python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.13.16-202.fc26.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+205.gbe0928903
pytest: 3.3.1
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.27.3
numpy: 1.13.1
scipy: 0.19.1
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.1.0
sphinx: 1.6.5
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0b10
sqlalchemy: 1.1.15
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.2
fastparquet: 0.1.3
pandas_gbq: None
pandas_datareader: None

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Feb 2, 2018

Those algos are not meant to be used by users, so please don't use them directly. And they assume numpy arrays as input. Although a sparse array is a subclass, if you do np.array(sparse_array) you only get the actual values, without any fills. That is the reason that you see those strange values (they are leftovers of making an 'empty' result array to be filled by the take operation, but only the first item got filled).
There is an issue for that np.array(..) behaviour: #14167

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Feb 2, 2018

Ah, I see now you were actually working on that issue I linked to :-)

@hexgnu

This comment has been minimized.

Copy link
Contributor Author

commented Feb 2, 2018

Yea I had a feeling it was related. That problem is a lot harder than I expected on first take... have a PR that's somewhat done but a lot of weird edge cases.

@hexgnu

This comment has been minimized.

Copy link
Contributor Author

commented Feb 2, 2018

Also I would never use algos as a user. I was just looking for advice on fixing something else related to merging sparse frames.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Feb 2, 2018

Also I would never use algos as a user.

Yes, sure! That's only not always clear from the issue post :)

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2018

Ideally algos.take_1d would dispatch to arr.take for SparseArray input. But we have some places in the library that seem to rely on take_1d(sparse_array) returning an ndarray.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Sep 4, 2018

This is fixed by #22325.

TomAugspurger added a commit that referenced this issue Oct 13, 2018

[API/REF]: SparseArray is an ExtensionArray (#22325)
Makes SparseArray an ExtensionArray.

* Fixed DataFrame.__setitem__ for updating to sparse.

Closes #22367

* Fixed Series[sparse].to_sparse

Closes #22389

Closes #21978
Closes #19506
Closes #22835

brute4s99 added a commit to brute4s99/pandas that referenced this issue Nov 19, 2018

[API/REF]: SparseArray is an ExtensionArray (pandas-dev#22325)
Makes SparseArray an ExtensionArray.

* Fixed DataFrame.__setitem__ for updating to sparse.

Closes pandas-dev#22367

* Fixed Series[sparse].to_sparse

Closes pandas-dev#22389

Closes pandas-dev#21978
Closes pandas-dev#19506
Closes pandas-dev#22835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.