Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.unique doesn't actually return unique elements of SparseArray #19595

Closed
hexgnu opened this issue Feb 8, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@hexgnu
Copy link
Contributor

commented Feb 8, 2018

Code Sample, a copy-pastable example if possible

pd.unique(pd.SparseArray([0,1,2,3], fill_value=3)) #=> array([0,1,2])

pd.unique(pd.Series([0,1,2,3])) #=> array([0,1,2,3])

Problem description

So I was digging into #5078 and stumbled across this problem.

Expected Output

I would expect the two to be the same.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 36f9052
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.16-202.fc26.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+253.g36f905285
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@hexgnu

This comment has been minimized.

Copy link
Contributor Author

commented Feb 8, 2018

Another smell of sorts is that {{dtype}}HashTable.uniques and {{dtype}}HashTable.get_labels are really similar. It seems that the code could be dryed up a bit... But I don't want to get too caught in the weeds ;)

@jreback jreback added this to the Next Major Release milestone Feb 10, 2018

@jreback

This comment has been minimized.

Copy link
Contributor

commented Feb 10, 2018

yeah this is pretty trivial to work on Sparse (as you don't need to materialize), would take a PR!

@jreback jreback added the Bug label Feb 10, 2018

@hexgnu

This comment has been minimized.

Copy link
Contributor Author

commented Feb 12, 2018

Got one already in the works thanks @jreback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.