Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Index slice an all-zero float sparse series by indexer does not preserve dtypes #34540

Closed
HYChou0515 opened this issue Jun 3, 2020 · 3 comments · Fixed by #34908
Closed
Assignees
Labels
Bug Sparse Sparse Data Type
Milestone

Comments

@HYChou0515
Copy link

HYChou0515 commented Jun 3, 2020

Code Sample, a copy-pastable example

import pandas as pd
import scipy as sp
import scipy.sparse

spmatrix = sp.sparse.csr_matrix((2, 2))
spmatrix[0, 0] = 1

df = pd.DataFrame.sparse.from_spmatrix(spmatrix)

# work as expected, cool
df.loc[1]
# 0    0.0
# 1    0.0
# Name: 0, dtype: Sparse[float64, 0.0]

# Indexing by index list. I expect 1,0, 0.0, not cool
df.loc[[1]]
#      0   1
# 1  0.0 0

# whenever indexing an all-zero series, it returns int type.
df.loc[[1]].loc[[1]] 
#    0  1
# 1  0  0

# checking the dtype, it has been turned into int64 instead of float64
df.loc[[1]].dtypes
# 0    Sparse[float64, 0]
# 1      Sparse[int64, 0]
# dtype: object

# indexing by bool list also not cool
df.loc[[True, True]]
#      0   1
# 0  1.0 0
# 1  0.0 0

# Slicing is cool
df.loc[0:2]
#      0    1
# 0  1.0  0.0
# 1  0.0  0.0

# If we explicitly specify its dtype, the type still not preserved.
import numpy as np
df.astype(np.float32).dtypes
# 0    Sparse[float32, 0.0]
# 1    Sparse[float32, 0.0]
df.astype(np.float32).loc[[1]].dtypes
# 0    Sparse[float64, 0.0]
# 1    Sparse[float64, 0.0]

Problem description

Index slice an all-zero sparse series by indexer (whether by bool array or index array) results in type int64. It should preserve type.

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 57034774bc16958b085ad6ca6d5379b9ac0289b3
python           : 3.8.1.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-101-generic
Version          : #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : en_US.UTF-8
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.0.dev0+1742.g5703477
numpy            : 1.18.4
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 41.2.0
Cython           : 0.29.19
pytest           : 5.4.3
hypothesis       : 5.16.0
sphinx           : 3.0.4
blosc            : 1.9.1
feather          : None
xlsxwriter       : 1.2.9
lxml.etree       : 4.5.1
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.15.0
pandas_datareader: None
bs4              : 4.9.1
bottleneck       : 1.3.2
fastparquet      : 0.4.0
gcsfs            : None
matplotlib       : 3.2.1
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : 0.17.1
pytables         : None
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.4.1
sqlalchemy       : 1.3.17
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : 0.15.1
xlrd             : 1.2.0
xlwt             : 1.3.0
numba            : 0.49.1
@HYChou0515 HYChou0515 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 3, 2020
@HYChou0515
Copy link
Author

@rushabh-newzera

however using features.loc[[id]] does preserve the datatype, but returns a series object

IMO, indexing by an index list should return a series though.

@rushabh-newzera
Copy link

Thank you

@jorisvandenbossche jorisvandenbossche added Sparse Sparse Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 3, 2020
@suvayu
Copy link
Contributor

suvayu commented Jun 20, 2020

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants