BUG: pandas.SparseDtype from pandas.CategoricalDtype fails #39874
Labels
Bug
Categorical
Categorical Data Type
ExtensionArray
Extending pandas with custom dtypes or arrays.
Sparse
Sparse Data Type
Code
Sample
The following operation causes an error .
In addition, the following does not raise an error, but changes the "Zero"-only column in an unexpected way when groupby is applied.
If the dense version of the data frame is used, the outcome is as expected.
Problem description
From the description of
pandas.SparseDtype
, my understanding is that thedtype
argument can be of typeExtensionDtype
, which is consistent withCategoricalDtype
. However, doing certain operations (example above) with a sparse data frame of such type causes anTypeError
.In addition, replacing the
CategoricalDtype
with astr
type seems to partially fix the problem. However, it still causes issues with groupby when a column consists of only thefill_value
.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 7d32926
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-22-generic
Version : #23~18.04.1-Ubuntu SMP Thu Jun 6 08:37:25 UTC 2019
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.2.2
numpy : 1.18.1
pytz : 2020.4
dateutil : 2.8.1
pip : 21.0.1
setuptools : 46.4.0.post20200518
Cython : 0.29.14
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : 1.2.1
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : 2.7.1
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : 1.3.5
tables : 3.5.2
tabulate : None
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.44.1
The text was updated successfully, but these errors were encountered: