BUG: pandas 1.0 dropna error with categorical data if pd.options.mode.use_inf_as_na = True #33594

thebucc · 2020-04-16T18:06:38Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np
from pandas.api.types import CategoricalDtype

# with categorical column and use_inf_as_na = True -> ERROR
pd.options.mode.use_inf_as_na = True
df1 = pd.DataFrame([['a1', 'good'], ['b1', 'good'], ['c1', 'good'], ['d1', 'bad']], columns=['C1', 'C2'])
df2 = pd.DataFrame([['a1', 'good'], ['b1', np.inf], ['c1', np.NaN], ['d1', 'bad']], columns=['C1', 'C2'])
categories = CategoricalDtype(categories=['good', 'bad'], ordered=True)
df1.loc[:, 'C2'] = df1['C2'].astype(categories)
df2.loc[:, 'C2'] = df2['C2'].astype(categories)
df1.dropna(axis=0)  # ERROR
df2.dropna(axis=0)  # ERROR

Problem description

With the latest version of pandas (1.0.3, installed via pip on python 3.6.8), DataFrame.dropna returns an error if a column is of type CategoricalDtype AND pd.options.mode.use_inf_as_na = True.

Exception with traceback:

Traceback (most recent call last):
File "", line 1, in
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/frame.py", line 4751, in dropna
count = agg_obj.count(axis=agg_axis)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/frame.py", line 7800, in count
result = notna(frame).sum(axis=axis)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/dtypes/missing.py", line 376, in notna
res = isna(obj)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/dtypes/missing.py", line 126, in isna
return _isna(obj)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/dtypes/missing.py", line 185, in _isna_old
return obj._constructor(obj._data.isna(func=_isna_old))
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 555, in isna
return self.apply("apply", func=func)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 442, in apply
applied = getattr(b, f)(**kwargs)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 390, in apply
result = func(self.values, **kwargs)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/dtypes/missing.py", line 183, in _isna_old
return _isna_ndarraylike_old(obj)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/dtypes/missing.py", line 283, in _isna_ndarraylike_old
vec = libmissing.isnaobj_old(values.ravel())
TypeError: Argument 'arr' has incorrect type (expected numpy.ndarray, got Categorical)

This doesn't happen with pandas 0.24.0 or if pd.options.mode.use_inf_as_na = False (default).

Expected Output

no error

Output of `pd.show_versions()`

pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-96-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.0.3
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-04-16T21:06:39Z

Can you edit your post to include the full traceback and remove all the unnecessary examples (all the things that work). Just have the DataFrame creation and the code that fails.

dsaxton · 2020-04-18T14:19:56Z

Smaller example:

import pandas as pd

cat = pd.Categorical([1, 2])
pd.options.mode.use_inf_as_na = True
cat.isna()  # Works
pd.Series(cat).isna()  # Raises
pd.DataFrame(cat).isna()  # Raises

Also interesting is that use_inf_as_na disables NA checking for string dtype:

[ins] In [5]: pd.options.mode.use_inf_as_na = True

[ins] In [6]: arr = pd.array(["a", "b", None])

[ins] In [7]: arr.isna()
Out[7]: array([False, False, False])

simonjayhawkins · 2020-04-19T09:52:18Z

With the latest version of pandas (1.0.3, installed via pip on python 3.6.8)

working in 0.25.3. regression in #29900 (i.e. 1.0.0)

9333e3d is the first bad commit
commit 9333e3d
Author: jbrockmendel jbrockmendel@gmail.com
Date: Sun Dec 1 10:11:52 2019 -0800

DEPR: Categorical.ravel, get_dtype_counts, dtype_str, to_dense (#29900)

cc @jbrockmendel

thebucc added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 16, 2020

dsaxton added Categorical Categorical Data Type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 18, 2020

dsaxton mentioned this issue Apr 18, 2020

BUG: Fix Categorical use_inf_as_na bug #33629

Merged

5 tasks

simonjayhawkins added Regression Functionality that used to work in a prior pandas version and removed Bug labels Apr 19, 2020

simonjayhawkins added this to the 1.1 milestone Apr 19, 2020

jreback closed this as completed in #33629 Apr 27, 2020

simonjayhawkins mentioned this issue May 5, 2020

Backport PR #33629 on branch 1.0.x (BUG: Fix Categorical use_inf_as_n… #34001

Merged

dsaxton mentioned this issue May 9, 2020

DISC: Deprecate use_inf_as_na #34093

Closed

simonjayhawkins modified the milestones: 1.1, 1.0.4 May 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pandas 1.0 dropna error with categorical data if pd.options.mode.use_inf_as_na = True #33594

BUG: pandas 1.0 dropna error with categorical data if pd.options.mode.use_inf_as_na = True #33594

thebucc commented Apr 16, 2020 •

edited

Loading

INSTALLED VERSIONS

TomAugspurger commented Apr 16, 2020

dsaxton commented Apr 18, 2020

simonjayhawkins commented Apr 19, 2020

BUG: pandas 1.0 dropna error with categorical data if pd.options.mode.use_inf_as_na = True #33594

BUG: pandas 1.0 dropna error with categorical data if pd.options.mode.use_inf_as_na = True #33594

Comments

thebucc commented Apr 16, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Apr 16, 2020

dsaxton commented Apr 18, 2020

simonjayhawkins commented Apr 19, 2020

thebucc commented Apr 16, 2020 •

edited

Loading

Output of `pd.show_versions()`