Series.map on a categorical does not process missing values #22527

batterseapower · 2018-08-28T09:16:31Z

Code Sample

>>> pd.Series(['Pandas', 'is', np.nan], dtype='category').map(lambda x: len(x) if x == x else -1)
0    6.0
1    2.0
2    NaN
dtype: category
Categories (2, int64): [6, 2]
>>> pd.Series(['Pandas', 'is', np.nan], dtype='category').astype(object).map(lambda x: len(x) if x == x else -1)
0    6
1    2
2   -1
dtype: int64

Problem description

Series.map calls its function argument once for each value in the categorical, but never calls it on NaN even if that is part of the series. This is inconsistent with how Series.map usually works, and is very surprising!

I'm raising this issue even though #15706 already exists because that issue is asking for something different (they want the argument to .map to be called once per value in the series, rather than once per unique value).

Another related issue is #20714.

Expected Output

Categorical map should give values equal to those obtained by first converting to object. For any series s and function f we should have the invariant that:

s.map(f).astype(object).equals(s.astype(object).map(f).astype(object))

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.23.1
pytest: 3.1.2
pip: 18.0
setuptools: 39.0.1
Cython: 0.27.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.9.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Categorical Categorical Data Type labels Aug 29, 2018

simonjayhawkins mentioned this issue Apr 23, 2020

.map on category dtype does not respect defaultdict when encountering np.nan values #29162

Closed

mroeschke added the Bug label Jun 28, 2020

jbrockmendel mentioned this issue Dec 18, 2021

BUG/API: Categorical.map & CategoricalIndex.map have no 'na_action' kwarg #44279

Closed

3 tasks

topper-123 mentioned this issue Feb 26, 2023

ENH: add na_action to Categorical.map & CategoricalIndex.map #51645

Merged

6 tasks

mroeschke closed this as completed in #51645 Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series.map on a categorical does not process missing values #22527

Series.map on a categorical does not process missing values #22527

batterseapower commented Aug 28, 2018

Series.map on a categorical does not process missing values #22527

Series.map on a categorical does not process missing values #22527

Comments

batterseapower commented Aug 28, 2018

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`