Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.map on a categorical does not process missing values #22527

Closed
batterseapower opened this issue Aug 28, 2018 · 0 comments · Fixed by #51645
Closed

Series.map on a categorical does not process missing values #22527

batterseapower opened this issue Aug 28, 2018 · 0 comments · Fixed by #51645
Labels
Bug Categorical Categorical Data Type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@batterseapower
Copy link
Contributor

Code Sample

>>> pd.Series(['Pandas', 'is', np.nan], dtype='category').map(lambda x: len(x) if x == x else -1)
0    6.0
1    2.0
2    NaN
dtype: category
Categories (2, int64): [6, 2]
>>> pd.Series(['Pandas', 'is', np.nan], dtype='category').astype(object).map(lambda x: len(x) if x == x else -1)
0    6
1    2
2   -1
dtype: int64

Problem description

Series.map calls its function argument once for each value in the categorical, but never calls it on NaN even if that is part of the series. This is inconsistent with how Series.map usually works, and is very surprising!

I'm raising this issue even though #15706 already exists because that issue is asking for something different (they want the argument to .map to be called once per value in the series, rather than once per unique value).

Another related issue is #20714.

Expected Output

Categorical map should give values equal to those obtained by first converting to object. For any series s and function f we should have the invariant that:

s.map(f).astype(object).equals(s.astype(object).map(f).astype(object))

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.23.1
pytest: 3.1.2
pip: 18.0
setuptools: 39.0.1
Cython: 0.27.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.9.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants