New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.fillna() crashes on Categorical series if value is a series #17033

Closed
capelastegui opened this Issue Jul 20, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@capelastegui

capelastegui commented Jul 20, 2017

Code Sample

import pandas as pd, numpy as np
s_str = pd.Series(['hello',np.NaN])
print s_str.fillna(s_str)   # This works
s_cat = s_str.astype('category')
print s_cat.fillna(s_str)   # This crashes

Problem description

Pandas.Series.fillna can take a scalar, dict, Series or DataFrame as value. The fillna() method for categorical only takes scalars as value, but it doesn't provide a clear error message when an unsupported input type, such as Series, is provided.

Calling Categorical.fillna() with value=series crashes with a cryptic error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Expected Output

If Series cannot be supported as input, the function should check for input type and provide a proper ValueError message (e.g. "value must be a scalar").

The ideal solution, however, would be to have Categorical.fillna() support the same value types as other fillna() methods.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 30.2.0
Cython: 0.23.4
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.1.0
sphinx: 1.5.3
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.0.0
tables: None
numexpr: 2.5
feather: None
matplotlib: 1.5.1
openpyxl: 2.4.1
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: 3.4.4
bs4: None
html5lib: None
sqlalchemy: 1.1.2
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.9.5
s3fs: None
pandas_gbq: None
pandas_datareader: None

@gfyoung

This comment has been minimized.

Member

gfyoung commented Jul 20, 2017

@capelastegui : I agree that at the very least the error message can be improved. I also don't see why we shouldn't support Series or DataFrame as inputs.

Thus, a PR to improve the error message is welcome! However, feel free to also dive into how we could support those two classes as inputs and submit a PR to add that functionality.

@jreback

This comment has been minimized.

Contributor

jreback commented Jul 21, 2017

I suppose this could work.

@jreback jreback added this to the Next Major Release milestone Jul 21, 2017

@gfyoung

This comment has been minimized.

Member

gfyoung commented Jul 22, 2017

@jreback : I think at the every least we could improve the error message. We can address the actual behavior in a subsequent PR if necessary. How does that sound?

reidy-p added a commit to reidy-p/pandas that referenced this issue Nov 14, 2017

reidy-p added a commit to reidy-p/pandas that referenced this issue Nov 14, 2017

@jreback jreback modified the milestones: Next Major Release, 0.22.0 Nov 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment