Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series/DataFrame.replace crashes without exception in some cases (concerning lists) #19266

Closed
h-vetinari opened this issue Jan 16, 2018 · 2 comments · Fixed by #22083
Closed
Labels
Compat pandas objects compatability with Numpy or Python functions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Jan 16, 2018

I have a Series of lists that I need to work with, and discovered some crashes without exceptions. Here's an overview of the behaviour (which is inconsistent to me).

import pandas as pd
import numpy as np
ser = pd.Series([['a', 'b'], np.nan, [1]])

ser.replace({np.nan : []}) # crashes w/o exception
ser.replace({np.nan : 'dummy'}) # works
# 0    [a, b]
# 1     dummy
# 2       [1]
# dtype: object

ser.replace({np.nan : ['dummy']}) # why does this unwrap?
# 0    [a, b]
# 1     dummy
# 2       [1]
# dtype: object

ser.replace({np.nan : ['dummy', 'alt']}) # crashes w/o exception

ser.fillna([]) # raises
ser.fillna({1 : []}) # this works!
# 0    [a, b]
# 1        []
# 2       [1]
# dtype: object

ser.fillna({1 : ['dummy', 'alt']}) # works as well!
# 0          [a, b]
# 1    [dummy, alt]
# 2             [1]
# dtype: object

# Dataframe has exact same behaviour as Series above
df = pd.DataFrame({'col' : ser})
df.replace({np.nan : []}) # crashes w/o exception
...

I agree that interpreting a list as the argument to .replace makes no sense. But I don't understand why it's not possible to fillna a list (cf. other people asking this question https://stackoverflow.com/q/33199193/2965879).

There's no reason in my opinion why .replace({np.nan : ['dummy', 'alt']}) or .replace({np.nan : []})couldn't work in principle - it's very clear what the intent is. Furthermore, it already works like that in fillna (of course with different interpretation of the dict-key). But even if forbidding lists is a design decision, the call shouldn't just crash, but raise an exception, at least.

In my case, I have to design an API that does replacements (as pre-/post-processing around actual work) with pandas in the background, and I'd like to be able to just pass through legal dicts to Series.replace / DataFrame.replace - e.g. {r'\s*' : np.nan} or {np.nan : []}. Otherwise I have to inspect every passed replacement parameter (with all the overhead that comes with allowing both {search : replace} and {column : {search : replace}}), extract special cases, and build complicated wrappers like in the answers of the above SO question.

Versions are the most recent on conda, details below.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Compat pandas objects compatability with Numpy or Python functions labels Jan 19, 2018
@jreback
Copy link
Contributor

jreback commented Jan 19, 2018

@h-vetinari these are pretty non-idiomatic cases. If you want to have a look and submit a PR would be great.

@minggli
Copy link
Contributor

minggli commented Jul 26, 2018

happy to look at this issue and try to fix. reverting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants