Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fillna with inplace does not work with multiple columns selection by loc #14858

Closed
hiiwave opened this issue Dec 11, 2016 · 3 comments
Closed
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question

Comments

@hiiwave
Copy link

hiiwave commented Dec 11, 2016

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))
df.iloc[1, 2:4] = np.nan
df.loc[:, ['C', 'D']].fillna(-1, inplace=True)
display(df)

Output:

A	B	C	D
0	1.387547	-1.299578	0.360015	1.290783
1	-0.395182	-0.112581	NaN	NaN
2	-0.649372	-1.831869	-0.103746	0.533153

Problem description

It's expected to modify the Nan to -1 but it does NOT.

Please see the following comparisons.

Comparison (1)

On contrary, the following codes behave as expected.
(The only difference is selection by iloc or by loc)

df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))
df.iloc[1, 2:4] = np.nan
df.iloc[:, 2:4].fillna(-1, inplace=True)
display(df)

Output:

	A	B	C	D
0	-0.522821	-1.600520	-1.468871	0.715790
1	0.493071	0.722474	-1.000000	-1.000000
2	0.545852	-0.877946	0.993169	-0.582661

Comparison (2)

When only one column is selected with loc, it behaves properly.

df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))
df.iloc[1, 2:4] = np.nan
df.loc[:, 'C'].fillna(-1, inplace=True)
display(df)

Output:

A	B	C	D
0	-0.549106	0.261093	-1.278554	2.017178
1	-1.424498	0.439482	-1.000000	NaN
2	-1.281520	1.190736	0.356319	0.416363

Expected Output of the first code sample

A	B	C	D
0	1.181106	1.101231	-0.198445	0.295238
1	-0.654265	-1.129840	-1.000000	-1.000000
2	-1.070404	0.096556	0.499020	-1.835347

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-358.14.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: zh_TW.big5 LOCALE: zh_TW.big5

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Dec 11, 2016

you are filling a copy. Using inplace is an anti-pattern. Most operations will show a SettingWithCopyWarning, but in this case this is a not easily detectable.

Use

In [11]: df[['C', 'D']] = df[['C', 'D']].fillna(-1)

In [12]: df
Out[12]: 
          A         B         C         D
0  0.236782  1.408896 -0.199882  0.803165
1 -1.763881  0.232414 -1.000000 -1.000000
2  0.878515 -0.394800  0.429696 -1.829569

@jreback jreback closed this as completed Dec 11, 2016
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question labels Dec 11, 2016
@jreback jreback added this to the No action milestone Dec 11, 2016
@matheushrd
Copy link

Try this:
df.loc[:, ['C', 'D']] = df.loc[:, ['C', 'D']].fillna(-1)
I was having the same difficulty with a .relplace in my code. This worked.

@shuiyuejihua
Copy link

not only multiple columns, but also one column.
df.loc[df.id==123, 'num'].fillna(0, inplace=True)
don't work ,
but
df.loc[df.id==123, 'num'] = 123
it works

why not edit the fillna function to adapt it in the future.
It seems like a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants