New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexed assignment does not work for dataframe #5938

Closed
RayVR opened this Issue Jan 14, 2014 · 5 comments

Comments

Projects
None yet
2 participants
@RayVR

RayVR commented Jan 14, 2014

In [82]: d = {'a': range(4), 'b': list('ab..'), 'c': ['a', 'b', nan, 'd']}
In [83]: df = pd.DataFrame(d)
In [84]: df
Out[84]:
   a  b    c
0  0  a    a
1  1  b    b
2  2  .  NaN
3  3  .    d
In [85]: df[['c']][pd.isnull(df.c)] = df[['b']][pd.isnull(df.c)]
In [86]: df
Out[86]:
   a  b    c
0  0  a    a
1  1  b    b
2  2  .  NaN
3  3  .    d

In [87]: df['c'][pd.isnull(df.c)] = df[['b']][pd.isnull(df.c)]

In [88]: df
Out[88]:
   a  b  c
0  0  a  a
1  1  b  b
2  2  .  .
3  3  .  d

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 14, 2014

You are doing a chained assignment to a copy, see here:

http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-view-versus-copy

Use loc/ix/iloc

In [8]: df.loc[pd.isnull(df.c),'c'] = df.loc[pd.isnull(df.c),'b']

In [9]: df
Out[9]: 
   a  b  c
0  0  a  a
1  1  b  b
2  2  .  .
3  3  .  d

[4 rows x 3 columns]

@jreback jreback closed this Jan 14, 2014

@RayVR

This comment has been minimized.

RayVR commented Jan 14, 2014

this does not work for the same reason.

In [96]: df.loc[pd.isnull(df.c), ['c']] = df.loc[pd.isnull(df.c), ['b']]

In [97]: df
Out[97]:
   a  b    c
0  0  a    a
1  1  b    b
2  2  .  NaN
3  3  .    d

this operation works on Series objects just fine, but if the column(s) selection is done like ['c'] it does not work. I've read the docs but this still doesn't work in the non-trivial case of assigning to a subset of n columns. What have I missed?

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 14, 2014

you are using 0.13? This works on master; I think this was just fixed on 0.13

you can do as a work-around:

In [6]: df.loc[pd.isnull(df.c), 'c'] = df.loc[pd.isnull(df.c), 'b']

In [7]: df
Out[7]: 
   a  b  c
0  0  a  a
1  1  b  b
2  2  .  .
3  3  .  d

[4 rows x 3 columns]

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 14, 2014

You can also do these. The rhs is aligned to what is selected on the lhs. As long as its a frame/series it should work (if its an ndarray/list its tricker as their are no labels to exiplicity to align)

In [8]: df = pd.DataFrame({'a': range(4), 'b': list('ab..'), 'c': ['a', 'b', nan, 'd']})

In [9]: df.loc[pd.isnull(df.c), 'c'] = df['b']

In [10]: df
Out[10]: 
   a  b  c
0  0  a  a
1  1  b  b
2  2  .  .
3  3  .  d

[4 rows x 3 columns]

In [11]: df = pd.DataFrame({'a': range(4), 'b': list('ab..'), 'c': ['a', 'b', nan, 'd']})

In [12]: df.loc[pd.isnull(df.c), ['c']] = df['b']

In [13]: df
Out[13]: 
   a  b  c
0  0  a  a
1  1  b  b
2  2  .  .
3  3  .  d

[4 rows x 3 columns]
@RayVR

This comment has been minimized.

RayVR commented Jan 14, 2014

I'm using the latest version I can get from Anaconda distribution, which is 0.12. I'll see if I can get 0.13. This solution works for this example but I'll have to see if it generalizes to the more complex use case. Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment