Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible spurious SettingWithCopyWarning #6757

Closed
jorisvandenbossche opened this issue Apr 1, 2014 · 26 comments

Comments

@jorisvandenbossche
Copy link
Member

commented Apr 1, 2014

Found some SettingWithCopyWarning warnings in older code with newer pandas. But it's quite possible that these are just side-effects of the warning ('False positives') and it cannot easily be detected, but I am no expert in that field.

Simplified dummy example:

In [1]: df = pd.DataFrame(np.arange(20).reshape(5, 4), columns=list('ABCD'))
In [2]: df
Out[2]:
    A   B   C   D
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
4  16  17  18  19

# reassign a selection to df
In [3]: df = df[(df['A']%5)!=0]

Setting a value with .loc[] (this output is from master, in 0.13.1 I got even the advice to Try using .loc[row_index,col_indexer] = value instead while I was using it ... in master just the warning):

In [4]: df.loc[df['B']==17, 'C'] = 1000
C:\Anaconda\envs\devel\Scripts\ipython-script.py:1: SettingWithCopyWarning: A va
lue is trying to be set on a copy of a slice from a DataFrame
  #!C:\Anaconda\envs\devel\python.exe

And then replacing a value with replace (also here the advice to use .loc is a little bit strange I think):

In [5]: df['D'] = df['D'].replace({7:2000})
C:\Anaconda\envs\devel\Scripts\ipython-script.py:1: SettingWithCopyWarning: A va
lue is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  #!C:\Anaconda\envs\devel\python.exe

In [6]: df
Out[6]:
    A   B     C     D
1   4   5     6  2000
2   8   9    10    11
3  12  13    14    15
4  16  17  1000    19
@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2014

master looks ok on these (no warnings) .......

IIRC 0.13.1 did have some false positives

I don't get on 0.13.1

I am on numpy 1.7.1....that could be a factor....

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented Apr 1, 2014

Strange, I get these on

In [7]: pd.__version__
Out[7]: '0.13.1-496-gd0aebea'

In [8]: np.__version__
Out[8]: '1.7.1'
@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2014

In [1]: pd.__version__
Out[1]: '0.13.1-533-g2ada054'

In [2]: np.__version__
Out[2]: '1.7.1'

hmm...do you get them ALWAYS, even on a fresh ipython?

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented Apr 1, 2014

Strange, first it seemed that I couldn't replicate this with a fresh ipython, but now it seems that I only see it if I first print the frame:

If I run this on a fresh ipython, I get the warnings:

df = pd.DataFrame(np.arange(40).reshape(10, 4), columns=list('ABCD'))
df
df = df[df['A']>0]
df.loc[df['B']==17, 'C'] = 1000
df['D'] = df['D'].replace({7:2000})

and with this not:

df = pd.DataFrame(np.arange(40).reshape(10, 4), columns=list('ABCD'))
df = df[df['A']>0]
df.loc[df['B']==17, 'C'] = 1000
df['D'] = df['D'].replace({7:2000})

Although with the original code, this was certainly not the case (that it was first printed).

And above was now with latest master ('0.13.1-543-g4bd1e6a')

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2014

hmm...ok I can replicate that

It is legit because of : df = df[df['A']>0]

which is a sliced version of the original
then when you set with loc you could misinterpret this as setting the original frame (that's the intent of the warning).

The problem is that you are reassigning df to a version of itself; if you use a different variable name then you don't get this. This might be a case where this is not detectable. Let me see.

In [10]: df = pd.DataFrame(np.arange(40).reshape(10, 4), columns=list('ABCD'))

In [11]: x = df[df['A']>0]

In [12]: df.is_copy

In [13]: x.is_copy
Out[13]: <weakref at 0x4746d08; to 'DataFrame' at 0x4759610>

In [14]: x.loc[df['B']==17, 'C'] = 1000

In [15]: x['D'] = df['D'].replace({7:2000})
@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented Apr 1, 2014

I also get the warning when assigning to a different variable (and then also without printing the frame first!):

In [1]: df = pd.DataFrame(np.arange(40).reshape(10, 4), columns=list('ABCD'))

In [2]: x = df[df['A']>0]

In [3]: x.loc[df['B']==17, 'C'] = 1000
C:\Anaconda\envs\devel\Scripts\ipython-script.py:1: SettingWithCopyWarning: A va
lue is trying to be set on a copy of a slice from a DataFrame
  #!C:\Anaconda\envs\devel\python.exe

But I suppose here the warning is legitimate, as it is indeed not changed in the original.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2014

that is legit

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented Apr 1, 2014

ah, yes, was just updating my comment saying that.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2014

of course with the caveat that its not wrong per se, just a warning (mainly for new users).

@bluefir

This comment has been minimized.

Copy link

commented Apr 2, 2014

I get this when I do inplace sort_index and fillna:

portfolio_analytics\attribution\Hierarchies.py:212: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
data_frame.sort_index(inplace=True)

C:\Python27\lib\site-packages\pandas\core\generic.py:2174: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
obj.fillna(v, inplace=True)

@bluefir

This comment has been minimized.

Copy link

commented Apr 2, 2014

Another one:

C:\Python27\lib\site-packages\pandas\core\indexing.py:346: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
self.obj[item] = s

@bluefir

This comment has been minimized.

Copy link

commented Apr 2, 2014

Wow! This thing is everywhere:

# Recalculate portfolio betas
portfolio_data_returns[field_beta_by_portfolio_weight] = portfolio_data_returns[field_beta] * portfolio_data_returns[field_portfolio_weight]
portfolio_betas = portfolio_data_returns[field_beta_by_portfolio_weight].groupby(level=field_date).sum()

-c:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead

@jreback

This comment has been minimized.

@bluefir

This comment has been minimized.

Copy link

commented Apr 2, 2014

Ok. In my last example, what am I doing inplace?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2014

it prob was already not a copy
the assignment does the check

you need to look at the first time it happens
and can always check is_copy property

@bluefir

This comment has been minimized.

Copy link

commented Apr 2, 2014

It's a copy, yes. I wanted it explicitly just to make sure the original DataFrame is left intact.

portfolio_data_returns = portfolio_data_all.loc[return_date_first:return_date_last].copy()
@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2014

and is that a series?

@bluefir

This comment has been minimized.

Copy link

commented Apr 2, 2014

DataFrame

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2014

you need to use loc just as it says then

@fonnesbeck

This comment has been minimized.

Copy link

commented Sep 22, 2014

I'm getting similar warnings for an operation like this, where I am just trying to truncate a variable at a particular value:

lab_subset.YEAR_AGE[lab_subset.YEAR_AGE > 75] = 75

So, I go in and try to use .loc:

lab_subset.YEAR_AGE.loc[lab_subset.YEAR_AGE > 75] = 75

But get the same error. The object lab_subset is also the result of an indexing operation:

lab_subset = measles_data[(CONFIRMED | DISCARDED) & measles_data.YEAR_AGE.notnull() & measles_data.COUNTY.notnull()]

So, I tried to use .loc on that as well, but the warning persists.

Its not clear to me what is going on here. Running '0.14.1-486-g1d65bc8' on Python 2.7.6 and OS X 10.9.5.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2014

no, you are still chaining, you need to use .loc on the DataFrame

lab_subset.loc[lab_subset.YEAR_AGE > 75,'YEAR_AGE'] = 75

@fonnesbeck

This comment has been minimized.

Copy link

commented Sep 22, 2014

OK, I see now. Man, that's going to be a tough one for new users to digest.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2014

that's why its a warning! That's the reason for it though. It sometimes does work. (and has been around since 0.13.0), just getting better / less spurious though the versions.

@fonnesbeck

This comment has been minimized.

Copy link

commented Sep 22, 2014

So, I still get the warning despite the syntax change. lab_subset isn't a view on measles_data, is it?

loc

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2014

you should put .copy() at the end of the first expression. Otherwise you end up changing data in the measles_dataset! (if its a view, which if its a single dtype ti will be), otherwise it would make a copy.

But that's the rub, you don't want to have to care/know its a view (and if its a view, you certainly don't want to propogate back to the original, except explicity).

@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2015

closing as stale. pls reopen if still an issue.

@jreback jreback closed this Oct 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.