spurious SettingWithCopyWarning #5597

jseabold · 2013-11-27T10:57:51Z

I'm getting spurious warnings on some old code that I'm running with new pandas. You can replicate by doing something like this (you have to take a subset of the data first, that's the key)

import pandas as pd
from pandas.core.common import SettingWithCopyWarning
from string import letters
import warnings
warnings.simplefilter('error', SettingWithCopyWarning)

def random_text(nobs=100):
    df = []
    for i in range(nobs):
        idx= np.random.randint(len(letters), size=2)
        idx.sort()
        df.append([letters[idx[0]:idx[1]]])

    return pd.DataFrame(df, columns=['letters'])

df = random_text(100000)

df = df.ix[df.letters.apply(lambda x : len(x) > 10)]
df['letters'] = df['letters'].apply(str.lower)

The text was updated successfully, but these errors were encountered:

jreback · 2013-11-27T14:14:28Z

This is not spurious, but exactly as intended. Though you may view it as a False Positive (but the whole point of this warning is try to not have False Negatives).

The ix is implicity doing a take which returns a copy of the data. then you assigning a column of the data to new values, the 'original' frame is unchanged (which I believe is your intent). The warning is indicating that you are in effect assigning to a cross-section of the original, which MAY or MAY not be a copy (in the case of a single dtype is is 'usually' not a copy). In this case it is though.

The warning can be turned off by doing any of the following, or simply setting pd.set_option('chained_assignment',None)

df = random_text(100000)
indexer = df.letters.apply(lambda x : len(x) > 10)
df = df.ix[indexer].copy()
df['letters'] = df['letters'].apply(str.lower)

df = random_text(100000)
indexer = df.letters.apply(lambda x : len(x) > 10)
df = df.ix[indexer]
df.loc[:,'letters'] = df['letters'].apply(str.lower)

df = random_text(100000)
indexer = df.letters.apply(lambda x : len(x) > 10)
df = df.ix[indexer]._setitem_copy(False)
df['letters'] = df['letters'].apply(str.lower)

jseabold · 2013-11-27T14:20:03Z

I see. Hmm, well I know I've copied the data... Same thing, but more explicitly, perhaps.

new_df = df.ix[df.letters.apply(lambda x : len(x) > 10)]
new_df.reset_index(drop=True, inplace=True)
del df
new_df['letters'] = new_df['letters'].apply(str.lower)

The warning can be trned off by doing any of the following, or simply setting
pd.set_option('chained_assignment',None)

Ok. I guess I'll set this because I don't care for this level of hand-holding. In numpy, you just know what you're doing copy vs. view or explicitly ask for a copy when you want to be sure.

What else does 'chained_assignment' handle? It sounds delightfully mysterious.

jseabold · 2013-11-27T14:26:53Z

Hmm, maybe I need it I guess if the pandas view vs. copy is not as readily determined as numpy's. I am very wary of your it's usually a copy (or you're 'usually not producing garbage results' elsewhere, possibly silently). Sounds like I have to live with this noise for quite common operations. Plus the warning only occurs the first time by default. This is a pretty ugly state of affairs IMO. Meh.

jreback · 2013-11-27T14:32:59Z

agreed...the warning is really meant for new users for the most part

see the docs

the problem is people try to do this:

df[column][row] = ...., which doesn't ALWAYS work (in a single dtyped case it does, but not for multiple dtypes)

The problem is detecting this is quite difficult as the above, for example, yields 2 separate (and unreleated) __setitem__ calls, which no way of determining that. The example you give here is unfortunately a side-effect.

another possibility (though still not pretty) is to allow

df.ix(copy=False)[indexer] = ....

jreback · 2013-11-27T14:33:41Z

the thing is it has always been there, this warning just makes it known.

jseabold · 2013-11-27T14:37:13Z

Makes sense. I had to get out of the df[][] = ... habit early on.

aldanor · 2013-11-27T15:10:13Z

@jseabold @jreback Could the issue be reopened please?

I'm getting a millions of warnings too in the code which I know works as intended. I mean, come on, isn't it a little too much?

Something as trivial as

>>> df = pd.DataFrame({'a': [1]}).dropna()
>>> df['a'] += 1

results in

.../pandas/core/generic.py:1029: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  warnings.warn(t, SettingWithCopyWarning)

Do you think this kind of code could be misinterpreted in any possible way? Every snippet like this would effectively yield a false positive.

I consider the snippet above legit pandas code; what would you expect noobs to do here to avoid the warning? Dig into pd.set_option? Not too user-friendly. Do a .copy() after each operation, or use an ix(...) mentioned above? That's just extra boilerplate and not pretty at all :(

TL;DR I understand the need to warn the noobs, but at the same time this would yield a thousand times more false positives than the true ones.

Edit: I understand this has been partially addressed by #5584 for setting entire columns. I believe though there are many more particular cases where a warning can be assured.to be false positive :/

jreback · 2013-11-27T15:39:53Z

@aldanor I updated #5584, their WAS a supriuos indication when the operation is doing nothing except creating a new object.

pls give a try with that PR and lmk.

I think this warning is useful (you can always turn it off of course), but their are some false positives; just trying to remove the ones I can (as you can see above its not always possible).

If you have more cases, pls post here

dsm054 · 2013-11-27T16:22:07Z

Count me among the pandas users who find the current state of affairs very frustrating.

We can't have a warning being standard if you follow recommended practice, which means I can no longer recommend people do the obvious thing even if it's clear it's going to work. As a result I now have to write .loc everywhere, even where it's manifestly not necessary.

Admittedly the only way I can think to get around this offhand is to have df["a"] return a different Series view object each time, so that it could grow a _chained_count value which can be incremented.

jreback · 2013-11-27T16:26:03Z

@dsm054 can you give an example of where 'standard practice' actually triggers this? This shouldn't be triggering very often (that's the intent anyhow).

In the entire test suite it DOES NOT trigger (except on purpose), so must be missing some cases (after #5584)

jreback · 2013-11-29T17:41:32Z

@dsm054 ony other warnigns you are getting that look spurious?

jseabold · 2014-02-13T21:30:24Z

I just noticed that the statsmodels test suite is littered with these warnings, but no failures. I've updated our code to ignore them, but surely this is spurious? If it's setting on a copy I'd assume the original to be unchanged.

[~/statsmodels/statsmodels-skipper/statsmodels/stats/tests/]
[1]: pd.version.version
[1]: '0.13.1-105-g8119991'

[~/statsmodels/statsmodels-skipper/statsmodels/stats/tests/]
[2]: table = pd.DataFrame(np.zeros((3,2)), columns=['A','B'])

[~/statsmodels/statsmodels-skipper/statsmodels/stats/tests/]
[3]: table.ix[2]['A'] = 3
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
#!/usr/bin/python

[~/statsmodels/statsmodels-skipper/statsmodels/stats/tests/]
[4]: table
[4]: 
A  B
0  0  0
1  0  0
2  3  0

[3 rows x 2 columns]

jreback · 2014-02-13T21:40:03Z

@jseabold

This is THE specific case that SettingWithCopy is addressing.

you should NOT do

table.ix[2]['A'] = 3 and instead do table.loc[2,'A']

It WILL work in a single-dtyped case as numpy almost always (and that's the rub) will give you a view. But will not work in a mixed dtype case.

You can turn them off, but the point is to use the correct semantics. It is not a bug but a warning to catch a potential error.

The setting will proceed, but you may be setting a copy. So if succeeds you had a view and the warning was spurious. but you cannot be sure; better to set using the loc/ix indexer (in its full form).

jseabold · 2014-02-13T21:46:32Z

What version was loc introduced in?

If I read what you're saying then I could use this right?

table.ix[2, 'A'] = 3

I don't get a warning here, and I know what the dtype of table will be always.

jreback · 2014-02-13T22:00:23Z

you can use .ix too (I just always use .loc)

that is the 'correct' expression to ensure it will always work

jreback · 2014-02-13T22:00:38Z

IIRC .loc/.iloc in 0.11

jseabold · 2014-02-13T22:03:15Z

Ok thanks. That's what I thought. We are still supporting old pandas unfortunately, because we are still supporting old(er) numpy.

apiszcz · 2015-03-07T22:25:16Z

I am getting the warning using .loc ?
df.loc[:,'f1']=f2['tf']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

FYI:
(Pdb) pd.version
'0.15.2'

jreback · 2015-03-07T22:51:45Z

that can generate a warning as well. simply use df['f1'] = f2['tf']

jreback mentioned this issue Nov 27, 2013

BUG/TST: reset setitem_copy on object enlargement #5584

Merged

jseabold closed this as completed Nov 27, 2013

jreback reopened this Nov 27, 2013

jreback closed this as completed in #5584 Nov 29, 2013

jreback mentioned this issue Dec 11, 2013

API/ENH: Detect trying to set inplace on copies in a nicer way, related (GH5597) #5679

Merged

jtratner mentioned this issue Dec 19, 2013

Add support for a pandasrc #4907

Closed

AndreRicoPSU mentioned this issue Feb 7, 2023

SettingWithCopyWarning on Regression runs HallLab/clarite-python#120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spurious SettingWithCopyWarning #5597

spurious SettingWithCopyWarning #5597

jseabold commented Nov 27, 2013

jreback commented Nov 27, 2013

jseabold commented Nov 27, 2013

jseabold commented Nov 27, 2013

jreback commented Nov 27, 2013

jreback commented Nov 27, 2013

jseabold commented Nov 27, 2013

aldanor commented Nov 27, 2013

jreback commented Nov 27, 2013

dsm054 commented Nov 27, 2013

jreback commented Nov 27, 2013

jreback commented Nov 29, 2013

jseabold commented Feb 13, 2014

jreback commented Feb 13, 2014

jseabold commented Feb 13, 2014

jreback commented Feb 13, 2014

jreback commented Feb 13, 2014

jseabold commented Feb 13, 2014

apiszcz commented Mar 7, 2015

jreback commented Mar 7, 2015

spurious SettingWithCopyWarning #5597

spurious SettingWithCopyWarning #5597

Comments

jseabold commented Nov 27, 2013

jreback commented Nov 27, 2013

jseabold commented Nov 27, 2013

jseabold commented Nov 27, 2013

jreback commented Nov 27, 2013

jreback commented Nov 27, 2013

jseabold commented Nov 27, 2013

aldanor commented Nov 27, 2013

jreback commented Nov 27, 2013

dsm054 commented Nov 27, 2013

jreback commented Nov 27, 2013

jreback commented Nov 29, 2013

jseabold commented Feb 13, 2014

jreback commented Feb 13, 2014

jseabold commented Feb 13, 2014

jreback commented Feb 13, 2014

jreback commented Feb 13, 2014

jseabold commented Feb 13, 2014

apiszcz commented Mar 7, 2015

jreback commented Mar 7, 2015