# Explaining the SettingwithCopyWarning in pandas

## Setup

In [1]:
import pandas as pd

In [2]:
print(f'pandas version: {pd.__version__}')

pandas version: 1.0.3


## Preparing the dataset

In [69]:
def get_data():
    df = pd.DataFrame({'A': range(0, 5), 
                       'B': range(10, 15),
                       'C': range(100, 105)})
    return df

X = get_data()
X

Unnamed: 0,A,B,C
0,0,10,100
1,1,11,101
2,2,12,102
3,3,13,103
4,4,14,104


## Common occurrences of `SettingWithCopyWarning`

### Chained Assignment

In [6]:
# load the data
X = get_data()

# display filtered rows
X[X['B'] > 12]

Unnamed: 0,A,B,C
3,3,13,103
4,4,14,104


In [7]:
# 1st try 
X[X['B'] > 12]['C'] = 999
X[X['B'] > 12]['C']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


3    103
4    104
Name: C, dtype: int64

In [8]:
# 2nd try 
X.loc[X['B'] > 12]['C'] = 999
X[X['B'] > 12]['C']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


3    103
4    104
Name: C, dtype: int64

In [9]:
# solution to the warning
X.loc[X['B'] > 12, 'C'] = 999
X[X['B'] > 12]['C']

3    999
4    999
Name: C, dtype: int64

### Hidden chaining

In [75]:
# load the data 
X = get_data()

# create a new DataFrame based on the filtered original
temp = X.loc[X['C'] > 101]
temp

Unnamed: 0,A,B,C
2,2,12,102
3,3,13,103
4,4,14,104


In [76]:
# replace the first value in the C column
temp.loc[2, 'C'] = 999

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [77]:
# print the values
print(f"New DataFrame: {temp.loc[2, 'C']}")
print(f"Original DataFrame: {X.loc[2, 'C']}")

New DataFrame: 999
Original DataFrame: 102


In [17]:
# how to avoid the warning
X = get_data()
temp = X.loc[X['C'] >= 101].copy()
temp.loc[2, 'C'] = 999
print(f"New DataFrame: {temp.loc[2, 'C']}")
print(f"Original DataFrame: {X.loc[2, 'C']}")

New DataFrame: 999
Original DataFrame: 102


### An example of a false negative

In [83]:
X = get_data()
X.loc[X['A'] > 2, ['A', 'B']]['A'] = 999
X

Unnamed: 0,A,B,C
0,0,10,100
1,1,11,101
2,2,12,102
3,3,13,103
4,4,14,104


In [84]:
X = get_data()
X[['A', 'B', 'C']]['A'] = 999
X

Unnamed: 0,A,B,C
0,0,10,100
1,1,11,101
2,2,12,102
3,3,13,103
4,4,14,104


In [82]:
X[['C']]['C'] = 999
X

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,A,B,C
0,0,10,100
1,1,11,101
2,2,12,102
3,3,13,103
4,4,14,104


## Determining if an object is a copy or a view

In [59]:
X = get_data()
X._is_view, X._is_copy

(False, None)

In [60]:
Y = X.loc[0:1, 'C']
Y._is_view, Y._is_copy

(True, None)

In [61]:
Z = X.loc[0:1, 'C'].copy()
Z._is_view, Z._is_copy

(False, None)

In [67]:
Z = X.loc[X['A'] > 1, :]
Z._is_view, Z._is_copy

(False, <weakref at 0x11c778d18; to 'DataFrame' at 0x11c7a7d30>)