## SettingWithCopyWarning.
Here is a nice explanation (with examples and history):
 
   https://www.dataquest.io/blog/settingwithcopywarning/


<b>SettingWithCopyWarning</b> should never be ignored.
<br>It tells you that you should check the result,
<br>because your operation of setting values in a pandas DataFrame (DF)
<br>might not have worked as expected.
<br>Ideally you should rewrite your code to avoid this warning
<br>To make sure, you can do the following setting:
``` python
pd.set_option('mode.chained_assignment', 'raise')
```

There is a difference between 
 - getting data from a DF.
 - setting data in a DF

Getting data - you specify rows and columns.
<br> Under the hood pandas may do extraction in
<br> two steps (for speed optimization - especially when 
<br> columns have different data types):
 - step1 - select certain columns or rows into a temporary DF
 - step2 - select values from this temporary DF

For example, "chained indexing" syntax works to get data:
``` python
df2 = df[cols][rows]
```
But simialr syntax may fail to set data:
``` python
df[cols][rows] = 1 # possible warning and error
```
When setting data, you should always use .loc syntax:
``` python
df.loc[rows, cols] = 1
```
to set data explicitly in one step.

Note that chained indexing can occur across two lines as well as within one. 
<br>This "hidden" chaining maybe difficult to debug.

Here is a picture illustrating the problem of assigning to a copy:<br>
<img src = "images/view-copy-assign.png" style="width:300px;" align='left'>

In [1]:
import sys, os
import numpy as np
import pandas as pd
!python --version

Python 3.9.12


In [2]:
# set option to always fail with exception 
# instead of just a warning
pd.set_option('mode.chained_assignment', 'raise')

In [3]:
df = pd.DataFrame(
    data = {'c1':[1,2,3], 'c2':['a','b','c'] },
    columns = ['c1','c2'])
df.head()

Unnamed: 0,c1,c2
0,1,a
1,2,b
2,3,c


In [4]:
# selecting subset of columns in certain order - use .loc syntax
df = df.loc[:,['c2','c1']]  # this is correct approach to select subset of columns

# note, do not use syntax without .loc: 
#     df = df[['c2','c1']]
# It may work, but sometimes it creates a copy,
# which may cause an error many lines of code down, 
# difficult to debug

df

Unnamed: 0,c2,c1
0,a,1
1,b,2
2,c,3


In [5]:
df['c5'] = 1

In [6]:
df.head()

Unnamed: 0,c2,c1,c5
0,a,1,1
1,b,2,1
2,c,3,1


In [7]:
mask = df.c1 >= 2
mask
# df['c5'][mask] = 2 # this causes errors (uncomment and run to see it)

0    False
1     True
2     True
Name: c1, dtype: bool

In [8]:
df.head()

Unnamed: 0,c2,c1,c5
0,a,1,1
1,b,2,1
2,c,3,1


In [9]:
# df[mask]['c5'] = 3 # this causes error (uncomment and run to see it)

In [10]:
df.head()

Unnamed: 0,c2,c1,c5
0,a,1,1
1,b,2,1
2,c,3,1


In [11]:
df.loc[mask,'c5'] = 4 # this is correct way to set values

In [12]:
df.head()

Unnamed: 0,c2,c1,c5
0,a,1,1
1,b,2,4
2,c,3,4


In [13]:
# Hidden chaining example:
# winners = data.loc[data.bid == data.price] # is it a copy or a view ?
# winners.loc[304, 'bidder'] = 'therealname' # this will give a warning
#
# one way to solve it is to take explicit copy:
# winners = data.loc[data.bid == data.price].copy()