## SettingWithCopyWarning

**Note**:
- This is a warning. 
- This warning is usually encountered while performing chain indexing 
- This warning is encountered while modifying a view or a copy

_Let's dive deeper with an example_

In [45]:
import pandas as pd
import numpy as np

# creating a sample dataframe
df = pd.DataFrame({
    'age' :     [ 20, 32, 23, 22, 32, 21, 37],
    'section' : [ 'I', 'II', 'III', 'IV', 'V', 'VI', 'VII'],
    'city' :    [ 'Kolkata', np.NAN, 'Mumbai', np.NAN, 'Mumbai', 'Delhi', np.NAN],
    'gender' :  [ 'M', 'M', 'F', 'F', 'M', 'F', 'F'],
    'favourite_color' : [ np.NAN, np.NAN, 'black', np.NAN, 'white', 'red', 'orange']
},index=list("ABCDEFG"))

In [46]:
# View the data
df

Unnamed: 0,age,section,city,gender,favourite_color
A,20,I,Kolkata,M,
B,32,II,,M,
C,23,III,Mumbai,F,black
D,22,IV,,F,
E,32,V,Mumbai,M,white
F,21,VI,Delhi,F,red
G,37,VII,,F,orange


### Example 1: Updating a dataframe view directly

Let's try to update the `favourite_color` for all `age > 30` to cyan 

In [47]:
# Listing out all rows for age > 30
df[df["age"] > 30]

Unnamed: 0,age,section,city,gender,favourite_color
B,32,II,,M,
E,32,V,Mumbai,M,white
G,37,VII,,F,orange


In [48]:
# Trying to set the color column by chaining operations (this should give out the warning)
df[df["age"] > 30]["favourite_color"] = "cyan"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


- As observed above, we have got the warning 
- Additionally, data never got updated in the original dataframe
- Reason for the warning: we tried updating a **view** (and not the parent dataframe) while _chaining 2 operations_
> `df[df["age"] > 30]` generates a view from the original dataframe

In [49]:
# Data never got updated in the original dataframe
df[df["age"] > 30]

Unnamed: 0,age,section,city,gender,favourite_color
B,32,II,,M,
E,32,V,Mumbai,M,white
G,37,VII,,F,orange


### How to fix the above warning
- The fix is simple; as per the warning we need to perform the new assignment operation explicitly on the original dataframe.

`Try using .loc[row_indexer,col_indexer] = value instead`

In [50]:
df.loc[(df["age"] > 30),["favourite_color"]] = "cyan"

In [51]:
# Now on validating we can find that the change has been applied on the original dataframe
df

Unnamed: 0,age,section,city,gender,favourite_color
A,20,I,Kolkata,M,
B,32,II,,M,cyan
C,23,III,Mumbai,F,black
D,22,IV,,F,
E,32,V,Mumbai,M,cyan
F,21,VI,Delhi,F,red
G,37,VII,,F,cyan


**Note**: The above assigment operation can even be extended for multiple columns.

E.g: 
- Update favourite_color to blue for all age > 30
- Update section to to `V` for all age > 30

In [52]:
df.loc[(df["age"] > 30),["favourite_color","section"]] = ["cyan","V"]

In [54]:
df.loc[["B","E","G"]]

Unnamed: 0,age,section,city,gender,favourite_color
B,32,V,,M,cyan
E,32,V,Mumbai,M,cyan
G,37,V,,F,cyan


### Example 2: Updating a dataframe view indirectly

Let's update all the NaN values in `city` to `unknown`

In [55]:
# List out all rows where city = NaN
df[df["city"].isna()]

Unnamed: 0,age,section,city,gender,favourite_color
B,32,V,,M,cyan
D,22,IV,,F,
G,37,V,,F,cyan


**Let's assign the NaN city rows to a new dataframe**

In [56]:
no_city_df = df[df["city"].isna()]
no_city_df

Unnamed: 0,age,section,city,gender,favourite_color
B,32,V,,M,cyan
D,22,IV,,F,
G,37,V,,F,cyan


In [57]:
type(no_city_df)

pandas.core.frame.DataFrame

**Let's try to update this new dataframe:**

In [58]:
no_city_df["city"] = "unknown"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [59]:
no_city_df

Unnamed: 0,age,section,city,gender,favourite_color
B,32,V,unknown,M,cyan
D,22,IV,unknown,F,
G,37,V,unknown,F,cyan


In [60]:
df

Unnamed: 0,age,section,city,gender,favourite_color
A,20,I,Kolkata,M,
B,32,V,,M,cyan
C,23,III,Mumbai,F,black
D,22,IV,,F,
E,32,V,Mumbai,M,cyan
F,21,VI,Delhi,F,red
G,37,V,,F,cyan


**Several important observations can be made from the example above:**
- The warning was displayed despite the new dataframe getting updated
- The original dataframe never got updated

### How to fix the above warning
- The new dataframe (`no_city_df`) was never a **_copy_** of the original dataframe. Rather it was created out of chaining operations, and hence the warning.
- After knowing this, we can simply make a copy of the original dataframe, and run the operations (which should work without further warning)

In [61]:
no_city_df1 = df[df["city"].isna()].copy()

In [62]:
no_city_df1

Unnamed: 0,age,section,city,gender,favourite_color
B,32,V,,M,cyan
D,22,IV,,F,
G,37,V,,F,cyan


In [63]:
no_city_df1["city"] = "unknown"

In [64]:
no_city_df1

Unnamed: 0,age,section,city,gender,favourite_color
B,32,V,unknown,M,cyan
D,22,IV,unknown,F,
G,37,V,unknown,F,cyan


In [65]:
df

Unnamed: 0,age,section,city,gender,favourite_color
A,20,I,Kolkata,M,
B,32,V,,M,cyan
C,23,III,Mumbai,F,black
D,22,IV,,F,
E,32,V,Mumbai,M,cyan
F,21,VI,Delhi,F,red
G,37,V,,F,cyan


### Takeaways
- Avoid `chaining` as this might cause inadvertant issues
- Be `explicit` wherever possible:
    * If original dataframe needs to be modified, try using assignment operators 
    * If a copy of original dataframe is needed, use copy() function   
    