## Views vs copies

In [12]:
import pandas as pd
import numpy as np

In [13]:
df = pd.DataFrame(
    {
    "user": [1, 2, 3],
    "age": [24, 54, 17],
    "sex": ["F", "F", "M"],
    "occupation": ["technician", "musician", "student"],
    }
)
df

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician
2,3,17,M,student


In [14]:
## Part.1 Warning after failed attempt at setting values

df[df.sex == "F"]

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician


In [15]:
#Then select the sex column

df[df.sex == "F"].sex

0    F
1    F
Name: sex, dtype: object

## Take-away message #1 Take-away message #1
### When setting values in a pd.DataFrame() , avoid chained assignment!
### Use iloc[] or loc[] instead.

In [16]:
# Finally set the value

df[df.sex == "F"].sex = "Female"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


In [17]:
df #Unsuccessful Look at takeaway message 1 above

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician
2,3,17,M,student


## Part2 Warning after successful attempt at setting values

In [18]:
df = pd.DataFrame(
    {
    "user": [1, 2, 3],
    "age": [24, 54, 17],
    "sex": ["F", "F", "M"],
    "occupation": ["technician", "musician", "student"],
    }
)
df

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician
2,3,17,M,student


### Using loc method

In [29]:
df.loc[df.sex == "F"]

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician


In [30]:
#Make it df2

df2 = df.loc[df.sex == "F"]

In [31]:
df2 # It is unsure that df2 is copy or view

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician


In [32]:
# Set the values

df2.loc[0:1, 'sex'] = "Female"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


In [33]:
df2 # Warning But successful to change because python is unsure it is view or copy

Unnamed: 0,user,age,sex,occupation
0,1,24,Female,technician
1,2,54,Female,musician


In [34]:
df # df unchanged and safe 

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician
2,3,17,M,student


## Make a explicit copy with copy() function

In [28]:
#Make it df3 explicit copy

df3 = df.loc[df.sex == "F"].copy() # Now python surely know df3 is a copy

In [35]:
df3

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician


In [36]:
# Set the values

df3.loc[0:1, 'sex'] = "Female"

In [37]:
df3 # No warning produced

Unnamed: 0,user,age,sex,occupation
0,1,24,Female,technician
1,2,54,Female,musician


## Take-away message #2
### When defining a new copy of a pd.DataFrame() …
### do so explicitly using .copy() !