## Pandas: Copy DataFrames with `copy()`

`df.copy()` method: make a copy of DataFrame `df`:
- `deep=True` (default): modifications to the data of the copy does not reflect in the original DF. In other words, **a new, independent object** is created.
- `deep=False`: modifications to the data of the copy reflects in the original DF. In other words, a **shallow copy** is created.

In [90]:
import pandas as pd

df = pd.read_csv("datasets/players_20.csv")

### Deep copy

In [91]:
# deep = True

df_copy = df.copy(deep=True)
df.height_cm

0        170
1        187
2        175
3        188
4        175
        ... 
18273    186
18274    177
18275    186
18276    185
18277    182
Name: height_cm, Length: 18278, dtype: int64

In [92]:
# When modifying the df_copy's height column

df_copy.height_cm = 67
df_copy.height_cm

0        67
1        67
2        67
3        67
4        67
         ..
18273    67
18274    67
18275    67
18276    67
18277    67
Name: height_cm, Length: 18278, dtype: int64

In [93]:
# This will stay unchanged.
df.height_cm

0        170
1        187
2        175
3        188
4        175
        ... 
18273    186
18274    177
18275    186
18276    185
18277    182
Name: height_cm, Length: 18278, dtype: int64

### Shallow copy

1. Try changing a cell in df_shallow_copy

In [94]:
df_shallow_copy = df.copy(deep=False)

In [95]:
df.iloc[1, 5]

'1985-02-05'

In [96]:
# When we update this...
df_shallow_copy.iloc[1, 5] = "2000-01-01"

In [97]:
# ...the original DF's column should be changed as well
df.iloc[1, 5]

'2000-01-01'

2. Try changing a row in df_shallow_copy

In [98]:
df = pd.read_csv("datasets/players_20.csv")
df.iloc[1, :]

sofifa_id                                                 20801
player_url    https://sofifa.com/player/20801/c-ronaldo-dos-...
short_name                                    Cristiano Ronaldo
long_name                   Cristiano Ronaldo dos Santos Aveiro
age                                                          34
                                    ...                        
lb                                                         61+3
lcb                                                        53+3
cb                                                         53+3
rcb                                                        53+3
rb                                                         61+3
Name: 1, Length: 104, dtype: object

In [99]:
import numpy as np
df_shallow_copy = df.copy(deep=False)
df_shallow_copy.iloc[1, :] = np.nan

In [100]:
df.iloc[1]

sofifa_id     20801
player_url      NaN
short_name      NaN
long_name       NaN
age              34
              ...  
lb              NaN
lcb             NaN
cb              NaN
rcb             NaN
rb              NaN
Name: 1, Length: 104, dtype: object

In [101]:
df_shallow_copy.iloc[1, :] = None
df.iloc[1]

sofifa_id     20801
player_url     None
short_name     None
long_name      None
age              34
              ...  
lb             None
lcb            None
cb             None
rcb            None
rb             None
Name: 1, Length: 104, dtype: object

## From what I observed, shallow copy does two things:
- For cells with type `object`, it's referenced back to the same cell on the original DF
- For cells with type `integer`, it's copied as a new, independent cell.


3. Try replicating this behavior on a column


In [102]:
df.long_name

0        Lionel Andrés Messi Cuccittini
1                                  None
2         Neymar da Silva Santos Junior
3                             Jan Oblak
4                           Eden Hazard
                      ...              
18273                                邵帅
18274                      Mingjie Xiao
18275                                张威
18276                               汪海健
18277                               潘喜明
Name: long_name, Length: 18278, dtype: object

In [103]:
df_shallow_copy = df.copy(deep=False)

# Try changing all cells in a column of df_shallow_copy to a string
df_shallow_copy.long_name = "??? ??? ???"

In [104]:
df.long_name # Somehow this breaks the references between df_shallow_copy's cell and df's cell on the long_name col (maybe updating this way replaces the underlying Series with an entirely new object)

0        Lionel Andrés Messi Cuccittini
1                                  None
2         Neymar da Silva Santos Junior
3                             Jan Oblak
4                           Eden Hazard
                      ...              
18273                                邵帅
18274                      Mingjie Xiao
18275                                张威
18276                               汪海健
18277                               潘喜明
Name: long_name, Length: 18278, dtype: object

In [105]:
# Try with loc

df_shallow_copy = df.copy(deep=False)

# Try changing all cells in a column of df_shallow_copy to a string
df_shallow_copy.loc[:, "long_name"]

0        Lionel Andrés Messi Cuccittini
1                                  None
2         Neymar da Silva Santos Junior
3                             Jan Oblak
4                           Eden Hazard
                      ...              
18273                                邵帅
18274                      Mingjie Xiao
18275                                张威
18276                               汪海健
18277                               潘喜明
Name: long_name, Length: 18278, dtype: object

In [106]:
df_shallow_copy.loc[:, "long_name"] = "???"


In [107]:
df.loc[:, "long_name"] # Well, changing values of a column via .loc still keeps the reference.

0        ???
1        ???
2        ???
3        ???
4        ???
        ... 
18273    ???
18274    ???
18275    ???
18276    ???
18277    ???
Name: long_name, Length: 18278, dtype: object

In [108]:
# Trying this on a column of type 'integer'
df.age

0        32
1        34
2        27
3        26
4        28
         ..
18273    22
18274    22
18275    19
18276    18
18277    26
Name: age, Length: 18278, dtype: int64

In [109]:
df_shallow_copy.age = 67

In [110]:
df.age # It still breaks the reference.

0        32
1        34
2        27
3        26
4        28
         ..
18273    22
18274    22
18275    19
18276    18
18277    26
Name: age, Length: 18278, dtype: int64

In [111]:
df_shallow_copy = df.copy(deep=False)
df.loc[:, "age"]

0        32
1        34
2        27
3        26
4        28
         ..
18273    22
18274    22
18275    19
18276    18
18277    26
Name: age, Length: 18278, dtype: int64

In [112]:
df_shallow_copy.loc[:, "age"] = 67
df_copy.age # Still as we postulated: shallow-copy

0        32
1        34
2        27
3        26
4        28
         ..
18273    22
18274    22
18275    19
18276    18
18277    26
Name: age, Length: 18278, dtype: int64

In [113]:
df_new_copy = df # This is not a copy - it's a direct references.

In [114]:
df.loc[0, 'height_cm'] = 210

In [115]:
df.loc[0, 'height_cm']

np.int64(210)

In [116]:
df_new_copy.loc[0, 'height_cm']

np.int64(210)

In [118]:
# Takeaway: changing cells, cols and rows with .loc or .iloc on a shallow copy will change the original cells if it's of type 'object'. changing rows directly `df.row = new_val` will assign an entirely new Series to that row.