![alt text](pandas.png "Title")

In [0]:
import pandas as pd

# Dataframes copies

Remember the discussion of copying mutable objects? It does not create a new object, but instead a view to it.

Dataframe are mutable! Therefore df1=df gives you a view, not a copy. 

__A change on a view changes the original!__

In [0]:
a = ['hello', 'bye']
b = a

In [0]:
# Let's create a small dataframe:
df = pd.DataFrame( {'col1': [1, 2, 3]})
df

In [0]:
# Taking a so-called 'copy'
df2 = df

# Changing the new df
df2['col1'] = df2['col1'] * 2
df2 

In [0]:
# As expected, the original dataframe has been changed too:
df

In [0]:
# Ok, let's recreate the df:
df = pd.DataFrame( {'col1': [1, 2, 3]})

# And this time we take a real copy. It will create an new independant object:
df2 = df.copy()

df2['col1'] = df2['col1'] * 2

# As expected, the original dataframe is not modified
df

# Here's an alternative way to create a copy:
df2 = pd.DataFrame(data=df.values, columns=df.columns)

## Using mutable objects in a Dataframe is an anti-pattern

In some scenarios, even copy() does not make a completely independent copy. This happens when using mutable objects (e.g. lists, dictionaries etc) inside a Dataframe!

In [0]:
# The value is not a single value but a list (a mutable object):
df = pd.DataFrame( {'col1': [['a', 'b', 'c']] } )
df

In [0]:
# Let's copy df
df2 = df.copy()

# Say that I want to extract that list and append items.

# I could convert the Series 'col1' to a Python list of lists and take the first item:
MyCol = list(df.col1)
print('MyCol', MyCol)
MyList = MyCol[0]
print('MyList:', MyList)

# Now let's append an item to that list
MyList.append('d')
print('My new list', MyList)

In [0]:
# In df2, we see the appended item too, that's... interesting (?)
df2

In [0]:
# In the original, despite copy(), we also see the change :-O
df

# Wow!

In [0]:
# Here's a way to cut the link between the list and the df. First let's create the df:
df = pd.DataFrame( {'col1': [['a', 'b', 'c']] } ) 

# Let's construct the list from scratch using a comprehension list:
mylist = [ item for item in df.col1[0] ]

# Adding one item to the list
mylist.append('d')

# The original df wasn't modified. Phheww!
df

## Conclusion

1) df2=df1 doesn't create a copy but a view, which is far more efficient... if that's what you want :-)

2) __Avoid__ using mutable objects inside dataframes, unless you know what you're doing! 



__________________________________________________
Nicolas Dupuis, Methodology and Innovation (IDAR C&SP), 2020+