# <div align="center">Python Shallow Copy and Deep Copy</div>
---------------------------------------------------------------------

you can Find me on Github:
> ###### [ GitHub](https://github.com/lev1khachatryan)

In Python, we use = operator to create a copy of an object. You may think that this creates a new object; it doesn't. It only creates a new variable that shares the reference of the original object.

Let's take an example where we create a list named old_list and pass an object reference to new_list using = operator.

In [1]:
old_list = [[1, 2, 3], [4, 5, 6], [7, 8, 'a']]
new_list = old_list

new_list[2][2] = 9

print('Old List:', old_list)
print('ID of Old List:', id(old_list))

print('New List:', new_list)
print('ID of New List:', id(new_list))

Old List: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
ID of Old List: 3018170792584
New List: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
ID of New List: 3018170792584


As you can see from the output both variables old_list and new_list shares the same id i.e 3018170792584.

So, if you want to modify any values in new_list or old_list, the change is visible in both.

Essentially, sometimes you may want to have the original values unchanged and only modify the new values or vice versa. In Python, there are two ways to create copies:

* Shallow Copy


* Deep Copy


To make these copy work, we use the copy module.

# <div align="center">Copy Module</div>
---------------------------------------------------------------------

We use the copy module of Python for shallow and deep copy operations. Suppose, you need to copy the compound list say x. For example:

In [3]:
import copy
x = [1,2,3]
copy.copy(x)
copy.deepcopy(x)

[1, 2, 3]

Here, the copy() return a shallow copy of x. Similarly, deepcopy() return a deep copy of x.

# <div align="center">Shallow Copy</div>
---------------------------------------------------------------------

A shallow copy creates a new object which stores the reference of the original elements.

So, a shallow copy doesn't create a copy of nested objects, instead it just copies the reference of nested objects. This means, a copy process does not recurse or create copies of nested objects itself.

In [5]:
import copy

old_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
new_list = copy.copy(old_list)

print("Old list:", old_list)
print("New list:", new_list)
print(id(old_list))
print(id(new_list))

Old list: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
New list: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
3018174728008
3018174728264


To confirm that new_list is different from old_list, we try to add new nested object to original and check it.

In [6]:
import copy

old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = copy.copy(old_list)

old_list.append([4, 4, 4])

print("Old list:", old_list)
print("New list:", new_list)

Old list: [[1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]]
New list: [[1, 1, 1], [2, 2, 2], [3, 3, 3]]


In the above program, we created a shallow copy of old_list. The new_list contains references to original nested objects stored in old_list. Then we add the new list i.e [4, 4, 4] into old_list. This new sublist was not copied in new_list.

However, when you change any nested objects in old_list, the changes appear in new_list.

In [7]:
import copy

old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = copy.copy(old_list)

old_list[1][1] = 'AA'

print("Old list:", old_list)
print("New list:", new_list)

Old list: [[1, 1, 1], [2, 'AA', 2], [3, 3, 3]]
New list: [[1, 1, 1], [2, 'AA', 2], [3, 3, 3]]


# <div align="center">Deep Copy</div>
---------------------------------------------------------------------

A deep copy creates a new object and recursively adds the copies of nested objects present in the original elements.

Let’s continue with example 2. However, we are going to create deep copy using deepcopy() function present in copy module. The deep copy creates independent copy of original object and all its nested objects.

In [8]:
import copy

old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = copy.deepcopy(old_list)

print("Old list:", old_list)
print("New list:", new_list)

Old list: [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
New list: [[1, 1, 1], [2, 2, 2], [3, 3, 3]]


However, if you make changes to any nested objects in original object old_list, you’ll see no changes to the copy new_list.

In [9]:
import copy

old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = copy.deepcopy(old_list)

old_list[1][0] = 'BB'

print("Old list:", old_list)
print("New list:", new_list)

Old list: [[1, 1, 1], ['BB', 2, 2], [3, 3, 3]]
New list: [[1, 1, 1], [2, 2, 2], [3, 3, 3]]


# <div align="center">Pandas DataFrame copy(deep=False) vs copy(deep=True) vs '='</div>
---------------------------------------------------------------------

When you write df2 = df1, you are creating a variable named df2, and binding it with an object with id 4541269200. When you write df1 = pd.DataFrame([9,9,9]), you are creating a new object with id 4541271120 and binding it to variable df1, but the object with id 4541269200 which was previously bound to df1 continues to live. If there were no variables bound to that object, it will get garbage collected by Python.

In [10]:
import pandas as pd
df1 = pd.DataFrame([1,2,3,4,5])
print(id(df1))

df2 = df1
print(id(df2))

df3 = df1.copy()
print(id(df3))

df4 = df1.copy(deep=False)
print(id(df4))

df1 = pd.DataFrame([9, 9, 9])
print(id(df1))

print(id(df2))


3018174778896
3018174778896
3018209359968
3018214993088
3018120962624
3018174778896


Deep copying doesn't work in pandas and the devs consider putting mutable objects inside a DataFrame as an antipattern. Consider the following:

In [37]:
arr1 = [1, 2, 3]
arr2 = [1, 2, 3, 4]
df1 = pd.DataFrame([[arr1], [arr2]], columns=['A'])
# print(df1.applymap(id))

df2 = df1.copy(deep=True)
# print(df2.applymap(id))

print(df1)

df2.loc[0, 'A'].append(55)
print(df1)
print(df2)


              A
0     [1, 2, 3]
1  [1, 2, 3, 4]
               A
0  [1, 2, 3, 55]
1   [1, 2, 3, 4]
               A
0  [1, 2, 3, 55]
1   [1, 2, 3, 4]


In [38]:
arr1 = [1, 2, 3,5]
arr2 = [1, 2, 3, 4]
df1 = pd.DataFrame([arr1, arr2], columns=['A', 'B','C','D'])

df2 = df1.copy(deep=True)

print('df1:', df1 )

print('df2:', df2 )

df2.loc[0, 'A'] = 222


print('df1:', df1 )

print('df2:', df2 )


df1:    A  B  C  D
0  1  2  3  5
1  1  2  3  4
df2:    A  B  C  D
0  1  2  3  5
1  1  2  3  4
df1:    A  B  C  D
0  1  2  3  5
1  1  2  3  4
df2:      A  B  C  D
0  222  2  3  5
1    1  2  3  4


In [50]:
arr1 = [1, 2, 3,5]
arr2 = [1, 2, 3, 4]
df = pd.DataFrame([arr1, arr2], columns=['A', 'B','C','D'])

In [51]:
k = df.values

In [52]:
k[0][2] = 222

In [56]:
df

Unnamed: 0,A,B,C,D
0,1,2,222,5
1,1,2,3,4


In [57]:
arr1 = [1, 2, 3,5]
arr2 = [1, 2, 3, 4]
df = pd.DataFrame([arr1, arr2], columns=['A', 'B','C','D'])

In [58]:
arr1[2] = 22

In [59]:
df

Unnamed: 0,A,B,C,D
0,1,2,3,5
1,1,2,3,4
