# Shallow and Deep Copies

In this notebook we will briefly touch on how python handles data storage.

By the end of this notebook you will know about:
- How python handles variable storage,
- The difference between assignment, shallow copies, and deep copies,
- Why we want to perform deep copies,
- The `del` function,
- The `id` function.

Let's start by making <i>assigning</i> the list `[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]` to a variable `a`.

In [1]:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

When you execute that code it creates a new list, `[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]`, and stores that list in memory. It then points `a` to that list as illustrated in this image.

<img src="a_list.png" style="width:50%"></img>

Now let's run the following code assigning the variable `a` to a new variable, `b`.

In [2]:
b = a

Now python will point `b` to the same list object that `a` is pointed to like in this image.

<img src="a_b_list.png" style="width:60%"></img>

So what happens when we run something like `b[4] = 11`?

In [3]:
b[4]=11

In [4]:
a

[1, 2, 3, 4, 11, 6, 7, 8, 9, 10]

Because `a` and `b` were pointing to the same list object in your memory, if you change `b`, then you also change `b`.

We can check this with the `id` function of base `python`, <a href="https://docs.python.org/3/library/functions.html#id">https://docs.python.org/3/library/functions.html#id</a>. This function takes in a variable and provides you the "identity" of the object stored within. An object's identity is a unique integer that will not change over the lifetime of that object.

In [9]:
## Note that this number will likely be different from 
## what you see on your computer screen
print(id(a))
print(id(b))
print(id([0,1]))

2078653503104
2078653503104
2078653342464


Because `a` and `b` are pointing to the same object in your computer's memory `id(a)` equals `id(b)`

## The `copy` module

In order to make true copies of an object in python we will need to use the `copy` module (note that we will look at modules and packages more closely in a later `jupyter notebook`). We can make two types of copies with the `copy` module:
1. Shallow copies, and
2. Deep copies.

In order to use the `copy` module, we first need to `import` it, again more on this in a later `jupyter notebook`.

In [10]:
import copy

### Shallow copies with `copy.copy()`

A shallow copy will create a new object and then copy and paste the contents of the previous object into the new copy. Shallow copies can be made with the `copy.copy()` function, <a href="https://docs.python.org/3/library/copy.html#copy.copy">https://docs.python.org/3/library/copy.html#copy.copy</a>.

For example:

In [11]:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

## making a shallow copy of a
b = copy.copy(a)

In [12]:
b[4] = 11

In [13]:
a

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [14]:
b

[1, 2, 3, 4, 11, 6, 7, 8, 9, 10]

In [15]:
print("id(a) =", id(a))
print("id(b) =", id(b))

id(a) = 2078653567872
id(b) = 2078653671168


We have now made a new object that is a shallow copy of `a`.

#### An issue with shallow copies

Shallow copies do struggle with more complicated data structures, for example `list`s of `list`s.

In [32]:
a = [[1], [2], [3], [4], [5]]
b = copy.copy(a)

## now we will reassign the 0 position of 
## the 0 list of b to be 100
b[0][0] = 100

In [33]:
a

[[100], [2], [3], [4], [5]]

In [34]:
b

[[100], [2], [3], [4], [5]]

The issue lies with how `copy.copy` copies the inner contents of the main object. We can see this by examining the `id` of the `0` entries of `a` and `b`.

In [35]:
id(a[0])

2078653596544

In [36]:
id(b[0])

2078653596544

Luckily, there is a way to make true copies of more complex data structures as well.

### Deep copies with `copy.deepcopy()`

Unlike a shallow copy, a deep copy will make a new object and then populate it with true copies of the interior of the original object. Deep copies can be made with the `copy.deepcopy()` function, <a href="https://docs.python.org/3/library/copy.html#copy.deepcopy">https://docs.python.org/3/library/copy.html#copy.deepcopy</a>.

Let's return to our `list` of `list`s example.

In [38]:
a = [[1], [2], [3], [4], [5]]
b = copy.deepcopy(a)

b[0][0] = 100

In [39]:
a

[[1], [2], [3], [4], [5]]

In [40]:
b

[[100], [2], [3], [4], [5]]

Now we can alter the `list`s inside of the main `list` of `b` without altering the content of `a`.


The concepts of a shallow and deep copies will come up a lot when we shift to data science topics, particularly data splits.


## The `del` Statement

When you think you are done with a particular variable, you may want to delete it. This can be done in python with a `del` statement, <a href="https://docs.python.org/3/reference/simple_stmts.html#del">https://docs.python.org/3/reference/simple_stmts.html#del</a>.

In [41]:
## For a del statement type del a space and 
## then the variable(s) you want deleted
## multiple variables should be separated by a comma
del a, b

Deleting a variable in this way will remove the pointer from the variable name to the data object stored in your memory. <i>Note: this will not directly delete the object from your computer's memory, but if no variables are pointing to an object, it will eventually be deleted by python, usually quickly after the `del` statement is executed.</i>

--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2023.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)