# Views and Copies

In our previous reading, we talked about how we could not just look at subsets of vectors, but also store those subsets in a new variable. For example, we could pull the middle entry of one vector and assign it to a new vector:

In [1]:
import numpy as np
a = np.array([42, 47, -1])
a

array([42, 47, -1])

In [2]:
new = a[1]
new

47

At the time, we illustrated what was happening with this set of pictures:

![vector_subsetting1](../week_2/img/C2W2-30_subsetting_part1.png)
![vector_subsetting2](../week_2/img/C2W2-30_subsetting_part2.png)
![vector_subsetting3](../week_2/img/C2W2-30_subsetting_part3.png)

And that was *close* to the truth about what was going on, but it wasn't quite the *full* truth. 

The reality is that when we create a subset in numpy and assign it to a new variable, what is *actually* happening is not that the variable is being assigned a *copy* of the values being the subset, but rather the variable is being assigned a *reference* to the subset, something that looks more like this:

[new image with referring arrow]

When numpy creates a reference to a subset of an existing array, that reference is called a *view*, because it's not a copy of the data in the original array, but an easy way to referring back to the original array -- it provides a *view* onto a subset of the original array. 

Why is this distinction important? It's important because it means that both variables -- `a` and `new` are actually both referencing the same data, and so changes make through one variable will propagate to the other. 

To illustrate in more detail, let's create two new vectors: `my_vector` and `my_subset`, where `my_subset` (as the name implies) is just a subset of `my_vector`:

In [3]:
my_vector = np.array([1, 2, 3, 4])
my_vector

array([1, 2, 3, 4])

In [4]:
my_subset = my_vector[1:3]
my_subset

array([2, 3])

[MIGHT NEED ANOTHER PIC HERE?]

Now suppose we change the first entry of `my_subset` to be `-99`:

In [5]:
my_subset[0] = -99

Since the first entry in `my_subset` is just a reference to the second entry in `my_vector`, the change I made to `my_subset` will also propagate to `my_vector`:

In [6]:
my_vector

array([  1, -99,   3,   4])

And just as edits to `my_subset` will propagate to `my_vector`, so too will edits to `my_vector` propagate forward to `my_subset`:

In [29]:
my_vector[2] = 42
my_subset

array([-99,  42])

### Why? Why Would Numpy Do This?

It is not uncommon, when they are first introduced to this behavior, for students to feel a little betrayed by numpy. "*Why*," they ask, "why would numpy do something that makes it so much harder to keep track of the consequences of changes I make to my data?"

The short answer, as with most things in numpy, is that it's all about speed. Creating a new copy of the data contained in the subset of a vector takes time, and so creating *views* instead of copies makes numpy faster.

## When do I get a view, and when do I get a copy?

Because numpy will *usually* create views when you subset a vector, and changes to views will propagate to the vectors associated with other variables, it's really important to keep track of when the object you're working with is a copy. 

Which brings us to the next slightly frustrating thing about numpy: the way that you ask for a subset will determine whether you get a view or a copy.


### Views and Copies from Subsetting

Generally speaking, **numpy will give you a *view* if you ask for a simple slice of an array,** but it will provide a *copy* if you use any other methods. A "simple slice" is when you pass a single index, or a range of indices separated by a `:`. So `my_vector[2]` is a simple slice, and so it `my_vector[2:4]`.

So, for example, this slice returns a view:

In [18]:
my_array = np.array([1, 2, 3])
my_slice = my_array[1:3]
my_slice

array([2, 3])

In [19]:
my_slice[0] = -1
my_array

array([ 1, -1,  3])

But if you ask for a subset any other way—such as with "fancy indexing" (where you pass a list when making your slice) or Boolean subsetting—you will NOT get a view, you will get a copy. As a result, changes made to your subset will not propagate back to `my_array`:

In [23]:
my_array = np.array([1, 2, 3])
my_slice = my_array[[1,2]]
my_slice[0] = -1
my_array

array([1, 2, 3])

In [24]:
my_array = np.array([1, 2, 3])
my_slice = my_array[my_array >= 2]
my_slice[0] = -1
my_array

array([1, 2, 3])

### Views and Copies When Editing

We established above that numpy will only return a view when you subset with simple slices, but not when you use fancy indexing or Boolean subsetting. 

But it's also important to understand what types of modifications of a view will result in changes that propagate back to the original array. 

But if you modify a view with a simple slice on the left-hand side of the assignment indicator (e.g., `my_slice[0] = ...` or `my_slice[0:2] = ...`), that change will propagate back to the original array (`my_array`).

But if we modify our vector and assign it to `my_slice` *without* that simple slice on the left-hand side of the assignment operator, numpy will actually just create a new vector and assign it to the variable, not modify entries in our current vector. So in the following example, when numpy sees `my_slice * 2` it just creates a new vector with values equal to double the values in `my_slice`, then assigns that vector to the variable `my_slice`—it doesn't modify the data originally associated with `my_slice` (which is the same data underlying `my_array`):

In [25]:
my_array = np.array([1, 2, 3])
my_slice = my_array[1:3]
my_slice = my_slice * 2
my_slice

array([4, 6])

In [26]:
my_array

array([1, 2, 3])

If you want ever do want to do a full-array manipulation and preserve your view, you can just use square brackets on the left side of the assignment operator with just `:`:

In [27]:
my_array = np.array([1, 2, 3])
my_slice = my_array[1:3]
my_slice[:] = my_slice * 2
my_slice

array([4, 6])

In [28]:
my_array

array([1, 4, 6])

## Making a Copy

Of course, this type of propagating behavior is not always desirable, and so if one wishes to pull a subset of a vector (or array) that is a full copy and not a view, one can just use the `.copy()` method:

In [14]:
my_vector = np.array([1, 2, 3, 4])
my_subset = my_vector[1:3].copy()
my_subset

array([2, 3])

In [15]:
my_subset[0] = -99
my_subset

array([-99,   3])

In [16]:
my_vector

array([1, 2, 3, 4])

## Checking If An Array is a View or Copy

In all the examples above, I showed whether we were working with a view or a copy by making an edit to the array in question and then looking at whether the original data had changed. But surely there's a more direct way to check if an array is a view or a copy, right?!

Well... kind of. But... well, it gets really complicated. In fact, it gets so complicated that we don't actually think it's worth learning. So if you're just curious, please read on, but **don't feel like you need to wrestle with what follows if you don't want to!**

All numpy arrays have a property called `.base` which points back to the original data it references. If you ask for `.base` (e.g., `my_array.base`) and you get back `None`, then you can be confident that what you're working with is not a view of a different array:

In [31]:
my_vector = np.array([1, 2, 3, 4])
my_vector.base is None

True

*But* that doesn't mean that there aren't other arrays that are views of `my_vector` which, if edited, would change `my_vector`, or that edits to `my_vector` wouldn't impact other arrays that are views of `my_vector`!

In [32]:
my_slice = my_vector[0:2]
my_slice[0] = -99
my_vector

array([-99,   2,   3,   4])

So yes, if you create a new array and immediately check `.base`, then if it is `None` then you know you just created a copy. 

But it isn't necessarily the case that if `.base` is *not* `None` that you need to worry. While finding that `.base` is not `None` does mean you're working with a view, it's possible that nothing else points to the original underlying data, and so changes to your array can't impact anything else (and no changes elsewhere in your code). 

To illustrate, consider the following example:

In [34]:
my_vector = np.array([1, 2, 3, 4])
my_vector.base is None

True

In [35]:
my_vector = my_vector[2:4]
my_vector.base is None

False

Here we created an array with values 1, 2, 3, and 4, then subset the third and fourth entries. This subset is a view of the original data, and so `.base` is not `None`. But because we assigned the subset back to the vector `my_vector`, while that original array of 1, 2, 3, and 4 still exists in memory, nothing can access it except through the variable `my_vector`, and so you don't have to worry about any changes elsewhere in your code propagating to `my_vector` even if it technically is a view.