# Numpy indexing and slicing

Often we need only to work with a part of an array. Imagine for example that we have a large image of 1000x1000 pixels (i.e. an array of size 1000x1000) but are only interested in the central part of the image where we have a cat. We want to be able to crop the image by saying which rows and columns of the image we want to keep. Or we might only be interested in values in the array above a certain limit. Cropping and extracting parts of arrays is exactly what indexing and slicing allow us to do. This is a large topic so if you want to know more about this, read the full reference: [Numpy reference](https://numpy.org/doc/stable/reference/arrays.indexing.html#indexing).

In [1]:
import numpy as np

We first create an array:

In [2]:
my_array = np.random.normal(size=10)
my_array

array([-0.54724939, -0.19383541, -1.65671735, -1.44714229, -1.36232639,
        1.52816418,  1.23520561, -0.96858971,  0.10066735, -2.34165528])

## Extracting and setting elements

The standard way to extract information from an array is to used the square parenthesis (bracket) notation. If we want for example to extract the second element of the array we write:



In [3]:
my_array[1]

-0.19383541440161517

Remember that **we start counting from 0** in Python, which is why the *second* element has index 1.

We can extend the notation and extract a range of elements by using the ```from_index:to_index (excluded)``` notation. Here ```excluded``` means that the **last index** specified is **not included**. For example if we want to recover elements with indices from 1 to 3 we write:

In [4]:
my_array[1:4]

array([-0.19383541, -1.65671735, -1.44714229])

We can also set values in the array in the same maner. For example let's set the above elements to 10:

In [5]:
my_array[1:4] = 10

In [6]:
my_array

array([-0.54724939, 10.        , 10.        , 10.        , -1.36232639,
        1.52816418,  1.23520561, -0.96858971,  0.10066735, -2.34165528])

Note that you can sometimes simplify the notation. For example if you want to extract all elements from the 4th one **to the last one**, you don't have to specify the last index, you can simply replace it by ```:```:


In [7]:
my_array[4::]

array([-1.36232639,  1.52816418,  1.23520561, -0.96858971,  0.10066735,
       -2.34165528])

## Higher dimensions

We have seen before that we can create arrays with more than one dimension (think e.g. of the pixels of an image). For example:

In [8]:
array2D = np.random.normal(size=(3,5))
array2D

array([[-0.11452065,  0.18395542,  1.33973537,  0.81286545, -0.65965307],
       [-0.71195185, -0.85884832,  1.33369661, -0.15143385,  0.40138647],
       [-1.03786857, -1.48841381, -0.52172333, -0.1005757 ,  0.10502931]])

The indexing system works in the same way here. We just have to specify now for each dimension which rows/columns we want to extract with ```my_array[start_row:end_row, start_column:end_column]```:

In [9]:
array2D[1:3, 0:2]

array([[-0.71195185, -0.85884832],
       [-1.03786857, -1.48841381]])

Here again, we can simplify the notation. If we want to select a few rows but **want to keep all columns**, we can again use the ```:``` notation like this:

In [10]:
array2D[1:3, :]

array([[-0.71195185, -0.85884832,  1.33369661, -0.15143385,  0.40138647],
       [-1.03786857, -1.48841381, -0.52172333, -0.1005757 ,  0.10502931]])

## Working with sub-parts

Using indexing, we can also create a smaller array that we want to work on specifically. For example let's say we are only interested in the 6th to 8th element. We can **extract** it and **asign** it to a new array:

In [11]:
sub_array = my_array[7:10]

In [12]:
my_array

array([-0.54724939, 10.        , 10.        , 10.        , -1.36232639,
        1.52816418,  1.23520561, -0.96858971,  0.10066735, -2.34165528])

In [13]:
sub_array

array([-0.96858971,  0.10066735, -2.34165528])

Let's now modify an element of this subarray:

In [14]:
sub_array[0] = 100

Let's check that ```sub_array``` has indeed changed:

In [15]:
sub_array

array([100.        ,   0.10066735,  -2.34165528])

Let's now also have a look at the original array:

In [16]:
my_array

array([ -0.54724939,  10.        ,  10.        ,  10.        ,
        -1.36232639,   1.52816418,   1.23520561, 100.        ,
         0.10066735,  -2.34165528])

**The value in the original array has changed too!**. The reason is that the slicing of the array **does not create an independent sub-array**. It is still linked to the original one. Depending on the types of modification, you might or might not encounter this problem. To be on the safe side, explicitely create a **copy** when creating a sub-array. Like that it will be independent from the original one: 

In [17]:
sub_array = my_array[7:10].copy()
sub_array[0] = 200

In [18]:
sub_array

array([ 2.00000000e+02,  1.00667355e-01, -2.34165528e+00])

In [19]:
my_array

array([ -0.54724939,  10.        ,  10.        ,  10.        ,
        -1.36232639,   1.52816418,   1.23520561, 100.        ,
         0.10066735,  -2.34165528])

## Boolean indexing

Instead of using numerical indices to extract values from the array, we can also select them by some criteria. Let's create a new random array:

In [20]:
my_array2 = np.random.normal(size=10)
my_array2

array([-1.7161294 ,  0.05616971,  0.48104343,  0.64086374,  0.21503675,
       -0.26408875,  1.53445755, -0.79749931,  1.12437233,  1.46047048])

How to proceed now if we for example only want to recover the elements that are larger than 0 ?

Let's try to see what happens when we just write it down as we would in regular mathemetics:

In [21]:
my_array2 > 0

array([False,  True,  True,  True,  True, False,  True, False,  True,
        True])

We see that the output is again an array, but instead of being filled with numbers, it contains only ```False``` and ```True```. Those values also exist in plain Python and are called booleans. For example:

In [22]:
a = 3
a > 10

False

We can now create an actual boolean array:

In [23]:
bool_array = my_array2 > 0
bool_array

array([False,  True,  True,  True,  True, False,  True, False,  True,
        True])

We can now use this **boolean array** ```bool_array``` to extract values from any array of the same size. Imagine that you superpose ```bool_array``` to another array ```value_array``` and only select those values in ```value_array``` which are ```True``` in ```bool_array```. Naturally we can do this with the original array itself. Instead of passing and index ```my_array[i]``` we pass the entire ```bool_array```:

In [24]:
from IPython.display import Image
Image(url='https://github.com/guiwitz/ISDAwPython_day2/raw/master/images/logical_indexing.jpeg',width=700)

In [25]:
my_array2[bool_array] 

array([0.05616971, 0.48104343, 0.64086374, 0.21503675, 1.53445755,
       1.12437233, 1.46047048])

Naturally this output array is much smaller than the original one as it only contains the values larger than 0.

## Exercise

1. Create a numpy array with values from 0 to 10 in steps of 0.5

2. Extract the the last three elements of the array (without manually setting the array length).

3. Apply a cosine function to the array and store the output in a new array.

4. Create a boolean array telling which values in the array from (3) are smaller than 0.

5. Recover only those values in a new array via indexing.