<a href='http://www.scienceacademy.ca'> <img style="float: left;height:70px" src="Logo_SA.png"></a>

# NumPy Essentials (Part: 2)
## Indexing, slicing, broadcasting & boolean masking

Hi Guys,<br>
Welcome to the NumPy Essentials lecture part 2.<br>

**In this lecture, we will learn:**
* How to reference elements within an array and how to assign values to the elements within an array. *(NumPy indexes are zero-based -- first element in a row is referenced using zero).* 
* How to select element or a group of elements from a NumPy array. 
* Broadcasting 
* Boolean mask array and masking operation

**Let's move on and create a new notebook to explore more about these concepts in NumPy.**

&#9989; *Please note that, this notebook can be considered as a reference to the video lecture. You can always come back and explore this notebook.*  


In [1]:
# first this first, import the library
import numpy as np

### Indexing & slicing of 1-D arrays (vectors)

In [2]:
# Lets create a simple 1-D NumPy array.
# (we can use arange() as well.) 
array_1d = np.array([-10, -2, 0, 2, 17, 106,200])

In [3]:
array_1d

array([-10,  -2,   0,   2,  17, 106, 200])

In the simplest case, selecting one or more elements of NumPy array looks very similar to python lists.

In [4]:
# Getting value at certain index
array_1d[0]

-10

In [5]:
# Getting a range value
array_1d[0:3], array_1d
# array_1d is included in the out to compare and understand

(array([-10,  -2,   0]), array([-10,  -2,   0,   2,  17, 106, 200]))

In [6]:
# Using -ve index 
array_1d[-2], array_1d
# array_1d is included in the out to compare and understand

(106, array([-10,  -2,   0,   2,  17, 106, 200]))

In [7]:
# Using -ve index for a range 
array_1d[1:-2], array_1d # 1 inclusive and -2 exclusive in this case

(array([-2,  0,  2, 17]), array([-10,  -2,   0,   2,  17, 106, 200]))

In [8]:
# Getting up-to and from certain index -- remember index starts from '0'
# (no need to give start and stop indexes)
array_1d[:2], array_1d[2:]

(array([-10,  -2]), array([  0,   2,  17, 106, 200]))

In [9]:
# Assigning a new value to a certain index in the array 
array_1d[0] = -102

In [10]:
array_1d
# The first element is changed to -102

array([-102,   -2,    0,    2,   17,  106,  200])

## Good to know!
** In case, the index does not exist, we get an IndexError** 
        
        array_1d[305]        
>***IndexError: index 305 is out of bounds for axis 0 with size 7***

**There is another way to do this and avoid such errors.**
We can get the size of the array, use mod operator and pass it to array_1d. <br>
*We may not use this trick in this course, but its useful to know*

        array_1d[305 % array_1d.size] 

###  Indexing & slicing 2-D arrays (matrices)

Lets create an array with 24 elements using arange() and convert it to 2D matrix using "shape".<br>
*note, 6 x 4 = 24*

In [11]:
array_2d= np.arange(24)
array_2d.shape = (6,4)
array_2d

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

To access any element, the general format is: <br>
* **`array_2d[row][col]`** <br>or<br> 
* **`array_2d[row,col]`**. 

We will use `[row,col]`, easier to use comma ',' for clarity.

In [12]:
# To get a complete row
array_2d[2]

array([ 8,  9, 10, 11])

In [13]:
array_2d[-4] # -0 and 0 is same inedex

array([ 8,  9, 10, 11])

In [14]:
# To get an individual element value at row = 5 and column = 2
array_2d[5,2]

22

In [15]:
# another way 
row = 5
column = 2
array_2d[row, column]

22

In [16]:
# Just to make sure, using [row][col] :)
array_2d[5][2]

22

In [17]:
# 2D array slicing
array_2d[:2,:2] # array_2d[:2,:2].shape gives (2,2), 4 elements for top left corner
# array_2d[0:2,0:2] is same as array_2d[:2,:2]

array([[0, 1],
       [4, 5]])

In [18]:
array_2d[2:4,2:4] # inner slice

array([[10, 11],
       [14, 15]])

### Broadcasting
Numpy arrays are different from normal Python lists because of their ability to broadcast. We will only cover the basics, for further details on broadcasting rules, click [here](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) <br>
Another good read on [broadcasting](https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html)!<br>

**Lets start with some simple examples:**

In [19]:
# Lets create an array using arange()
array_1d = np.arange(0,10)
array_1d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Take a slice of the array and set it equal to some number, say 500.<br>

        array_1d[0:5] = 500 
this will **broadcast the value of 500 to the first 5 elements** of the array_1d

In [20]:
array_1d[0:5] = 500 
array_1d

array([500, 500, 500, 500, 500,   5,   6,   7,   8,   9])

In [21]:
# Lets create a 2D martix with ones
array_2d = np.ones((4,4))
array_2d

array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])

In [22]:
# Lets broadcast 300 to the first row of array_2d
array_2d[0] = 300
array_2d

array([[ 300.,  300.,  300.,  300.],
       [   1.,    1.,    1.,    1.],
       [   1.,    1.,    1.,    1.],
       [   1.,    1.,    1.,    1.]])

In [23]:
# Lets create a simple 1-D array and broadcast to array_2d
array_2d + np.arange(0,4)
# try array_2d + np.arange(0,3), did this work? if not why?

array([[ 300.,  301.,  302.,  303.],
       [   1.,    2.,    3.,    4.],
       [   1.,    2.,    3.,    4.],
       [   1.,    2.,    3.,    4.]])

In [24]:
array_2d + 300
# array_2d + [300,2], did it work? if not why?

array([[ 600.,  600.,  600.,  600.],
       [ 301.,  301.,  301.,  301.],
       [ 301.,  301.,  301.,  301.],
       [ 301.,  301.,  301.,  301.]])

### Another broadcasting example  

In [25]:
array_1 = np.arange(1,4)
array_2 = np.arange(1,4)[:, np.newaxis]

In [26]:
# Official way of printing is used, format() and len() are used for revisions
print(array_1) 
print("Shape of the array is: {}, this is {}-D array".format(array_1.shape,len(array_1.shape)))
# (3,) indicates that this is a one dimensional array (vector) 

[1 2 3]
Shape of the array is: (3,), this is 1-D array


In [27]:
# Official way of printing is used, format() and len() are used for revisions
print(array_2)
print("Shape of the array is: {}, this is {}-D array".format(array_2.shape,len(array_2.shape)))
# (3, 1) indicates that this is a 2-D array (matrix)

[[1]
 [2]
 [3]]
Shape of the array is: (3, 1), this is 2-D array


In [28]:
# Broadcasting arrays
array_1 + array_2

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

This [image](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png) could be very helpful to understand the broadcasting concepts: The code to generate this image is available [here](https://jakevdp.github.io/PythonDataScienceHandbook/06.00-figure-code.html#Broadcasting).  
<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png" >
*Please note, the image is reference to the source on the website only*

# Good to know 
## Fancy Indexing
Fancy indexing allows us to select entire rows or columns out of order. <br>
**Lets create a NumPy array_2d to see how it works!** <br>

***Do you remember, zeros(), range(), shape and broadcasting? let's revise these concepts :)***<br>

    * array_2d = np.zeros((5,5))
    * array_2d[1]=1 # broadcasting 1 to the 2nd row at index 1
    * array_2d[2]=2 # broadcasting 2 to the 2nd row at index 2 
    * array_2d[3]=3 # broadcasting 3 to the 2nd row at index 3
    * array_2d[4]=4 # broadcasting 4 to the 2nd row at index 4
    * array_2d # see how the matrix look like!

This above process is **tedious!**, You think you can use a **for loop?**<br>
*The comments are provided for revisions!*

In [29]:
array_2d = np.zeros((5,5))          # Create a zero matrix
array_2d.shape[1]                   # using shape attribute, get the no to run the loop
for i in range(array_2d.shape[1]):  # using range() in the loop
    array_2d[i] = i                 
array_2d                            # print the matrix

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.]])

In [30]:
array_2d[[1,2,3]]

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.]])

In [31]:
# We can use any order
array_2d[[3,0,4]]

array([[ 3.,  3.,  3.,  3.,  3.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 4.,  4.,  4.,  4.,  4.]])

In [32]:
# lets try another matrix
array_2d = np.arange(24)
array_2d.shape = (6,4)
array_2d

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

In [33]:
# grabbing rows
array_2d[[2,4]]

array([[ 8,  9, 10, 11],
       [16, 17, 18, 19]])

In [34]:
# grabbing columns
array_2d[:,[3,2]]

array([[ 3,  2],
       [ 7,  6],
       [11, 10],
       [15, 14],
       [19, 18],
       [23, 22]])

## Boolean mask arrays
Boolean mask is very useful and handy when it comes to count, modify, extract or manipulate values in an array based on certain condition or criteria, e.g. <br>
* We want to count all the values greater than a certain value. <br>
* We set a threshold, and want to get-rid of outliers in our data.<br>

In NumPy, Boolean masking is often the most efficient way to accomplish these types of tasks.<br>
Lets start with a very simple example.

In [35]:
# Lets create a simple array using arange()
array_1d = np.arange(1,11)
array_1d

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

We can apply condition such as >, <, == etc

In [36]:
# lets create a bool_array for some condition, say array_1d > 3
bool_array = array_1d > 3
bool_array

array([False, False, False,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

Lets create a mask to **filter out the even numbers in "array_1d"**

In [37]:
# A number is even if, number % 2 is "0"
mod_2_mask_1d = 0 == array_1d % 2
mod_2_mask_1d

array([False,  True, False,  True, False,  True, False,  True, False,  True], dtype=bool)

### Masking operation
**In masking operation,** we simply index on the boolean array "**`array_mod_2_mask`**", that will return a 1D array filled with all the values that meet the condition -- all the values in position at which the mask array (array_mod_2_mask) is "**`True`**". 

In [38]:
# filtering out the odds in masking operation
even_values = array_1d[mod_2_mask_1d]
print(even_values)

[ 2  4  6  8 10]


In [39]:
# Lets check with our array_2d
array_2d= np.arange(24)
array_2d.shape = (6,4)
mask_mod_2_2d = 0 == array_2d %2
print(array_2d)
print(mask_mod_2_2d)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
[[ True False  True False]
 [ True False  True False]
 [ True False  True False]
 [ True False  True False]
 [ True False  True False]
 [ True False  True False]]


In [40]:
# filtering out the odds in masking operation
print(array_2d[mask_mod_2_2d])

[ 0  2  4  6  8 10 12 14 16 18 20 22]


# Excellent Job!
### A quick review, and we will move on to the next topic