# Introduction to numpy with exercises

NumPy is short for Numerical Python. It provides us with a multidimensional *ndarray*, comparable to pythons built in *list*. An *ndarray* can store elements of one datatype only and all computations are optimized for fast scientific computing. We will look at some basic functionality before we move on to pandas, which is built on top of numpy.
I won't explain everything in detail, but I encourage you to look at the documentation of all the functions we use.

First we import numpy. The convention is to import numpy as np. 

In [2]:
import numpy as np

There are several ways to create a numpy array. We will look at some of them in this notebook. First we create a *ndarray* from a list:

In [2]:
my_list = [2,5,7,2,123,56,22,36,122,4,346,251,546]
my_array = np.array(my_list)

Now we can calculate some statistics on it, using numpy's functions:

In [3]:
#the mean
np.mean(my_array)

117.07692307692308

In [4]:
#the sum
np.sum(my_array)

1522

In [5]:
#the standard deviation
np.std(my_array)

161.16616439060246

We can also call methods on a numpy array. The syntax is different, but the result is the same:

In [68]:
my_array.mean()

117.07692307692308

In [67]:
#get maximum of the array
my_array.max()

546

## Fast computation

Next we create a large array with random integers, using numpy's random.randint function. We use this lare array to show that numpy's function are indeed faster than built in Python functions.
Look up numpy's random.randint in the to understand how we create the array.

In [13]:
large_array = np.random.randint(1, 5000, 100000)

Let's look at our array:

In [17]:
large_array

array([2081, 1449, 2417, ..., 1868, 1836, 2393])

In [14]:
%%timeit
np.sum(large_array)

42.7 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [15]:
%%timeit
sum(large_array)

7.22 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Here we used %%timit to calculate the average duration of the following calculation. This is only for demonstration purposes, the use of timeit is not part of this course.
We see that it takes numpy's sum function on average 42 microseconds to compute the sum, while it takes pythons built in sum function 7 miliseconds to compute it. So we see that numpy is indeed much faster.

## Slicing, indexing and filtering

As we did with python lists, we can use slicing to look at parts of the array. To show this, we first create a new array with *np.arange*. We urge you to take a look at the documentation op np.arange to see what's happening.

In [3]:
my_range = np.arange(1,100)

In [33]:
print(my_range)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
 97 98 99]


In [41]:
#look at the 10th element of the array (note that indexing starts at zero)
print(my_range[9])

10


In [34]:
#print the first 6 elements of the array
print(my_range[:6])

[1 2 3 4 5 6]


In [35]:
#print the last 7 elements of the array
print(my_range[-7:])

[93 94 95 96 97 98 99]


In [36]:
#create a new array with every other element
new_array = my_range[::2]
print(new_array)

[ 1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95
 97 99]


In [38]:
#create a new array with all elements backwards
backwards_array = my_range[::-1]
print(backwards_array)

[99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76
 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52
 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28
 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4
  3  2  1]


Like with lists, we can also use boolean indexing to get specific values of the array

In [73]:
#get all elements bigger than 50
my_range[my_range > 50]

array([51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [5]:
#get all even elements
my_range[my_range % 2 == 0]

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,
       36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68,
       70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])

## Multidimensional arrays

We can also create multidimensional arrays. You can add as much dimensions as you want, but we'll only look at two-dimensional arrays. You can think of a two-dimensional array as a matrix. 
For this example, we use numpy's random.randn function to create a matrix, but there are lots of ways to do this. Again, look at the documentation to see what's happening.

In [6]:
my_matrix = np.random.randn(4,5)
print(my_matrix)

[[ 0.33104357 -2.36250819  1.38901189 -0.28846464 -0.29097257]
 [-1.99468553 -0.14413605 -1.00274485  1.74461533  1.16883954]
 [ 1.30902461  0.70832768 -0.13963162  0.30532602 -0.21095704]
 [-0.303024   -0.89279909  1.19600517 -0.16456374  0.95757163]]


We can look at the shape of the array

In [7]:
my_matrix.shape

(4, 5)

You can think of this as a matrix with 4 rows and 5 columns. 
We can also extract the number of rows and columns separately:

In [10]:
number_of_rows = my_matrix.shape[0]
number_of_cols = my_matrix.shape[1]
print(f"number of rows of my_matrix: {number_of_rows}")
print(f"number of cols of my_matrix: {number_of_cols}")

number of rows of my_matrix: 4
number of cols of my_matrix: 5


We can also use numpys reshape method to transform a one dimensional array to a 2-dimensional one

In [54]:
another_matrix = my_range.reshape(9,11)
print(another_matrix)

[[ 1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22]
 [23 24 25 26 27 28 29 30 31 32 33]
 [34 35 36 37 38 39 40 41 42 43 44]
 [45 46 47 48 49 50 51 52 53 54 55]
 [56 57 58 59 60 61 62 63 64 65 66]
 [67 68 69 70 71 72 73 74 75 76 77]
 [78 79 80 81 82 83 84 85 86 87 88]
 [89 90 91 92 93 94 95 96 97 98 99]]


Note that my_range is an array with 99 elements, so we have to reshape is to a matrix with the same number of elements. Otherwise it will throw an error.

In [55]:
my_range.reshape(10,10)

ValueError: cannot reshape array of size 99 into shape (10,10)

Now we can compute statistics on this matrix. 

In [58]:
#mean of all elements in the matrix
np.mean(another_matrix)

50.0

We can use the axis parameter to do computation on either the rows or the columns

In [60]:
#compute mean of each column
np.mean(another_matrix, axis = 0)

array([ 6., 17., 28., 39., 50., 61., 72., 83., 94.])

In [62]:
#compute mean of each row
np.mean(another_matrix, axis = 1)

array([ 6., 17., 28., 39., 50., 61., 72., 83., 94.])

To learn more about the use of the axis parameter you can look, for example, here:
    https://www.sharpsightlabs.com/blog/numpy-axes-explained/

We can also do indexing, slicing and filtering on a multidimensional array. To illustrate this, we create a new matrix using the reshape() method.

In [11]:
my_array = np.arange(0,15).reshape(3,5)
print(my_array)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


We can select one element by [row, columns position]

In [14]:
 my_array[1,4]

9

Or we can use slicing. The next cell outputs the intersection of the first two rows and the first three columns.

In [19]:
my_array[:2, :3]

array([[0, 1, 2],
       [5, 6, 7]])

In [20]:
#return all rows, 3rd and 5th column
my_array[:,[2,4]]

array([[ 2,  4],
       [ 7,  9],
       [12, 14]])

## Exercises

In [72]:
#TODO

# References and more info
- numpy's axis parameter explained: https://www.sharpsightlabs.com/blog/numpy-axes-explained/
- creating NumPy arrays with random: https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html
- indexing and slicing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

