# Hands-on NumPy
---
Welcome to the second session of the DSC ML Talks event!. 

In this hands-on workshop on NumPy, we will be covering some of the most important concepts of the PyData module NumPy. 

This notebook contains many exercise as well based on the topics that will be covered in this session. We want you to attempt those exercises after this session ends, and make your submissions in a Google Form to be shared at the end of this session. 

### Import NumPy
---

In order to use NumPy in your python projects, the first step is to import the NumPy mudule. Here is how you do that.

In [1]:
# importing numpy
import numpy as np

Next up, let's take a look at lists and arrays.

## Arrays in NumPy
---

* Arrays are the most distinguishing feature of the NumPy module.
* Can be thought of as a substitute of Python lists.
* NumPy arrays are MUCH faster than Python lists. 

If we talk in mathematical terminology, a 1-D array represents a vector and a 2-D array represents a matrix.

In [2]:
x = list(range(10)) # this creates a list of length 10 with elements from 0 to 10
x

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

We already know what python lists are, and how they work. So let's now use the list __x__ that we declated above, and covert into a NumPy array.



In [3]:
# creating a numpy array using a python list
x_arr = np.array(x)
x_arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### EXERCISE: Create any python list of your choice and the print a NumPy array using the python list 

In [4]:
# enter code here
y = [i for i in range(0, 10, 2)]
y = np.array(y)
y

array([0, 2, 4, 6, 8])

Now one very important thing to note is that lists and numpy arrays are different because in lists you can have any types of values, it can be an int, a string, a float, etc. __But numpy array can only contain elements of a single data-type__ and it will upcast if the types don't match. 

Let us see how this works.

In [5]:
z = np.array([10, 100, 1000.0, 113]) # 3rd element, 1000.0 is a floating point number
z

array([  10.,  100., 1000.,  113.])

As you can see we passed in a float value along with the different integer values and the resultant array was one which contained all float values. So the integer values were typecasted to floats as well.


Here are some more important numpy functions that might come in handy.

* __np.zeros(shape, dtype)__ = This helps in creating an array of zeroes where length is equal to the length of the array we want and dtype is the data type of the elements in the array. You can also pass in the dimensions of the array that you wish to have.

* __np.ones(shape, dtype)__ = This helps in creating an array of ones where length is equal to the length of the array we want and dtype is the data type of the elements in the array. You can also pass in the dimensions of the array that you wish to have.

* __np.arange(start, end, step)__ = This helps in creating an array of elements starting from the 'start' and going till the end with taking the specified number of steps in between.

* __np.linspace(start, end)__ = This creates an array of evenly spaced elements.

* __np.random.randint(start, end, shape)__ = This creates an array of the dimensions that we provide and the the elements are random numbers betweent the start and the end.

In [6]:
np.zeros(5, dtype = int)

array([0, 0, 0, 0, 0])

In [7]:
np.ones((3, 4), dtype=int)

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

In [8]:
np.arange(0, 5, 2)

array([0, 2, 4])

In [9]:
np.linspace(0, 10, 5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [10]:
np.random.randint(0, 20, (3, 3)) # Create a 3x3 array of random integers in the interval [0, 20]

array([[ 7, 14, 13],
       [ 0, 17,  4],
       [ 2,  3, 11]])

### EXERCISE: Create 10 equal intervals between 30 to 90.
Hint: Which NumPy method can be used to divide a range into buckets?

In [11]:
# enter code here
np.linspace(30, 90, 10)

array([30.        , 36.66666667, 43.33333333, 50.        , 56.66666667,
       63.33333333, 70.        , 76.66666667, 83.33333333, 90.        ])

### EXERCISE: Create a numpy array of random integers between 40 to 200, containing 6 rows and 2 columns. 

In [12]:
# enter code here
arr = np.random.randint(40, 200, (6, 2))
arr

array([[ 74, 199],
       [ 82, 121],
       [134,  57],
       [126,  92],
       [ 77, 158],
       [121, 167]])

Let's go ahead and do some more operations using NumPy.

First let us have a look at the random.randn module. 

The __randn__ function returns an array containing values (or "samples") from a standard normal distribution.

* __np.random.randn(shape)__

__NOTE:__ Shape is not a tuple here (unlike some other NumPy methods) but simply comma separated integer values. 

In [13]:
a = np.random.randn(3,4)
a

array([[ 1.8432076 ,  0.07061936, -1.16431943, -1.85956952],
       [-1.12864626,  2.00265813, -0.98996977,  0.59976545],
       [ 0.38845348, -0.44987186, -1.091848  ,  2.17784094]])

### EXERCISE: Create a vector ex_mat1 of 10 elements with random samples from the normal distribution.

In [14]:
# write code here
ex_mat1 = np.random.randn(10,) 
ex_mat1

array([-0.34106478, -0.97428149,  2.23551528,  0.30649053, -0.15943794,
       -0.40673369, -1.47206916, -0.84231794, -0.60127786, -0.98229354])

### EXERCISE: Create a matrix ex_mat2 (2 rows, 3 cols) with random values from the normal distribution.



In [15]:
# write code here
ex_mat2 = np.random.randn(2, 3) 
ex_mat2

array([[-0.06233964,  0.61194652,  1.627512  ],
       [-0.33357015, -0.50385881, -0.8440918 ]])

Now, let us define 2 more matrices x1 and x2.

In [16]:
x1 = np.random.randint(0, 100, (4,5))
x2 = np.random.rand(10)
x1, x2

(array([[67, 52,  8,  7, 85],
        [24, 62, 37, 51, 50],
        [91, 61, 51, 43, 32],
        [77,  1, 84, 37,  0]]),
 array([0.14849106, 0.67659162, 0.35954891, 0.57190777, 0.97506521,
        0.77076408, 0.03141305, 0.35354714, 0.94116173, 0.34906649]))

We will use the above defined x1 and x2 and use them to understand methods like 'ndim', 'shape' and 'size

In [17]:
print("Dimensions of x2 are:",x2.ndim) # number of dimensions

print("Shape of x2 is:",x2.shape) # shape of the array

print("Size of x2 is:",x1.size) # number of elements in the array

print("The data type of x2 is:",x2.dtype) # data-type of elements in the array

Dimensions of x2 are: 1
Shape of x2 is: (10,)
Size of x2 is: 20
The data type of x2 is: float64


Now that we know some basic operations, let's take a look at a very important topic - Indexing!

Let's use the arrays x1 and x2 for this as well

In [18]:
x1 

array([[67, 52,  8,  7, 85],
       [24, 62, 37, 51, 50],
       [91, 61, 51, 43, 32],
       [77,  1, 84, 37,  0]])

In [19]:
x2 

array([0.14849106, 0.67659162, 0.35954891, 0.57190777, 0.97506521,
       0.77076408, 0.03141305, 0.35354714, 0.94116173, 0.34906649])

The indexing starts from 0 so when say we want the element at the 0th position, we are trying to access the very first element in the array.

In [20]:
print("The first element of x2 is:", x2[0])

print("The second element of x2 is:", x2[1])

print("The element at the position (0,0) of x1 is:", x1[0,0])

print("The element at the position (3,3) of x1 is:", x1[2,2])

The first element of x2 is: 0.1484910589935765
The second element of x2 is: 0.6765916164267335
The element at the position (0,0) of x1 is: 67
The element at the position (3,3) of x1 is: 51


Now, let us assume that we want the print the last element of an array, however the catch here is that we don't know the length of the array. 

In NumPy arrays, just like Python lists, indexing from the end starts from index value __-1__. That is, _arr[-1]_ will print the last element of the numpy array _arr_, _arr[-2]_ will print second last element of arr, so on and so forth. 

In [21]:
# printing last element of array x2
x2[-1]

0.34906649107338417

### EXERCISE: Print the second last element in the last row of x1.

In [22]:
# enter code here
x1[-1, -2]

37

We can use these indices to even change the values of elements in our array. We will change the first element in row 1 of the array x1 to over 9000.

In [23]:
x1[0, 0] = 9001
x1

array([[9001,   52,    8,    7,   85],
       [  24,   62,   37,   51,   50],
       [  91,   61,   51,   43,   32],
       [  77,    1,   84,   37,    0]])

### EXERCISE: Set the 4th element in x2 as 351 and print the result.

In [24]:
# enter code here
x2[3] = 351
x2

array([1.48491059e-01, 6.76591616e-01, 3.59548908e-01, 3.51000000e+02,
       9.75065215e-01, 7.70764082e-01, 3.14130486e-02, 3.53547135e-01,
       9.41161729e-01, 3.49066491e-01])

Next up, we will see how array slicing works. We can use our x1 and x2 to learn about array slicing as well.

In [25]:
x2[0:5] # this gives us the first 5 elements in our array 

array([1.48491059e-01, 6.76591616e-01, 3.59548908e-01, 3.51000000e+02,
       9.75065215e-01])

This can also be written as:

In [26]:
x2[:5]

array([1.48491059e-01, 6.76591616e-01, 3.59548908e-01, 3.51000000e+02,
       9.75065215e-01])

In [27]:
x2[::2] # gives us every other element in our array

array([0.14849106, 0.35954891, 0.97506521, 0.03141305, 0.94116173])

In [28]:
x2[::-1] # reverses the whole array for us

array([3.49066491e-01, 9.41161729e-01, 3.53547135e-01, 3.14130486e-02,
       7.70764082e-01, 9.75065215e-01, 3.51000000e+02, 3.59548908e-01,
       6.76591616e-01, 1.48491059e-01])

Let's try the same with multi-dimensional arrays

In [29]:
x1[:, 0]  # retrieve the first column

array([9001,   24,   91,   77])

In [30]:
print(x1[:, :2]) # retrieving the first 2 columns

print(x2[:]) # gives us all the elements in the array x2

print(x1[-1, :])

[[9001   52]
 [  24   62]
 [  91   61]
 [  77    1]]
[1.48491059e-01 6.76591616e-01 3.59548908e-01 3.51000000e+02
 9.75065215e-01 7.70764082e-01 3.14130486e-02 3.53547135e-01
 9.41161729e-01 3.49066491e-01]
[77  1 84 37  0]


### EXERCISE: Make two new numpy arrays y1 and y2 with random integer values of length 10. Then add reverse of y1 to reverse of y2

In [31]:
# enter code here
y1 = np.random.randint(0, 100, 10)
y2 = np.random.randint(0, 100, 10)
y1 = y1[::-1]
y2 = y2[::-1]
y1 + y2

array([ 75,  60, 104,  42,  91, 143,  95, 145, 116, 115])

Now we will take a look at __reshape__. The reshape method allows us to modify (or view a mutated view of the array values, to be more precise) the shape of a NumPy array once it has been declared.

In [32]:
arr = np.array([10,20,30,40,50,60,70,80,90])
arr.shape

(9,)

In [33]:
arr.reshape((3, 3))

array([[10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

So we defined a one dimensional array first and then converted it into a 2d array using reshape. This also can be done using __newaxis__ method but reshape works just fine and is kind of more self-explanatory.

Now, let us have a look at how to concatenate two arrays.

In [34]:
arr1 = np.array([1, 2, 3, 4, 5])

arr2 = np.array([4, 5, 6, 7, 8])

np.concatenate([arr1, arr2])

array([1, 2, 3, 4, 5, 4, 5, 6, 7, 8])

You can do the same for multi-dimensional array.

__NOTE:__ The default axis for concatenation is __axis = 0__, i.e., along the row axis. In order to concatenate along the column axis, you have to specify the __axis__ argument.

In [35]:
arr3 = np.array([[1, 2, 3],
                 [4, 5, 6]])

arr4 = np.array([[6, 7, 8],
                 [7, 8, 9]])

print(np.concatenate([arr3, arr4])) # default axis = 0

print(np.concatenate([arr3, arr4], axis=1)) # concatenation along column axis


[[1 2 3]
 [4 5 6]
 [6 7 8]
 [7 8 9]]
[[1 2 3 6 7 8]
 [4 5 6 7 8 9]]


### EXERCISE : Make two multidimensional arrays a1 of shape(5, 2) , a2 of shape(5, 2) and concatenate them along the column axis.  

In [36]:
# enter code here
a1 = np.random.randn(5, 2)
print(a1)
a2 = np.random.randn(5, 2)
print(a2)
np.concatenate([a1, a2], axis=1)

[[ 0.51685472 -0.83115289]
 [-0.29009033 -0.9597636 ]
 [ 0.17133934  0.39655519]
 [ 1.10333507  1.66704951]
 [ 1.54009626 -0.02397053]]
[[ 0.33559679  1.58514371]
 [-0.57283218 -0.01014035]
 [-0.13652998  0.92207799]
 [ 0.26146463 -0.51571265]
 [ 0.49921506  1.37966675]]


array([[ 0.51685472, -0.83115289,  0.33559679,  1.58514371],
       [-0.29009033, -0.9597636 , -0.57283218, -0.01014035],
       [ 0.17133934,  0.39655519, -0.13652998,  0.92207799],
       [ 1.10333507,  1.66704951,  0.26146463, -0.51571265],
       [ 1.54009626, -0.02397053,  0.49921506,  1.37966675]])

Now let us have a look at some other aggregation methods in NumPy.

In [37]:
print("The smallest element in arr1 is:", np.min(arr1))

print("The largest element in arr1 is:", np.max(arr1))

print("The sum of all elements in arr1 is:", np.sum(arr1))

The smallest element in arr1 is: 1
The largest element in arr1 is: 5
The sum of all elements in arr1 is: 15


We can carry out a similar thing for multi-dimensional arrays as well.

In [38]:
print("The smallest element in arr3 is:", np.min(arr3))

print("The largest element in arr3 is:", np.max(arr3))

print("The sum of all elements in arr3 is:", np.sum(arr3))

The smallest element in arr3 is: 1
The largest element in arr3 is: 6
The sum of all elements in arr3 is: 21


### EXERCISE : Find the maximum element in 1st row of arr3.
Hint: Recall how to retrieve the rows in a multidimensional array using indexing.

In [39]:
# enter code here
np.max(arr3[0])

3

Next up, we will see how does broadcasting works in NumPy.

In [40]:
a = np.array([1, 2, 3])

b = np.array([4, 5, 6])

a + b

array([5, 7, 9])

Adding a scalar to an array:

In [41]:
a = a + 5
a

array([6, 7, 8])

Adding arrays of different shapes:

In [42]:
M = np.ones((2, 3))
A = np.arange(3)

print(M)
print(A)

[[1. 1. 1.]
 [1. 1. 1.]]
[0 1 2]


In [43]:
M + A

array([[1., 2., 3.],
       [1., 2., 3.]])

Now, let's see how comparison operators work in NumPy.

In [44]:
a

array([6, 7, 8])

In [45]:
a < 12 # compares all the elements in 'a' to see if the element is smaller than 12 and if yes it returns True or else False

array([ True,  True,  True])

Next up let's explore some fancy indexing.

In [46]:
X = np.array([10, 20, 30, 45, 65, 67, 64, 3, 56, 74, 29])

In [47]:
[X[3], X[4], X[10]] # grab the elements in X at positions 3, 4 and 10

[45, 65, 29]

In [48]:
j = [3, 4, 10] # another way to do the above operation
X[j]

array([45, 65, 29])

In [49]:
y1 = np.arange(25).reshape((5, 5))
y1

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [50]:
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
y1[row, col]

array([ 2,  6, 13])

In this we defined a 5x5 matrix and we defined the row and the column variables in order to perform fancy indexing. 

We can also use fancy indexing to manipulate the values in our array.

In [51]:
x = np.arange(20)
i = np.array([3, 4, 7, 9])
x[i] = 100
print(x)

[  0   1   2 100 100   5   6 100   8 100  10  11  12  13  14  15  16  17
  18  19]


Next let's take a look at sorting using NumPy.

In [52]:
x = np.array([4, 5, 6, 32, 5, 7, 8, 1])
np.sort(x)

array([ 1,  4,  5,  5,  6,  7,  8, 32])

Here we returned elements but we can use argsort in order to return the indices of the sorted elements.

In [53]:
np.argsort(x)

array([7, 0, 1, 4, 2, 5, 6, 3], dtype=int64)

We can also do row and column wise sorting in a matrix.

In [54]:
x = np.random.randint(0, 25, (5, 5))
print(x)

[[ 3  4  8 10  1]
 [ 1 12 20 17  9]
 [ 8  3 12 10 23]
 [24 23 24 15 22]
 [23 24  1 20 16]]


In [55]:
np.sort(x, axis=0) # sorting all elements of the column

array([[ 1,  3,  1, 10,  1],
       [ 3,  4,  8, 10,  9],
       [ 8, 12, 12, 15, 16],
       [23, 23, 20, 17, 22],
       [24, 24, 24, 20, 23]])

In [56]:
np.sort(x, axis=1) # sorting all elements of the row

array([[ 1,  3,  4,  8, 10],
       [ 1,  9, 12, 17, 20],
       [ 3,  8, 10, 12, 23],
       [15, 22, 23, 24, 24],
       [ 1, 16, 20, 23, 24]])

### EXERCISE: Sort all elements of matrix m1 firstly column wise then row wise

In [57]:
m1 = np.random.randint(0, 100, (7,5))
m1

array([[17, 72, 49, 20, 45],
       [85, 85, 64, 87, 54],
       [45,  0, 61, 35, 19],
       [33, 82, 40, 19, 98],
       [53, 30, 77, 64, 38],
       [60, 20, 11, 57, 81],
       [36, 92, 12, 96,  0]])

In [58]:
# enter code here
temp = np.sort(m1, axis=0)
np.sort(temp, axis=1)
#sorted m1 column-wise then sorted that column-wise sorted matrix row-wise. 

array([[ 0,  0, 11, 17, 19],
       [12, 19, 20, 20, 33],
       [30, 35, 36, 38, 40],
       [45, 45, 49, 57, 72],
       [53, 54, 61, 64, 82],
       [60, 64, 81, 85, 87],
       [77, 85, 92, 96, 98]])

With this we come to an end to the NumPy section! Congrats on finishing this tutorial!