<h1>Project -1: Introduction to NumPy</h1>

<h2>Introduction</h2>

<p>This activity aims to introduce you to Numpy - a package for scientific computing with Python that we will use extensively in this class. This activity is by no means a complete tutorial on NumPy but it should be enough for you to do most of projects and activities in this class. For more information, please see NumPy's <a href="https://docs.scipy.org/doc/numpy/user/quickstart.html">official tutorial</a> and <a href='https://docs.scipy.org/doc/numpy/reference/index.html'>API</a>. To use NumPy, first import the package as what we do in the following cell:</p>

In [1]:
import numpy as np

<h2>Creating Vectors and Matrices</h2>

<p> NumPy's main object is a multidimensional array, in other words, a table of the same data type. Let's see an example on how to create an NumPy array:  </p>

In [2]:
X = np.array([[1,2,3], [4,5,6]])
X

array([[1, 2, 3],
       [4, 5, 6]])

In the cell above, we created a two dimensional table, a.k.a, a matrix of size $2 \times 3$. To create an array, what you need to do is to pass in a list of objects into the function [np.array()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html). Now that we have shown you how to create a matrix, you might have wondered how we can represent a vector in NumPy. There are three ways to represent a vector in NumPy. In the cell below we are using the function [.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html) to specify the length of the 2-D array in each dimension.

In [3]:
v1=np.array([3,4,5])
print("This is a numpy vector:{}. It's shape is {}".format(v1, v1.shape))
v2=v1.reshape((3,1))
print("This is a column vector (matrix):\n{}. It's shape is {}".format(v2, v2.shape))
v3=v1.reshape((1,3))
print("This is a row vector (matrix):{}. It's shape is {}".format(v3,v3.shape))

This is a numpy vector:[3 4 5]. It's shape is (3,)
This is a column vector (matrix):
[[3]
 [4]
 [5]]. It's shape is (3, 1)
This is a row vector (matrix):[[3 4 5]]. It's shape is (1, 3)


<p><p>These three representation are usually not compatible. Some operations will still work, but not in the way we expect. </p> We will always prefer the vector notation. You can transform any (matrix) vector into a numpy vector with [.flatten()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html).</p>

In [4]:
#We add v1 and v2 but the output is not as expected
v4=v2+v1
print('The sum of a column vector and numpy vector:\n{}'.format(v4))
v5 = v2.flatten() + v1 
print('The expected result of summing two numpy vectors: {}'.format(v5))

The sum of a column vector and numpy vector:
[[ 6  7  8]
 [ 7  8  9]
 [ 8  9 10]]
The expected result of summing two numpy vectors: [ 6  8 10]


Numpy arrays, as objects, pass by reference.  This means that when you set return an array, you're returning a pointer (reference) to the real array in memory, and doing operations on the array will modify its values for every variable pointing to the array.  Because of this, numpy provides a [.copy()](https://numpy.org/doc/stable/reference/generated/numpy.copy.html) function, which will create a completely new array with the same values so that you can safely edit variables independently.


In [5]:
a = np.array([1,2,3])
b = a
c = a.copy()
print('Array a: {}'.format(a))
print('Array b (passed by reference): {}'.format(b))
print('Array c (passed by value): {}'.format(c))
a[0] = 5
print('Array a after editing a: {}'.format(a))
print('Array b after editing a: {}'.format(b))
print('Array c after editing a: {}'.format(c))

Array a: [1 2 3]
Array b (passed by reference): [1 2 3]
Array c (passed by value): [1 2 3]
Array a after editing a: [5 2 3]
Array b after editing a: [5 2 3]
Array c after editing a: [1 2 3]


### Axes

Numpy defines axes for arrays by giving them a number, so that functions can act along a certain dimension on the array. 

For a 1 dimensional array the axis is defined as <br>
$\left[\begin{array}{ccc}
axis0 → & 0 & 0 \\
\end{array}\right]$

For a 2 dimensional array the axis are defined as <br>
$\left[\begin{array}{ccc}
0 & axis 1 → & 0 \\
axis 0  & 0  & 0 \\
↓ & 0  & 0 \\
0 & 0  & 0 \\
\end{array}\right]$

The same pattern continues when more dimensions are added.

When you want to apply a function along a particular axis, you pass the axis number into the numpy function as an optional argument "axis=..." If you set axis=-1 it uses the last axis. For example, if there are three axis then axis=-1 is equivalent to axis=2. 

An example of using the axis argument with the sum function is below. This function sums up the matrix elements along the given axis. 

In [6]:
A = np.array([[1,-1],[1,-1],[1,-1]])
print('Original matrix: \n{}'.format(A))
print('Summed along axis = 0: \n{}'.format(np.sum(A, axis=0)))
print('Summed along axis = 1: \n{}'.format(np.sum(A, axis=1)))
print('Summed along axis = -1, which is the same as the last axis=1: \n{}'.format(np.sum(A, axis = -1)))

Original matrix: 
[[ 1 -1]
 [ 1 -1]
 [ 1 -1]]
Summed along axis = 0: 
[ 3 -3]
Summed along axis = 1: 
[0 0 0]
Summed along axis = -1, which is the same as the last axis=1: 
[0 0 0]


### Broadcasting

Numpy can automatically change the shapes of arrays if you try to complete an operation where the shapes of input arrays do not match up. Numpy will "broadcast" the smaller array axes along the larger array so they have compatible shapes.

The way this works is that numpy checks the length of all the dimensions from right to left (largest axis to smallest) to see if they match. If the dimensions do not match broadcasting is applied.

This is confusing at first but and can be more easily understood by [reading the numpy documentation and looking at examples](https://numpy.org/doc/stable/user/basics.broadcasting.html)!

Broadcasting can be very useful in simplifying your code, but it is a double edged sword. It can lead to mistakes when your code compiles and works correctly because broadcasting was used under the hood, but in reality your dimensions were not matching up correctly! 



In [7]:
# If one of the arrays has an axis length 1, then the length 1 axis is broadcast to match the shape of the larger axis.
print('Original matrix X: \n{}'.format(X))
Y = np.array([[0, 5, 10]])
print('Matrix Y with mismatched dimension 0: \n{}'.format(Y))
print('X+Y using broadcasting: \n{}'.format(X + Y))
Z = np.array([[-10],[10]])
print('Matrix Z with mismatched dimension 1: \n{}'.format(Z))
print('X+Z using broadcasting: \n{}'.format(X + Z))
# If one of the array dimensions does not exist, then the operation applies elementwise along the axis that exists. 
V = np.array([0, 5, 10])
print('Matrix V with missing dimension 0: \n{}'.format(V))
print('Adding X+V using broadcasting: \n{}'.format(X + V))

# If both the axis exist but are different length then the operation fails!!!

Original matrix X: 
[[1 2 3]
 [4 5 6]]
Matrix Y with mismatched dimension 0: 
[[ 0  5 10]]
X+Y using broadcasting: 
[[ 1  7 13]
 [ 4 10 16]]
Matrix Z with mismatched dimension 1: 
[[-10]
 [ 10]]
X+Z using broadcasting: 
[[-9 -8 -7]
 [14 15 16]]
Matrix V with missing dimension 0: 
[ 0  5 10]
Adding X+V using broadcasting: 
[[ 1  7 13]
 [ 4 10 16]]


### Concatenation

Numpy arrays can also be concatenated in various ways.  This particular feature often requires practice to feel natural; however, the [concatenate](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html) function can concatenate any number of arrays along any axis.

In [8]:
a = np.array([[1,2,3],[4,5,6]])
b = np.array([[7,8,9],[10,11,12]])

print('First matrix: \n{}'.format(a))
print('Second matrix: \n{}'.format(b))
print('Concatenated matrices (first axis is default): \n{}'.format(np.concatenate((a, b))))
print('Concatenated along the second axis: \n{}'.format(np.concatenate((a, b), axis=-1)))

First matrix: 
[[1 2 3]
 [4 5 6]]
Second matrix: 
[[ 7  8  9]
 [10 11 12]]
Concatenated matrices (first axis is default): 
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
Concatenated along the second axis: 
[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]


<h3> Attributes of Numpy Arrays </h3>

Three important attributes of NumPy array are [.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html), [.ndim](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ndim.html), and [.size](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.size.html): 

In [9]:
# the shape of an array
print("The shape of X: {}".format(X.shape))

# the dimension of an array
print("The dimension of X: {}".format(X.ndim))

#total number of elements of an array
print("The size of X: {}".format(X.size))

The shape of X: (2, 3)
The dimension of X: 2
The size of X: 6


<h3>Common Arrays</h3>

There are also functions for creating common matrices such as [eye()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.eye.html), [zeros()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html), and [arange()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html). Eye() creates an identity matrix, zero() creates a matrix of all zeros, and arange() creates a vector of equally spaced values.

In [10]:
print('Identity matrix of length 3:\n {}'.format(np.eye(3)))
print('Zero matrix of shape 2 by 3:\n {}'.format(np.zeros([2,3])))
print('Vector incrementing by 2 each element:\n {}'.format(np.arange(1,11,2)))

Identity matrix of length 3:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Zero matrix of shape 2 by 3:
 [[0. 0. 0.]
 [0. 0. 0.]]
Vector incrementing by 2 each element:
 [1 3 5 7 9]


<h2>Matrix Operations</h2>
<p>In this section we will cover matrix functions that you will find useful in future projects. In many instances rather than writing functions from scratch there will be preexisting numpy functions which are faster and will save you time.</p> 

In [11]:
# Create some matrices
X = np.array([[1,2], [3,4]])
Y = np.array([[5,6], [7,8]])
A = np.array([[1,2,3,4], [7,8,9,10]])
B = np.array([[4,5,6,7], [1,2,3,4]])

To perform matrix multiplication use [np.dot()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html) or the @ symbol. Using the traditional Python multiplication will multiply the values of the matrix element-wise.

In [12]:
print('X: \n{}'.format(X))
print('Y: \n{}'.format(Y))
print('Element-wise multiplication:\n{}'.format(X*Y))
print('Matrix multiplication:\n{}'.format(np.dot(X,Y)))
print('Alternative syntax for matrix multiplication:\n{}'.format(X @ Y))

X: 
[[1 2]
 [3 4]]
Y: 
[[5 6]
 [7 8]]
Element-wise multiplication:
[[ 5 12]
 [21 32]]
Matrix multiplication:
[[19 22]
 [43 50]]
Alternative syntax for matrix multiplication:
[[19 22]
 [43 50]]


The [outer](https://numpy.org/doc/stable/reference/generated/numpy.outer.html) function can be applied to perform the outer product of two vectors.

In [13]:
a = np.arange(3)
b = np.arange(3,6)
print('a: {}'.format(a))
print('b: {}'.format(b))
print('Outer product: \n{}'.format(np.outer(a,b)))

a: [0 1 2]
b: [3 4 5]
Outer product: 
[[ 0  0  0]
 [ 3  4  5]
 [ 6  8 10]]


There are more niche element-wise functions as well such as [np.maximum](https://numpy.org/doc/stable/reference/generated/numpy.maximum.html), which return an array where every index holds the maximum of the values at that index in the two input arrays.

In [14]:
print('A: \n{}'.format(A))
print('B: \n{}'.format(B))
print('Element-wise maximum:\n{}'.format(np.maximum(A, B)))

A: 
[[ 1  2  3  4]
 [ 7  8  9 10]]
B: 
[[4 5 6 7]
 [1 2 3 4]]
Element-wise maximum:
[[ 4  5  6  7]
 [ 7  8  9 10]]


To tranpose a matrix use [.transpose()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html). An alternative and more concise syntax is .T. 

In [15]:
print('B transposed:\n', B.transpose())
print('Alternative syntax for transpose:\n', B.T)

B transposed:
 [[4 1]
 [5 2]
 [6 3]
 [7 4]]
Alternative syntax for transpose:
 [[4 1]
 [5 2]
 [6 3]
 [7 4]]


### Exercise 1

<p>Use NumPy to calculate $AB^T-Y$. The output should be: $[[ 55,  24],
       [185,  82]]$</p>

In [17]:
###INSERT CODE HERE###
###INSERT CODE HERE###
np.dot(A,B.T) - Y

array([[ 55,  24],
       [185,  82]])

<h3>Other matrix operations </h3>

Other useful functions to use in your projects are [np.amax()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.amax.html#numpy.amax), [np.amin()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html), [np.argmax()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html), and [np.argmin()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.argmin.html). Note that with all of these functions we can specify the axis along which we want to perform the operation. If no axis is specified the above functions will perform the operation across all dimensions.

In [18]:
print('X:\n{}'.format(X))

#Using amax and amin
print('Largest element in X: {}'.format(np.amax(X)))
print('Largest elements along the first axis: {}'.format(np.amax(X, axis = 0)))
print('Smallest element in X: {}'.format(np.amin(X)))
print('Smallest elements along the second axis: {}'.format(np.amin(X, axis = 1)))

X:
[[1 2]
 [3 4]]
Largest element in X: 4
Largest elements along the first axis: [3 4]
Smallest element in X: 1
Smallest elements along the second axis: [1 3]


In [19]:
#Using argmax and argmin
print('Index of the smallest element in X: {}'.format(np.argmin(X)))
print('Indices of the smallest elements along the first axis: {}'.format(np.argmin(X, axis = 0)))

Index of the smallest element in X: 0
Indices of the smallest elements along the first axis: [0 0]


Another function you will find helpful is [np.sum()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html). Like with np.amax, np.amin, etc. we can specify the axis along which to sum. 

In [20]:
print('Sum of all elements in X: {}'.format(np.sum(X)))
print('Sum of elements along the first axis: {}'.format(np.sum(X, axis = 0)))

Sum of all elements in X: 10
Sum of elements along the first axis: [4 6]


It can also be useful to check the optional parameters for functions if you get stuck, as many helpful details are often found there.  For example. [np.sum](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html) has a parameter keepdims which allows you to preserve the dimensions the sum was conducted over.

In [21]:
print('Sum of all elements in X: {}'.format(np.sum(X, keepdims=True)))
print('Sum of elements along the first axis: {}'.format(np.sum(X, axis=0, keepdims=True)))

Sum of all elements in X: [[10]]
Sum of elements along the first axis: [[4 6]]


In many cases the norm of a matrix might be called for, in which case [np.linalg.norm](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html) has a set of norm procedures available with the 'ord' parameter.

In [22]:
v = np.arange(10)
print('Sample vector: {}'.format(v))
print('Norm of vector: {}'.format(np.linalg.norm(v)))

Sample vector: [0 1 2 3 4 5 6 7 8 9]
Norm of vector: 16.881943016134134


[Vstack](https://docs.scipy.org/doc/numpy/reference/generated/numpy.vstack.html) and [hstack](https://docs.scipy.org/doc/numpy/reference/generated/numpy.hstack.html) are useful for adding columns or rows to existing matrices.

In [23]:
print('A: \n{}'.format(A))
print('B: \n{}'.format(B))

print('Vertical stacking concatenates vectors along the first axis:\n{}'.format(np.vstack((A,B))))
print('Horizontal stacking concatenates vectors along the second axis:\n{}'.format(np.hstack((A,B))))

A: 
[[ 1  2  3  4]
 [ 7  8  9 10]]
B: 
[[4 5 6 7]
 [1 2 3 4]]
Vertical stacking concatenates vectors along the first axis:
[[ 1  2  3  4]
 [ 7  8  9 10]
 [ 4  5  6  7]
 [ 1  2  3  4]]
Horizontal stacking concatenates vectors along the second axis:
[[ 1  2  3  4  4  5  6  7]
 [ 7  8  9 10  1  2  3  4]]


### Exercise 2

<p> Find the maximum elements in A along the first axis (axis = 0) and add it to the sum of elements in B along the first axis. Your output should be:</p>
**array([12, 15, 18, 21])** </p>

In [24]:
###INSERT CODE HERE###
np.amax(A, axis = 0) + np.sum(B, axis = 0)

array([12, 15, 18, 21])

<h3>Element-wise Matrix Operations</h3>

There are many operations that you might want to perform such as taking the square root or exponent on each element of a numpy array. Some examples of these functions are [np.exp()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.exp.html), [np.sqrt()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sqrt.html), and [np.square()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.square.html). There are many more numpy functions that will perform element-wise operations on numpy arrays so make sure to look them up in the official documentation.

In [None]:
X = np.arange(1,6,1)
print('X: {}'.format(X))
print('The exponential applied element-wise to X: \n{}'.format(np.exp(X)))
print('The square root applied element-wise to X: \n{}'.format(np.sqrt(X)))
print('The square applied element-wise to X: \n{}'.format(np.square(X)))

### Sorting Functions

Numpy also has many operations for sorting its arrays, either for the whole matrix or for individual axes.  Some examples of these are [np.sort](https://numpy.org/doc/stable/reference/generated/numpy.sort.html) and [np.argsort](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html).  These generally do what their names imply: sort returns the array sorted and argsort returns the indexes to sort the array.  Note that the object method [.sort()](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.sort.html) does an in-place sort of the elements: be careful to do a copy on the array if you want to keep the original values somewhere!

In [None]:
S = np.array([[1,4,3],[2,5,0]])
print('Original matrix: \n{}'.format(S))
print('Flattened matrix: \n{}'.format(S.flatten()))
print('Sorting the whole matrix: \n{}'.format(np.sort(S, axis=None)))
print('Sorting by a specified axis (the first one): \n{}'.format(np.sort(S, axis=0)))
print('Sorting by the last axis: \n{}'.format(np.sort(S)))
print('Argsort on the whole matrix: \n{}'.format(np.argsort(S, axis=None)))
print('Argsort by a specified axis (the first one): \n{}'.format(np.argsort(S, axis=0)))
print('Argsort by the last axis: \n{}'.format(np.argsort(S)))

### Unique Elements

Similar to how you can make a list into a set in Python to pull unique values, numpy has [np.unique](https://numpy.org/doc/stable/reference/generated/numpy.unique.html), except it returns an object of the same type (numpy array) and has the extra options of specifying axis and returning counts.

In [None]:
U = np.array([3,4,4,2,4,5,6,6,2])
print('Original matrix: {}'.format(U))
print('Unique values: {}'.format(np.unique(U)))
print('Unique values and counts: {}'.format(np.unique(U, return_counts=True)))

### Finding Values

Sometimes you know a value is in an array, but don't know exactly where.  Instead of iterating through, you could use [np.argwhere](https://numpy.org/doc/stable/reference/generated/numpy.argwhere.html) to get all indexes that satisfy some condition.

In [None]:
print('Original matrix: \n{}'.format(S))
print('All indexes of values <= 2: \n{}'.format(np.argwhere(S <= 2)))

<h2>Indexing and Slicing</h2>
<p>NumPy's array can indexed and sliced, just like python's list. For single dimension array operations: </p>

In [None]:
x1 = np.arange(5)

print("x1: {}".format(x1))
print("Indexing a single element: {}".format(x1[3]))
print("Slicing: {}".format(x1[1:3]))  
print("Slicing last two numbers: {}".format(x1[-2:]))


Another important feature with regards to slicing is that you can slice a numpy array using another numpy array.  It's worth playing around with this to see what's possible and what isn't!

In [None]:
y1 = np.array([30,40,50,20,10])
y1_slice = np.array([2,1,0])
print("y1: {}".format(y1))
print('Indexes to slice: {}'.format(y1_slice))
print('Returned values: {}'.format(y1[y1_slice]))
print('Argsort on y1: {}'.format(np.argsort(y1)))
print('y1 indexed by argsort: {}'.format(y1[np.argsort(y1)]))

<h3> Multidimensional arrays:</h3>

In [None]:
x2 = np.array([[1,2,3], [4,5,6], [7,8,9]])
print("x2:\n{}".format(x2))
print("Full slice:\n{}".format(x2[::])) 
print("Indexing an element at the first row, third column: {}".format(x2[0,2]))
print("Slicing on both axes:\n{}".format(x2[1:,:2]))
print("Indexing by row:{}".format(x2[0])) # Selecting the first row 
print("Indexing by column: {}".format(x2[:,0])) # Selecting the first column

print("Iterating through each row in a matrix:")
for row in x2:
    print(row)

### Exercise 3

<p>Print the first and second row of A then the second and third column of A. </p>

In [None]:
###INSERT CODE HERE###

## Random Functions

Numpy also has a wide array of functions utilizing randomization.  
As a preliminary note, there's two ways to make replicable results when using numpy randomization: on one hand, there's [np.random.seed](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html) which can be run once to set the seed for the entire project; however, re-running the command will reset the seed.  On the other hand, a random generator can be directly instantiated using [np.random.default_rng](https://numpy.org/doc/stable/reference/random/generator.html).  More information can be found [here](https://towardsdatascience.com/stop-using-numpy-random-seed-581a9972805f); however, we'll be using the seed function out of simplicity.

Some functions that might be useful in this course specifically are [np.random.shuffle](https://numpy.org/doc/stable/reference/random/generated/numpy.random.shuffle.html) which shuffles an array in-place, [np.random.permutation](https://numpy.org/doc/stable/reference/random/generated/numpy.random.permutation.html) which returns a shuffled copy of an input array or shuffles a range if given an integer, and [np.random.rand](https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html) which generates a random array of a certain shape.

In [None]:
np.random.seed(0)

r = np.arange(10)
print('Range 0-10: {}'.format(r))
np.random.shuffle(r)
print('Shuffled range: {}'.format(r))
print('Randomly permutated range: {}'.format(np.random.permutation(10)))
print('Randomly generated matrix: \n{}'.format(np.random.rand(3,3)))