<a href="https://colab.research.google.com/github/antfolk/BMEN35/blob/main/Session1/BMEN35_Ex2_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numpy

In this exercise we will looking into some of the "basic" Python libraries that you will come across. We will start with NumPy.

NumPy (https://numpy.org/) is "the fundamental package for scientific computing with Python". If you are running this code on your own (outside google colab), you will need to install the NumPy package (eg. using pip or conda). We will start with the "standard" way of importing numpy(given that it is installed). 

In [1]:
import numpy as np

Numpy has its "own" datatypes and the ones you will use a lot is arrays (numpy arrays that is which are different from the other Python arrays)

You can define variable (like arrays) in the following way.

In [2]:
b = np.zeros(5)
print(type(b))
print(b)

<class 'numpy.ndarray'>
[0. 0. 0. 0. 0.]


The above commands creates an array of zeros (great for initializing). 

Similarly you can create a matrix/2D array like below. Note that the argument is (3,3) with parenthesis and this denotes the shape of the array. This (the argument) is a tuple of ints (as stated in the documentation). 

In [3]:
c = np.zeros((4,3)) # Four rows, three columns
print(c)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


## Numpy vocabulary

Numpy comes with its own "vocubulary" and some of the commonly used word you will encounter are the following.

Each dimension of the NumPy array is called an **axis**. <br>
The number of axes is called the **rank**.<br>
The length of (each of) the axes is called the **shape**.<br>



The astute student will think, why not use any of the good old datatypes from the previous lecture (such as dictionaries). Part of the answer is that in a numpy all elements have the same datatype (which is numerical, eg. not a string)

In [4]:
d = np.ones((5,3)) # Five rows, three columns
print(d)
d.shape

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


(5, 3)

In [5]:
d.ndim # How many dimensions in array d

2

In [6]:
d.size # Number of elements in the array.

15

You can create arrays of arbitrary rank ("dimension"). Below is a 3D array with the shape (5,4,3) and a 4D array with the shape (2,3,4,5).

In [7]:
e = np.ones((5,4,3))
print(e)

[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]


In [8]:
f = np.ones((5,4,3,2))
print(f)

[[[[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]]


 [[[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]]


 [[[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]]


 [[[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]]


 [[[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]

  [[1. 1.]
   [1. 1.]
   [1. 1.]]]]


Arrays in Python are typically row-major order (see wikipedia if of interest). Lets generate an 2D array with 3 rows and 2 colums.

In [9]:
g = np.array([[1,2],[3,4],[5,6]])
print(g)
g1 = g[1,:] # Select 2nd row (index 1) and all columns
print(g1)

[[1 2]
 [3 4]
 [5 6]]
[3 4]


We can use the function `arange()` (which is similar to the `range()` function we used in the previous exercise) to create an array. Here we create an array with number starting at 1 to (at most) 10 with increments of 2.

In [10]:
f = np.arange(1,10,2)
print(f)

[1 3 5 7 9]


# Common numpy routines

There are several numpy routines you will come across and we will briefly mention a few. More details can be found in the API reference for numpy.

We will start with `numpy.random` . Here you can find routines to create (pseudo) random numbers.

The `rand()` method will create an array with random samples from a `uniform distribution` over [0,1]. Pay attention to how this method is called.

In [11]:
my_rand = np.random.rand(5) # Create an array with 5 random number
print(my_rand)

[0.40111037 0.05295888 0.97166848 0.09303534 0.25769262]


The `randn()` method will create an array with samples from the standard normal distribution.

In [12]:
my_randn = np.random.randn(10)
print(my_randn)

[-0.86424633 -0.91470183  0.6093441   0.65240508  0.63797835  0.22969819
 -0.02747473  0.14436115 -0.02235886 -0.03111783]


# Mathematical and statistical methods on numpy arrays
There are several mathematical and statistical methods available for the `ndarray` . Some examples below:

In [13]:
np.mean(my_randn) # Mean value of the my_randn vector

0.04138872716626472

In [14]:
np.sum(my_randn) # Summation over all the elements of my_randn

0.41388727166264716

In [15]:
np.square(my_randn) # Elementwise "squaring" of my_randn

array([7.46921724e-01, 8.36679436e-01, 3.71300231e-01, 4.25632387e-01,
       4.07016369e-01, 5.27612583e-02, 7.54860800e-04, 2.08401402e-02,
       4.99918767e-04, 9.68319429e-04])

# Linear algebra

Linear algebra is fundamental to machine learning and there are several routines implemented in numpy to perform this.

In [16]:
a2 = np.array([2,3])
b2 = np.array([4,1])
c2 = a2*b2 # Elementwise multiplication
print(c2)

[8 3]


Writing `a2*b2` will perform an *elementwise multiplication* of the vectors a and b . To take dot product (a.k.a. the inner product or scalar product) you need to use the dot() function

In [17]:
d2 = np.dot(a2,b2) # (2*4)+(3*1)
print(d2)

11


As an example from the book a trained linear regression model in Matrix form could look like
\begin{equation}
\bf{ \hat{y} = X\theta}
\end{equation}
Suppose we would have our input data **X**  (3-by-3 matrix) as below. Each row is one measurement for example
\begin{align}
\bf{X} = \begin{bmatrix} 1 & 2.4 & 3.2 \\ 1 & 3.4 & 0.9 \\ 1 & 2.3 & 4.1 \end{bmatrix}
\end{align}
And our $\bf{θ}$ (3-by-1 matrix) as
\begin{align}
\bf{θ} = \begin{bmatrix} 7.4 \\ 3.2 \\ 4.1 \end{bmatrix}
\end{align}
We can compute $\bf{\hat{y}}$ as follows

In [18]:
# Lets define X and print it out
X = np.array([[1,2.4,3.2],[1,3.4,0.9],[1,2.3,4.1]])
print(X)

[[1.  2.4 3.2]
 [1.  3.4 0.9]
 [1.  2.3 4.1]]


In [19]:
# Lets define theta and print it out
theta = np.array([7.4, 3.2, 4.1])
print(theta)

[7.4 3.2 4.1]


In [20]:
# Now lets compute y_hat
y_hat = X*theta
print(y_hat)

[[ 7.4   7.68 13.12]
 [ 7.4  10.88  3.69]
 [ 7.4   7.36 16.81]]


The above doesnt look right. And it isn't, * performs and *elements-wise* multiplication. In this case we want the matrix multiplication version (dot product). And we can do that in the following way.

In [21]:
y_hat = [] # Get rid of old result
y_hat = np.dot(X,theta)
print(y_hat)

[28.2  21.97 31.57]


That looks more like it. One thing to keep in mind here is that our $\bf{θ}$ is not (3-by-1) matrix. It is a 1-D array. It is not equivalent to a "row" array or a "column" array (as would be the case in Matlab). We can see that by looking at the shape of the array.

In [22]:
np.shape(theta)

(3,)

It just say the shape is 3. Nothing about the dimension. So if we define another $\bf{θ}_1$ as a **row vector**. 
\begin{align}
\bf{θ} = \begin{bmatrix} 7.4 & 3.2 & 4.1\end{bmatrix}
\end{align}

In [23]:
theta1 = np.array([7.4, 3.2, 4.1])
theta1.shape = (1,3)
print(theta1.shape)
print(theta1)

(1, 3)
[[7.4 3.2 4.1]]


If we try to multiply perform \begin{equation}
\bf{ \hat{y} = X\theta_1}
\end{equation}
we will get an error in this case

In [24]:
y_hat = np.dot(X,theta1)

ValueError: shapes (3,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)

This is because $\bf{θ}_1$ as a **row vector**. We can get the results we wan't by flipping $\bf{θ}_1$ to a column vector by taking the transpose.



In [25]:
y_hat1 = np.dot(X,theta1.T)
print(y_hat1)

[[28.2 ]
 [21.97]
 [31.57]]


You can now also see that the shape of y_hat1 is (3,1) as opposed to (3,).

That is it for this (very) short tutorial on Numpy. You can find more tutorials online. Like this one https://numpy.org/doc/stable/user/absolute_beginners.html or this one https://numpy.org/doc/stable/user/basics.html

### The end