# Math 1 - Introduction of Numpy

In [1]:
import numpy as np
from matplotlib import pyplot as pt

This notebook contains some examples of working with **Numpy**. This is an n-dimensional array package for the language `Python` and part of the **SciPy Ecosystem**. My knowledge in math is very limited but I used Numpy several times for Machine Learning or Image Processing. Therefore the focus of this notebook is not on hardcore math but on giving a simple introduction in Numpy.

## 1.  Let's start
Create numpy arrays from python arrays to represent vectors and matrices. This is the basic data type in `numpy`.

In [2]:
v = np.array([[1], [1], [2], [3], [5], [8], [13]]) # vector representation
m = np.array([[1, 5], [2, 4]]) # matrix representation

In [3]:
print('shape of v', v.shape)
print('shape of m', m.shape)

shape of v (7, 1)
shape of m (2, 2)


When you look at the arrays you'll see that the data is arranged differently.

Vector:

In [4]:
v

array([[ 1],
       [ 1],
       [ 2],
       [ 3],
       [ 5],
       [ 8],
       [13]])

Matrix:

In [5]:
m

array([[1, 5],
       [2, 4]])

### 1.1 Create numpy arrays from functions
Numpy has several functions to automatically create arrays. The function `arange(N)` returns evenly spaces values withing the interval `N`.

In [6]:
np.arange(5)

array([0, 1, 2, 3, 4])

The function takes also a stop argument.

In [7]:
np.arange(1, 10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

And a third argument for the step size.

In [8]:
np.arange(1, 10, 0.5)

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. ,
       7.5, 8. , 8.5, 9. , 9.5])

Another useful function is `linspace(start, stop)`. This function returns a list of evenly spaced samples over the given interval. The optional argument `num` represents the number of samples and is set to 50 by default.

In [9]:
np.linspace(1, 10)

array([ 1.        ,  1.18367347,  1.36734694,  1.55102041,  1.73469388,
        1.91836735,  2.10204082,  2.28571429,  2.46938776,  2.65306122,
        2.83673469,  3.02040816,  3.20408163,  3.3877551 ,  3.57142857,
        3.75510204,  3.93877551,  4.12244898,  4.30612245,  4.48979592,
        4.67346939,  4.85714286,  5.04081633,  5.2244898 ,  5.40816327,
        5.59183673,  5.7755102 ,  5.95918367,  6.14285714,  6.32653061,
        6.51020408,  6.69387755,  6.87755102,  7.06122449,  7.24489796,
        7.42857143,  7.6122449 ,  7.79591837,  7.97959184,  8.16326531,
        8.34693878,  8.53061224,  8.71428571,  8.89795918,  9.08163265,
        9.26530612,  9.44897959,  9.63265306,  9.81632653, 10.        ])

In [10]:
np.linspace(1, 10, num=5)

array([ 1.  ,  3.25,  5.5 ,  7.75, 10.  ])

The argument `start` and `stop` influences the shape of the samples.

In [11]:
np.linspace(1, [10], num=5)

array([[ 1.  ],
       [ 3.25],
       [ 5.5 ],
       [ 7.75],
       [10.  ]])

In [12]:
np.linspace([1,1], [10, 20], num=5)

array([[ 1.  ,  1.  ],
       [ 3.25,  5.75],
       [ 5.5 , 10.5 ],
       [ 7.75, 15.25],
       [10.  , 20.  ]])

The `axis` argument can be used to specify along which axis the samples should be stored. 

In [13]:
np.linspace([1], [10], num=5, axis=1)

array([[ 1.  ,  3.25,  5.5 ,  7.75, 10.  ]])

In [14]:
np.linspace([1, 1], [10, 10], num=5, axis=1)

array([[ 1.  ,  3.25,  5.5 ,  7.75, 10.  ],
       [ 1.  ,  3.25,  5.5 ,  7.75, 10.  ]])

**Only ones or zeros**

Vectors and matrices with only zeros or ones can be generated easily with the functions `zeros()` and `ones()`.

In [15]:
np.zeros([5, 1])

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.]])

In [16]:
np.ones([1, 5])

array([[1., 1., 1., 1., 1.]])

In [17]:
np.ones([5, 5])

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

### 1.2 Working with the shape
As seen before, the shape depends how the data is interpreted. A row-vector has a shape in the form `(1, N)`

In [18]:
v = np.ones([1, 5])
v.shape

(1, 5)

The vector `v` looks like this:

In [19]:
v

array([[1., 1., 1., 1., 1.]])

Numpy has a function called `reshape()`. This function changes the shape and therefore the interpretation of the underlying data. As an example, the row-vector `v` can be interpreted as column-vector.

In [20]:
v.reshape((5, 1))

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.]])

Which is the same as the **transposition** of the vector.

In [21]:
v.T

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.]])

The same thing can be done with a matrix.

In [22]:
m = np.linspace(1, [5, 5, 5, 5], num=5)
m.shape

(5, 4)

The matrix `m` as defined above:

In [23]:
m

array([[1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.]])

and with a new shape:

In [24]:
m.reshape((4, 5))

array([[1., 1., 1., 1., 2.],
       [2., 2., 2., 3., 3.],
       [3., 3., 4., 4., 4.],
       [4., 5., 5., 5., 5.]])

**But wait** this is not the same as the transposition of the matrix `m`.

In [25]:
m.T

array([[1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.]])

The reshape function accepts every matching shape but it's not the same as the transposition in every case. Below are two further examples.

In [26]:
m.reshape((2, 10))

array([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3.],
       [3., 3., 4., 4., 4., 4., 5., 5., 5., 5.]])

In [27]:
m.reshape(1, 20)

array([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3., 4., 4., 4., 4.,
        5., 5., 5., 5.]])

## 2. Calculations
Now it's time to do some calculations with the vectors and matrices. Numpy provides several functions in the field of **Linear Algebra**. In this chapter we take a look at the very basic part.

Numpy reference for linear algebra: https://docs.scipy.org/doc/numpy/reference/routines.linalg.html

### 2.1 Basic arithmetic
The arithmetic operations (+, -, *, /) can be used with the standard python operators.

In [28]:
a = np.linspace(1, 10, num=5)
b = np.linspace(1, 30, num=5)

In [29]:
a + b

array([ 2. , 11.5, 21. , 30.5, 40. ])

In [30]:
a - b

array([  0.,  -5., -10., -15., -20.])

In [31]:
a * b

array([  1.    ,  26.8125,  85.25  , 176.3125, 300.    ])

In [32]:
a / b

array([1.        , 0.39393939, 0.35483871, 0.34065934, 0.33333333])

`a` and `b` are interpreted as row-vectors. But multiplication and division are not defined for two vectors (on the other hand, it's possible to multiply a vector by a scalar). So, what happens here? The examples above are **element-wise** operations for the underlying list objects. Therefore, the multiply and divide calculations work in python but are not relevant for the vector representation.

### 2.2 Dot (or scalar) product
There exists other strategies for multiplying vectors. One of them is the dot (or scalar, or inner) product. It is defined as follows where $a$ and $b$ are two vectors.

$$a \bullet b = \sum_{i=0}^n a_i * b_i = a_1 * b_1 + a_2 * b_2 + \ldots + a_n * b_n$$

The result is a scalar (a single number). Numpy has a function called `dot()` to calculate the dot product.

In [33]:
np.dot(a, b)

589.375

The dot product has several applications. For example it can be used to calculate the length of a vector $v$:  $||v|| = \sqrt{v \bullet v}$

In [35]:
v = np.arange(1,5) # row vector
np.sqrt(np.dot(v, v)) # the dot product

5.477225575051661

In the linalg module of Numpy is already a function `norm()` to calculate the lenght.

In [36]:
np.linalg.norm(v)

5.477225575051661

With the length are we able to normalize a vector. Which means that each component of a vector is divided by the lenght of the vector. The result is a **unit vector** with the same direction as the original vector but the lenght of `1`.

In [37]:
def normalize_vector(v):
    v_length = np.linalg.norm(v)
    return v * (1 / v_length)

The vector `v` from above but normalized.

In [38]:
normalize_vector(v)

array([0.18257419, 0.36514837, 0.54772256, 0.73029674])

When we now calculate the length of the normalized vector we get almost `1` so the result is correct.

In [39]:
np.linalg.norm(normalize_vector(v))

0.9999999999999999

Numpy has a function `allclose` to check if two arrays are equal within a tolerance. This can be also used to verify the result.

In [41]:
np.allclose(np.linalg.norm(normalize_vector(v)), 1)

True

There's no function to directly normalize a vector and get the unit vector in numpy but in **scikit-learn**. That's another useful python library that provides several functions for machine learning. The numpy arrays are compatible with sklearn but we have to reshape the vector for the `normalize()` function.

In [42]:
from sklearn.preprocessing import normalize
normalize(v.reshape((1,-1)))

array([[0.18257419, 0.36514837, 0.54772256, 0.73029674]])

## 3. Conclusion
Numpy is very useful and widely used in python libraries like scikit-learn. This notebook shows just a little part of it. Therefore, I really recommend to take a look at the official documentation https://docs.scipy.org/doc/numpy/reference/. There are tons of informations and examples in a very good quality!