<a id="toc"></a>

## Theory NumPy

- [Why not native Python lists](#why)
  - Benchmarks
<br>

- [ARRAYS](#arrays)
  - np.array
  - types
  - np.arange
  - np.linspace
<br>

- [IO](#io)
  - loadtxt, savetxt
<br>     

- [OPERATIONS](#ops)
  - Additions / Multiplications
  - Comparisons
  - Broadcasting
<br>     

- [WHERE](#indexing)
  - Indexing & slicing
  - Where & Boolean indexing
  - argmax, argmin
  - copy *vs.* view
<br>  

- [AGG & FUNCTIONS](#agg)
  - sum, mean,...
  - np.nan
  - nansum, nanmean
  - ufunc
<br>  

- [CONCATENATE](#concat)
    - concatenate
<br>   

- [Matix and linear algebra](#la)
    - Vectors
    - Matrices
    - Rotation
    - Inner product
    - Einsum

<a id = "why"></a>

# Why not native?

[toc](#toc)

Basic syntax / design differences between Python and C

```python
# Python
result = 0  
for i in range(100):
    result += 1 
```

```C
/* C */
int result = 0; 
for (int i = 0; i < 100; i++) 
    {
        result += 1; 
    }

```
--- 

```python
# Python is implicit / build in
result = 0  # Could be anything, probably integer good enough, 
            # less code, more readable
for i in range(100):
    result +=1 # "what kind/type result here?" , 
               # "what is addition for this type?"
               # much slower
```
```C
/* C  is very explicit */
int result = 0; // Result can only be an integer, 
                // more code, less readable 
for (int i=0; i<100; i++) // how to do a for loop here
    {
        result += 1; // "I know what kind/type result is",
                     // "I already know addition for integers"
                     // much faster
    }

```
---
Design choices "dynamic"-types vs "static"-types

```python
# Python is dynamic
x = 4
x = 'four' ## "ok, now x is 'four', got it"

```

```C
/* C is static */
int x = 4
x = 'four' // error , "you told me x is an integer"
```

<br>

## A Python integer vs a C integer

- Much more memory required to account for dynamic-aspects

![](static/py_vs_C_integer.png)

---

## Python list

- Information like types is stored in each individual cell of the list
    -  This is what makes it possible to have anything in a list ! = very dynamic

![](static/py_list.png)

---

## Numpy array

- In data, we often (always?) have same data types for single columns:
    - We can therefore hope to "factor out" those types
    - We are willing to sacrifice "dynamism" for "static/speed" without too much effort

![](static/numpy_list.png)

## What is NumPy?

- **Num**eric **Py**thon
- A library / module with functionalities for numeric computation
    - Simple usage:
        - Faster Python lists
        - Math functions
        - Random number generators
    - Advanced:
        - Tools for linear algebra
        - Fourier transforms  

- Basically a toolbox:
  - Objects and functions for Python users (= easy/readable)
  - But "written" in C / C++ / Fortran (= fast/performance)

- https://numpy.org/

### Benchmarks: aggregations

In [1]:
import numpy as np
import array ## native python arrays
import sys

do_benchmarks = True

In [2]:
if do_benchmarks:

    ### Native Python

    big_list = list(range(1_000_000))
    print(sys.getsizeof(big_list)) ## (Flawed) memory footprint estimate of *whole* object
    %timeit sum(big_list)

8000056
42.1 ms ± 4.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [3]:
if do_benchmarks:

    ### Native Python
    
    big_list = tuple(range(1_000_000))
    print(sys.getsizeof(big_list))
    %timeit sum(big_list)

8000040
41.9 ms ± 3.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [4]:
if do_benchmarks:

    ### Native Python array module: optimised memory usage: same type for all elements

    big_list = array.array("i", range(1_000_000))
    print(sys.getsizeof(big_list))
    %timeit sum(big_list)

4091948
60.2 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [6]:
if do_benchmarks:

    ### Numpy: C-under the hood

    big_array = np.arange(1_000_000)
    print(sys.getsizeof(big_array))
    %timeit np.sum(big_array)

4000112
735 µs ± 76.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [7]:
40 / 0.4

100.0

In [8]:
### Using generators: not as relevant in data:
### In data: we use already existing values, we don't "generate" new ones

if do_benchmarks:

    generator_big_list = range(1_000_000) ## Generator of values: how to "produce" the next value
    print(sys.getsizeof(generator_big_list))
    %timeit sum(generator_big_list)

48
56.4 ms ± 3.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


### Benchmarks: simple addition

In [9]:
if do_benchmarks:
    
    big_list = list(range(1_000_000))
    
    def f0(big_array):
        for i,el in enumerate(big_list):
            big_list[i] = el+1

    %timeit f0(big_array)

127 ms ± 9.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [11]:
if do_benchmarks:
    
    big_array= np.arange(1_000_000)
    def f0(big_array):
        for i,el in enumerate(big_list):
            big_list[i] = el+1

    %timeit f0(big_array)

196 ms ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [12]:
if do_benchmarks:

    big_array= np.arange(1_000_000)
    def f0(big_array):
        big_array= big_array+ 1 ## Vectorised computation: same operation "at once"

    %timeit f0(big_array)

2.8 ms ± 842 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


#### Conclusion

- Numpy **can** speed things up dramatically, but this require some **rewrites / refactoring** of native Python code

In [13]:
print(156 / 3.96)
print(254 / 3.96)

39.39393939393939
64.14141414141415


## What are the most important things to remember from NumPy?

0. Numpy are faster lists (if used correctly)
1. Creation of np.array
2. Indexing
3. Boolean indexing
4. Aggregation functions along axes
5. np.nan
6. Datatypes

## Python as a data science ecosystem

source: jupytearth.org
![](static/python_stack.png)

<a id = "arrays"></a>

# Arrays

[toc](#toc)

In [14]:
import numpy as np ## np is a convention

Quick help

```py 
import numpy as np
np? 
```
  
documentation  
https://numpy.org/doc/stable/reference/index.html

In [15]:
### py
lst_a = list("abcde")
lst_1 = list(range(5))

### np
arr_a = np.array(lst_a)
arr_1 = np.array(lst_1)

In [16]:
print(arr_a)
display(arr_a) ## Jupyter Notebooks
arr_a          ## Jupyter Notebooks

['a' 'b' 'c' 'd' 'e']


array(['a', 'b', 'c', 'd', 'e'], dtype='<U1')

array(['a', 'b', 'c', 'd', 'e'], dtype='<U1')

In [17]:
print(np.arange(0,5,1))   ## Similar to list(range(0,5,1)): start (included) = default 0, stop (exluded), step = default 1  

print(np.linspace(0,1,5)) ## 0 (included) to 1 (included), 5 values in total
print("-"*50)
print(np.zeros((3,4)))
print("-"*50)
print(np.ones((3,4)))

[0 1 2 3 4]
[0.   0.25 0.5  0.75 1.  ]
--------------------------------------------------
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
--------------------------------------------------
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


### Random sampling / generation

https://numpy.org/doc/stable/reference/random/index.html

In [18]:
display(np.random.randint(-1,5,(2,3))) ## min(included), max(excluded), shape

display(np.random.normal(5,1,(2,3)))   ## mean, std, shape

array([[0, 1, 2],
       [0, 1, 3]])

array([[6.36184299, 3.699352  , 5.51855504],
       [5.56458491, 5.95763472, 4.63653286]])

### Numpy data types

https://numpy.org/doc/stable/user/basics.types.html

- np.int32
- np.float64
- np.uint8
- ...

In [19]:
display(np.array([1, 2 ,3.14])) ## float because 3.14 is float and float > int

arr_int = np.array([1,2,3], dtype = np.int64) ## default int = int32
display(arr_int)

arr_float = arr_int.astype(np.float16)
display(arr_float)

arr_float = np.array(arr_int, dtype = np.float32)
display(arr_float)

array([1.  , 2.  , 3.14])

array([1, 2, 3], dtype=int64)

array([1., 2., 3.], dtype=float16)

array([1., 2., 3.], dtype=float32)

### Numpy array attributes

In [20]:
arr_2D = np.arange(6).reshape(2,3)
print(arr_2D)
print("-"*50)
print("ndim", arr_2D.ndim)   ## Number of levels
print("size", arr_2D.size)   ## Number of cells
print("shape", arr_2D.shape) ## Shape
print("dtype", arr_2D.dtype) ## Datatype
print("itemsize", arr_2D.itemsize) ## Memory footprint of one cell-values
print("nbytes", arr_2D.nbytes)     ## Memory footprint of whole array-values

[[0 1 2]
 [3 4 5]]
--------------------------------------------------
ndim 2
size 6
shape (2, 3)
dtype int32
itemsize 4
nbytes 24


In [21]:
arr_2D = np.arange(6, dtype = np.uint8).reshape(2,-1) ## Changed dtype
arr_2D = np.array([[0,1,2],[3,4,5]]).astype(np.uint8) ### Alternative
print(arr_2D)
print("-"*50)
print("ndim", arr_2D.ndim)
print("size", arr_2D.size) 
print("shape", arr_2D.shape)
print("dtype", arr_2D.dtype)
print("itemsize", arr_2D.itemsize)
print("nbytes", arr_2D.nbytes)

[[0 1 2]
 [3 4 5]]
--------------------------------------------------
ndim 2
size 6
shape (2, 3)
dtype uint8
itemsize 1
nbytes 6


In [22]:
arr_2D[0,0] = np.nan
arr_2D

ValueError: cannot convert float NaN to integer

In [23]:
for el in [np.array([]), np.nan, np.inf, np.uint8, np.int32, np.float16, np.float64]:
    print(el,":",type(el))

[] : <class 'numpy.ndarray'>
nan : <class 'float'>
inf : <class 'float'>
<class 'numpy.uint8'> : <class 'type'>
<class 'numpy.int32'> : <class 'type'>
<class 'numpy.float16'> : <class 'type'>
<class 'numpy.float64'> : <class 'type'>


In [24]:
print(np.inf + 1)
print(1 / np.inf)
print(np.inf * np.inf)
print(np.inf - np.inf)

inf
0.0
inf
nan


<a id = "io"></a>

# IO

[toc](#toc)  

Note: 
- We **can** import / read / write data via Numpy,
- But we will (usually) **use Pandas** for import / read / write instead (among other things)

In [25]:
print(arr_2D)

### Store into a csv
np.savetxt("./saves/mycsv.csv",
           arr_2D,
           delimiter = ";",
           fmt = "%d"  ## Keep same format as data
           )

[[0 1 2]
 [3 4 5]]


In [26]:
### Get from csv
from_csv = np.loadtxt("./saves/mycsv.csv",
                      delimiter = ";",
                      dtype = np.uint8, ## Must be the same datatype for all columns 
                                        ## (= limitation of Numpy, we will use Pandas later instead)
                      usecols = [0,2],  ## Use columns 0,2
                      skiprows = 0,     ## Don't skip any rows 
                      unpack = False    ## Unpack into multiple variables 
                                        ## = as much as number of columns
                      ) 
print(from_csv)

[[0 2]
 [3 5]]


## Save and load Numpy objects

- https://numpy.org/doc/stable/reference/generated/numpy.save.html
- https://numpy.org/doc/stable/reference/generated/numpy.load.html

<a id = "ops"></a>

# Operations

[toc](#toc)

In [27]:
arr_2 = 2 * np.arange(5,0,-1) + 2
display(arr_2)

array([12, 10,  8,  6,  4])

In [28]:
arr_2%3

array([0, 1, 2, 0, 1], dtype=int32)

In [29]:
arr_2 < 9

array([False, False,  True,  True,  True])

In [30]:
arr_2 + arr_1

array([12, 11, 10,  9,  8])

In [31]:
arr_3 = np.arange(6)

print(f"Shape of arr_1: {arr_1.shape}")
print(f"Shape of arr_3: {arr_3.shape}")

try:
    arr_1 + arr_3
except Exception as err:
    print(err)

Shape of arr_1: (5,)
Shape of arr_3: (6,)
operands could not be broadcast together with shapes (5,) (6,) 


### Broadcasting

- Normally: shapes must match to apply operations
- If they don't match rules of broadcasting apply:
    https://numpy.org/doc/stable/user/basics.broadcasting.html

In [32]:
### Broadcasting
print("original\n", arr_2D)
print("-"*50)
print("-"*50)

print(arr_2D + 10) ### Add fixed value to array: add to ALL cells 
print("-"*50)

print(arr_2D + np.array([0,0,10])) ### Matching/compatible level 1
print(arr_2D + np.array([0,0,10]).reshape(1,-1))

print("-"*50)
print(arr_2D + np.array([0,10]).reshape(-1,1)) ### Matching/compatible level 0
print("-"*50)
print(np.array([[0,1,2]]) + np.array([[0],[10], [100]]))

original
 [[0 1 2]
 [3 4 5]]
--------------------------------------------------
--------------------------------------------------
[[10 11 12]
 [13 14 15]]
--------------------------------------------------
[[ 0  1 12]
 [ 3  4 15]]
[[ 0  1 12]
 [ 3  4 15]]
--------------------------------------------------
[[ 0  1  2]
 [13 14 15]]
--------------------------------------------------
[[  0   1   2]
 [ 10  11  12]
 [100 101 102]]


In [33]:
np.array([0,0,10])

array([ 0,  0, 10])

In [34]:
np.array([0,0,10]).reshape(1,-1)

array([[ 0,  0, 10]])

In [35]:
np.array([0,10]).reshape(-1,1)

array([[ 0],
       [10]])

<a id = "indexing"></a>

# Where

[toc](#toc)

### Indexing

In [36]:
print("original")
display(arr_2D)
print("-"*50)
print("-"*50)

display(arr_2D[0])       ## At rows (level/axis = 0), get position 0 (all "columns")
display(arr_2D[0,None:]) ## Same thing as above
print("-"*50)
display(arr_2D[:,0])     ## At columns (level/axis = 1), get position 0 (all "rows")
print("-"*50)
display(arr_2D[-1,-1])   ## Get element from last row, last column

original


array([[0, 1, 2],
       [3, 4, 5]], dtype=uint8)

--------------------------------------------------
--------------------------------------------------


array([0, 1, 2], dtype=uint8)

array([0, 1, 2], dtype=uint8)

--------------------------------------------------


array([0, 3], dtype=uint8)

--------------------------------------------------


5

In [37]:
my_lst = [list(range(3)) for _ in range(2)]
print(my_lst)

[[0, 1, 2], [0, 1, 2]]


In [38]:
my_lst[:][-1]

[0, 1, 2]

In [39]:
display(arr_2D.reshape(2 ,-1, 1))   ## Array with 3 levels/dimensions/axes 
                                    ## Arbitrary number of levels possible
display(arr_2D.reshape(2 ,-1, 1)[0,-1, 0])  ## First element of level/axis = 0,(rows)
                                            ## Last  element of level/axis = 1 (columns)
                                            ## First element of level/axis = 2 (depth)

array([[[0],
        [1],
        [2]],

       [[3],
        [4],
        [5]]], dtype=uint8)

2

### Slicing

In [40]:
display(arr_1)
print("-"*50)
display(arr_1[1::2])    ## From index 1 to end, step of 2
display(arr_1[1::-1])   ## From index 1 to end, step of -1 
                        ## => reverse order: select [1, 0]

array([0, 1, 2, 3, 4])

--------------------------------------------------


array([1, 3])

array([1, 0])

In [41]:
print("original")
display(arr_2D)
print("-"*50)
print("-"*50)

display(arr_2D[:,:2])   ## All rows, up to (excluded) columns position = 2
print("-"*50)
display(arr_2D[1:2,:2]) ## For rows: from row pos = 1 up to pos 2 (excluded)
                        ## For columns : from start (= 0) up to pos 2 (excluded)

original


array([[0, 1, 2],
       [3, 4, 5]], dtype=uint8)

--------------------------------------------------
--------------------------------------------------


array([[0, 1],
       [3, 4]], dtype=uint8)

--------------------------------------------------


array([[3, 4]], dtype=uint8)

### Slicing != Indexing

In [42]:
display(arr_2D[0:1])
display(arr_2D[0])
print(arr_2D[0:1].shape == arr_2D[0].shape)
print(arr_2D[0:1] == arr_2D[0])

array([[0, 1, 2]], dtype=uint8)

array([0, 1, 2], dtype=uint8)

False
[[ True  True  True]]


### Boolean indexing

In [43]:
display(arr_2D)
print("-"*50)

cond = arr_2D <= 3
display(cond)
display(arr_2D[cond])

array([[0, 1, 2],
       [3, 4, 5]], dtype=uint8)

--------------------------------------------------


array([[ True,  True,  True],
       [ True, False, False]])

array([0, 1, 2, 3], dtype=uint8)

#### And / Or / Not

In [44]:
display(arr_2D)
print("-"*50)
cond = arr_2D < 2
cond2 = arr_2D > 4

cond3 = ~cond                 ## Not

display(arr_2D[cond | cond2]) ## Or
display(arr_2D[cond & cond2]) ## And

display(arr_2D[cond3])

array([[0, 1, 2],
       [3, 4, 5]], dtype=uint8)

--------------------------------------------------


array([0, 1, 5], dtype=uint8)

array([], dtype=uint8)

array([2, 3, 4, 5], dtype=uint8)

#### In

In [45]:
cond_ac = np.isin(arr_a,["a", "c"])
display(arr_a)
print("-"*50)
display(cond_ac)
display(arr_a[cond_ac]) ## Using position where true 
                        ## to select values from SAME array

display(arr_1[cond_ac]) ## Using position where true 
                        ## to select values from OTHER array

array(['a', 'b', 'c', 'd', 'e'], dtype='<U1')

--------------------------------------------------


array([ True, False,  True, False, False])

array(['a', 'c'], dtype='<U1')

array([0, 2])

#### Mind the shape / size

In [46]:
try:
    arr_2D[cond_ac]
except Exception as err:
    print(err)

boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 5


In [47]:
cond_3 = cond_ac[:3]
display(cond_3)
print("-"*50)
display(arr_2D)
display(arr_2D[:,cond_3])

array([ True, False,  True])

--------------------------------------------------


array([[0, 1, 2],
       [3, 4, 5]], dtype=uint8)

array([[0, 2],
       [3, 5]], dtype=uint8)

### Where function

In [48]:
np.where(arr_2D < 3, arr_2D, 10 * arr_2D) ## Condition, result if True, result if False
                                          ## Faster than loop + if-else

array([[ 0,  1,  2],
       [30, 40, 50]], dtype=uint8)

### Argmax & Argmin

In [49]:
arr_rdm = np.random.randint(0,10,6).reshape(2,3)
display(arr_rdm)
print("-"*50)

display(np.argmax(arr_rdm))           ## Return position of max
display(np.argmin(arr_rdm, axis = 1)) ## Search along level/axis = 1

display(np.argmax(arr_rdm, axis = 0, keepdims = True)) 

array([[5, 6, 5],
       [1, 5, 5]])

--------------------------------------------------


1

array([0, 0], dtype=int64)

array([[0, 0, 0]], dtype=int64)

### Copy *vs.* View

In [50]:
print("original")
arr_rdm = np.random.randint(0,10,6).reshape(2,3)
display(arr_rdm)
print("-"*50)

arr_rdm_view = arr_rdm[:2,:2]
arr_rdm_view[0,0] = 99
display(arr_rdm_view)
display(arr_rdm)

original


array([[6, 1, 1],
       [9, 1, 1]])

--------------------------------------------------


array([[99,  1],
       [ 9,  1]])

array([[99,  1,  1],
       [ 9,  1,  1]])

In [51]:
arr_rdm_copy = arr_rdm[:2,:2].copy()
arr_rdm_copy[0,0] = -99
display(arr_rdm_copy)
display(arr_rdm)

array([[-99,   1],
       [  9,   1]])

array([[99,  1,  1],
       [ 9,  1,  1]])

<a id = "agg"></a>

# Aggregations

[toc](#toc)

https://numpy.org/doc/stable/reference/routines.math.html

### sum, mean, std, cumsum,...

In [52]:
arr_arg = np.arange(0,12).reshape(3,4)
display(arr_arg)

print("-"*50)

print(np.sum(arr_arg))
display(np.mean(arr_arg, axis = 0, keepdims = True)) ## Mean along specific axis
display(arr_arg.mean(axis = 1, keepdims = True))     ## Mean along specific axis

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

--------------------------------------------------
66


array([[4., 5., 6., 7.]])

array([[1.5],
       [5.5],
       [9.5]])

### NaN

In [53]:
print(np.nan)
print(type(np.nan))

arr_nan = np.array([[0,1,2,np.nan],[3,4,5,6]])

display(arr_nan)

nan
<class 'float'>


array([[ 0.,  1.,  2., nan],
       [ 3.,  4.,  5.,  6.]])

In [54]:
print(np.sum(arr_nan))  ## Sum not defined with nan
print(np.mean(arr_nan)) ## Mean not defined with nan

nan
nan


In [55]:
print(np.sum(arr_nan, axis = 1))  ## Where nan encountered -> nan value
print(np.mean(arr_nan, axis = 0))

[nan 18.]
[1.5 2.5 3.5 nan]


### nansum, nanmean

In [56]:
print(np.nansum(arr_nan))  ## Ignoring nans
print(np.nanmean(arr_nan)) ## Ignoring nans 

21.0
3.0


In [57]:
print(np.nansum(arr_nan, axis = 1))  
print(np.nanmean(arr_nan, axis = 0))

[ 3. 18.]
[1.5 2.5 3.5 6. ]


## Ufunc: universal functions

https://numpy.org/doc/stable/reference/ufuncs.html#ufuncs

In [58]:
print(np.exp(0)) ## Exponential e**(x)
print("-"*50)
print(np.exp(arr_2D))

1.0
--------------------------------------------------
[[  1.      2.719   7.39 ]
 [ 20.08   54.6   148.4  ]]


<a id = "concat"></a>

# Concatenate & restructure

[toc](#toc)

## concatenate

In [59]:
display(arr_1)
display(arr_2)

print("-"*50)
display(np.concatenate([arr_1, arr_2, arr_1]))

arr_concat = np.concatenate([arr_1.reshape(1,-1),
                             arr_2.reshape(1,-1),
                             arr_1.reshape(1,-1)
                             ],
                            axis = 0
                            )

display(arr_concat)

display(np.concatenate([arr_1.reshape(-1,1),
                        arr_2.reshape(-1,1),
                        arr_1.reshape(-1,1)
                        ],
                       axis = 1))

array([0, 1, 2, 3, 4])

array([12, 10,  8,  6,  4])

--------------------------------------------------


array([ 0,  1,  2,  3,  4, 12, 10,  8,  6,  4,  0,  1,  2,  3,  4])

array([[ 0,  1,  2,  3,  4],
       [12, 10,  8,  6,  4],
       [ 0,  1,  2,  3,  4]])

array([[ 0, 12,  0],
       [ 1, 10,  1],
       [ 2,  8,  2],
       [ 3,  6,  3],
       [ 4,  4,  4]])

## transpose

In [60]:
display(arr_2D)
display(arr_2D.T)
np.transpose(arr_2D, axes = [1,0])

array([[0, 1, 2],
       [3, 4, 5]], dtype=uint8)

array([[0, 3],
       [1, 4],
       [2, 5]], dtype=uint8)

array([[0, 3],
       [1, 4],
       [2, 5]], dtype=uint8)

# Other functions

In [61]:
display(np.unique([0,1,1,2,2]))

array([0, 1, 2])

<a id = "la"></a>

# Matrices & linear algebra

[toc](#toc)

- Vectors
- Matrices
- Rotation
- Inner product
- Einsum

https://numpy.org/doc/stable/reference/routines.linalg.html

## Vectors

if A, B are (2D) vectors

$
A = 
\begin{bmatrix}
    a_0 \\
    a_1 
\end{bmatrix},

B = 
\begin{bmatrix}
    b_0 \\
    b_1 
\end{bmatrix}
$ 

$
C = \alpha \cdot A + \beta \cdot B 
\\ = \alpha \cdot \begin{bmatrix}
    a_0 \\
    a_1 
\end{bmatrix} + 
\beta \cdot \begin{bmatrix}
    b_0 \\
    b_1 
\end{bmatrix}
\\ = \begin{bmatrix}
    \alpha \cdot a_0 \\
    \alpha \cdot a_1 
\end{bmatrix} + 
\begin{bmatrix}
    \beta \cdot b_0 \\
    \beta \cdot b_1 
\end{bmatrix}
\\ = \begin{bmatrix}
    \alpha \cdot a_0 + \beta \cdot b_0 \\
    \alpha \cdot a_1 + \beta \cdot b_1 
\end{bmatrix}
\\ = \begin{bmatrix}
    c_0 \\
    c_1 
\end{bmatrix} 
$ 

then C is also a vector

Example:

$
E = 
\begin{bmatrix}
    1 \\
    0 
\end{bmatrix},

F = 
\begin{bmatrix}
    0 \\
    1 
\end{bmatrix}
$

$ G = 0.5 \cdot E + 3 \cdot F = \begin{bmatrix}
    0.5 \\
    3 
\end{bmatrix}
$

In [62]:
E = np.array([1,0]) 
F = np.array([0,1])
G = 0.5 * E + 3 * F ## Numpy arrays are like vectors (but also more: see broadcasting)
G

array([0.5, 3. ])

## Matrices

a matrix is $N_1$x$N_2$-dimension vector

here M,N are 2x2 matrices

$
M = \begin{bmatrix}
    m_{00} & m_{01} \\
    m_{10} & m_{11}
\end{bmatrix},
N = \begin{bmatrix}
    n_{00} & n_{01} \\
    n_{10} & n_{11}
\end{bmatrix}
$

$
M+\gamma \cdot N = 
\begin{bmatrix}
    m_{00} & m_{01}\\
    m_{10} & m_{11}
\end{bmatrix} + \gamma \cdot \begin{bmatrix}
    n_{00} & n_{01}\\
    n_{10} & n_{11}
\end{bmatrix} = 
\begin{bmatrix}
    m_{00} + \gamma \cdot n_{00} & m_{01} + \gamma \cdot n_{01}\\
    m_{10} + \gamma \cdot n_{10} & m_{11} + \gamma \cdot n_{11}
\end{bmatrix} 
$

In [63]:
H = np.array([[0.5, 5 ],[2,4]])
I = np.array([[1,0],[0,1]])

H-0.1 * I

array([[0.4, 5. ],
       [2. , 3.9]])

### Matrix as transformation for vectors and dot product

One can look at matrices as translators of vectors: they can map one vector to another.

![](static/mapping_A_B.png)

We need 4 values to be able to express a (linear) relation between all ${a_{i}}$ and all ${b_{j}}$ 

This transformation is expressed by the **"dot-product"** between matrices (or vectors)

$ B = W \cdot A = \begin{bmatrix}
    w_{00} & w_{01} \\
    w_{10} & w_{11}
\end{bmatrix} \cdot
\begin{bmatrix}
    a_0 \\
    a_1 
\end{bmatrix} =
\begin{bmatrix}
    w_{00} \cdot a_0 + w_{01} \cdot a_1 \\
    w_{10} \cdot a_0 + w_{11} \cdot a_1 
\end{bmatrix} =
\begin{bmatrix}
    b_0 \\
    b_1 
\end{bmatrix}
$ 

$w_{00}$ = how much of $a_0$ is transferred to $b_{0}$  
$w_{01}$ = how much of $a_1$ is transferred to $b_{0}$  
...

#### Example

$R = \begin{bmatrix}
    1 \\
    0 
\end{bmatrix} = [\rightarrow]$, 
$U  = \begin{bmatrix}
    0 \\
    1 
\end{bmatrix} = [\uparrow] $,
$L = \begin{bmatrix}
    -1 \\
    0 
\end{bmatrix} = [\leftarrow] $, 
$ D = \begin{bmatrix}
    0 \\
    -1 
\end{bmatrix} = [\downarrow] $

If we want to a transformation that cycles R,U,L,D,R,..

This can be done with the (counter-clockwise) rotation matrix W:  
$ W = \begin{bmatrix}
    0 & -1 \\
    1 & 0 
\end{bmatrix}$

In [64]:
R = np.array([1,0])
U = np.array([0,1])
L = -R
D = -U

W = np.array([[0,-1],[1,0]])

print(np.dot(W,R)) ## dot product
print("-"*50)
print(U == np.dot(W,R))
print(np.all(U == np.dot(W,R)))

[0 1]
--------------------------------------------------
[ True  True]
True


### Euclidean distance with dot product

It is possible to calculate the (Euclidean) distance between 2 vectors with dot product:

distance_square = $||A - B||^2 = (a_0 - b_0)^2 + (a_1 - b_1)^2$ = $u_0^2 + u_1^2$  

(set $u_i= (a_i - b_i)$ for convenience)

$(A - B)^T \cdot (A - B) = \begin{bmatrix}
    (a_0 - b_0) & (a_1 - b_1)
\end{bmatrix} \cdot \begin{bmatrix}
    (a_0 - b_0) \\
    (a_1 - b_1) 
\end{bmatrix} = \begin{bmatrix}
    u_0 & u_1 \\
\end{bmatrix} \cdot \begin{bmatrix}
    u_0 \\
    u_1 
\end{bmatrix} = \begin{bmatrix}
    u_0 \cdot u_0 + u_1 \cdot u_1
\end{bmatrix} = u_0 ^2 + u_1^2 = ||A - B||^2$

with $A^T = \begin{bmatrix}
    a_0 & a_1
\end{bmatrix} 
$ the transposed (flipped) matrix/vector of $A = \begin{bmatrix}
    a_0 \\
    a_1 
\end{bmatrix} $

In [65]:
print((E - F))
print((E - F).T)
print("-"*50)

dist_E_F_square = np.dot((E - F).T, (E - F))
dist_E_F = np.sqrt(dist_E_F_square)

print (dist_E_F_square)
print(dist_E_F)

[ 1 -1]
[ 1 -1]
--------------------------------------------------
2
1.4142135623730951
