# Working with numpy and matrices

## Math

For any matrix C=A*B we define $C(i,j)=\sum_{k=1}^{m} A(i,k)B(k,j)$

## Numpy

Element wise operations in numpy. Example is multiplying with a scalar.

Could also be done with matrices of the same shape.

In [2]:
import numpy as np
values = [1,2,3,4,5]
np_values = np.array(values) + 5
t_values = np.multiply(values, 5)
print(np_values)
print(t_values)

[ 6  7  8  9 10]
[ 5 10 15 20 25]


In [1]:
%matplotlib inline

Getting the shape of an `ndarray`

In [9]:
np_values.shape

(5,)

Matrix multiplication (Matrix Product) of $n*m,m*p$ results in a matrix of shape $n*p$

Matrix multiplications is repetition of vector dot product (rows times columns)
Matrix multiplication is **not commutative** $A\times B\ne B\times A$

This implies for data, that data on the left side has to be laid out as rows and data on the right side as columns

## Matrix multiplication in numpy.

Can be element wise with `multiply` and `*` or the matrix product with `np.matmul` in numpy the dot product `dot` is the same
for 2 dimensional data.

In [11]:
a = np.array([[1,2],[3,4]])
b = np.array([[1,2,3],[4,5,6]])

print(a.shape)
print(b.shape)
c = np.matmul(a,b)
print(c)
print(c.shape)

(2, 2)
(2, 3)
[[ 9 12 15]
 [19 26 33]]
(2, 3)


## Transposing in numpy

Be carefull transpose only changes the indexing so the underlying data is the same.

$A\times B^\intercal=(B\times A^\intercal)^\intercal$

In [12]:
d = b.T
print(d)
print(d.shape)

[[1 4]
 [2 5]
 [3 6]]
(3, 2)


## Neural network learning

First the perceptron is discussed. After that we need a more general approach to not only fix lines but other shapes

### Log Loss function

Error function needs to be continous and differentiable in order for gradient descent to work. The more a data point is from the boundary the higher the error (if it is on the wrong side).

### Activation functions

To move from discrete values to continous values we use an activation function for instance.

sigmoid(x) $S(x)=\frac{1}{1+e^{-x}}$

Applied to Wx+b this becomes $\hat{y}=S(Wx+b)$

### Softmax

If you want to assign a continous value to a multi-class classfication you can use softmax. This works even is the score is < 0. Softmax is equal two sigmoid when number of classes = 2.

In [25]:
import math

def softmax(L):
    return np.exp(L) / (np.sum(np.exp(L)))

def sigmoid(x):
    return 1 / (1+math.e**-x)

print(softmax([0,1]))
sigmoid(1)

[0.26894142 0.73105858]


0.7310585786300049

### One-hot encoding.

Transform multiple classes into columns with 1 and 0's. No linear dependencies.

### Maximum likelihood

Pick the model that gives the highest percentage to the correct label. By multiply the change of it beeing the correct label

### Products vs Sums

When multiplying lots of small numbers < 1 the products becomes very, very small and a single changes has a great impact which is of no good so we need to turn products into sums $log(a) + log(b) = log(a*b)$

We will use natural log ln. so with four points a,b,c,d a*b*c*d becomes ln(a)+ln(b)+ln(c)+ln(d), because they are all < 1 they are all negative and it is better to take the negative so -ln(a)-ln(b)-ln(c)-ln(d) , this is the  cross-entropy. Lower numbers means a better model. This can be used as error functions because negative log of a small number is a big number (so a big error) and smaller numbers are more mis classified. Our goal is to minimize cross-entropy


## Cross - Entropy

Formula: $-\sum_{i=1}^{m} y_iln(p_i) + (1-y_i)ln(1-p_i)$

Formula holds when there are 2 class yi{0,1}

cross entropy for multiple classes

$-\sum_{i=1}^{n} \sum_{j=1}^{m} y_{ij}ln(p_{ij})$

We can see from this formula that both are the samen for n=2

In [1]:
def cross_entropy(Y, P):
    cross_entropy = 0
    for idx, yi in enumerate(Y):
        pi = P[idx]
        cross_entropy += yi*np.log(pi) + (1-yi)*np.log(1-pi)
    return -cross_entropy

def cross_entropy_np(Y, P):
    Y = np.float_(Y)
    P = np.float_(P)
    return -np.sum(Y * np.log(P) + (1 - Y) * np.log(1 - P))b

## Error Function

Error function is average 1/m of the cross entropy

## Gradient descent step

