## Linear Algebra for Machine Learning


## Multiplying matrices and vectors

** Matrices Product:** The matrix product of matrices $\textbf{A}$ and $\textbf{B}$ is a third matrix $\textbf{C}$.

** Properties of Matrices products: ** 
* *Distributive:* $A(B + C) =AB + AC$
* *Associative:* $ABC = (AB)C$
* The transpose of a matrix product has a simple form $(AB)^T = B^TA$

**Note**: Matrix maltiplication is not commutative i.e $AB \neq BA$


** Dot product ** between two vectors $x$ and $y$ of the same dimensionality is the matrix product $\mathbf{x^Ty}$. The dot product between two vectors is commutative i.e $$\mathbf{x^T y=y^Tx}$$.



** There are two ways**.
We can either use the **np.dot** function, which applies a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments: OR
Use multiplication which applies to matrix-matrix.

In [None]:
import numpy as np # naming import convention


In [None]:
# # Let create an array of  A and B
A = np.array([[ 1.,  0., 1.],[ -1.,  1., 0.],[1.,  0.,  -1.]])

B = np.array([[2,1,-2], [-2,2,1], [1,-2,2]])

In [None]:
C = np.dot(A,B)
C

In [None]:
# Or you can write
C = A.dot(B)
C

In [None]:
# create a vector v
v = np.array([5, 2.5, 0.5])
np.dot(A, v)

## Exercise

Create two vectors of random values between -10 and 10 of size 100. Compute:

* elementwise product between them
* dot (scalar) product between them.

Compare your results

## Identity and Inverse Matrices

An *identity matrix* is a matrix that does not change any vector when we multiply that vector by that matrix denoted as $I_n \in  \mathbb{R}^{n \times n}$.

The matrix inverse of $A$ is denoted as $A^{-1}$, is defined as the matrix such that:
$$ \mathbf{A A^{-1} = I_n} $$

**Inverse: np.linalg.inv**


In [None]:
np.linalg.inv(A)

## Finding Determinant

The determinant of a square matrix $\mathbf{A}$ is often denoted $\mid\mathbf{A}\mid$ and is a quantity often used in linear algebra. 

In [None]:
np.linalg.det(A)

## Linear Equations

A system of linear equations is given as $$ \mathbf{Ax =b}$$ where $A \in \mathbb{R}^{m \times n}$ is a known matrix, $b \in \mathbb{R}^{m}$ is a known vector, and $x \in \mathbb{R}^n$ is a
vector of unknown variables. We can solve for $\mathbf{x}$ by following steps:

\begin{align*}
Ax &=b \\
A^{-1} Ax &= A^{-1} b \\
I_n x &=A^{-1} b \\
x &=A^{-1} b
\end{align*}


**For example**, let solve these equations: 

\begin{eqnarray*} x + 3y + 5z & = & 10 \\
                   2x + 5y + z & = & 8  \\
                   2x + 3y + 8z & = & 3
 \end{eqnarray*}

In [None]:
# In matrix notation: 
A = np.array([[1., 3., 5.],
               [2., 5., 1.],
               [2., 3., 8.]])

b = np.array([10,8,3])

#using a matrix inverse

#x = np.linalg.solve(A, b)
x = np.dot(np.linalg.inv(A),b)
print(x)


In [None]:
## Or use 
x = np.linalg.solve(A, b)
print(x)


### Numpy Exercise

* Generate a matrix with 10 rows and 50 columns, elements being drawn from normal distribution $\mathcal{N}(1, 10)$. Specify random seed to make the result reproducible.
* Normalize the matrix: subtract from each column its mean and divide by the standard deviation. I suggest np.mean, np.std with axis parameter.
* Define function scale which takes a vector of numbers and brings them to the range from 0 to 1:

$$ scale(x)=\frac{x_i - min(x)}{max(x) - min(x)}$$

## Linear Regression with Numpy

In regression, we are interested in predicting a scalar-valued target, such as the price of a stock. By linear, we mean that the target must be predicted as a linear function of the inputs. 

In order to formulate a linear regression, we need to define two things: a model (hypothesis) and a loss function. 

The model is functions that compute predictions from the inputs given by


$$
y = \sum_j w_jx_j + b
$$


where $w$ is the weights, and $b$ is an intercept term, which we'll call the bias

**Loss function** defines how well the model fit the data and thus show how far off the prediction $y$ is from the target $t$ and given as:

$$
\mathcal{L(y,t)} = \frac{1}{2}(y - t)^2
$$


When we combine our model and loss function, we get an optimization problem, where we are trying to minimize a cost function with respect to the model parameters (i.e. the weights and bias).The cost function is simply the loss, averaged over all the training examples.

\begin{align}
\varepsilon (w_1\ldots w_D,b) & = \frac{1}{N}\sum_{i=1}^N \mathcal{L}(y^{(i)},t^{(i)})\\
& = \frac{1}{2N}\sum_{i=1}^N (y^{(i)} - t^{(i)})^2\\
&=\frac{1}{2N}\sum_{i=1}^N \left(\sum_j w_jx_j^{(i)} + b -t^{(i)} \right)
\end{align}


## Solving the optimization problem


We now want to find the choice of model parameters $w_1\ldots w_D,b$ that minimizes $\varepsilon (w_1\ldots w_D,b)$ as given above.There are two methods which we can use: direct solution and gradient descent.

Using direct solution it can shown that:
$$
\mathbf{w} = (\mathbf{x^Tx})^{-1}\mathbf{x^Tt}
$$

Let us implement this in python

### Define the target function

In [None]:
import matplotlib.pyplot as plt  # Plotting library
%matplotlib inline
np.random.seed(seed=42)

In [None]:
# Define the vector of input samples as x, with 100  values sampled from a uniform distribution between 0 and 1
x = np.random.uniform(0, 1, 100)

In [None]:
# Create the targets t with some gaussian noise
noise_variance = 0.2  # Variance of the gaussian noise
# Gaussian noise error for each sample in x
noise = np.random.randn(100) * noise_variance
# Create targets t
t = x*2 + noise

In [None]:
# Plot the target t versus the input x
plt.plot(x, t, 'o', label='t')
# Plot the initial line
#plt.plot([0, 1], [f(0), f(1)], 'b-', label='f(x)')
plt.xlabel('$x$', fontsize=15)
plt.ylabel('$t$', fontsize=15)
plt.ylim([0,2])
plt.title('inputs (x) vs targets (t)')
plt.legend(loc=2);

In [None]:
# create loss function
def loss(x, w, t):
    N = x.shape[0]
    y = x.dot(w)
    loss = (y - t)
    return loss

In [None]:
w = np.array([2])

In [None]:
loss(x,w,t)

In [None]:
#check x shape
x.shape

In [None]:
# let us change the shape of x to  be NxD matrix
x = x.reshape(len(x), 1)
x.shape

In [None]:
# Run again the loss function
loss(x,w,t)

In [None]:
def cost(x,w, t):
    '''
    Evaluate the cost function in a vectorized manner for 
    inputs `x` and targets `t`, at weights `w1`, `w2` and `b`.
    '''
    N = x.shape[0]
    return (loss(x, w,t) **2).sum() / (2.0 * N)

In [None]:
def get_parameter(x, t):
    '''
    Solve linear regression exactly. (fully vectorized)
    
    Given `x` - NxD matrix of inputs
          `t` - target outputs
    Returns the optimal weights as a D-dimensional vector
    '''
    N, D = np.shape(x)
    A = np.matmul(x.T, x)
    c = np.dot(x.T, t)
    return np.matmul(linalg.inv(A), c)


In [None]:
# let try to run the cost function
cost(x,w, t)

In [None]:
w_grad = get_parameter(x,t)

In [None]:
plt.plot(x, t, 'o', label='t')
plt.plot(x,np.dot(x, w_grad), 'g-', label='torelence $=1e^{-7}$')
plt.xlabel('$x$', fontsize=15)
plt.ylabel('$t$', fontsize=15)
plt.ylim([0,2])
plt.title('inputs (x) vs targets (t)')
plt.legend(loc=2)