# My personal notes for Andrew Ng Machine Learning course

link:
https://www.youtube.com/playlist?list=PLZ9qNFMHZ-A4rycgrgOYma6zxF4BZGGPW

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import math

#### Transforming a for loop in a matrix multiplication

Note 1: if you want to convert a matrix (X) line in a matrix column use **X.reshape()**

Note 2: if you want to convert a vector into a matrix with one o more columns use **np.c_[np.ones((len(X),1)),X]**

See Example below

In [5]:
#Example
X=np.array([1,2,3,4])
print(f"{X}\n")
print(f"{X.reshape(4,1)}\n")
print(f"{np.c_[np.ones((len(X),2)),X]}")

del X

[1 2 3 4]

[[1]
 [2]
 [3]
 [4]]

[[1. 1. 1.]
 [1. 1. 2.]
 [1. 1. 3.]
 [1. 1. 4.]]


## Example

Imagine that you have a dataset Y with only one feature and you want to evaluate a funciton **f(x)=-40+0.25*x**.

We can do it as a for loop, but better with a matrix multiplication as follows:

$$Y=
\begin{pmatrix}
 2104\\
 1416\\
 1534\\
 852\\
\end{pmatrix}
$$

and
$$f(x)=-40+0.25\cdot~x$$

Solving through matrix multiplication we have:

$$f(Y)=
\begin{pmatrix}
 1 && 2104\\
 1 &&1416\\
 1 &&1534\\
 1 && 852\\
\end{pmatrix}
\times
\begin{pmatrix}
 -40\\
 0.25\\
\end{pmatrix}
=
\begin{pmatrix}
 486\\
 314\\
 343.5\\
 173\\
\end{pmatrix}
$$

In [3]:
Y = np.array([2104,1416,1534,852]) #dataset
#Transform the Y in a matrix 2x1 (rows x columns)
Y = np.c_[np.ones((len(Y),1)),Y]

fx = np.array([-40,0.25])

Answer = Y@fx #or Y.dot(fx)
print(Answer)

[486.  314.  343.5 173. ]


The same example above can be solve for multiple competing hypothesis f(x). Example:
$$Y=
\begin{pmatrix}
 2104\\
 1416\\
 1534\\
 852\\
\end{pmatrix}
$$

and
1. $$f(x)=-40+0.25\cdot~x$$
2. $$f(x)=200+0.1\cdot~x$$
3. $$f(x)=-150+0.4\cdot~x$$

Solving through matrix multiplication we have:

$$f(Y)=
\begin{pmatrix}
 1 && 2104\\
 1 &&1416\\
 1 &&1534\\
 1 && 852\\
\end{pmatrix}
\times
\begin{pmatrix}
 -40 && 200 && -150\\
 0.25 && 0.1 && 0.4\\
\end{pmatrix}
=
\begin{pmatrix}
 486 && 410 && 692\\
 314 && 342 && 416\\
 344 && 353 && 464\\
 173 && 285 && 191\\
\end{pmatrix}
$$

In [6]:
Y = np.array([2104,1416,1534,852]) #dataset
#Transform the Y in a matrix 2x1 (rows x columns)
Y = np.c_[np.ones((len(Y),1)),Y]

fx = np.array([[-40,200,-150],[0.25,0.1,0.4]])

Answer = Y@fx #or Y.dot(fx)
print(Answer)

[[486.  410.4 691.6]
 [314.  341.6 416.4]
 [343.5 353.4 463.6]
 [173.  285.2 190.8]]


**Linear Regression with multiple features** 

Again imagine that we want to predict house prices.


\begin{pmatrix}
 \text{Size}(x_1) && \text{#ofbedrooms}(x_2) && \text{#offloors}(x_3) && \text{age of home}(x_4) && \text{Price}(Y)\\
 2104 && 5 && 1 && 45 && 460 \\
 1416&& 3 && 2 && 40 && 232 \\
 1534 && 3 && 2 && 30 && 315 \\
\end{pmatrix}

Notation:

$n=$number of features

$x^{(i)}=$ input of $i^{th}$ training example. Ex: $x^{(2)}=[1416,3,2,232]$

$x_j^{(i)}=$ input of feature j in $i^{th}$ training example. Ex: $x_3^{(2)}=2$

**Hypothesis with multiple features**

$h_{\theta}(x)=\theta^{T}X=\theta_0 x_0 + \theta_1 x_1 + \cdots \theta_n x_n$ 

Because

$X=
\begin{pmatrix}
 x_0\\
 x_1\\
 \vdots \\
 x_n\\
\end{pmatrix}
$
and
$\theta=
\begin{pmatrix}
 \theta_0\\
 \theta_1\\
 \vdots\\
 \theta_n\\
\end{pmatrix}
$

which turn out to be

$\theta^{T}X=\matrix{\pmatrix{\theta_0 & \theta_1 & ... & \theta_n}}\pmatrix{x_0^{(i)}\cr x_1^{(i)}\cr ...\cr x_n^{(i)}\cr}=\theta_0 x_0 + \theta_1 x_1 + \cdots \theta_n x_n$

**Compute transpose matrix in python**

Let A be an **mxn** matrix and let $B=A^T$. 
Them B is an **nxm** matrix and **$B_{ij}=A_{ji}$**

In [14]:
A = np.array([[1, 2, 3], [4, 5, 6]])

print(f'Original Array A:\n{arr1}\n')

B = A.transpose()

print(f'Transpose B:\n{arr1_transpose}')


Original Array A:
[[1 2 3]
 [4 5 6]]

Transpose B:
[[1 4]
 [2 5]
 [3 6]]


<span style="color:red">ATTENTION</span>:
Scale your features to be in a similar scale. Doing so GD will work more efficiently.

Example:

$x_1$ = size (ranging from 0 to 2000)

$x_2$ = number of bedrooms (ranging from 1 to 5)

**Scaling** to approx. $-1\leq x \leq 1$.

$x_1 = \frac{x_1}{max({x_1})}$


$x_2 = \frac{x_1}{max({x_2})}$

Another option is to discount the mean. Or discount the mean and divide by the std.
Values too large are bad as well are those too small

**Maker sure GD is working correclty**

Always plot **$J(\theta)$ vs #ofiterations$**

It should **always decrease**. If some bumps occur or it increase instead you are probably using a too large learning rate $\alpha$