# Working with Matrices

**Def**: _rectangular array of numbers_

**Dimension**: _Number of Rows_ __x__ _Number of Columns_   **(nxm)**

$$
A = 
\begin{pmatrix}
0.8944272 & 0.4472136 & 0.234223\\
-0.4472136 & -0.8944272 & 0.234223
\end{pmatrix}
$$
where $A \in  R^{2x3}$ and $A_{ij}$ refers to element in $i^{th}$ row and $j^{th}$ column.



# Vectors
**Def**: _an __nx1__ matrix_

$$
y = 
\begin{pmatrix}
y_0 \\
y_1 \\
y_2 \\
y_3
\end{pmatrix}
$$

where $y_i$ refers to the $i^{th}$ element in the vector $y$.

# Matrix Operations
### Addition & Subtraction ###
$
\begin{pmatrix}
1 & 1 & 3 \\
1 & 5 & -2 \\
\end{pmatrix}
$
+
$
\begin{pmatrix}
3 & 4 & -2 \\
1 & 3 & -2 \\
\end{pmatrix}
$
= 
$
\begin{pmatrix}
4 & 5 & 1 \\
2 & 8 & -4 \\
\end{pmatrix}
$

__Note:__ _The dimensions of both matrices must be the same._


### Scalar Multiplication and Division ###
$
3 * 
\begin{pmatrix}
1 & 1 & 3 \\
1 & 5 & -2 \\
\end{pmatrix}
=
\begin{pmatrix}
3 & 3 & 9 \\
3 & 15 & -6 \\
\end{pmatrix}
$

__Note:__ _Done elementwise._


### Vector Multiplication ###
$
\begin{pmatrix}
1 & 1 & 3 \\
1 & 5 & -2 \\
\end{pmatrix}
* 
\begin{pmatrix}
3 & 3 \\
3 & 2 \\
4 & 4 
\end{pmatrix}
= 
\begin{pmatrix}
(1*3)+(1*3)+(3*4) & (1*3)+(1*2)+(3*4) \\
(1*3)+(5*3)+(4*4) & (1*3)+(5*2)+(-2*4) \\
\end{pmatrix}
= 
\begin{pmatrix}
18 & 17 \\
34 & 5 \\
\end{pmatrix}
$




# How to get started with ML? 

###### Let the algorithm make hypothesis ($ h_\theta $).  
> Linear Regressian Model Hypothesis function:  
>    > $ h_\theta (x) = \theta_0 + \theta_1(x) $  
>
> where $ \theta_i $ for $i = 0 \dots j $ are parameters we are trying to optimize

###### The difference is commonly known as the error ($ \epsilon $). 
> > $ \epsilon  = h_\theta (x) - y$   
> >
> where $ y^T = [y_0 \space y_1 \space y_2 \cdots  y_n] $
> and $ x^T = [x_0 \space x_1 \space x_2  \cdots  x_n] $

######  The systemic way of grading the overall performance of the hypothesis is through the use of *Cost Function*.
> Linear Regressian Cost Function: 
> > $ J(\theta_0, \theta_1) = \frac{1}{2m}\sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$
> 
> _Note_: **Cost Functions** can vary based on the application of their use. 
>

![GitHub Logo](visuals/cost_func_lin_reg.png)

###### The goal is to minimize our *Cost Function* .

# Mimizing the Cost Function: Gradient Descent

###### The basic concept is imagining a ball at some point on a hill and let it roll down to direction with the steepest slope.
![gradient_descent](visuals/gradient_descent.png)

###### Algorithm:   

`repeat until convergence:`
      $ \theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j} J(\theta_0, \theta_1) $

###### Implementation: 
$$ temp_0:=\theta_0-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1) $$
$$ \cdots  $$
$$ temp_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1) $$
$$ \theta_0 := temp_0 $$
$$ \cdots $$
$$ \theta_j := temp_j; $$

