# Linear Regression
## 1. <u>Basic equation</u>
$\hat{y} = \theta_0 + \theta_1x_1 + ... + \theta_nx_n$    
     
In this equation:
   * $\hat{y}$ is the predicted value.
   * $n$ is the number of features.
   * $x_i$ is the $i^{th}$ feature.
   * $\theta_j$ is the $j^{th}$ model parameter.
   * $\theta_0$ is the bias term.
   * $\theta_1, \theta_2 ..., \theta_n$ are the feature weights.   
    
For people familiar with statistics, the **bias term** is the **intercept** and the **feature weights** are the **betas**. It is just a matter of nomenclature.

### <u>Vectorized form</u>
The equation can be written in a vectorized for:
$$\hat{y} = h_\theta(x) = \theta.x$$  
  

In this equation:
   * $\theta$ is the model's *parameter vector*, containing the bias term $\theta_0$ and the feature weights $\theta_1$ to $\theta_n$.
   * $x$ is the instance's *feature vector*, containing $x_0$ to $x_n$, with $x_0$ always equal to 1.
   * $\theta.x$ is the dot product of the vectors $\theta$ and $x$, which is equal to $\theta_0x_0+\theta_1x_1+\theta_2x_2+...+\theta_nx_n$.
   * $h_\theta$ is the hypothesis function using the model parameters $\theta$.
   <br/>
   
## 2. <u>Linear Algebra </u>  
* In machine learning vectors are often represented as *column vectors* which are **2D arrays with a single column**.
<br/>

### <u>Dot product</u>   
   
 The vectorized form of the equation tells us that $\hat{y}$ is really the dot product between $\theta$ and $x$.  
 
 In linear algebra, the **dot product** is a way of doing *vector-vector* multiplication. In fact, we said that $\theta$ is the vector that contain all the model's parameters and that $x$ was the vector that contained the values of a specific subject on all the variables/features.
   
 **<u>EX:</u>**  
   
   
   - If we are measuring the effects of age,weight(kg) and height(cm) on IQ, the *features vector* for a particular individual, Jack, would be:  
   
   $$x_{\textit{Jack}} = \begin{bmatrix}1\\\textit{Age}_{\textit{Jack}}\\\textit{Weight}_{\textit{Jack}}\\\textit{Height}_{\textit{Jack}}\end{bmatrix} = \begin{bmatrix}1\\22\\72\\178\end{bmatrix}$$   
<br/>
   - And the *parameter vector* would stay the same across all participants(we will see later how to get the parameters):
   <br/>
   $$\theta = \begin{bmatrix}\theta_0\\ \theta_1 \\ \theta_2\\\theta_3\end{bmatrix}$$

<br/>

The result of a *dot product* is always a scalar i.e a single number (which is our case, is $\hat{y}$).
### <u>Compute the dot product</u>


- We write the *dot product* using this form: $a.b$
- Both vectors must have the same dimensionality.
- The calculation is simple: **element 1 of first vector** $\times$ **element 1 of second vector** $+$ **element 2 of first vector** $\times$ **element 2 of second vector** etc...
- **<u>EX:</u>** $$\begin{bmatrix}3&2&5\end{bmatrix}\bullet\begin{bmatrix}2&1&1\end{bmatrix} = 3\cdot2+2\cdot1+5\cdot1 = 6+2+5 = 13 $$


### <u>Compute the dot product in python</u>

In [7]:

import numpy as np
v1 = np.array([3,2,5])
v2 = np.array([2,1,1])

np.dot(v1,v2)

13

### <u>Dot product in machine learning<u/>:
- In machine learning, we like to consider *vectors* as 2D-arrays (matrices) with 1 column and 0 rows. Why? Because our framework rarely needs *vectors*. In fact we are not interested in calculating only one $\hat{y}$ but rather a group of them. In other terms, we don't want to predict only Jack's score, but also Jane's, Alice's and John's. All at the same time.Therefore the previous Jack's feature vector will become a features matrix with rows = number of subjects and columns = number of features.
- That's why we will be using matrix-matrix or matrix-vector multiplication most of the time.
- It is therefore conveniant to write the **dot product** formula as a matrix multiplication.
- If A and B are two matrices, the matrix multiplication is: $$A^{T}\cdot B$$
- $A^T$ is the *transpose* of A.
#### In sum:
    - In machine learning we use matrices most of the time (2D vectors).
    - Therefore we will define a vector as a 1 column 0 rows matrix.
    - So the *dot product* is now conceptualized as a matrix multiplication.
<br/>
    

    


### <u>Matrix multiplication:<u/>
- Matrix multiplication is a basic operation in linear algebra.
- A matrix is a 2D(rows,columns) array.
-**<u>EX:</u>**
<br/>
    $$\begin{bmatrix}1&3&5\\7&2&1\end{bmatrix}$$
    <br/>
    
- The basic formula for matrix multiplication is 
    <br/>
    
    $$A\cdot B = A^T \cdot B$$<br/>
    
    
- **<u>EX:</u>**
    
    
