#Linear Regression using Least Squares Method

##Usage

It is used to find a line that best fit an amount of points distributed in space.

![Graphic1](https://www.scribbr.com/wp-content/uploads/2021/04/explanatory-and-response-variables-768x432.png)

In the example above, it is possible to see that the line fits best most points at the same time. This will minimize MSE, because MSE is defined as $\frac{1}{n}\sum(Y_1 - Y_0)^2$, which is literally the average size of the distance with all points' $Y_1$ and $Y_0$ similar to the Euclidian Norm.

##Calculation

Where:

* $\beta$ is the vector of coefficients that minimize the MSE.
* $X$ is the matrix of predictor variables in the form of a design matrix. Each line is an object, and each column is a variable. The first column represents the intercept term. Usually filled with 1's for estimation of Y-intercept. 
* $X^T$ is $X$ transposed.
* $Y$ is the vector of response variables.

##Making an example

The following graphic represent points of the matrix:

Points $= [1, 2, 3, 4, 5, 6, 7]$

$Y = [1.5, 3.8, 6.7, 9.0, 11.2, 13.6, 16]$

![graphic2](https://i.ibb.co/SnCLTd2/figure1.png)

First, build the $X$ design matrix (the first matrix below) filling the intercept terms with ones, creating the matrix $X$. We can also calculate $X^T$.

![matrixXandTransposed](https://i.ibb.co/kq5YVXS/image.png)

Calculating $X^TX$:

![MatrixProduct](https://i.ibb.co/kSkXC1k/image.png)

Then it's inverse will be:

![MatrixInverse](https://i.ibb.co/PMwLqwp/image.png)

Now that we already have $(X^TX)^{-1}$, remember, $\beta = (X^TX)^{-1}X^TY$. So, multiplying it by $X^T$:

![MatrixProductfromInversed](https://i.ibb.co/R7vNzr5/image.png)

In the end, multiplying by the response variables:

![MatrixResult](https://i.ibb.co/DQbRzN9/image.png)

Those are the coefficients that minimize the MSE. Remember the linear equation formula, $y = mx + b$. Here, $-\frac{29}{35} = b$ and $\frac{169}{70} = m$. This makes gets us to $y = \frac{169}{70}x - \frac{29}{35}$. Let's see how this looks on the graphic:

![FinalChart](https://i.ibb.co/1fhWVFH/figure1.png)

#Using the Penrose-Inverse

The Penrose-Inverse formula is defined as:

$A^{+} = (A^TA)^{-1}A^T$

* $A^{+}$ is the Moore-Penrose Inverse.
* $A$ is the original matrix.
* $A^T$ is $A$ transposed.

Putting side by side with $\beta = (X^TX)^{-1}X^TY$, notice that $(X^TX)^{-1}X^T$ is replaceable with the Moore-Penrose Inverse $X^{+}$.

With Moore-Penrose Inverse: $\beta = X^{+}Y$


In [None]:
import numpy as np

A = np.array([[1, 2], [3, 4]]) #example matrix
At = A.transpose()

mooreA = np.dot(np.linalg.inv(np.dot(At, A)), At) #the math way

print(mooreA) #print the matrix
print(np.linalg.pinv(A)) #the numpy way
#prints exactly the same answer

##References

https://www.youtube.com/watch?v=P8hT5nDai6A -> This video explains the same Linear Regression model, but shows a different way to do it. He does the same example we do in this article, but we do it in Matrix Form, which involves the Penrose-Inverse. https://www.ma.imperial.ac.uk/~das01/GSACourse/Regression.pdf -> This article explains more on this method we are using here.