# Linear Regression

## Articles used:
- [Introduction to Linear Regression in Python](https://towardsdatascience.com/introduction-to-linear-regression-in-python-c12a072bedf0)

Linear regression is a basic predictive analytics technique that uses historical data to predict an output variable.

The basic idea is that if we can fit a linear regression model to observed data, we can then use the model to predict any future values. For example, let’s assume that we have found from historical data that the price (P) of a house is linearly dependent upon its size (S) — in fact, we found that a house’s price is exactly 90 times its size. The equation will look like this:
<br><br>
<b>P = 90 x S</b>
<br><br>
With this model, we can then predict the cost of any house. If we have a house that is 1,500 square feet, we can calculate its price to be:
<br><br>
<b>P = 90 x 1500 = $135,000</b>


## 1. Basic concepts and mathematics

There are two kinds of variables in a linear regression model:
The <b>input or predictor variable</b> is the variable(s) that help predict the value of the output variable. It is commonly referred to as ${X}$.
<br>
The <b>output variable</b> is the variable that we want to predict. It is commonly referred to as ${Y}$.
<br>
To estimate Y using linear regression, we assume the equation:

<center></b>$Y_e = \alpha + \beta X$</b></center>

where <b>$Y_e$ is the estimated or predicted value of $Y$ based on our linear equation.</b>
<br>
Our goal is to find statistically significant values of the parameters $\alpha$ and $\beta$ that minimise the difference between $Y$ and $Y_e$. (See green lines): <br>
![Green lines show the difference between actual values Y and estimate values Yₑ
](../resources/images/articles/differencebetweenactualvaluesYandestimatevaluesY.png)
<br>
If we are able to determine the optimum values of these two parameters, then we will have the line of best fit that we can use to predict the values of $Y$, given the value of $X$.
<b>To estimate $\alpha$ and $\beta$ We can use a method called ordinary least squares.</b>
<br>
The objective of the least squares method is to find values of $\alpha$ and $\beta$ that minimise the sum of the squared difference between $Y$ and $Y_e$. 

<center>
<b>
$\beta = \frac{
    \prod_{i=1}^{n} (X_i - \overline{X})(Y_i - \overline{Y})
}{
    \prod_{i=1}^{n} (X_i - \overline{X})^2
}$
</b>
</center>
<br>
<center>
<b>
$\alpha = \overline{Y} - \beta * \overline{X}$
</b>
</center>
<br>
<center>
<b>
$\beta = Cov(X, Y) / Var(X)$
</b>
</center>


where $\overline{X}$ is the mean of $X$ values and $\overline{Y}$ is the mean of $Y$ values.
<br>


## 2. Linear Regression From Scratch