# Linear Regression

A type of supervised learning to predict a value in a continuous dataset with the following hypothesis function

\begin{equation*}
h(x) =  \theta_1x + \theta_0
\end{equation*}

where h(x) is the output value,
x is the input value, ${\theta_0}$ and ${\theta_1}$ is the parameter that we need to find in order draw a line that best fit the datasets


## Cost function
Cost function is used as a measurement to determine how good the line fits the datasets, if we are given any arbitrary parameters a and b

The cost function that we will be using is called Mean Square Error (MSE). This function basically will get the average of the total sum of the square difference between the predicted value and the actual value. 

Basically, what we want to achieve is to get the smallest MSE value possible because that will give us the most accurate prediction

### Calculating the loss for one variable linear regression

Let's say we have a set of 12 data points:    
  
*taken from Andrew Ng's Machine Learning course on Coursera

| ${x}$ | ${y}$  |
|-------|----|
| 1     |-890|
| 2     |-1411|
| 2     |-1560|
| 3     |-2220|
| 3     |-2091|
| 4     |-2878|
| 5     |-3537|
| 6     |-3268|
| 6     |-3920|
| 6     |-4163|
| 8     |-5471|
| 10    |-5157|

and we have 4 possible values of our parameters ${\theta_0}$ and ${\theta_1}$:

| ${\theta_0}$ | ${\theta_1}$ | Model |
|-------|----|----|
| -1780     |530.9| ${y}$ = -1780${x}$ + 530.9
| -1780     |-530.9| ${y}$ = -1780${x}$ - 530.9
| -569.6     |530.9| ${y}$ = -569.6${x}$ + 530.9
| -569.6     |-530.9| ${y}$ = -569.6${x}$ - 530.9

So, the objective here is to determine which ${\theta_0}$ and ${\theta_1}$ will be the optimal value for our linear
regression model to predict the y value, given x

### Python script to calculate MSE (Mean Square Error)

In [1]:
def calc_mse(data, theta0, theta1):
    
    # Get the SSE by adding up the square of the difference between prediction value
    # and the actual value, this will be fed into MSE calculation
    sse = 0
    for i in range(len(data)):
        pred_val = data[i][0] * theta0 + theta1
        sq_diff = (data[i][1] - pred_val) ** 2
        sse = sse + sq_diff
    
    # Get the MSE (Mean Square Error) or simply the loss 
    mse = sse / (2 * len(data)) 
    return mse

### Now let's calculate the loss for different values of ${\theta_0}$ and ${\theta_1}$

In [2]:
data = [[1,-890],
        [2, -1411], 
        [2, -1560],
        [3, -2220],
        [3, -2091],
        [4, -2878],
        [5,-3537],
        [6, -3268],
        [6, -3920],
        [6, -4163],
        [8,-5471],
        [10,-5157]]

theta_arr = [[-1780,530.9],
             [-1780,-530.9],
             [-569.6,530.9],
             [-569.6,-530.9]]

for x in range(len(theta_arr)):
    theta0 = theta_arr[x][0]
    theta1 = theta_arr[x][1]
    print("MSE for theta0=",theta0," and theta1=",theta1,"is ", calc_mse(data, theta0, theta1))


MSE for theta0= -1780  and theta1= 530.9 is  2356033743.12
MSE for theta0= -1780  and theta1= -530.9 is  3160207085.52
MSE for theta0= -569.6  and theta1= 530.9 is  71346612.23999998
MSE for theta0= -569.6  and theta1= -530.9 is  11863726.799999999


### Conclusion

To recap on the MSE explanation in the beginning of this document, we want to achive the smallest MSE value in order to have a function that can give us the most accurate prediction of the next unknown y, given a new x value.

The minimal loss (or the smallest MSE value) that we can get with the 4 different ${\theta_0}$ and ${\theta_1}$ values is 11863726.799999999

So in other words, the suitable linear regression function would be:

\begin{equation*} y = -569.6x -530.9
\end{equation*}

## Moving Forward

You may be asking: what does this have to do with machine learning? Finding the loss value (in this case, MSE) is one of the steps in a machine learning optimization process to minimize the error in the prediction function.

A machine learning task will perform the following 3 steps

1. PREDICT the value with a hypothesis function using initial parameters ${\theta_0}$ and ${\theta_1}$
2. EVALUATE using the cost function to determine how inaccurate the prediction by comparing prediction and actual value
3. TRAIN (gradually fine-tune the ${\theta}$) parameters with the guide of the cost fucntion until you eventually get the smallest loss value
4. Rinse and repeat.

In the next section, we will look into what's called Parameterized Learning. It is the step of training the machine to repetitively find out what is the best values of ${\theta_0}$ and ${\theta_1}$ which will give the smallest loss value, and the training algorithm used is called the Gradient Descent.