# Linear Regression

## Definition
- The essence of regression of how to fit a line to some data.
- Apply Chi-Square Test to see the DEPENDENCY of the line with dataset.

## Math
- Our linear function: y = mx + c
- Chi-Squared Test:
    * $X^2 = \sum(y_{i} − mx_{i} − c)$
- Minimize the Chi-Squared Test $X^2$ by differentiating it (respect to m and c) and setting it to zero, then we have two equations:
    * $m = \frac{\sum(x_{i} - \bar{x})y_{i}}{\sum(x_{i} - \bar{x})^2}$
    * $c = \bar{y} - m\bar{x}$

In [None]:
import numpy as np
from scipy import stats
from matplotlib import pyplot as plt

In [None]:
# Defines coordinates
x_coordinates = [0.43,0.5,0.6,0.7,0.8,0.75,0.62]
y_coordinates = [0.12,0.25,0.55,0.78,0.8,0.82,0.57]

In [None]:
# Linear Regression Implementation
def linfit(x_coordinates, y_coordinates):
    # Calculate mean-x and mean-y
    x_bar = np.sum(x_coordinates) / len(x_coordinates)
    y_bar = np.sum(y_coordinates) / len(y_coordinates)
    
    suma = 0
    sumb = 0
    for i in range(len(x_coordinates)):
        suma += (x_coordinates[i] - x_bar) * y_coordinates[i]
        sumb += (x_coordinates[i] - x_bar) ** 2
    m = suma / sumb
    c = y_bar - m*x_bar
    
    return m, c


def graph(x_coordinates, y_coordinates, m, c):
    x = np.array([0,0.25,0.5,0.75,1.0])
    y = eval(f'{m}*x + {c}')
    plt.plot(x, y)
    
    for i in range(len(x_coordinates)):
        plt.scatter(x_coordinates[i], y_coordinates[i], c='r')
    plt.show()

In [None]:
# Applying Linear Regression
m, c = linfit(x_coordinates, y_coordinates)
graph(x_coordinates, y_coordinates, m, c)
print(f'm = {m}, c = {c}')

In [None]:
# Another approach by using Scipy library.
linr = stats.linregress(x_coordinates, y_coordinates)
m, c = linr.slope, linr.intercept
graph(x_coordinates, y_coordinates, m, c)
print(f'm = {m}, c = {c}')