# Least Squares Regression in Python
This notebook contains Inle Bush's notes and an example of least squares regression.

Note: The purpose of the following code is education not implementation. The function scipy.optimize.least_squares() is a better alternative for implementation.

## Explanation
The line of best fit will take the form:

$Y=mX+c$

The following quadratic loss function is used to estimate the accuracy of the line of best fit. To maximize estimated accuracy, this loss function must be minimized.

$L(x)=\sum_{i=1}^{n} (y_i-(mX+c))^2$

Taking the partial derivative of the loss function and setting to 0, one ends up with the following equation.


$nc=\sum_{i=1}^{n} y_i-m\sum_{i=1}^{n} x_i$

$c=\sum_{i=1}^{n} y_i-m\sum_{i=1}^{n} x_i$







## Functions
No external libraries are used in this execution. For my own practice, I implemented these basic statistical functions below:
* mean
* variance
* covariance

In [None]:
def mean(xarr: list) -> float:
    '''returns the average of a list'''

    mean = 0
    for x in xarr:
        mean += x
    mean /= len(xarr)

    return mean

def variance(xarr: list) -> float:
    '''Returns population variance:
     The average squared distance from the mean.
     '''
    variance = 0
    meanx = mean(xarr)
    for x in xarr:
        variance += (meanx - x) ** 2
    variance /= len(xarr) #Population size = len(xarr)

    return variance

def covariance(xarr: list, yarr: list) -> float:
    '''Returns population covariance:
    The average product of the difference of x to the mean and the difference of y to the mean
    '''
    meanx = mean(xarr)
    meany = mean(yarr)

    covar = 0
    for i in range(len(xarr)):
        covar += (xarr[i] - meanx) * (yarr[i] - meany)
    covar /= len(xarr)

    return covar

## Least Squares

As shown above, the equation for least squares is:



In [None]:
def leastsquares(xarr: list, yarr: list) -> tuple:
    '''Returns slope and y intercept of the line of best fit. 
    In the form slope, y_int
    '''
    meanx = mean(xarr)
    meany = mean(yarr)
    
    slope = 0
    for x in xarr:
        for y in yarr:
            slope += (y - meany) / (x - meanx)
    
    return slope, meany - slope * meanx

    # Using other functions
    # slope = covariance(xarr, yarr)/variance(xarr)
    # y_int = mean(yarr) - slope * mean(xarr) # Because the line goes through the point with mean x and mean y
    # return slope, y_int