# Linear Regression

- Response Variable : Y
- Predictor Variable : X
- Use test Y, X to estimate the coefficients $\alpha$ and $\beta$

$Y_i$ = $\beta$ $X_i$ + $\alpha$ + $\epsilon$

In [1]:
# Predict Y
def predict(alpha, beta, x_i):
    return beta * x_i + alpha

In [2]:
# To calculate the error which is obtained when predicting y_i from beta * x_i + alpha when actual value is y_i
def error(alpha, beta, x_i, y_i):
    return y_i - predict(alpha, beta, x_i)

- We want to calculate the total error over the entire data set. But if we add all the errors we will probably get the wrong value since there may be positive error values and negative error values which will cancel each other. 
- To avoid this, we generally square each value and then add them

In [4]:
# Residual sum of squares (RSS)
def sum_of_squared_errors(alpha, beta, x, y):
    return sum(error(alpha, beta, x_i, y_i) ** 2 for x_i, y_i in zip(x,y))

- The next step after calculating the total error is to minimize this error and have as good fit (of regression line) as possible.
- This can be done by choosing the best alpha and beta values 

In [5]:
def least_squares_fit(x,y):
    """Given training values of x and y,
        find the least_squares values of alpha and beta"""
    beta = correlation(x,y) * standard_deviation(y) / standard_deviation(x)
    alpha = mean(y) - beta * mean(x)
    return alpha, beta

- The choice of alpha says that when we see the averagge value of x we predict the average value of y
- The choice of beta says that when input value increases by standard_deviation(x), the prediction increases by correlation(xy) * standard_deviation(y).
- When x and y are perfectly correlated, one standard_deviation increase in x results in one standard deviation increase of y in prediction

## RMSE

- Calculating the square root of average of sum of squares of residuals (RSS)

In [15]:
def RMSE(alpha, beta, x, y):
    return np.sqrt(sum_of_squared_errors(alpha, beta, x, y))

## $R^2$ statistic

- A better way to check the fit of the data is to calculate the coefficient of determination ($R^2$) which measures the fraction of total variation in the dependent variable  

$R^2$ = (TSS - RSS) / TSS

- where TSS = Total sum of squares, RSS = Residual sum of squares

- TSS measures the total variance in the response Y , and can be thought of as the amount of variability inherent in the response before the regression is performed. 
- In contrast, RSS measures the amount of variability that is left unexplained after performing the regression.
- Hence, TSS − RSS measures the amount of variability in the response that is explained (or removed) by performing the regression

In [6]:
def total_sum_of_squares(y):
    """total squared variation of y_i's from their mean"""
    return sum(v ** 2 for v in de_mean(y))     # de_mean(y) : translate y by subtracting its mean

def r_squared(alpha, beta, x, y):
    """Fraction of variation captured by mode = 1 - fraction of variation not captured by model"""
    return 1.0 - (sum_of_squared_errors(alpha, beta, x, y) / total_sum_of_squares(y))

- The values for $R^2$ range from 0 to 1, which captures the percentage of squared correlation between the predicted and actual values of the target variable. 
- A model with an $R^2$ of 0 is no better than a model that always predicts the mean of the target variable, whereas a model with an $R^2$ of 1 perfectly predicts the target variable. 
- Any value between 0 and 1 indicates what percentage of the target variable, using this model, can be explained by the features