# Simple Linear Regression
## Contents
- Description  
- Errors
- Cost Function
- Convergence Algorithm

## Description  
Simple linear regression is a statistical method used to model the relationship between two variables: one independent variable (predictor) and one dependent variable (outcome). The goal is to find a linear equation, \( y = mx + b \), where \( y \) is the dependent variable, \( x \) is the independent variable, \( m \) is the slope (rate of change), and \( b \) is the intercept (value of \( y \) when \( x = 0 \)).

## Errors
Simple linear regression trys to make a best fit line with a minimium error.  **errors** represent the difference between the actual observed values
and the values predicted by the regression model. The error for each data point is calculated as:
$$ E_i = y_i - \hat{y}_i $$

Where:
- $E_i$  is the error for the \( i \)-th observation,
- $y_i$  is the actual observed value of the dependent variable,
- $\hat{y}_i$  is the predicted value from the regression model.

## Cost Function
The cost function measures the difference between the predicted values of the model and the actual target values. By minimizing this cost function, we can determine the optimal values for the model’s parameters and improve its performance.

The Mean Squared Error (MSE) is given by:

$$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

Where:
- $y_i$ is the actual value
- $\hat{y}_i$ is the predicted value
- $n$ is the total number of data points

Reference: https://medium.com/@yennhi95zz/3-understanding-the-cost-function-in-linear-regression-for-machine-learning-beginners-ec9edeecbdde

## Convergence Algorithm
Gradient Descent is an iterative optimization algorithm that tries to find the optimum value (Minimum/Maximum) of an objective function. 
In a convergence algorithm, such as gradient descent, the weight update formula is:

$$ w_{t+1} = w_t - \eta \nabla L(w_t) $$

Where:
- $w_t$ is the weight at iteration $t$
- $w_{t+1}$ is the updated weight after iteration $t$
- $\eta$ is the learning rate
- $\nabla L(w_t)$ is the gradient of the loss function with respect to $w_t$

Reference: https://www.geeksforgeeks.org/gradient-descent-in-linear-regression/

# Multiple Linear Regression

Multiple linear regression is an extension of simple linear regression that models the relationship between two or more independent variables (predictors) and a single dependent variable (outcome). The goal is to find the best-fit equation that describes how the dependent variable changes with variations in the independent variables.

The formula for multiple linear regression is:

$$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \epsilon $$

Where:
- $y$ is the dependent variable (the value we are trying to predict)
- $\beta_0$ is the intercept
- $\beta_1, \beta_2, \dots, \beta_n$ are the coefficients for each independent variable
- $x_1, x_2, \dots, x_n$ are the independent variables (the predictors)
- $\epsilon$ is the error term (residuals)


# Perfomance
- R-squared
- Adjusted R-squared

## R-squared
R-squared (R2) is defined as a number that tells you how well the independent variable(s) in a statistical model explain the variation in the dependent variable. It ranges from 0 to 1, where 1 indicates a perfect fit of the model to the data.

The formula for R-squared ($R^2$) is:

$$ R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} $$

Where:
- $R^2$ is the coefficient of determination
- $y_i$ is the actual value for data point $i$
- $\hat{y}_i$ is the predicted value for data point $i$
- $\bar{y}$ is the mean of the actual values
- $n$ is the total number of data points


## Adjusted R-squared
The adjusted R-squared is a modified version of R-squared that accounts for predictors that are not significant in a regression model. In other words, the adjusted R-squared shows whether adding additional predictors improve a regression model or not.
The formula for Adjusted R-squared ($R^2_{adj}$) is:

$$ R^2_{adj} = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - k - 1} \right) $$

Where:
- $R^2_{adj}$ is the Adjusted R-squared
- $R^2$ is the R-squared (coefficient of determination)
- $n$ is the total number of data points
- $k$ is the number of independent variables (predictors)


Refernce for R-square vs Adjusted R-squared https://corporatefinanceinstitute.com/resources/data-science/adjusted-r-squared/#:~:text=Summary,adding%20value%20to%20the%20model.