# Linear Regression

    A Supervised Machine Learning algorithm that predicts the continuous value by understanding the linear relationship between predictor and response variable.
    
    Linear Regression is a parametric model where we deal with assumption.
    
    Two types of LR :
        Simple Linear Regression :
        
$$y = α + β x$$

        Multiple Linear Regression :
        
$$y = α + β_1 x_1 + β_2 x_2 + β_3 x_3 +...+β_q x_q$$      

    Parameters of Linear Regression :
        α : intercept | β : Coefficent of Regression
    
    y --> variable(Dependent | Response | Outcome)
    x --> variable(Independent | Predictor | Explanatory) 
    
    These parameters are estimated by
        Statistical Method Estimation : 
            OLS(Ordinal Least Square) : aim is to minimize the error of the predicted model.
        
        Optimal Method Estimation :
            Gradient Descent
        
---        

    OLS(Ordinal Least Square)
         Technique for estimating coefficients of linear regression equations.
         
         Least squares stand for the minimum squares error (SSE).
         
         OLS method aims to minimize the sum of square differences between the observed and predicted values. 
         
    Understanding the Mathematics behind the OLS Algorithm
    
$$Y_i = \alpha + \beta X_i + \varepsilon_i $$

    Where εi is the error term, and α, β are the true (but unobserved) parameters of the regression.

$$\text{Error }(\hat{\varepsilon}_i) = \hat{y_i} - y_i$$

    The goal of simple linear regression is to find those parameters α and β for which the error term is minimized.
    
$$\hat{\alpha} = \underset{\hat{\alpha}}{\text{min}} \sum_{i=1}^{n} \left( y_i - (\alpha + \beta x_i) \right)^2 = \underset{\hat{\alpha}}{\text{min}} \sum_{i=1}^{n} \left(\varepsilon_i\right)^2$$

$$\hat{\beta} = \underset{\hat{\beta}}{\text{min}} \sum_{i=1}^{n} \left( y_i - (\alpha + \beta x_i) \right)^2 = \underset{\hat{\beta}}{\text{min}} \sum_{i=1}^{n} \left(\varepsilon_i\right)^2$$

    The equations for α and β that minimize the error term can be derived through calculus by taking partial derivatives with respect to α and β and setting them to zero. 

    This procedure is called ordinary least squares (OLS) error.
    Advantages of OLS Regression
        
    Partial Derivative with respect to α:        
$$ \frac{\partial}{\partial \alpha} = -2 \sum_{i=1}^{n} (y_i - (\alpha + \beta x_i)) $$  

    Setting partial derivative for α to zero:
$$ \sum_{i=1}^{n} y_i - n \hat{\alpha} - \hat{\beta} \sum_{i=1}^{n} x_i = 0 $$    

    Solving for α cap:
$$\hat{\alpha} = \frac{1}{n} \left( \sum_{i=1}^{n} y_i - \hat{\beta} \sum_{i=1}^{n} x_i \right)$$   

    Partial Derivative with respect to β:
$$\frac{\partial}{\partial \beta} = -2 \sum_{i=1}^{n} x_i(y_i - (\alpha + \beta x_i)) $$   

    Setting partial derivative for β to zero:
$$\sum_{i=1}^{n} x_i y_i - \hat{\alpha} \sum_{i=1}^{n} x_i - \hat{\beta} \sum_{i=1}^{n} x_i^2 = 0$$    

    Solving for β cap:
$$\hat{\beta} = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}$$    

---

    Assumption of Linear Regression
    
        Linearity
        
        Normality
        
        No Multicollinearity
        
        Homoscedasticity
        
        No Autocorrelation

---

    Linearity : 
        Understanding the linear association between each predictor with response variable.
        
        With Pearson's R Correlation, we can find the relation whether they are strongly or weakly correlated or not.
        
        Graphically we can visualize with scatter plot!        

---

    Normality : 
        Reponse Variablie should be Normal Distributed  with Mean μ and Variance σ2.
        
        Residual | Error should be Normal Distributed  with Mean μ = 0 and Variance σ2.
        
        Graphically we can visualize with kdeplot or displot!
        
        Measure :
            Anderson Darling Test
            
            Shapiro-Wilk Test
            
        When above measure fails, we have to transform the data to make it normal distributed.
        
        Transformation :
            BOX-COX transformation
            
            Yeo-Johnson(data point are -ve) transformation
            
            Log transformation

---

    No Multicollinearity :
        There should no correlation among independent variable. If we have multicollinearity then we get false significance.
        
        With Correlation, we can find the relation whether they are strongly or weakly correlated with each other or not.
        
        Measure :
            Correlation 
            
            Variance Inflation Factor

---

    Homoscedasticiy : 
        Checks for Same Spreadness
        
        Meaning variance of error or residual should be statistically equal to zero
        Measure : 
            Breusch Pagan Test
                Ho: Homoscedasticity exists (constant variance of residuals).
                Ha: Heteroscedasticity exists (non-constant variance of residuals)

            
            Goldfeld Quandt Test
                Ho: Homoscedasticity (equal variance)
                Ha: Heteroscedasticity (unequal variance)
                
```python
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_goldfeldquandt

model = sm.OLS(y, X).fit()

# Perform the Goldfeld-Quandt test
gq_test = het_goldfeldquandt(model.resid, X)
print("F-statistic:", gq_test[0])
print("p-value:", gq_test[1])
print("The null hypothesis of homoscedasticity is rejected if p-value < 0.05")

```
            
            White Test
                Ho: Homoscedasticity
                Ha: Heteroscedasticity
```python
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_white

# Fit your regression model
model = sm.OLS(y, X).fit()

# Calculate residuals
residuals = model.resid

# Get independent variables for White test
exog = model.model.exog

# Perform White test
white_test = het_white(residuals, exog)
print("White Test Results:")
print("LM Statistic:", white_test[0])
print("LM-Test p-value:", white_test[1])
print("F-Test p-value:", white_test[2])
print("Degrees of Freedom:", white_test[3])
```
            
            NCV test(non-constant variance)
                Ho: Homoscedasticity(variance of the errors (residuals) is constant)
                Ha: Heteroscedasticity
                
                
```python
import statsmodels.api as sm
from statsmodels.compat import lzip
from statsmodels.stats.diagnostic import het_goldfeldquandt

model = sm.OLS(y, X)
results = model.fit()

# Perform the Goldfeld-Quandt test for non-constant variance
name = ['F statistic', 'p-value']
test = het_goldfeldquandt(results.resid, X)
print(lzip(name, test))
```

---
---

    No Autocorrelation : 
        No correlation on observation of Dependent Variable. 
        
        
        Measure :
            Durbin Watson Test
                Ho is significant we use linear regression
                Ha is significant then use Time series
                
```python
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson

model = sm.OLS(y, X)
results = model.fit()

# Perform Durbin-Watson test for autocorrelation in residuals
dw_statistic = durbin_watson(results.resid)
print("Durbin-Watson statistic:", dw_statistic)
```

    Interpretation of Durbin-Watson test :
        Test statistic ~ 2, => no significant autocorrelation.
        Test statistic is significantly < 2 (approaching 0), it suggests positive autocorrelation.
        Test statistic is significantly > 2 (approaching 4), it suggests negative autocorrelation.