# Model Assumptions 

In the context of statistical modeling, especially in linear regression, model assumptions refer to the set of conditions that the underlying data and model need to satisfy for the results to be valid and reliable. These assumptions are critical because if they are violated, the estimates obtained from the model may be biased, inefficient, or misleading.

In [144]:
# imports
import pandas as pd
import numpy as np
import seaborn as sns
import statsmodels.api as sm

In [145]:
# Read data

df_auto = pd.read_csv("./data/auto-mpg.csv")
df_auto.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


In [146]:
# Read data 
df_ads = pd.read_csv("./data/advertising.csv")
df_ads.head()

Unnamed: 0.1,Unnamed: 0,TV,radio,newspaper,sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9



# 1.Linearity 

    The linearity assumption requires that there is a linear relationship between the response variable (y) and predictor (X). Linear means that the change in y by 1-unit change in X, is constant.



![image.png](attachment:image.png)

To check for linearity in linear regression, the goal is to determine if the relationship between the independent variables (predictors) and the dependent variable (target) is linear. Here are some common methods to check for linearity

### 1. Scatter Plots

The simplest and most direct method to check for linearity is by visually inspecting scatter plots.
Steps:

    Plot each independent variable (X) against the dependent variable (Y).
    Look for a straight-line pattern in the scatter plot.

In [147]:
# An Example of a Scatter plot 


In [148]:
# pairplots (scatter matrix)


### Statistical Testing for Linearity
#### Rainbow Test

The Rainbow Test is a diagnostic test used to check whether a linear regression model is misspecified — specifically, to determine if there are non-linear relationships between the independent variables and the dependent variable that a linear model might fail to capture

#### Rainbow test  Hypothesis:

    The null hypothesis of the Rainbow test is that the linear model is correctly specified (i.e., the relationship between the predictors and the target variable is linear).
    
    The alternative hypothesis is that the linear model is misspecified, and there is a non-linear relationship between the predictors and the dependent variable
    
*If the p-value from the test is small (typically less than 0.05), you reject the null hypothesis*

In [149]:
# build a model


In [150]:
# Ranibow test example 
# returns F-statitisc ,p,valu


## 2. Independence

The independence assumption has two parts:

independence of features and independence of errors.

### Independence of Features

    Independence of features means that we want to avoid collinearity between features in a multiple regression.

Collinearity means that the features can be used to predict each other, which causes numerical problems for the regression algorithm and leads to unstable coefficient

In [151]:
# How to check  for indipedence of features 

In [152]:
# getting the corr


In [153]:
# Heat map for better visualization


### Independence of Errors

    Independence of errors means we want to avoid autocorrelation of errors. Autocorrelation means that a variable is correlated with itself, so that later values can be predicted based on previous values.


## 3. Normality

    The normality assumption states that the model residuals should follow a normal distribution


![image.png](attachment:image.png)

A visual check for normality (above an example of a normal distribution)

In [154]:
# visual check on the residuals


### Statistical Test for normality
#### Jarque-Bera Test

The Jarque-Bera (JB) test is a statistical test used to check whether a dataset follows a normal distribution. Specifically, it tests the null hypothesis that the data comes from a normal distribution by examining two key characteristics:

    Skewness: A measure of the asymmetry of the distribution.
    Kurtosis: A measure of the "tailedness" or the sharpness of the peak of the distribution.

Hypotheses:

    Null Hypothesis (H₀): The data follows a normal distribution.
    Alternative Hypothesis (H₁): The data does not follow a normal distribution.

*If the p-value from the test is small (typically less than 0.05 or 0.01), you reject the null hypothesis and conclude that the data significantly deviates from a normal distribution.*

In [155]:
# Example statistical test 
# Values returned are 

    # JB test statistic
    # The p-value for JB
    # Skew
    # Kurtosis



In [156]:
# print summary


In [157]:
# Other test examples Todo

from statsmodels.stats.diagnostic import kstest_normal
from statsmodels.stats.diagnostic import normal_ad
from statsmodels.stats.stattools import omni_normtest


## 4. Equal Variance: Homoscedasticity

![image.png](attachment:image.png)

### Statistical test
#### Goldfeld-Quandt test

One popular statistical test for homoscedasticity is the Goldfeld-Quandt test, which divides the dataset into two groups, then finds the MSE of the residuals for each group. The ratio of the second group's mse_resid divided by the first group's mse_resid becomes a statistic that can be compared to the f-distribution to find a p-value.

Null Hypothesis (H₀):

    The variance of the residuals is constant (i.e., the error terms are homoscedastic).
    
Alternative Hypothesis (H₁):

    The variance of the residuals is not constant (i.e., the error terms are heteroscedastic).

*If the p-value from the test is small (typically less than 0.05) reject the null hypothesis*

In [158]:
# out puts 
# Goldfeld-Quandt test statistic
# Goldfeld-Quandt test p-value
# Ordering



## Other tests
 

In [159]:
# Todo

from statsmodels.stats.diagnostic import het_breuschpagan