# Linear Regression using Statsmodel

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 


In [2]:
df = pd.read_csv("USA_Housing.csv")

In [4]:
df.drop("Address",axis = 1 , inplace = True)

In [5]:
df.head()

Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Avg. Area Number of Bedrooms,Area Population,Price
0,79545.458574,5.682861,7.009188,4.09,23086.800503,1059034.0
1,79248.642455,6.0029,6.730821,3.09,40173.072174,1505891.0
2,61287.067179,5.86589,8.512727,5.13,36882.1594,1058988.0
3,63345.240046,7.188236,5.586729,3.26,34310.242831,1260617.0
4,59982.197226,5.040555,7.839388,4.23,26354.109472,630943.5


In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.drop('Price',axis = 1), df['Price'], test_size=0.33, random_state=42)

In [7]:
import statsmodels.api as sm

In [10]:
X_train = sm.add_constant(X_train)
X_test = sm.add_constant(X_test)

model = sm.OLS(y_train,X_train)
results = model.fit()

  return ptp(axis=axis, out=out, **kwargs)


In [12]:
results.params

const                          -2.638142e+06
Avg. Area Income                2.158989e+01
Avg. Area House Age             1.661025e+05
Avg. Area Number of Rooms       1.198959e+05
Avg. Area Number of Bedrooms    1.901071e+03
Area Population                 1.523150e+01
dtype: float64

In [13]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                  Price   R-squared:                       0.919
Model:                            OLS   Adj. R-squared:                  0.919
Method:                 Least Squares   F-statistic:                     7557.
Date:                Wed, 04 Dec 2019   Prob (F-statistic):               0.00
Time:                        12:37:50   Log-Likelihood:                -43375.
No. Observations:                3350   AIC:                         8.676e+04
Df Residuals:                    3344   BIC:                         8.680e+04
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                                   coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------------
const           

# F- Test

* If your Linear Regression model fit well then R-squared valued would be closer to 1.
* Adjusted R-squared will penalies R-square value if you will keep on adding unecessary fetuares for building your model. 
* If Adjusted R-squared is much lesser than R-squared it's a sign that you are using a feature which has very lesser impact on the target.

* F-Statistic or F-test is used to access the significance of overall Regression model.

**It compares the existing model with multiple feature with Intercept only model(without feature). The Null hypothesis is that these 2 models are equal.**
Whereas alternate Hypothesis is that Intercept only model is worse than our model.
We will get back a P-value(Prob (F-statistic)) and F-statistics value for whether to accept or reject the Null hypothesis.


### If P-value(Prob (F-statistic)) < 0.05 and F-statistics > 1 or high indicates that good relationship amoung the target and features.

# T-Test

* T-test will take into account one feature at a time.
* The Null hypothesis in this case is feature coefficient is equal to 0. And Alternate hypothesis is that feature coefficient not equal 0.

* if  P>|t| value is 0 or near to 0 , it means you reject the Null hypothesis and accept the Alternate hypothesis.


**Omnibus tests** are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall.


**AIC and BIC** differ by the way they penalize the number of parameters of a model. More precisely, BIC criterion will induce a higher penalization for models with an intricate parametrization in comparison with AIC criterion.

**Log-Likelihood**
* Coefficients of a linear regression model can be estimated using a negative log-likelihood function from maximum likelihood estimation.
* The negative log-likelihood function can be used to derive the least squares solution to linear regression.

## Dublin Watson Test

The Durbin Watson Test is a measure of **autocorrelation (also called serial correlation)** in residuals from regression analysis. Autocorrelation is the similarity of a time series over successive time intervals. 

* It can lead to underestimates of the standard error and can cause you to think predictors are significant when they are not. The Durbin Watson test looks for a specific type of serial correlation, the AR(1) process.

The Hypotheses for the Durbin Watson test are:
* H0 = no first order autocorrelation.
* H1 = first order correlation exists.

## Jarque-Bera Test

The Jarque-Bera Test, is a test for normality. 
Specifically, the test matches the skewness and kurtosis of data to see if it matches a normal distribution. The data could take many forms, including:

* Time Series Data.
* Errors in a regression model.
* Data in a Vector.


**A normal distribution has a skew of zero (i.e. it’s perfectly symmetrical around the mean) and a kurtosis of three; kurtosis tells you how much data is in the tails and gives you an idea about how “peaked” the distribution is. It’s not necessary to know the mean or the standard deviation for the data in order to run the test.**



* Jarque-Bera Test
https://www.statisticshowto.datasciencecentral.com/jarque-bera-test/

* Durbin Watson Test   
https://www.statisticshowto.datasciencecentral.com/durbin-watson-test-coefficient/

* Maximum likelihood
https://machinelearningmastery.com/linear-regression-with-maximum-likelihood-estimation/