<a href="https://colab.research.google.com/github/minhhuong05/Econometrics_Midterm_Assignment/blob/main/vif%2Bheteroskedasticity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# VIF Test

In [None]:
import pandas as pd
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.tools.tools import add_constant

In [None]:
# Read data from CSV file, drop duplicate
data = pd.read_csv('/content/drive/MyDrive/bank-additional-full.csv', sep=';')
data = data.drop_duplicates()

# Select variables of interest
selected_vars = ['euribor3m', 'emp.var.rate', 'nr.employed', 'cons.price.idx', 'campaign']
X = data[selected_vars]

In [None]:
# Add a constant column to the data for computing VIF
X_with_const = add_constant(X)

# Compute VIF for each independent variable
vif_data = pd.DataFrame()
vif_data["Variable"] = X_with_const.columns
vif_data["VIF"] = [variance_inflation_factor(X_with_const.values, i) for i in range(X_with_const.shape[1])]

print(vif_data)

         Variable            VIF
0           const  303001.939708
1       euribor3m      31.341625
2    emp.var.rate      32.614143
3     nr.employed      14.474171
4  cons.price.idx       4.683652
5        campaign       1.031704


Based on the results of the Variance Inflation Factor (VIF) test:

- The VIF values for the variables 'euribor3m', 'emp.var.rate', and 'nr.employed' are significantly higher than 10, indicating high multicollinearity among these variables.
- The VIF value for the variable 'cons.price.idx' is relatively low, suggesting low multicollinearity.
- The VIF value for the variable 'campaign' is close to 1, indicating little to no multicollinearity.

Therefore, we can conclude that there is high multicollinearity among the variables 'euribor3m', 'emp.var.rate', and 'nr.employed', which may affect the stability and reliability of the regression coefficients for these variables.

# Heteroskedasticity Test

In [None]:
import statsmodels.api as sm
from statsmodels.compat import lzip
from statsmodels.stats.diagnostic import het_breuschpagan

In [None]:
# Read data from CSV file,drop duplicate
data = pd.read_csv('/content/drive/MyDrive/bank-additional-full.csv', sep=';')
data = data.drop_duplicates()

# Select independent variables and the dependent variable
X = data[['euribor3m', 'emp.var.rate', 'nr.employed', 'cons.price.idx', 'campaign']]
y = data['y'].map({'yes': 1, 'no': 0})

# Add a constant column to the data
X = sm.add_constant(X)

# Fit the regression model
model = sm.OLS(y, X).fit()

In [None]:
# Perform the Breusch-Pagan test
lm, lm_p_value, fvalue, f_p_value = het_breuschpagan(model.resid, X)

# Print the results
print("Breusch-Pagan LM Statistic:", lm)
print("Breusch-Pagan LM p-value:", lm_p_value)
print("Breusch-Pagan F-Statistic:", fvalue)
print("Breusch-Pagan F p-value:", f_p_value)

Breusch-Pagan LM Statistic: 3568.621173171877
Breusch-Pagan LM p-value: 0.0
Breusch-Pagan F-Statistic: 781.336739133104
Breusch-Pagan F p-value: 0.0


The results of the Breusch-Pagan test show that the p-values of both LM Statistic and F-Statistic are very small (both equal to 0). This indicates that we have enough evidence to reject the null hypothesis of no heteroscedasticity. In this case, we conclude that there is heteroscedasticity in the regression model.