# Econometrics

# 8th Session

# Simulating Instrumental Variable Data and Performing 2SLS Estimation

# The goal of econometrics is to understand the relationships between variables accurately.

### Importing data from the excel file

### Using the simulation in the appendix, I created this Excel file. I generated Y values using an intercept of 0.7 and an X variable with a coefficient of 1.2.

In [190]:
import pandas as pd
import numpy as np

In [192]:
data = pd.read_excel("IV_data_session_8_example.xlsx")

Y = data["Y"]
X = data["X"]
Z = data["Z"]

### Checking the correlation between Z and X

In [194]:
np.corrcoef(Z, X)

array([[1.        , 0.67194391],
       [0.67194391, 1.        ]])

### Checking the covariance between X and Y

In [198]:
np.cov(X, Y)

array([[   65.44454431,   220.50413431],
       [  220.50413431, 18969.22434043]])

### Checking the covariance between Z and Y

In [200]:
np.cov(Z, Y)

array([[1.60768174e+01, 2.61547776e+01],
       [2.61547776e+01, 1.89692243e+04]])

### Impelmenting OLS Y and X

In [202]:
import statsmodels.api as sm

X_ols = sm.add_constant(X)
ols_model = sm.OLS(Y, X_ols).fit()

print("OLS result: ")
print(ols_model.summary())

OLS result: 
                            OLS Regression Results                            
Dep. Variable:                      Y   R-squared:                       0.039
Model:                            OLS   Adj. R-squared:                  0.039
Method:                 Least Squares   F-statistic:                     407.5
Date:                Fri, 06 Jun 2025   Prob (F-statistic):           7.30e-89
Time:                        00:57:39   Log-Likelihood:                -63242.
No. Observations:               10000   AIC:                         1.265e+05
Df Residuals:                    9998   BIC:                         1.265e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -28.5733      2.626    -10

### The estimated coefficient is 3.36 instead of the expected 1.2, indicating that something is not right.

### Using the instrumental variable Z, I estimated the endogenous variable.

In [204]:
Z_intercept = sm.add_constant(Z)
first_stage_model = sm.OLS(X, Z_intercept).fit()
X_hat = first_stage_model.fittedvalues

print("First OLS result: ")
print(first_stage_model.summary())

First OLS result: 
                            OLS Regression Results                            
Dep. Variable:                      X   R-squared:                       0.452
Model:                            OLS   Adj. R-squared:                  0.451
Method:                 Least Squares   F-statistic:                     8230.
Date:                Fri, 06 Jun 2025   Prob (F-statistic):               0.00
Time:                        01:02:32   Log-Likelihood:                -32092.
No. Observations:               10000   AIC:                         6.419e+04
Df Residuals:                    9998   BIC:                         6.420e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0691      0.161 

### Then, I used the estimated values of X to run a regression with Y as the dependent variable.

In [206]:
X_hat_intercept = sm.add_constant(X_hat)
second_stage_model = sm.OLS(Y, X_hat_intercept).fit()

print("Second OLS result: ")
print(second_stage_model.summary())

Second OLS result: 
                            OLS Regression Results                            
Dep. Variable:                      Y   R-squared:                       0.002
Model:                            OLS   Adj. R-squared:                  0.002
Method:                 Least Squares   F-statistic:                     22.48
Date:                Fri, 06 Jun 2025   Prob (F-statistic):           2.16e-06
Time:                        01:05:01   Log-Likelihood:                -63431.
No. Observations:               10000   AIC:                         1.269e+05
Df Residuals:                    9998   BIC:                         1.269e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.7000      3.682

### The estimated parameters are now 1.2 and 0.7, matching the values used in the data simulation.