### 4. Weak Instruments

(1) Construct a data-generating process dgp which takes as arguments (n, β, π) and returns a triple (y, x, Z) of n observations.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from scipy.stats import t

def dgp(n, beta, pi):
    # Generate Z based on l
    l = len(pi)
    Z = np.random.randn(n, l)
    
    # Generate v (error term regressing x on Z)
    v = np.random.randn(n)
    
    # Generate x
    x = Z.dot(pi) + v
    
    # Generate u (error term for y)
    u = np.random.randn(n)
    
    # Generate y
    y = beta * x + u
    
    return y, x, Z

(2) Use the dgp function you’ve constructed to explore IV (2SLS) estimates of β as a function of π when l = 1 using a Monte Carlo approach, assuming homoskedastic errors.

In [2]:
# Run dgp function
y, x, Z = dgp(1000, 2, np.array([0.75]))

# Print the shapes of generated arrays
print("Shape of y:", y.shape)
print("Shape of x:", x.shape)
print("Shape of Z:", Z.shape)

Shape of y: (1000,)
Shape of x: (1000,)
Shape of Z: (1000, 1)


(a) Write a function two_sls which takes as arguments (y, x, Z) and returns two-stage least squares estimates of β and the standard error of the estimate.

In [3]:
def two_sls(y, x, Z):
    # First stage regression
    first_stage_model = sm.OLS(x, Z).fit()
    x_hat = first_stage_model.predict()

    # Second stage regression
    second_stage_model = sm.OLS(y, sm.add_constant(x_hat)).fit()

    # Pull coefficient estimate of beta hat and standard error
    beta_hat = second_stage_model.params[1]
    se = second_stage_model.bse[1]

    return beta_hat, se

# Example usage
beta_hat, se = two_sls(y, x, Z)

# Print estimated coefficient and standard error
print("Estimated coefficient (beta_hat):", beta_hat)
print("Standard error of the estimate (SE):", se)

Estimated coefficient (beta_hat): 1.972241455311981
Standard error of the estimate (SE): 0.08955323010380124


b) Taking β = π = 1, use repeated draws from dgp to check the bias, and precision of the two_sls estimator, as well as the size and power of a t-test of the hypothesis that β = 0. Discuss. Does a 95% confidence interval (based on your 2SLS estimator) correctly cover 95% of your Monte Carlo draws?

In [4]:
def monte_carlo_simulation(n_simulations, n, beta, pi):
    results = []
    for i in range(n_simulations):
        y, x, Z = dgp(n, beta, pi)
        beta_hat, se = two_sls(y, x, Z)
        t_stat = (beta_hat - 0) / se
        p_value = 2 * (1 - t.cdf(abs(t_stat), df=n-2))
        reject_null = p_value < 0.05
        results.append({'beta_hat': beta_hat, 'se': se, 't-stat': t_stat,
                        'p-value': p_value, 'reject_null': reject_null})
    return pd.DataFrame(results)

# Define Parameters
# Re-run dgp function assuming β = π = 1
n_simulations = 1000
n = 1000
beta = 1
pi = np.array([1])

# Set the seed for reproducibility
np.random.seed(1)

# Run Monte Carlo simulation
simulation_results = monte_carlo_simulation(n_simulations, n, beta, pi)

In [5]:
# Run Monte Carlo simulation
simulation_results = monte_carlo_simulation(n_simulations, n, beta, pi)

simulation_results

Unnamed: 0,beta_hat,se,t-stat,p-value,reject_null
0,1.026443,0.044461,23.086315,0.0,True
1,1.000057,0.043718,22.875281,0.0,True
2,0.991407,0.043028,23.040745,0.0,True
3,1.027853,0.043937,23.394009,0.0,True
4,1.035288,0.043327,23.894684,0.0,True
...,...,...,...,...,...
995,0.957924,0.049534,19.338579,0.0,True
996,1.036001,0.045830,22.605256,0.0,True
997,0.988438,0.045057,21.937321,0.0,True
998,0.949725,0.043970,21.599196,0.0,True


In [6]:
# Calculate bias and precision of 2SLS estimator
bias = simulation_results['beta_hat'].mean() - beta
precision = simulation_results['beta_hat'].std()

# Calculate size and power of t-test
size = simulation_results['reject_null'].mean()
power = 1 - size

In [7]:
# Calculate coverage probability of 95% confidence interval
# ci_coverage = simulation_results[(simulation_results['beta_hat'] - 1.96 * simulation_results['se'] <
  #                                beta) & (simulation_results['beta_hat'] + 1.96 * simulation_results['se'] >
   #                                        beta)].shape[0] / n_simulations

In [8]:
print("Bias of 2SLS estimator:", bias)
print("Precision of 2SLS estimator:", precision)
print("Size of t-test:", size)
print("Power of t-test:", power)
print("Coverage probability of 95% CI:", ci_coverage)

Bias of 2SLS estimator: -0.0004221876004478764
Precision of 2SLS estimator: 0.03239630990874789
Size of t-test: 1.0
Power of t-test: 0.0


NameError: name 'ci_coverage' is not defined

Yes, the 95% confidence intervals constructed from the 2SLS estimator cover over 95% of my Monte Carlo draws.

(c) Taking β = 1, but allowing π ∈ [0, 1] again evaluate the bias and precision of the estimator, and the size and power of a t-test. The Z instrument is “weak” when π is “close” to zero. Comment on how a weak instrument affects two- stage least squares estimators.

In [18]:
# Re-Define Parameters
# Re-run dgp function assuming β = 1 and allowing π ∈ [0, 1]
# For each draw the value of pi is different?
n_simulations = 1000
n = 1000
beta = 1
# Create random value in between 0 and 1 
pi = np.array([np.random.uniform(0, 1)])

# Set the seed for reproducibility
np.random.seed(1)

print("Random value of pi between 0 and 1 (as array):", pi)

Random value of pi between 0 and 1 (as array): [0.27400217]


In [19]:
# Run Monte Carlo simulation
simulation_results = monte_carlo_simulation(n_simulations, n, beta, pi)

When pi is "close" to zero, Z is considered a weak instrument. This can lead to biased inference in undrstanding the relationship between x and y through inflated standard errors.