## 2. Breusch-Pagan Extended

#### Consider a linear regression of the form  y = α + βx +u with (y,x) both scalar random variables, where it is assumed that (a.i) E(ux) = Eu = 0 and (a.ii) E(u^2|x) = sigma^2
#### (1) The condition a.i is essentially untestable; explain why.

In the real world, it's not possible to directly measure the error term u since we cannot know the "true" values, we only know our estimated values. This makes it impossible to test condition a.i since we can't calculate the expected value of u.

#### (2) Breusch and Pagan (1979) argue that one can test a.ii via an auxiliary regression uˆ2 = c+dx+e, where the uˆ2 are the residuals from the first regression, and the test of a.ii then becomes a test of H0 : d = 0. Describe the logic of the test of a.ii

Condition a.ii specifies that there is no heteroskedasticity. Using an auxillary regression can show us whether the residuals from the first regression vary with x (heteroskedasticity) or if they are the same across all values of x (no heteroskedasticity). If d is non-zero, there is heteroskedasticty since u^2 varies by x value. Doing a hypothesis test with a null hypothesis of d = 0 tests whether x has an effect on the residuals from the first regression.

#### (3) Use the two conditions a.i and a.ii to construct a GMM version of the  Breusch-Pagan test.

In [64]:
import numpy as np
import statsmodels.api as sm
np.random.seed(12345)

# First, generate some data
# Data generating process (DGP) function (picking random N, sigma_v, beta_true, and c)
N = 1000000
sigma_v = 2
beta_true = 1

def linear_dgp(N, sigma_v, beta_true):
    X = np.random.normal(0, 1, N) 
    v = np.random.normal(0, sigma_v, N)  # Error term u
    y = X * beta_true + v  # Outcome variable y
    return y, X, v

y, X, Z = linear_dgp(N, sigma_v, beta_true)

# Add constant term to X
X = sm.add_constant(X)

# obtain u values (residuals from OLS)
model = sm.OLS(y, X)
results = model.fit()
u_hat = results.resid  # Residuals

# Now, write moment conditions:
# First, from a.i, E(ux) = 0
# Second, from a.ii, E(u^2 - σ^2) = 0

# Test moment conditions
# First, from a.i, E(ux) = 0
moment1 = np.mean(u_hat * X[:,1])  # Corrected for dot product, should be close to 0
print(moment1)

# Second, from a.ii, E(u^2 - σ^2) = 0
moment2 = np.mean(u_hat**2) - sigma_v**2  # Should be close to 0
print(moment2)


-5.075264652987244e-16
0.00022056601456732494


#### (4) What can you say about the performance or relative merits of the  Bruesch-Pagan test versus your GMM alternative?

The GMM alternative offers flexibility in situations where condition a.ii is violated, such as in part (5). However, the Bruesch-Pagan test is convenient in that it offers a binary result. Either we detect a heteroskedasticity problem or we do not (based on whether we accept or reject our null hypothesis).

??????????????????????????????????????????????

#### (5) Suppose that in fact that x is distributed uniformly over the interval [0, 2π], and E(u 2 |x) = σ 2 (x) = σ 2 sin(2x), thus violating a.ii. What can you say about the performance of the Breusch Pagan test in this circumstance? Can you modify your GMM test to provide a superior alternative?

In this case, there is heteroskedasticity. However, we can see that the relationship between E(u^2|x) and σ^2 is nonlinear, and is in fact approximately symmetric around pi, the midway point of our interval in which x is uniformly distributed. This means the Breusch Pagan test is not hypothetically able to detect this violation of a.ii.
The GMM test, on the other hand, is more flexible and we can modify the moment conditions to account for the relationship between E(u^2|x) and σ^2. Specifically, we can say E(u^2 - σ^2 * sin2x) = 0.

#### (6) In the above, we’ve considered a test of a specific functional form for the variance of u. Suppose instead that we don’t have any prior information regarding the form of E(u 2 |x) = f(x). Discuss how you might go about constructing an extended version of the Breusch-Pagan test which tests for f(x) non-constant.

We could add additional terms to the auxiliary regression uˆ2 = c+dx+e to extend the Breusch-Pagan test. So, for example, we could make the auxillary regression uˆ2 = c+dx+jsinx+kx^2+me^x+e and then run hypothesis tests for null hypotheses d=0, j=0, k=0, and m=0. If it turns out that any of these coefficients are non-zero, that could potentially detect heteroskedasticity in more cases.

#### (7) Show that you can use your ideas about estimating f(x) to construct a more efficient estimator of β if f(x) isn’t constant. Relate your estimator to the optimal generalized least squares (GLS) estimator.

???????????????????????????????????????????