###### <h2> Exercise 8 - Heteroskedasticity</h2>

We now consider a linear specification $y=\alpha + \beta_1x_1 + \beta_2x_2 +\epsilon$ in which the error term $\epsilon$ has variance which scales as the square of $x_1$.  We begin by performing several $t$ tests when we set $\beta_1=0$.

In [21]:
using Distributions
using PyPlot
using LinearAlgebra
using Printf
using Random

TypeI = 0;

d = Normal(0,1);
N = 500;

alpha = 0.78;
beta1 = 0.0;
beta2 = 5.61;

M = 1000;
betas = zeros(M);

for ii=1:M

    x1 = rand(d,N);
    x2 = rand(d,N);

    eps = zeros(N);
    for ii=1:N
        eps[ii] = rand(Normal(0.0,x1[ii]^2))
    end

    y = alpha .+ beta1*x1 + beta2*x2 + eps;

    X = [ones(N) x1 x2];

    betaHat = (X'*X)\(X'*y);
    epsHat = y - X*betaHat
    SigHat = inv(X'*X)*(epsHat'*epsHat)
    
    betaHat1 = betaHat[2]
    betas[ii] = betaHat1
    stdErr1 = sqrt(SigHat[2,2])/sqrt(N-3)
    tHat1 = betaHat1/stdErr1;
    p1 = cdf(TDist(3),-abs(tHat1)) + (1-cdf(TDist(3),abs(tHat1)))
    
    if(p1<0.06)
        TypeI = TypeI + 1;
    end
    
end

println("Average coefficient estimate: ", mean(betas))
println("Number of Type I errors: ", TypeI)

Average coefficient estimate: 0.0012974731722132631
Number of Type I errors: 176


The parameter estimate for $\beta_1$ are reasonable, which is to be expected since having heteroskedastic errors does not affect consistency of the OLS estimator. However, a large fraction of the tests return a Type I error.  To confirm that heteroskedasticity is the issue, we repeat the exercise with a constant error term variance.

In [23]:
TypeI = 0

for ii=1:M

    x1 = rand(d,N);
    x2 = rand(d,N);

    eps = rand(Normal(),N);

    y = alpha .+ beta1*x1 + beta2*x2 + eps;

    X = [ones(N) x1 x2];

    betaHat = (X'*X)\(X'*y);
    epsHat = y - X*betaHat
    SigHat = inv(X'*X)*(epsHat'*epsHat)
    
    betaHat1 = betaHat[2]
    betas[ii] = betaHat1
    stdErr1 = sqrt(SigHat[2,2])/sqrt(N-3)
    tHat1 = betaHat1/stdErr1;
    p1 = cdf(TDist(3),-abs(tHat1)) + (1-cdf(TDist(3),abs(tHat1)))
    
    if(p1<0.06)
        TypeI = TypeI + 1;
    end
    
end

println("Average coefficient estimate: ", mean(betas))
println("Number of Type I errors: ", TypeI)

Average coefficient estimate: -0.001380853599643182
Number of Type I errors: 5


Removing the heteroskedasticity reduces the likelihood of a Type I error to almost nothing.