Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation for Julia.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
benchmark
ext
src
test
.gitignore
.travis.yml
LICENSE.md
Manifest.toml
Project.toml
README.md
REQUIRE

README.md

CovarianceMatrices.jl

Build Status CovarianceMatrices Coverage Status codecov.io

Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation for Julia.

Installation

The package is registered on METADATA, so to install

pkg> add CovarianceMatrices

Introduction

This package provides types and methods useful to obtain consistent estimates of the long run covariance matrix of a random process.

Three classes of estimators are considered:

  1. HAC - heteroskedasticity and autocorrelation consistent (Andrews, 1996; Newey and West, 1994)
  2. HC - hetheroskedasticity consistent (White, 1982)
  3. CRVE - cluster robust (Arellano, 1986; Bell, 2002)

The typical application of these estimators is to conduct robust inference about parameters of generalized linear models.

Quick tour

HAC (Heteroskedasticity and Autocorrelation Consistent)

Available kernel types are:

  • TruncatedKernel
  • BartlettKernel
  • ParzenKernel
  • TukeyHanningKernel
  • QuadraticSpectralKernel

These types are subtypes of the abstract type HAC.

For example, ParzenKernel(NeweyWest) return an instance of TruncatedKernel parametrized by NeweyWest, the type that corresponds to the optimal bandwidth calculated following Newey and West (1994). Similarly, ParzenKernel(Andrews) corresponds to the optimal bandwidth obtained in Andrews (1991). If the bandwidth is known, it can be directly passed, i.e. TruncatedKernel(2).

Long-run variance of random vector

Consider testimating the long-run variance of a (p x 1) random vector X based on (T x 1) observations.

## X is (Txp)
CovarianceMatrices.variance(X, ParzenKernel())           ## Parzen Kernel with Optimal Bandwidth a lá Andrews
CovarianceMatrices.variance(X, ParzenKernel(NeweyWest))  ## Parzen Kernel with Optimal Bandwidth a lá Newey-West
CovarianceMatrices.variance(X, ParzenKernel(2))          ## Parzen Kernel with Bandwidth  = 2

Before calculating the variance the data can be prewhitened.

## X is (Txp)
CovarianceMatrices.variance(X, ParzenKernel(prewhiten=true))             ## Parzen Kernel with Optimal Bandwidth a lá Andrews
CovarianceMatrices.variance(X, ParzenKernel(NeweyWest, prewhiten=true))  ## Parzen Kernel with Optimal Bandwidth a lá Newey-West
CovarianceMatrices.variance(X, ParzenKernel(2, prewhiten=true))          ## Parzen Kernel with Bandwidth  = 2

Long-run variance of the regression coefficient

In the regression context, the function vcov does all the work:

vcov(::DataFrameRegressionModel, ::HAC)

Consider the following artificial data (a regression with autoregressive error component):

using CovarianceMatrices
using DataFrames
using Random
Random.seed!(1)
n = 500
x = randn(n,5)
u = Array{Float64}(undef, 2*n)
u[1] = rand()
for j in 2:2*n
    u[j] = 0.78*u[j-1] + randn()
end


df = DataFrame()
df[:y] = y
for j in enumerate([:x1, :x2, :x3, :x4, :x5])
    df[j[2]] = x[:,j[1]]
end

Using the data in df, the coefficient of the regression can be estimated using GLM

lm1 = glm(@formula(y~x1+x2+x3+x4+x5), df, Normal(), IdentityLink())

To get a consistent estimate of the long run variance of the estimated coefficients using a Quadratic Spectral kernel with automatic bandwidth selection à la Andrews

vcov(lm1, QuadraticSpectralKernel(Andrews))

If one wants to estimate the long-time variance using the same kernel, but with a bandwidth selected à la Newey-West

vcov(lm1, QuadraticSpectralKernel(NeweyWest))

The standard errors can be obtained by the stderror method

stderror(::DataFrameRegressionModel, ::HAC)