学习R语言{sandwich}包

接着这个包学习各种标准误。

# 普通标准误

先学一下一下经典标准误，公式为$$s^2(X'X)^{-1}$$

In [1]:
# 先做一下普通回归，看下R的标准误
fit <- lm(mpg ~ wt + hp, data = mtcars)
library(broom)
tidy(fit)

term,estimate,std.error,statistic,p.value
<chr>,<dbl>,<dbl>,<dbl>,<dbl>
(Intercept),37.22727012,1.59878754,23.284689,2.565459e-20
wt,-3.87783074,0.63273349,-6.128695,1.119647e-06
hp,-0.03177295,0.00902971,-3.518712,0.001451229


复现一下标准误，先计算样本方差，注意分母部分：
$$\frac{e^2}{n - K}$$

In [3]:
# 样本方差
s2 <- sum(fit$residuals^2) / (length(fit$residuals) - length(fit$coefficients))
s2

计算协方差矩阵：

In [5]:
# 计算协方差矩阵
X <- mtcars[, c("wt", "hp")]
X <- cbind(1, X)  # 添加截距项
X <- as.matrix(X)
cov_matrix <- s2 * solve(t(X) %*% X)
cov_matrix

Unnamed: 0,1,wt,hp
1,2.5561215917,-0.73594515,0.0001484701
wt,-0.7359451464,0.40035167,-0.00376369
hp,0.0001484701,-0.00376369,8.153566e-05


和R函数结果比较一下：

In [6]:
vcov(fit)

Unnamed: 0,(Intercept),wt,hp
(Intercept),2.5561215917,-0.73594515,0.0001484701
wt,-0.7359451464,0.40035167,-0.00376369
hp,0.0001484701,-0.00376369,8.153566e-05


我们使用标准误是协方差矩阵对角线元素的平方根：

In [18]:
# 提取对角线元素
sqrt(diag(cov_matrix))

完美，结果和`lm`的结果一模一样。

# 异方差稳健标准误

In [19]:
library(sandwich)
library(lmtest)

fit <- lm(mpg ~ wt + hp, data = mtcars)

# 计算稳健标准误
coeftest(fit, vcov = vcovHC(fit, type = "HC1"))


t test of coefficients:

              Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 37.2272701  2.0367350 18.2779 < 2.2e-16 ***
wt          -3.8778307  0.6512038 -5.9549 1.803e-06 ***
hp          -0.0317729  0.0069814 -4.5511 8.815e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


# 刀切法

In [None]:
jackknife_lm <- function(formula, data) {
  n <- nrow(data)
  coef_i <- matrix(NA, n, length(coef(lm(formula, data = data))))
  
  for (i in 1:n) {
    data_i <- data[-i, ]
    fit_i <- lm(formula, data = data_i)
    coef_i[i, ] <- coef(fit_i)
  }

  coef_bar <- colMeans(coef_i)
  se_jack <- sqrt((n - 1) / n * colSums((coef_i - coef_bar)^2))

  return(se_jack)
}

# 示例
data(mtcars)
jackknife_lm(mpg ~ wt + hp, data = mtcars)

# 异方差自相关稳健标准误

In [3]:
# 测试异方差自相关稳健标准误
library(sandwich)
library(lmtest)
fit <- lm(mpg ~ wt + hp, data = mtcars)
coeftest(fit, vcov = vcovHAC(fit))


t test of coefficients:

              Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 37.2272701  1.8705542 19.9017 < 2.2e-16 ***
wt          -3.8778307  0.6092956 -6.3644 5.890e-07 ***
hp          -0.0317729  0.0069295 -4.5852 8.022e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


# 聚类数据

## 聚类稳健标准误

In [4]:
# 加载必要的包
library(sandwich)
library(lmtest)

# 拟合线性模型
model <- lm(mpg ~ hp + wt, data = mtcars)

# 使用 cyl 作为聚类变量，计算聚类稳健标准误
cl_vcov <- vcovCL(model, cluster = ~ cyl)

# 使用 coeftest 输出带聚类标准误的回归结果
coeftest(model, vcov = cl_vcov)


t test of coefficients:

              Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 37.2272701  3.0612294 12.1609 6.552e-13 ***
hp          -0.0317729  0.0052248 -6.0812 1.275e-06 ***
wt          -3.8778307  0.6998809 -5.5407 5.652e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


## 面板聚类稳健标准误

In [6]:
# 加载必要的包
library(sandwich)
library(lmtest)
library(dplyr)

# 模拟面板数据
set.seed(42)
n_firms <- 100
n_years <- 10
n <- n_firms * n_years

firm_id <- rep(1:n_firms, each = n_years)
year <- rep(2001:2010, times = n_firms)

# 生成协变量
sales <- rnorm(n, mean = 100, sd = 20)
capital <- rnorm(n, mean = 50, sd = 10)

# 模拟 firm 固定效应 + 时间序列相关性
firm_effect <- rnorm(n_firms, 0, 5)
time_effect <- arima.sim(model = list(ar = 0.5), n = n_years * n_firms, sd = 3)
firm_random <- firm_effect[firm_id]

# 构造响应变量
investment <- 10 + 0.6 * sales + 0.4 * capital + firm_random + time_effect

# 构建数据框
panel_data <- data.frame(
  investment, sales, capital,
  firm_id = factor(firm_id),
  year = year
)

# 拟合线性模型
model <- lm(investment ~ sales + capital, data = panel_data)

# 使用 vcovPL() 计算面板稳健协方差（Driscoll-Kraay 类型）
vcov_pl <- vcovPL(
  model,
  cluster = ~ firm_id,
  order.by = ~ year,
  kernel = "Bartlett",
  lag = "NW1987",  # 自动选择带宽
  adjust = TRUE
)

# 输出带面板协方差的回归结果
coeftest(model, vcov = vcov_pl)



t test of coefficients:

             Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 12.104409   0.950206  12.739 < 2.2e-16 ***
sales        0.596958   0.008412  70.965 < 2.2e-16 ***
capital      0.361147   0.014060  25.687 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


In [7]:
# 加载必要的包
library(sandwich)
library(lmtest)
library(dplyr)

# 模拟面板数据
set.seed(2025)
n_countries <- 60
n_years <- 5
n <- n_countries * n_years

country <- rep(1:n_countries, each = n_years)
year <- rep(2001:2005, times = n_countries)

# 生成协变量
investment <- rnorm(n, mean = 25, sd = 5)
education <- rnorm(n, mean = 15, sd = 3)

# 模拟国家固定效应 + 横截面相关性
country_effect <- rnorm(n_countries, 0, 2)
growth_noise <- matrix(rnorm(n_years * n_countries, 0, 1), nrow = n_years)
growth_noise <- t(apply(growth_noise, 1, function(x) x + rnorm(1, 0, 1)))  # 加入时间共同冲击

# 构造响应变量
growth <- 2 + 0.5 * investment + 0.3 * education +
  country_effect[country] + as.vector(growth_noise)

# 构建数据框
panel_data <- data.frame(
  growth, investment, education,
  country = factor(country),
  year = year
)

# 拟合线性模型
model <- lm(growth ~ investment + education, data = panel_data)

# 使用 vcovPC() 计算面板修正协方差矩阵
vcov_pc <- vcovPC(
  model,
  cluster = ~ country + year,  # 指定面板结构
  pairwise = TRUE              # 使用 pairwise 平衡样本处理非平衡性
)

# 输出带 PCSE 的回归结果
coeftest(model, vcov = vcov_pc)



t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.008889   0.837810  1.2042   0.2295    
investment  0.504226   0.022562 22.3486   <2e-16 ***
education   0.347335   0.039100  8.8832   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


## Bootstrap

In [9]:
# 加载必要的包
library(sandwich)
library(lmtest)
library(dplyr)

# 模拟数据
set.seed(2025)
n_firms <- 40
n_years <- 6
n <- n_firms * n_years

firm_id <- rep(1:n_firms, each = n_years)
year <- rep(2010:2015, times = n_firms)

# 生成协变量
treatment <- rbinom(n, 1, 0.5)
capital <- rnorm(n, mean = 100, sd = 15)
firm_effect <- rnorm(n_firms, 0, 5)
error <- rnorm(n, 0, 5)

# 构造响应变量
investment <- 50 + 4 * treatment + 0.3 * capital + firm_effect[firm_id] + error

# 构建数据框
panel_data <- data.frame(
  investment, treatment, capital,
  firm_id = factor(firm_id),
  year = year
)

# 拟合线性模型
model <- lm(investment ~ treatment + capital, data = panel_data)

# 使用 vcovBS() 进行 bootstrap 聚类协方差估计
set.seed(123)
vcov_bs <- vcovBS(
  model,
  cluster = ~ firm_id,  # 按公司聚类
  R = 999,              # bootstrap 重抽样次数
  fix = TRUE,            # 保证协方差矩阵为半正定
  type = "wild"          # 使用 wild bootstrap
)

# 输出带 bootstrap 标准误的回归结果
coeftest(model, vcov = vcov_bs)



t test of coefficients:

             Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 50.587520   2.941899 17.1955 < 2.2e-16 ***
treatment    3.466906   0.933071  3.7156 0.0002529 ***
capital      0.298937   0.027282 10.9574 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
