<a href="https://colab.research.google.com/github/khbae/trading/blob/master/Petersen_Jupyter_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Petersen - Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches (prepared by Jinkyu Kim)


Objectives : provide standard error codes
-------------
If you use wrong standard error, you have more probability of both rejecting your hypotheses and your paper being rejected by referees. **:(** Let's use proper standard error!!!

Provided codes
----------
OLS, White Error, Newey-West, Pooled OLS (same as OLS), Clustered by Firm, Clustered by Time (R default), Clustered by Time (STATA default), Clustered by Firm and Time (STATA default), Fama and Macbeth

When to use each code?
----------
**FIRM EFFECT**: USE Std. Error **Clustered by FIRM**, or if you sure your firm effect is permanent, FE, RE (I don't provide here, if you need, just search on google) is okay

**TIME EFFECT**: USE **Fama MacBeth**, or if T is sufficient, Std. Error clustered by Time is okay.

**FIRM & TIME EFFECT**: if N,T is sufficient, **Double Clustering**, if not, consider using combination of **Time Dummy + Std. Error Clustered by FIRM**




In [0]:
rm(list=ls())

#LIBRARY
library(sandwich); library(plm); library(lmtest)

#DATA Reading from PETERSEN website
mydat<-read.table(
  "http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt",
  col.names=c("firm", "year","x", "y"))

In [0]:

#OLS
  ols = lm(y~x, data=mydat)
  result = t(as.data.frame(summary(ols)$coefficients[2,1:3]))
  row.names(result) = c("ols")

#OLS with White
  white = coeftest(ols, vcov = function(x) vcovHC(x, method="white1", type="HC1"))
  result = rbind(result, white[2,c("Estimate", "Std. Error", "t value")])
  row.names(result)[2] = c("white")
  
#OLS with Newey-West 
  newey = coeftest(ols, vcov = NeweyWest(ols))
  result = rbind(result, newey[2,c("Estimate", "Std. Error", "t value")])
  row.names(result)[3] = c("newey")
  
#OLS clustered by Firm or Year 
  p.ols = plm(y~x, model="pooling", index=c("firm", "year"), data=mydat)
  result=rbind(result, summary(p.ols)$coefficients[2 ,c("Estimate", "Std. Error","t-value")])
  row.names(result)[4] = c("p.ols")
  
  cluster.firm = coeftest(p.ols, vcov = function(x) vcovHC(x, cluster="group", type="HC1"))
  result = rbind(result, cluster.firm[2,c("Estimate", "Std. Error", "t value")])
  row.names(result)[5] = c("C.Firm")
  
  #Cluster by Time - R Default 
  cluster.time = coeftest(p.ols, vcov = function(x) vcovHC(x, cluster="time", type="HC1")) #Different Result!!!
  result = rbind(result, cluster.time[2,c("Estimate", "Std. Error", "t value")])
  row.names(result)[6] = c("C.Time.R")

  #Cluster by Time - STATA Default
  cluster.time = coeftest(p.ols, vcov = function(x) vcovHC(x, method=c("arellano"), type=c("sss"),cluster = c("time"))) #Different Result!!!
  result = rbind(result, cluster.time[2,c("Estimate", "Std. Error", "t value")])
  row.names(result)[7] = c("C.Time.Stata")
  
  #OLS clustered by Firm and Year - STATA Default
  vcovDC = function(x, ...){
    vcovHC(x, cluster="group", ...) + 
      vcovHC(x, method=c("arellano"), type=c("sss"),cluster = c("time"), ...) - 
      vcovHC(x, method="white1", ...)
  }
  
  cluster.double = coeftest(p.ols, vcov = function(x) vcovDC(x)) 
  result = rbind(result, cluster.double[2,c("Estimate", "Std. Error", "t value")])
  row.names(result)[8] = c("C.Double")
  
  #Fama-Macbeth
  fmb = pmg(y~x, mydat, index=c("year","firm"))
  FMB = coeftest(fmb)
  result = rbind(result, FMB[2,c("Estimate", "Std. Error", "t value")])
  row.names(result)[9] = c("FMB") 
  
  round(result, 4)

Unnamed: 0,Estimate,Std. Error,t value
ols,1.0348,0.0286,36.2041
white,1.0348,0.0284,36.444
newey,1.0348,0.0482,21.4696
p.ols,1.0348,0.0286,36.2041
C.Firm,1.0348,0.0506,20.4714
C.Time.R,1.0348,0.0317,32.6666
C.Time.Stata,1.0348,0.0334,30.9933
C.Double,1.0348,0.0535,19.3396
FMB,1.0356,0.0333,31.0599


You can compare the results to Petersen's website.
http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.htm

Standard Errors are same at least at 3 to 4 decimal points.

If you want to see full results, just eneter the variable name, such as 
**ols, wheite, newey, p.ols, cluster.firm, cluster.time, cluster.double, FMB**


In [0]:
summary(ols)


Call:
lm(formula = y ~ x, data = mydat)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.7611 -1.3680 -0.0166  1.3387  8.6779 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.02968    0.02836   1.047    0.295    
x            1.03483    0.02858  36.204   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.005 on 4998 degrees of freedom
Multiple R-squared:  0.2078,	Adjusted R-squared:  0.2076 
F-statistic:  1311 on 1 and 4998 DF,  p-value: < 2.2e-16


In [0]:
white


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.028361  1.0465   0.2954    
x           1.034833   0.028395 36.4440   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


In [0]:
newey


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.02968    0.06602  0.4496    0.653    
x            1.03483    0.04820 21.4696   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


In [0]:
cluster.firm


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


In [0]:
cluster.time


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.023387  1.2691   0.2045    
x           1.034833   0.033389 30.9933   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


In [0]:
cluster.double


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.064990  0.4567   0.6479    
x           1.034833   0.053508 19.3396   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


In [0]:
FMB


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.031278   0.023356  1.3392   0.1806    
x           1.035586   0.033342 31.0599   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


In [0]:
Contact Info: Jinkyu Kim, Business School, Hanyang Univ. email:jkyu126@gmail.com