Estimation results - Comparison with R #13

paulofelipe · 2021-05-05T13:44:36Z

Hi, Peter!

First of all, I would like to thank you for making this package available.

Some time ago I wrote a system solution to replicate the examples in the book on Structural Gravity in R. I recently discovered the existence of the library you wrote. So I was trying to replicate your example on R.

However, I am finding some differences in the first step (estimation). I tested using three different functions in R and the three generate slightly different results than those generated in your package.

One detail I noticed is that your estimate uses 870 observations, while the database has 900 observations. It was not clear to me at what point these observations are eliminated.

Below is the code I am using. Thanks!

library(readr)
#> Warning: package 'readr' was built under R version 4.0.3
library(fixest)
#> Warning: package 'fixest' was built under R version 4.0.5
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(estimatr)
#> Warning: package 'estimatr' was built under R version 4.0.3

data <- read_csv("https://gist.githubusercontent.com/peter-herman/13b056e52105008c53faa482db67ed4a/raw/83898713b8c695fc4c293eaa78eaf44f8e880a85/sample_gravity_data.csv")
#> 
#> -- Column specification --------------------------------------------------------
#> cols(
#>   exporter = col_character(),
#>   importer = col_character(),
#>   year = col_double(),
#>   trade = col_double(),
#>   Y = col_double(),
#>   E = col_double(),
#>   pta = col_double(),
#>   contiguity = col_double(),
#>   common_language = col_double(),
#>   lndist = col_double(),
#>   international = col_double()
#> )

head(data)
#> # A tibble: 6 x 11
#>   exporter importer  year trade      Y      E   pta contiguity common_language
#>   <chr>    <chr>    <dbl> <dbl>  <dbl>  <dbl> <dbl>      <dbl>           <dbl>
#> 1 GBR      AUS       2006  4310 9.26e5 362227     0          0               1
#> 2 FIN      AUS       2006   514 1.43e5 362227     0          0               0
#> 3 USA      AUS       2006 16619 5.02e6 362227     1          0               1
#> 4 IND      AUS       2006   763 5.49e5 362227     0          0               1
#> 5 SGP      AUS       2006  8756 3.30e5 362227     1          0               1
#> 6 CHE      AUS       2006  1318 3.81e5 362227     0          0               1
#> # ... with 2 more variables: lndist <dbl>, international <dbl>

fit <- fixest::fepois(
  trade ~ pta + contiguity + common_language + lndist + international |
    exporter + importer,
  data = data
)

summary(fit, se = "hetero")
#> Poisson estimation, Dep. Var.: trade
#> Observations: 900 
#> Fixed-effects: exporter: 30,  importer: 30
#> Standard-errors: Heteroskedasticity-robust 
#>                  Estimate Std. Error    z value  Pr(>|z|)    
#> pta              0.471138   0.111639   4.220200   2.4e-05 ***
#> contiguity       0.891577   0.137645   6.477400  9.33e-11 ***
#> common_language  0.032625   0.087180   0.374225  0.708237    
#> lndist          -0.389862   0.075690  -5.150800  2.59e-07 ***
#> international   -3.412600   0.223079 -15.298000 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -1,648,634.3   Adj. Pseudo R2: 0.975414
#>            BIC:  3,297,703.1     Squared Cor.: 0.994273

fit_alpaca <- alpaca::feglm(
   trade ~ pta + contiguity + common_language + lndist + international |
    exporter + importer,
  data = data,
  family = poisson(),
  control = list(dev.tol = 1e-10)
)

summary(fit_alpaca, type = "sandwich") 
#> poisson - log link
#> 
#> trade ~ pta + contiguity + common_language + lndist + international | 
#>     exporter + importer
#> 
#> Estimates:
#>                 Estimate Std. error z value Pr(> |z|)    
#> pta              0.47114    0.10760   4.379  1.19e-05 ***
#> contiguity       0.89158    0.13266   6.721  1.81e-11 ***
#> common_language  0.03262    0.08402   0.388     0.698    
#> lndist          -0.38986    0.07295  -5.344  9.09e-08 ***
#> international   -3.41258    0.21501 -15.872   < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> residual deviance= 3288638.61,
#> null deviance= 134106681.53,
#> n= 900, l= [30, 30]
#> 
#> Number of Fisher Scoring Iterations: 8

fit_glm <- glm(
  trade ~ -1 + pta + contiguity + common_language + lndist + international +
    exporter + importer,
  data = data,
  family = quasipoisson()
)

lmtest::coeftest(
  fit_glm,
  vcov. = sandwich::vcovHC(fit_glm, type = "HC1")
)[1:5,]
#>                    Estimate Std. Error     z value     Pr(>|z|)
#> pta              0.47113830 0.11164055   4.2201359 2.441551e-05
#> contiguity       0.89157690 0.13764612   6.4773123 9.337080e-11
#> common_language  0.03262497 0.08718028   0.3742242 7.082376e-01
#> lndist          -0.38986231 0.07569223  -5.1506252 2.596195e-07
#> international   -3.41258441 0.22308201 -15.2974433 7.951958e-53

^{Created on 2021-05-05 by the reprex package (v2.0.0)}

peter-herman · 2021-05-06T18:35:25Z

Hi Paulo,

Thanks for reaching out about this. I spent sometime looking into this issue and have identified the problem.

It is arising as a result of the pre-estimation diagnostics that the gme package does as part of the estimate() method. Part of that routine appears to not have been as robust as intended because it is failing here. The issue is that after it flags one fixed effect as collinear ('importer_ZAF' in this case because there is a full set of 2*N fixed effects and it is alphabetically the last one), it is incorrectly determining that there is insufficient variation in ZAF imports. The problem is that this second determination is erroneously triggering because there are NO zero-valued trade flows when it should only trigger when they are ALL zero-values. This was simply an oversight in how it was coded. The result is that the 30 observations with ZAF as the importer get dropped.

I touched base with a colleague of mine that wrote that routine in gme and it has apparently already been fixed but we have not yet released the update. I'm going to try to see that that update gets released soon, which should resolve the issue.

paulofelipe · 2021-05-07T16:30:42Z

Hi Peter,

Thanks for your reply Based on this information, I can continue my tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimation results - Comparison with R #13

Estimation results - Comparison with R #13

paulofelipe commented May 5, 2021 •

edited

Loading

peter-herman commented May 6, 2021

paulofelipe commented May 7, 2021

Estimation results - Comparison with R #13

Estimation results - Comparison with R #13

Comments

paulofelipe commented May 5, 2021 • edited Loading

peter-herman commented May 6, 2021

paulofelipe commented May 7, 2021

paulofelipe commented May 5, 2021 •

edited

Loading