Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimation results - Comparison with R #13

Open
paulofelipe opened this issue May 5, 2021 · 2 comments
Open

Estimation results - Comparison with R #13

paulofelipe opened this issue May 5, 2021 · 2 comments

Comments

@paulofelipe
Copy link

paulofelipe commented May 5, 2021

Hi, Peter!

First of all, I would like to thank you for making this package available.

Some time ago I wrote a system solution to replicate the examples in the book on Structural Gravity in R. I recently discovered the existence of the library you wrote. So I was trying to replicate your example on R.

However, I am finding some differences in the first step (estimation). I tested using three different functions in R and the three generate slightly different results than those generated in your package.

One detail I noticed is that your estimate uses 870 observations, while the database has 900 observations. It was not clear to me at what point these observations are eliminated.

Below is the code I am using. Thanks!

library(readr)
#> Warning: package 'readr' was built under R version 4.0.3
library(fixest)
#> Warning: package 'fixest' was built under R version 4.0.5
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(estimatr)
#> Warning: package 'estimatr' was built under R version 4.0.3

data <- read_csv("https://gist.githubusercontent.com/peter-herman/13b056e52105008c53faa482db67ed4a/raw/83898713b8c695fc4c293eaa78eaf44f8e880a85/sample_gravity_data.csv")
#> 
#> -- Column specification --------------------------------------------------------
#> cols(
#>   exporter = col_character(),
#>   importer = col_character(),
#>   year = col_double(),
#>   trade = col_double(),
#>   Y = col_double(),
#>   E = col_double(),
#>   pta = col_double(),
#>   contiguity = col_double(),
#>   common_language = col_double(),
#>   lndist = col_double(),
#>   international = col_double()
#> )

head(data)
#> # A tibble: 6 x 11
#>   exporter importer  year trade      Y      E   pta contiguity common_language
#>   <chr>    <chr>    <dbl> <dbl>  <dbl>  <dbl> <dbl>      <dbl>           <dbl>
#> 1 GBR      AUS       2006  4310 9.26e5 362227     0          0               1
#> 2 FIN      AUS       2006   514 1.43e5 362227     0          0               0
#> 3 USA      AUS       2006 16619 5.02e6 362227     1          0               1
#> 4 IND      AUS       2006   763 5.49e5 362227     0          0               1
#> 5 SGP      AUS       2006  8756 3.30e5 362227     1          0               1
#> 6 CHE      AUS       2006  1318 3.81e5 362227     0          0               1
#> # ... with 2 more variables: lndist <dbl>, international <dbl>

fit <- fixest::fepois(
  trade ~ pta + contiguity + common_language + lndist + international |
    exporter + importer,
  data = data
)

summary(fit, se = "hetero")
#> Poisson estimation, Dep. Var.: trade
#> Observations: 900 
#> Fixed-effects: exporter: 30,  importer: 30
#> Standard-errors: Heteroskedasticity-robust 
#>                  Estimate Std. Error    z value  Pr(>|z|)    
#> pta              0.471138   0.111639   4.220200   2.4e-05 ***
#> contiguity       0.891577   0.137645   6.477400  9.33e-11 ***
#> common_language  0.032625   0.087180   0.374225  0.708237    
#> lndist          -0.389862   0.075690  -5.150800  2.59e-07 ***
#> international   -3.412600   0.223079 -15.298000 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -1,648,634.3   Adj. Pseudo R2: 0.975414
#>            BIC:  3,297,703.1     Squared Cor.: 0.994273

fit_alpaca <- alpaca::feglm(
   trade ~ pta + contiguity + common_language + lndist + international |
    exporter + importer,
  data = data,
  family = poisson(),
  control = list(dev.tol = 1e-10)
)

summary(fit_alpaca, type = "sandwich") 
#> poisson - log link
#> 
#> trade ~ pta + contiguity + common_language + lndist + international | 
#>     exporter + importer
#> 
#> Estimates:
#>                 Estimate Std. error z value Pr(> |z|)    
#> pta              0.47114    0.10760   4.379  1.19e-05 ***
#> contiguity       0.89158    0.13266   6.721  1.81e-11 ***
#> common_language  0.03262    0.08402   0.388     0.698    
#> lndist          -0.38986    0.07295  -5.344  9.09e-08 ***
#> international   -3.41258    0.21501 -15.872   < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> residual deviance= 3288638.61,
#> null deviance= 134106681.53,
#> n= 900, l= [30, 30]
#> 
#> Number of Fisher Scoring Iterations: 8

fit_glm <- glm(
  trade ~ -1 + pta + contiguity + common_language + lndist + international +
    exporter + importer,
  data = data,
  family = quasipoisson()
)

lmtest::coeftest(
  fit_glm,
  vcov. = sandwich::vcovHC(fit_glm, type = "HC1")
)[1:5,]
#>                    Estimate Std. Error     z value     Pr(>|z|)
#> pta              0.47113830 0.11164055   4.2201359 2.441551e-05
#> contiguity       0.89157690 0.13764612   6.4773123 9.337080e-11
#> common_language  0.03262497 0.08718028   0.3742242 7.082376e-01
#> lndist          -0.38986231 0.07569223  -5.1506252 2.596195e-07
#> international   -3.41258441 0.22308201 -15.2974433 7.951958e-53

Created on 2021-05-05 by the reprex package (v2.0.0)

@peter-herman
Copy link
Owner

Hi Paulo,

Thanks for reaching out about this. I spent sometime looking into this issue and have identified the problem.

It is arising as a result of the pre-estimation diagnostics that the gme package does as part of the estimate() method. Part of that routine appears to not have been as robust as intended because it is failing here. The issue is that after it flags one fixed effect as collinear ('importer_ZAF' in this case because there is a full set of 2*N fixed effects and it is alphabetically the last one), it is incorrectly determining that there is insufficient variation in ZAF imports. The problem is that this second determination is erroneously triggering because there are NO zero-valued trade flows when it should only trigger when they are ALL zero-values. This was simply an oversight in how it was coded. The result is that the 30 observations with ZAF as the importer get dropped.

I touched base with a colleague of mine that wrote that routine in gme and it has apparently already been fixed but we have not yet released the update. I'm going to try to see that that update gets released soon, which should resolve the issue.

@paulofelipe
Copy link
Author

Hi Peter,

Thanks for your reply Based on this information, I can continue my tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants