Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected and inconsists reference level in felm #32

Open
Oravishayrizi opened this issue Jul 29, 2020 · 1 comment
Open

unexpected and inconsists reference level in felm #32

Oravishayrizi opened this issue Jul 29, 2020 · 1 comment

Comments

@Oravishayrizi
Copy link

felm drop the reference level in an unexpected way, specifically, not like lm

I made a reprex to illustrate it.
Is it a bug or a is it the expected behavior and there is something wrong with my intuition?



# Data from http://www.princeton.edu/~otorres/DID101R.pdf

library(foreign)
library(lfe)
#> Loading required package: Matrix
library(forcats)

mydata = read.dta("http://dss.princeton.edu/training/Panel101.dta")

mydata$time = ifelse(mydata$year >= 1994, 1, 0)
mydata$treated = ifelse(mydata$country %in% c("E","F","G"), 1, 0)
mydata$year.f<-as.factor(mydata$year)
#leads-and-lags
summary(felm(y~country+year.f:factor(treated),data=mydata)) # Different years are being ommited
#> Warning in chol.default(mat, pivot = TRUE, tol = tol): the matrix is either
#> rank-deficient or indefinite

#> Warning in chol.default(mat, pivot = TRUE, tol = tol): the matrix is either
#> rank-deficient or indefinite
#> 
#> Call:
#>    felm(formula = y ~ country + year.f:factor(treated), data = mydata) 
#> 
#> Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -5.797e+09 -1.374e+09  1.027e+08  1.313e+09  4.675e+09 
#> 
#> Coefficients:
#>                               Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)                  3.191e+09  1.446e+09   2.206   0.0325 *
#> countryB                    -1.514e+09  1.135e+09  -1.334   0.1888  
#> countryC                    -3.835e+08  1.135e+09  -0.338   0.7370  
#> countryD                     1.912e+09  1.135e+09   1.685   0.0988 .
#> countryE                    -1.167e+09  2.160e+09  -0.540   0.5916  
#> countryF                     1.669e+09  2.160e+09   0.773   0.4439  
#> countryG                    -1.845e+08  2.160e+09  -0.085   0.9323  
#> year.f1990:factor(treated)0 -4.195e+09  1.794e+09  -2.338   0.0239 *
#> year.f1991:factor(treated)0 -3.192e+09  1.794e+09  -1.779   0.0820 .
#> year.f1992:factor(treated)0 -3.767e+09  1.794e+09  -2.100   0.0414 *
#> year.f1993:factor(treated)0 -1.918e+08  1.794e+09  -0.107   0.9153  
#> year.f1994:factor(treated)0  4.208e+08  1.794e+09   0.235   0.8156  
#> year.f1995:factor(treated)0  1.019e+09  1.794e+09   0.568   0.5729  
#> year.f1996:factor(treated)0 -7.829e+08  1.794e+09  -0.436   0.6646  
#> year.f1997:factor(treated)0         NA         NA      NA       NA  
#> year.f1998:factor(treated)0 -1.120e+09  1.794e+09  -0.624   0.5358  
#> year.f1999:factor(treated)0 -2.819e+09  1.794e+09  -1.571   0.1231  
#> year.f1990:factor(treated)1 -1.954e+09  2.072e+09  -0.943   0.3507  
#> year.f1991:factor(treated)1 -1.851e+09  2.072e+09  -0.893   0.3764  
#> year.f1992:factor(treated)1 -1.016e+09  2.072e+09  -0.490   0.6263  
#> year.f1993:factor(treated)1  1.705e+08  2.072e+09   0.082   0.9348  
#> year.f1994:factor(treated)1  4.295e+08  2.072e+09   0.207   0.8367  
#> year.f1995:factor(treated)1 -5.313e+09  2.072e+09  -2.564   0.0137 *
#> year.f1996:factor(treated)1 -8.724e+08  2.072e+09  -0.421   0.6756  
#> year.f1997:factor(treated)1  1.136e+09  2.072e+09   0.548   0.5863  
#> year.f1998:factor(treated)1 -3.735e+09  2.072e+09  -1.803   0.0781 .
#> year.f1999:factor(treated)1         NA         NA      NA       NA  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.537e+09 on 45 degrees of freedom
#> Multiple R-squared(full model): 0.5382   Adjusted R-squared: 0.2919 
#> Multiple R-squared(proj model): 0.5382   Adjusted R-squared: 0.2919 
#> F-statistic(full model):2.185 on 24 and 45 DF, p-value: 0.01165 
#> F-statistic(proj model): 2.017 on 26 and 45 DF, p-value: 0.01901
summary(lm(y~country+year.f:factor(treated),data=mydata)) #1999- last level is baing ommited
#> 
#> Call:
#> lm(formula = y ~ country + year.f:factor(treated), data = mydata)
#> 
#> Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -5.797e+09 -1.374e+09  1.027e+08  1.313e+09  4.675e+09 
#> 
#> Coefficients: (2 not defined because of singularities)
#>                               Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)                  3.720e+08  1.446e+09   0.257   0.7982  
#> countryB                    -1.514e+09  1.135e+09  -1.334   0.1888  
#> countryC                    -3.835e+08  1.135e+09  -0.338   0.7370  
#> countryD                     1.912e+09  1.135e+09   1.685   0.0988 .
#> countryE                     1.651e+09  2.160e+09   0.764   0.4486  
#> countryF                     4.488e+09  2.160e+09   2.077   0.0435 *
#> countryG                     2.634e+09  2.160e+09   1.219   0.2290  
#> year.f1990:factor(treated)0 -1.376e+09  1.794e+09  -0.767   0.4471  
#> year.f1991:factor(treated)0 -3.731e+08  1.794e+09  -0.208   0.8362  
#> year.f1992:factor(treated)0 -9.482e+08  1.794e+09  -0.529   0.5997  
#> year.f1993:factor(treated)0  2.627e+09  1.794e+09   1.464   0.1501  
#> year.f1994:factor(treated)0  3.240e+09  1.794e+09   1.806   0.0776 .
#> year.f1995:factor(treated)0  3.838e+09  1.794e+09   2.139   0.0379 *
#> year.f1996:factor(treated)0  2.036e+09  1.794e+09   1.135   0.2625  
#> year.f1997:factor(treated)0  2.819e+09  1.794e+09   1.571   0.1231  
#> year.f1998:factor(treated)0  1.699e+09  1.794e+09   0.947   0.3486  
#> year.f1999:factor(treated)0         NA         NA      NA       NA  
#> year.f1990:factor(treated)1 -1.954e+09  2.072e+09  -0.943   0.3507  
#> year.f1991:factor(treated)1 -1.851e+09  2.072e+09  -0.893   0.3764  
#> year.f1992:factor(treated)1 -1.016e+09  2.072e+09  -0.490   0.6263  
#> year.f1993:factor(treated)1  1.705e+08  2.072e+09   0.082   0.9348  
#> year.f1994:factor(treated)1  4.295e+08  2.072e+09   0.207   0.8367  
#> year.f1995:factor(treated)1 -5.313e+09  2.072e+09  -2.564   0.0137 *
#> year.f1996:factor(treated)1 -8.724e+08  2.072e+09  -0.421   0.6756  
#> year.f1997:factor(treated)1  1.136e+09  2.072e+09   0.548   0.5863  
#> year.f1998:factor(treated)1 -3.735e+09  2.072e+09  -1.803   0.0781 .
#> year.f1999:factor(treated)1         NA         NA      NA       NA  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.537e+09 on 45 degrees of freedom
#> Multiple R-squared:  0.5382, Adjusted R-squared:  0.2919 
#> F-statistic: 2.185 on 24 and 45 DF,  p-value: 0.01165

mydata$year.f<-fct_relevel(mydata$year.f,"1993",after = Inf)
summary(lm(y~country+year.f:factor(treated),data=mydata)) #1999- last level is baing ommited
#> 
#> Call:
#> lm(formula = y ~ country + year.f:factor(treated), data = mydata)
#> 
#> Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -5.797e+09 -1.374e+09  1.027e+08  1.313e+09  4.675e+09 
#> 
#> Coefficients: (2 not defined because of singularities)
#>                               Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)                  2.999e+09  1.446e+09   2.073   0.0439 *
#> countryB                    -1.514e+09  1.135e+09  -1.334   0.1888  
#> countryC                    -3.835e+08  1.135e+09  -0.338   0.7370  
#> countryD                     1.912e+09  1.135e+09   1.685   0.0988 .
#> countryE                    -8.051e+08  2.160e+09  -0.373   0.7111  
#> countryF                     2.031e+09  2.160e+09   0.940   0.3521  
#> countryG                     1.778e+08  2.160e+09   0.082   0.9348  
#> year.f1990:factor(treated)0 -4.003e+09  1.794e+09  -2.231   0.0307 *
#> year.f1991:factor(treated)0 -3.000e+09  1.794e+09  -1.672   0.1014  
#> year.f1992:factor(treated)0 -3.575e+09  1.794e+09  -1.993   0.0524 .
#> year.f1994:factor(treated)0  6.126e+08  1.794e+09   0.341   0.7344  
#> year.f1995:factor(treated)0  1.211e+09  1.794e+09   0.675   0.5032  
#> year.f1996:factor(treated)0 -5.911e+08  1.794e+09  -0.329   0.7433  
#> year.f1997:factor(treated)0  1.918e+08  1.794e+09   0.107   0.9153  
#> year.f1998:factor(treated)0 -9.278e+08  1.794e+09  -0.517   0.6076  
#> year.f1999:factor(treated)0 -2.627e+09  1.794e+09  -1.464   0.1501  
#> year.f1993:factor(treated)0         NA         NA      NA       NA  
#> year.f1990:factor(treated)1 -2.124e+09  2.072e+09  -1.025   0.3107  
#> year.f1991:factor(treated)1 -2.021e+09  2.072e+09  -0.976   0.3344  
#> year.f1992:factor(treated)1 -1.186e+09  2.072e+09  -0.573   0.5698  
#> year.f1994:factor(treated)1  2.591e+08  2.072e+09   0.125   0.9010  
#> year.f1995:factor(treated)1 -5.483e+09  2.072e+09  -2.647   0.0112 *
#> year.f1996:factor(treated)1 -1.043e+09  2.072e+09  -0.503   0.6171  
#> year.f1997:factor(treated)1  9.652e+08  2.072e+09   0.466   0.6435  
#> year.f1998:factor(treated)1 -3.905e+09  2.072e+09  -1.885   0.0659 .
#> year.f1999:factor(treated)1 -1.705e+08  2.072e+09  -0.082   0.9348  
#> year.f1993:factor(treated)1         NA         NA      NA       NA  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.537e+09 on 45 degrees of freedom
#> Multiple R-squared:  0.5382, Adjusted R-squared:  0.2919 
#> F-statistic: 2.185 on 24 and 45 DF,  p-value: 0.01165


summary(felm(y~factor(year):factor(treated)|country,data=mydata)) # The same year, but mot the one I want
#> Warning in chol.default(mat, pivot = TRUE, tol = tol): the matrix is either
#> rank-deficient or indefinite

#> Warning in chol.default(mat, pivot = TRUE, tol = tol): the matrix is either
#> rank-deficient or indefinite
#> 
#> Call:
#>    felm(formula = y ~ factor(year):factor(treated) | country, data = mydata) 
#> 
#> Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -5.797e+09 -1.374e+09  1.027e+08  1.313e+09  4.675e+09 
#> 
#> Coefficients:
#>                                     Estimate Std. Error t value Pr(>|t|)  
#> factor(year)1990:factor(treated)0 -1.003e+09  1.794e+09  -0.559   0.5789  
#> factor(year)1991:factor(treated)0         NA         NA      NA       NA  
#> factor(year)1992:factor(treated)0 -5.751e+08  1.794e+09  -0.321   0.7500  
#> factor(year)1993:factor(treated)0  3.000e+09  1.794e+09   1.672   0.1014  
#> factor(year)1994:factor(treated)0  3.613e+09  1.794e+09   2.014   0.0500 .
#> factor(year)1995:factor(treated)0  4.211e+09  1.794e+09   2.347   0.0234 *
#> factor(year)1996:factor(treated)0  2.409e+09  1.794e+09   1.343   0.1861  
#> factor(year)1997:factor(treated)0  3.192e+09  1.794e+09   1.779   0.0820 .
#> factor(year)1998:factor(treated)0  2.072e+09  1.794e+09   1.155   0.2541  
#> factor(year)1999:factor(treated)0  3.731e+08  1.794e+09   0.208   0.8362  
#> factor(year)1990:factor(treated)1 -1.030e+08  2.072e+09  -0.050   0.9606  
#> factor(year)1991:factor(treated)1         NA         NA      NA       NA  
#> factor(year)1992:factor(treated)1  8.351e+08  2.072e+09   0.403   0.6888  
#> factor(year)1993:factor(treated)1  2.021e+09  2.072e+09   0.976   0.3344  
#> factor(year)1994:factor(treated)1  2.280e+09  2.072e+09   1.101   0.2769  
#> factor(year)1995:factor(treated)1 -3.462e+09  2.072e+09  -1.671   0.1016  
#> factor(year)1996:factor(treated)1  9.783e+08  2.072e+09   0.472   0.6390  
#> factor(year)1997:factor(treated)1  2.986e+09  2.072e+09   1.442   0.1563  
#> factor(year)1998:factor(treated)1 -1.884e+09  2.072e+09  -0.910   0.3679  
#> factor(year)1999:factor(treated)1  1.851e+09  2.072e+09   0.893   0.3764  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.537e+09 on 45 degrees of freedom
#> Multiple R-squared(full model): 0.5382   Adjusted R-squared: 0.2919 
#> Multiple R-squared(proj model): 0.4468   Adjusted R-squared: 0.1518 
#> F-statistic(full model):2.185 on 24 and 45 DF, p-value: 0.01165 
#> F-statistic(proj model): 1.817 on 20 and 45 DF, p-value: 0.0486

Created on 2020-07-29 by the reprex package (v0.3.0)

@lrberge
Copy link

lrberge commented Dec 8, 2020

Hi, that's actually a valid behavior. There's no single solution to fixing a collinear system of variables: it depends on numerical precision and algorithmic choices. It should not however affect your variables of interest (provided they're not in the collinear system of course).

See that post here that may help/clarify the issue: lrberge/fixest#48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants