# Group Fixed Effects

Suppose that you want to include fixed effects for:

- Each author in a study of article (or patent) citations
- Each board member in a study of performance of firms
- Each team member in a measure of performance of a team

To fix ideas consider a simple dataset of publications and respective citations:

In [1]:
clear all
use citations_raw, clear

In [2]:
%browse

Unnamed: 0,art_id,journal_id,citations,x,authors
1,1,4,38,-1.0965978,AutB;AutC;AutD;AutE;AutF
2,2,12,34,3.1820426,AutA;AutB;AutC;AutD
3,3,4,22,2.3190854,AutA; AutD
4,4,4,37,0.5314979,AutA;AutB
5,5,3,29,-0.12634525,AutA;AutB
6,6,3,25,-1.0660225,AutA;AutB
7,7,4,34,2.3141234,AutB
8,8,6,35,0.77014482,AutA;AutB
9,9,4,32,0.073131695,AutA;AutB
10,10,10,31,0.71518707,AutA;AutB


We can easily run a regression of citation on our explanatory variable *x* :

In [3]:
reghdfe citations x
estimates store reg_wide_yx


(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =        100
Absorbing 1 HDFE group                            F(   1,     98) =       1.82
                                                  Prob > F        =     0.1805
                                                  R-squared       =     0.0182
                                                  Adj R-squared   =     0.0082
                                                  Within R-sq.    =     0.0182
                                                  Root MSE        =     5.7103

------------------------------------------------------------------------------
   citations | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |  -.7321277   .5428297    -1.35   0.181    -1.809356    .3451002
       _cons |   30.50843   .5721567    53.32   0.000       29.373    31.64386
------

In [4]:
reghdfe citations x, a(journal)
estimates store reg_wide_yxj


(dropped 3 singleton observations)
(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =         97
Absorbing 1 HDFE group                            F(   1,     87) =       3.76
                                                  Prob > F        =     0.0556
                                                  R-squared       =     0.1136
                                                  Adj R-squared   =     0.0219
                                                  Within R-sq.    =     0.0415
                                                  Root MSE        =     5.6921

------------------------------------------------------------------------------
   citations | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |  -1.184256   .6104138    -1.94   0.056     -2.39752    .0290073
       _cons |   30.49536   .5780516    52.76   0.

But what if want to add a fixed effect for author?
Solution: create a dummy per author. With 6 authors this is easy to do.

In [5]:
local i=1
foreach au in AutA AutB AutC AutD AutE AutF {
gen aut`i'=regexm(authors,"`au'")
local ++i
}
save citations_wide, replace




file citations_wide.dta saved


In [6]:
%browse

Unnamed: 0,art_id,journal_id,citations,x,authors,_est_reg_wide_yx,_est_reg_wide_yxj,aut1,aut2,aut3,aut4,aut5,aut6
1,1,4,38,-1.0965978,AutB;AutC;AutD;AutE;AutF,1,1,0,1,1,1,1,1
2,2,12,34,3.1820426,AutA;AutB;AutC;AutD,1,0,1,1,1,1,0,0
3,3,4,22,2.3190854,AutA; AutD,1,1,1,0,0,1,0,0
4,4,4,37,0.5314979,AutA;AutB,1,1,1,1,0,0,0,0
5,5,3,29,-0.12634525,AutA;AutB,1,1,1,1,0,0,0,0
6,6,3,25,-1.0660225,AutA;AutB,1,1,1,1,0,0,0,0
7,7,4,34,2.3141234,AutB,1,1,0,1,0,0,0,0
8,8,6,35,0.77014482,AutA;AutB,1,1,1,1,0,0,0,0
9,9,4,32,0.073131695,AutA;AutB,1,1,1,1,0,0,0,0
10,10,10,31,0.71518707,AutA;AutB,1,1,1,1,0,0,0,0


We are now able to run the regression adding a fixed effect per author

In [7]:
reghdfe citations x aut1-aut6, a(journal)
estimates store reg_wide_yxja


(dropped 3 singleton observations)
(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =         97
Absorbing 1 HDFE group                            F(   7,     81) =       0.85
                                                  Prob > F        =     0.5477
                                                  R-squared       =     0.1387
                                                  Adj R-squared   =    -0.0208
                                                  Within R-sq.    =     0.0686
                                                  Root MSE        =     5.8151

------------------------------------------------------------------------------
   citations | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |  -1.339499   .6417399    -2.09   0.040     -2.61636    -.062638
        aut1 |  -.9999574   3.024211    -0.33   0.

Note: I need to explicitly introduce a dummy per author!

This works because we have a small dataset. But what if my dataset is large?

Solution: use **reghdfe** with the **group** option 

First we need to reshape the data and put it in an article-author line format (long shape)

In [8]:
reshape long aut, i(art_id) j(j)
drop if aut==0
drop aut
rename j author_id
drop authors _est*
save citations_long, replace


(j = 1 2 3 4 5 6)

Data                               Wide   ->   Long
-----------------------------------------------------------------------------
Number of observations              100   ->   600         
Number of variables                  14   ->   10          
j variable (6 values)                     ->   j
xij variables:
                     aut1 aut2 ... aut6   ->   aut
-----------------------------------------------------------------------------

(374 observations deleted)




(file citations_long.dta not found)
file citations_long.dta saved


In [9]:
%browse

Unnamed: 0,art_id,author_id,journal_id,citations,x
1,1,2,4,38,-1.0965978
2,1,3,4,38,-1.0965978
3,1,4,4,38,-1.0965978
4,1,5,4,38,-1.0965978
5,1,6,4,38,-1.0965978
6,2,1,12,34,3.1820426
7,2,2,12,34,3.1820426
8,2,3,12,34,3.1820426
9,2,4,12,34,3.1820426
10,3,1,4,22,2.3190854


Using the long dataset we can still obtain the same results as with the wide dataset we do. For example, If we want to obtain the results of
```
regress citation x
```
as if it were run in the wide data we do:

In [10]:
reghdfe citations x, group(art_id)
estimates store reg_long_yx


(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =        100
Absorbing 1 HDFE group                            F(   1,     98) =       1.82
                                                  Prob > F        =     0.1805
                                                  R-squared       =     0.0182
                                                  Adj R-squared   =     0.0082
                                                  Within R-sq.    =     0.0182
                                                  Root MSE        =     5.7103

------------------------------------------------------------------------------
   citations | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |  -.7321277   .5428297    -1.35   0.181    -1.809356    .3451002
       _cons |   30.50843   .5721567    53.32   0.000       29.373    31.64386
------

In [11]:
estimates table reg_wide_yx reg_long_yx, keep(x) b(%7.4f) se(%7.4f) stats(N r2_a)


----------------------------------
    Variable | reg_w~x   reg_l~x  
-------------+--------------------
           x | -0.7321   -0.7321  
             |  0.5428    0.5428  
-------------+--------------------
           N |     100       100  
        r2_a |  0.0082    0.0082  
----------------------------------
                      Legend: b/se


To obtain the same results as
```
reghdfe citations x, a(journal)
```
in the wide dataset we do:

In [12]:
reghdfe citations x, group(art_id) a(journal)
estimates store reg_long_yxj


(dropped 3 singleton observations)
(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =         97
Absorbing 1 HDFE group                            F(   1,     87) =       3.76
                                                  Prob > F        =     0.0556
                                                  R-squared       =     0.1136
                                                  Adj R-squared   =     0.0219
                                                  Within R-sq.    =     0.0415
                                                  Root MSE        =     5.6921

------------------------------------------------------------------------------
   citations | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |  -1.184256   .6104138    -1.94   0.056     -2.39752    .0290073
       _cons |   30.49536   .5780516    52.76   0.

In [13]:
estimates table reg_wide_yxj reg_long_yxj, keep(x) b(%7.4f) se(%7.4f) stats(N r2_a)


----------------------------------
    Variable | reg_w~j   reg_l~j  
-------------+--------------------
           x | -1.1843   -1.1843  
             |  0.6104    0.6104  
-------------+--------------------
           N |      97        97  
        r2_a |  0.0219    0.0219  
----------------------------------
                      Legend: b/se


And finally, if we want to run the equivalent to
```
reghdfe citations x aut1-aut6, a(journal)
```
in the wide dataset we can do either:

In [14]:
reghdfe citations x, group(art_id) individual(author_id) a(journal_id author_id) aggregation(sum)
estimates store reg_long_yxja


(dropped 3 singleton observations)
(MWFE estimator converged in 14 iterations)

HDFE Linear regression                            Number of obs   =         97
Absorbing 2 HDFE groups                           F(   1,     81) =       4.36
                                                  Prob > F        =     0.0400
                                                  R-squared       =     0.1387
                                                  Adj R-squared   =    -0.0208
                                                  Within R-sq.    =     0.0510
                                                  Root MSE        =     5.8151

------------------------------------------------------------------------------
   citations | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |  -1.339499   .6417399    -2.09   0.040     -2.61636    -.062638
       _cons |   30.49813   .5905429    51.64   0

In [15]:
estimates table reg_wide_yxja reg_long_yxja, keep(x) b(%7.4f) se(%7.4f) stats(N r2_a)


----------------------------------
    Variable | reg_w~a   reg_l~a  
-------------+--------------------
           x | -1.3395   -1.3395  
             |  0.6417    0.6417  
-------------+--------------------
           N |      97        97  
        r2_a | -0.0208   -0.0208  
----------------------------------
                      Legend: b/se


- To obtain the same values as in the "wide" data I had to specify the option "**aggregation(sum)**"
- By default **reghdfe** assumes the option "**aggregation(mean)**"

- The option "**aggregation(sum)**" assumes that the contribution of the authors to **y** is the summation of the author's fixed effects.
- An alternative would be to assume that the contribution to **y** is given by the average of the author's fixed effects

- Let us run the model with the default option using two alternative specifications:

In [16]:
reghdfe citations x i.journal_id, group(art_id) individual(author_id) a(author_id) keepsingletons
estimates store reg_long_yxja2
reghdfe citations x, group(art_id) individual(author_id) a(journal_id author_id) keepsingletons
estimates store reg_long_yxja3


> (link)
(MWFE estimator converged in 6 iterations)

HDFE Linear regression                            Number of obs   =        100
Absorbing 1 HDFE group                            F(  12,     82) =       1.19
                                                  Prob > F        =     0.3047
                                                  R-squared       =     0.1550
                                                  Adj R-squared   =    -0.0201
                                                  Within R-sq.    =     0.1482
                                                  Root MSE        =     5.7913

------------------------------------------------------------------------------
   citations | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |  -1.296932   .6349505    -2.04   0.044    -2.560051   -.0338135
             |
  journal_id |
          3  |   2.059242   3.093768     0.67 

And now we replicate these results with the wide dataset:

In [17]:
use citations_wide, clear

In [18]:
 egen naut=rowtotal(aut1-aut6)

In [19]:
forval j=1/6 {
qui replace aut`j'=aut`j'/naut
}

In [20]:
%browse

Unnamed: 0,art_id,journal_id,citations,x,authors,_est_reg_wide_yx,_est_reg_wide_yxj,aut1,aut2,aut3,aut4,aut5,aut6,naut
1,1,4,38,-1.0965978,AutB;AutC;AutD;AutE;AutF,1,1,0.0,0.2,0.2,0.2,0.2,0.2,5
2,2,12,34,3.1820426,AutA;AutB;AutC;AutD,1,0,0.25,0.25,0.25,0.25,0.0,0.0,4
3,3,4,22,2.3190854,AutA; AutD,1,1,0.5,0.0,0.0,0.5,0.0,0.0,2
4,4,4,37,0.5314979,AutA;AutB,1,1,0.5,0.5,0.0,0.0,0.0,0.0,2
5,5,3,29,-0.12634525,AutA;AutB,1,1,0.5,0.5,0.0,0.0,0.0,0.0,2
6,6,3,25,-1.0660225,AutA;AutB,1,1,0.5,0.5,0.0,0.0,0.0,0.0,2
7,7,4,34,2.3141234,AutB,1,1,0.0,1.0,0.0,0.0,0.0,0.0,1
8,8,6,35,0.77014482,AutA;AutB,1,1,0.5,0.5,0.0,0.0,0.0,0.0,2
9,9,4,32,0.073131695,AutA;AutB,1,1,0.5,0.5,0.0,0.0,0.0,0.0,2
10,10,10,31,0.71518707,AutA;AutB,1,1,0.5,0.5,0.0,0.0,0.0,0.0,2


In [21]:
reghdfe citations x aut1-aut6, nocons a(journal_id) keepsingletons
estimates store reg_wide_yxja2


> (link)
(MWFE estimator converged in 1 iterations)
note: aut6 omitted because of collinearity

HDFE Linear regression                            Number of obs   =        100
Absorbing 1 HDFE group                            F(   6,     82) =       0.95
                                                  Prob > F        =     0.4664
                                                  R-squared       =     0.1550
                                                  Adj R-squared   =    -0.0201
                                                  Within R-sq.    =     0.0648
                                                  Root MSE        =     5.7913

------------------------------------------------------------------------------
   citations | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |  -1.296932   .6349505    -2.04   0.044    -2.560051   -.0338135
        aut1 |  -6.598418   10.50

In [22]:
estimates table reg_wide_yxja2 reg_long_yxja2 reg_long_yxja3, keep(x) b(%7.4f) se(%7.4f) stats(N r2_a)


--------------------------------------------
    Variable | reg_w~2   reg_l~2   reg_l~3  
-------------+------------------------------
           x | -1.2969   -1.2969   -1.2969  
             |  0.6350    0.6350    0.6389  
-------------+------------------------------
           N |     100       100       100  
        r2_a | -0.0201   -0.0201   -0.0327  
--------------------------------------------
                                Legend: b/se


The results are the same ... with the exception of the last regression. This is because counting dof is no easy task with more then one absorbed variable in a long format. With large datasets this should be irrelevant.