## Regression Tables via `pf.etable()` and the `Stargazer` package

To produce regression tables, we have two options: pyfixest's internal `etable()` function and the [Stargazer](https://github.com/StatsReporting/stargazer) Python package.

To begin, we load some libraries and fit a set of regression models. 

In [1]:
import numpy as np
from stargazer.stargazer import LineLocation, Stargazer

import pyfixest as pf

%load_ext autoreload
%autoreload 2

data = pf.get_data()

fit1 = pf.feols("Y ~ X1 | f1", data = data)
fit2 = pf.feols("Y ~ X1 | f1 + f2", data = data)
fit3 = pf.feols("Y ~ X1 + X2 | f1", data = data)
fit4 = pf.feols("Y ~ X1 + X2 | f1 + f2", data = data)
fit5 = pf.feols("Y ~ X1 *X2 | f1 + f2", data = data)

## Regression Tables via `pf.etable()`

We can compare all regression models via the pyfixest-internal `pf.etable()` function: 

In [2]:
pf.etable([fit1, fit2, fit3, fit4, fit5])

Unnamed: 0,est1,est2,est3,est4,est5
depvar,Y,Y,Y,Y,Y
X1,-0.949*** (0.069),-0.919*** (0.065),-0.950*** (0.067),-0.924*** (0.061),-0.924*** (0.061)
X2,,,-0.174*** (0.018),-0.174*** (0.015),-0.185*** (0.025)
X1:X2,,,,,0.011 (0.018)
f1,x,x,x,x,x
f2,-,x,-,x,x
R2,0.437,0.609,0.489,0.659,0.659
S.E. type,by: f1,by: f1,by: f1,by: f1,by: f1
Observations,997,997,997,997,997


`etable` allows us to do a few things out of the box. For example, we can only keep the variables that we'd like, which keeps all variables that fit the provided regex match. 

In [3]:
pf.etable([fit1, fit2, fit3, fit4, fit5], keep = "X1")

Unnamed: 0,est1,est2,est3,est4,est5
depvar,Y,Y,Y,Y,Y
X1,-0.949*** (0.069),-0.919*** (0.065),-0.950*** (0.067),-0.924*** (0.061),-0.924*** (0.061)
X1:X2,,,,,0.011 (0.018)
f1,x,x,x,x,x
f2,-,x,-,x,x
R2,0.437,0.609,0.489,0.659,0.659
S.E. type,by: f1,by: f1,by: f1,by: f1,by: f1
Observations,997,997,997,997,997


We can use the `exact_match` argument to select a specific set of variables: 

In [4]:
pf.etable([fit1, fit2, fit3, fit4, fit5], keep = ["X1", "X2"], exact_match = True)

Unnamed: 0,est1,est2,est3,est4,est5
depvar,Y,Y,Y,Y,Y
X1,-0.949*** (0.069),-0.919*** (0.065),-0.950*** (0.067),-0.924*** (0.061),-0.924*** (0.061)
X2,,,-0.174*** (0.018),-0.174*** (0.015),-0.185*** (0.025)
f1,x,x,x,x,x
f2,-,x,-,x,x
R2,0.437,0.609,0.489,0.659,0.659
S.E. type,by: f1,by: f1,by: f1,by: f1,by: f1
Observations,997,997,997,997,997


We can also easily **drop** variables via the `drop` argument: 

In [5]:
pf.etable([fit1, fit2, fit3, fit4, fit5], drop = ["X1"])

Unnamed: 0,est1,est2,est3,est4,est5
depvar,Y,Y,Y,Y,Y
X2,,,-0.174*** (0.018),-0.174*** (0.015),-0.185*** (0.025)
f1,x,x,x,x,x
f2,-,x,-,x,x
R2,0.437,0.609,0.489,0.659,0.659
S.E. type,by: f1,by: f1,by: f1,by: f1,by: f1
Observations,997,997,997,997,997


By default, `pf.etable()` reports **standard errors**. But we can also ask to output p-values or confidence intervals via the `coef_fmt` 
function argument. 

In [6]:
pf.etable([fit1, fit2, fit3, fit4, fit5], coef_fmt = "b (se) [p]")

Unnamed: 0,est1,est2,est3,est4,est5
depvar,Y,Y,Y,Y,Y
X1,-0.949*** (0.069) [0.000],-0.919*** (0.065) [0.000],-0.950*** (0.067) [0.000],-0.924*** (0.061) [0.000],-0.924*** (0.061) [0.000]
X2,,,-0.174*** (0.018) [0.000],-0.174*** (0.015) [0.000],-0.185*** (0.025) [0.000]
X1:X2,,,,,0.011 (0.018) [0.565]
f1,x,x,x,x,x
f2,-,x,-,x,x
R2,0.437,0.609,0.489,0.659,0.659
S.E. type,by: f1,by: f1,by: f1,by: f1,by: f1
Observations,997,997,997,997,997


Additionally, we can also overwrite the defaults for the reported significance levels and control the rounding of results via the 
`signif_code` and `digits` function arguments: 

In [7]:
pf.etable([fit1, fit2, fit3, fit4, fit5], signif_code=[0.01, 0.05, 0.1], digits = 5)

Unnamed: 0,est1,est2,est3,est4,est5
depvar,Y,Y,Y,Y,Y
X1,-0.94944*** (0.06886),-0.91925*** (0.06539),-0.94953*** (0.06652),-0.92405*** (0.06093),-0.92417*** (0.06094)
X2,,,-0.17423*** (0.01840),-0.17411*** (0.01461),-0.18550*** (0.02516)
X1:X2,,,,,0.01057 (0.01818)
f1,x,x,x,x,x
f2,-,x,-,x,x
R2,0.43708,0.60903,0.48899,0.65904,0.65916
S.E. type,by: f1,by: f1,by: f1,by: f1,by: f1
Observations,997,997,997,997,997


By default, `pf.etable()` returns a data frame, but you can also opt to output latex and markdown via the `type` argument.

In [8]:
pf.etable([fit1, fit2, fit3, fit4, fit5], signif_code=[0.01, 0.05, 0.1], digits = 5, type = "md")

                               est1                   est2                   est3                   est4                   est5
------------  ---------------------  ---------------------  ---------------------  ---------------------  ---------------------
depvar                            Y                      Y                      Y                      Y                      Y
-------------------------------------------------------------------------------------------------------------------------------
X1            -0.94944*** (0.06886)  -0.91925*** (0.06539)  -0.94953*** (0.06652)  -0.92405*** (0.06093)  -0.92417*** (0.06094)
X2                                                          -0.17423*** (0.01840)  -0.17411*** (0.01461)  -0.18550*** (0.02516)
X1:X2                                                                                                         0.01057 (0.01818)
--------------------------------------------------------------------------------------------------------

You can also rename variables if you want to have a more readable output. Just pass a dictionary to the `labels` argument. Note that interaction terms will also be relabeled using the specified labels for the interacted variables (if you want to manually relabel an interaction term differently, add it to the dictionary).

In [16]:
labels={
    "X1": "Age",
    "X2": "Years of Schooling",
    "Y": "Wage",
    "f1": "Industry",
    "f2": "Year"
}

pf.etable([fit1, fit2, fit3, fit4, fit5], labels=labels)

Unnamed: 0,est1,est2,est3,est4,est5
depvar,Wage,Wage,Wage,Wage,Wage
Age,-0.949*** (0.069),-0.919*** (0.065),-0.950*** (0.067),-0.924*** (0.061),-0.924*** (0.061)
Years of Schooling,,,-0.174*** (0.018),-0.174*** (0.015),-0.185*** (0.025)
Age × Years of Schooling,,,,,0.011 (0.018)
Industry,x,x,x,x,x
Year,-,x,-,x,x
R2,0.437,0.609,0.489,0.659,0.659
S.E. type,by: f1,by: f1,by: f1,by: f1,by: f1
Observations,997,997,997,997,997


If you want to label the rows indicating the inclusion of fixed effects not with the variable label but with a custom label, you can pass on a separate dictionary to the `felabels` argument.

In [17]:

pf.etable([fit1, fit2, fit3, fit4, fit5], labels=labels, felabels={"f1": "Industry Fixed Effects", "f2": "Year Fixed Effects"})

Unnamed: 0,est1,est2,est3,est4,est5
depvar,Wage,Wage,Wage,Wage,Wage
Age,-0.949*** (0.069),-0.919*** (0.065),-0.950*** (0.067),-0.924*** (0.061),-0.924*** (0.061)
Years of Schooling,,,-0.174*** (0.018),-0.174*** (0.015),-0.185*** (0.025)
Age × Years of Schooling,,,,,0.011 (0.018)
Industry Fixed Effects,x,x,x,x,x
Year Fixed Effects,-,x,-,x,x
R2,0.437,0.609,0.489,0.659,0.659
S.E. type,by: f1,by: f1,by: f1,by: f1,by: f1
Observations,997,997,997,997,997


## Regression Tables via `Stargazer`

We have opened a PR for `pyfixest` support for the excellent [Stargazer](https://github.com/StatsReporting/stargazer/pull/105) project. Until it is merged, you can download the dev version from `py-econometrics` by typing

```bash
pip install git+https://github.com/py-econometrics/stargazer.git
```

`Stargazer` is particularly useful if you need highly customizable regression tables (beyond the scope of `pf.etable()`), or if you want to compare models from `statsmodels` or `linearmodels` with `pyfixest`. 

After installing `stargazer`, we can produce a summary table via the `Stargazer` class: 

In [10]:
stargazer_table = Stargazer([fit1, fit2, fit3, fit4, fit5])
stargazer_table

rmse1.7296950763721533
rmse1.4414995715974865
rmse1.6480033852100044
rmse1.3461516723563165
rmse1.3459213070070806


0,1,2,3,4,5
,,,,,
,Dependent variable: Y,Dependent variable: Y,Dependent variable: Y,Dependent variable: Y,Dependent variable: Y
,,,,,
,(1),(2),(3),(4),(5)
,,,,,
X1,-0.949***,-0.919***,-0.950***,-0.924***,-0.924***
,(0.069),(0.065),(0.067),(0.061),(0.061)
X1:X2,,,,,0.011
,,,,,(0.018)
X2,,,-0.174***,-0.174***,-0.185***


We can easily add custom statisics. For example, assume that we want to correct for multiple testing via the Romano-Wolf correction. We can do this as follows:

In [11]:
rwolf_res = pf.rwolf([fit1, fit2, fit3, fit4, fit5], param = "X1", seed = 123, reps = 9999)
rwolf_pvalues = np.round(rwolf_res.xs("RW Pr(>|t|)"), 3).to_list()

In [12]:
stargazer_table.add_line('Fixed Effects', [x._fixef for x in [fit1, fit2, fit3, fit4, fit5]], LineLocation.FOOTER_TOP)
stargazer_table.add_line('X1: Romano-Wolf P-Value', rwolf_pvalues, LineLocation.FOOTER_TOP)
stargazer_table

0,1,2,3,4,5
,,,,,
,Dependent variable: Y,Dependent variable: Y,Dependent variable: Y,Dependent variable: Y,Dependent variable: Y
,,,,,
,(1),(2),(3),(4),(5)
,,,,,
X1,-0.949***,-0.919***,-0.950***,-0.924***,-0.924***
,(0.069),(0.065),(0.067),(0.061),(0.061)
X1:X2,,,,,0.011
,,,,,(0.018)
X2,,,-0.174***,-0.174***,-0.185***
