# Getting Started with PyFixest

In a first step, we load the module and some example data:

In [29]:
%load_ext autoreload
%autoreload 2

from pyfixest.estimation import feols, fepois
from pyfixest.summarize import summary, etable
from pyfixest.visualize import coefplot, iplot
from pyfixest.utils import get_data

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [30]:
data = get_data()
data.head()

Unnamed: 0,Y,Y2,X1,X2,f1,f2,f3,group_id,Z1,Z2
0,,-9.53419,1.0,0.457858,9.0,15.0,0.0,1.0,-2.294594,0.64469
1,3.221964,-2.835142,,-4.998406,8.0,6.0,9.0,11.0,,-3.706959
2,1.449755,-3.721375,1.0,1.55848,,11.0,0.0,18.0,0.543363,2.102921
3,4.44392,13.960173,0.0,1.560402,15.0,1.0,4.0,15.0,-1.593905,2.193772
4,0.825537,-6.93662,1.0,-3.472232,20.0,19.0,9.0,5.0,1.482674,-3.239027


## OLS Estimation

We can estimate a fixed effects regression via the `feols()` function. `feols()` has three arguments: a two-sided model formula, the data, and optionally, the type of inference.

In [31]:
fit = feols(fml="Y~X1 | f1", data=data, vcov="HC1")
type(fit)

pyfixest.feols.Feols

The first part of the formula contains the dependent variable and "regular" covariates, while the second part contains fixed effects.

`feols()` returns an instance of the `Fixest` class.

To inspect the results, we can use a summary function or method:

In [32]:
fit.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.271 |        0.056 |     4.812 |      0.000 |   0.161 |    0.382 |
---
RMSE: 1.42  Adj. R2: 0.022  Adj. R2 Within: 0.022


Alternatively, the `.summarize` module contains a `summary` function, which can be applied on instances of regression model objects 
or lists of regression model objects. 

In [33]:
summary(fit)

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.271 |        0.056 |     4.812 |      0.000 |   0.161 |    0.382 |
---
RMSE: 1.42  Adj. R2: 0.022  Adj. R2 Within: 0.022


You can access individual elements of the summary via dedicated methods: `.tidy()` returns a "tidy" `pd.DataFrame`, 
`.coef()` returns estimated parameters, and `se()` estimated standard errors. Other methods include `pvalue()`, `confint()`
and `tstat()`.

In [34]:
fit.coef()

Coefficient
X1    0.271127
Name: Estimate, dtype: float64

In [35]:
fit.se()

Coefficient
X1    0.056346
Name: Std. Error, dtype: float64

## Standard Errors and Inference

Supported covariance types are "iid", "HC1-3", CRV1 and CRV3 (up to two-way clustering). Inference can be adjusted "on-the-fly" via the
`.vcov()` method:

In [36]:
fit.vcov({"CRV1": "group_id + f1"}).summary()
fit.vcov({"CRV3": "group_id"}).summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.271 |        0.029 |     9.279 |      0.000 |   0.210 |    0.333 |
---
RMSE: 1.42  Adj. R2: 0.022  Adj. R2 Within: 0.022
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  CRV3
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.271 |        0.044 |     6.137 |      0.000 |   0.178 |    0.364 |
---
RMSE: 1.42  Adj. R2: 0.022  Adj. R2 Within: 0.022


It is also possible to run a wild (cluster) bootstrap after estimation (via the [wildboottest module](https://github.com/s3alfisc/wildboottest)):

In [37]:
fit2 = feols(fml="Y~ X1", data=data, vcov={"CRV1": "group_id"})
fit2.wildboottest(param="X1", B=999)

param                            X1
t value                    6.267915
Pr(>|t|)                        0.0
bootstrap_type                   11
inference         CRV(['group_id'])
impose_null                    True
dtype: object

Note that the wild bootstrap currently does not support fixed effects in the regression model. Supporting fixed effects is work in progress.

## IV Estimation 

It is also possible to estimate instrumental variable models with *one* endogenous variable and (potentially multiple) instruments:

In [38]:
iv_fit = feols(fml="Y2~ 1 | f1 + f2 | X1 ~ Z1 + Z2", data=data)
iv_fit.summary()

###

Estimation:  IV
Dep. var.: Y2, Fixed effects: f1+f2
Inference:  CRV1
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.301 |        0.332 |     0.906 |      0.372 |  -0.379 |    0.981 |
---


If the model does not contain any fixed effects, just drop the second part of the formula above:

In [39]:
feols(fml="Y~ 1 | X1 ~ Z1 + Z2", data=data).summary()

###

Estimation:  IV
Dep. var.: Y
Inference:  iid
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.198 |        0.121 |    18.133 |      0.000 |   1.960 |    2.436 |
| X1            |      0.366 |        0.107 |     3.437 |      0.001 |   0.157 |    0.575 |
---


IV estimation with multiple endogenous variables and multiple estimation syntax is currently not supported. The syntax is "depvar ~ exog.vars | fixef effects | endog.vars ~ instruments".

## Poisson Regression 

With version `0.8.4`, it is possible to estimate Poisson Regressions (not yet on PyPi): 

In [40]:
from pyfixest.utils import get_data

pois_data = get_data(model="Fepois")
pois_fit = fepois(fml="Y~X1 | f1+f2", data=pois_data, vcov={"CRV1": "group_id"})
pois_fit.summary()

###

Estimation:  Poisson
Dep. var.: Y, Fixed effects: f1+f2
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |     -0.008 |        0.045 |    -0.177 |      0.859 |  -0.097 |    0.081 |
---
Deviance: 1068.008


## Multiple Estimation 

`PyFixest` supports a range of multiple estimation functionality: `sw`, `sw0`, `csw`, `csw0`, and multiple dependent variables. If multiple regression syntax is used, 
`feols()` and `fepois` returns an instance of a `FixestMulti` object, which essentially consists of a dicionary of `Fepois` or `Feols` instances.

In [41]:
multi_fit = feols(fml="Y~X1 | csw0(f1, f2)", data=data, vcov="HC1")
multi_fit

<pyfixest.FixestMulti.FixestMulti at 0x29aa98edbd0>

In [42]:
multi_fit.summary()

###

Estimation:  OLS
Dep. var.: Y
Inference:  HC1
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.221 |        0.088 |    25.231 |      0.000 |   2.048 |    2.394 |
| X1            |      0.343 |        0.067 |     5.090 |      0.000 |   0.211 |    0.476 |
---
RMSE: 1.765  Adj. R2: 0.022  Adj. R2 Within: 0.022
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.271 |        0.056 |     4.812 |      0.000 |   0.161 |    0.382 |
---
RMSE: 1.42  Adj. R2: 0.022  Adj. R2 Within: 0.022
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  HC1
Observations:

Alternatively, you can look at the estimation results via the `etable()` method:

In [43]:
multi_fit.etable()

fml,Y~X1,Y~X1,Y~X1|f1,Y~X1|f1+f2
Coefficient,Intercept,X1,X1,X1
Estimate,2.221,0.343,0.271,0.33
Std. Error,0.088,0.067,0.056,0.048
t value,25.231,5.09,4.812,6.945
Pr(>|t|),0.0,0.0,0.0,0.0
2.5 %,2.048,0.211,0.161,0.237
97.5 %,2.394,0.476,0.382,0.424


If you are only insterested in some parameters, e.g. "X1", you can use the following syntax:

In [44]:
multi_fit.etable().xs("X1", level=1, axis=1)

fml,Y~X1,Y~X1|f1,Y~X1|f1+f2
Estimate,0.343,0.271,0.33
Std. Error,0.067,0.056,0.048
t value,5.09,4.812,6.945
Pr(>|t|),0.0,0.0,0.0
2.5 %,0.211,0.161,0.237
97.5 %,0.476,0.382,0.424


You can access an individual model by its name - i.e. a formula - via the `all_fitted_models` attribure.

In [45]:
multi_fit.all_fitted_models["Y~X1"].tidy()

Unnamed: 0_level_0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,2.220916,0.088024,25.230741,0.0,2.048182,2.39365
X1,0.343246,0.067438,5.089815,4.28343e-07,0.210909,0.475582


or equivalently via the `fetch_model` method:

In [46]:
multi_fit.fetch_model(0).tidy()

Model:  Y~X1


Unnamed: 0_level_0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,2.220916,0.088024,25.230741,0.0,2.048182,2.39365
X1,0.343246,0.067438,5.089815,4.28343e-07,0.210909,0.475582


Here, `0` simply fetches the first model stored in the `all_fitted_models` dictionary, `1` the second etc.

Objects of type `Fixest` come with a range of additional methods: `tidy()`, `coef()`, `vcov()` etc, which 
essentially loop over the equivalent methods of all fitted models. E.g. `Fixest.vcov()` updates inference for all 
models stored in `Fixest`.

In [47]:
multi_fit.vcov("iid").summary()

###

Estimation:  OLS
Dep. var.: Y
Inference:  iid
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.221 |        0.089 |    24.909 |      0.000 |   2.046 |    2.396 |
| X1            |      0.343 |        0.069 |     4.987 |      0.000 |   0.208 |    0.478 |
---
RMSE: 1.765  Adj. R2: 0.022  Adj. R2 Within: 0.022
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  iid
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.271 |        0.056 |     4.837 |      0.000 |   0.161 |    0.381 |
---
RMSE: 1.42  Adj. R2: 0.022  Adj. R2 Within: 0.022
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  iid
Observations:

If you have estimated multiple models without multiple estimation syntax and still want to compare them, you can use the `etable()` function: 

In [48]:
from pyfixest.summarize import etable
etable([fit, fit2])

| Coefficient   | est1             | est2             |
|:--------------|:-----------------|:-----------------|
| X1            | 0.271*** (0.044) | 0.343*** (0.055) |
| Intercept     |                  | 2.221*** (0.062) |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001


## Visualization 

`PyFixest` provides two functions to visualize the results of a regression: `coefplot` and `iplot`.

In [49]:
from lets_plot import *
LetsPlot.setup_html()

multi_fit.coefplot().show()

## TWFE Event Study

To conclude this intro, we estimate an event study from an example of the the [LOST](https://lost-stats.github.io/Model_Estimation/Research_Design/event_study.html) library of statistical techniques.

In [50]:
import pandas as pd
import numpy as np

# Read in data
df = pd.read_csv(
    "https://raw.githubusercontent.com/LOST-STATS/LOST-STATS.github.io/master/Model_Estimation/Data/Event_Study_DiD/bacon_example.csv"
)

df["time_to_treat"] = (df["year"] - df["_nfd"]).fillna(0).astype(int)
df["time_to_treat"] = pd.Categorical(
    df.time_to_treat, np.sort(df.time_to_treat.unique())
)
df["treat"] = np.where(pd.isna(df["_nfd"]), 0, 1)

fml = "asmrs ~ i(time_to_treat, treat, ref = -1) + csw(pcinc, asmrh, cases) | stfips + year"
fit = feols(fml=fml, data=df, vcov={"CRV1": "stfips"})

In [51]:
plot = fit.iplot(
    yintercept=0,
    figsize=(2000, 600),
    coord_flip=False,
    title="Event Study Plot",
    rotate_xticks=90,
)
plot.show()