# Getting Started with PyFixest

In a first step, we load the module and some example data:

In [5]:
%load_ext autoreload
%autoreload 2

from pyfixest.estimation import feols, fepois
from pyfixest.summarize import summary
from pyfixest.visualize import coefplot, iplot
from pyfixest.utils import get_data

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
data = get_data()
data.head()

Unnamed: 0,Y,Y2,X1,X2,f1,f2,f3,group_id,Z1,Z2
0,,-9.53419,1.0,0.457858,9.0,15.0,0.0,1.0,1.144217,0.966141
1,2.486016,-3.57109,,-4.998406,8.0,6.0,9.0,11.0,,-5.889014
2,1.817729,-3.353401,2.0,1.55848,,11.0,0.0,18.0,2.033647,2.392247
3,4.811894,14.328147,1.0,1.560402,15.0,1.0,4.0,15.0,1.870371,1.083727
4,0.457563,-7.304594,0.0,-3.472232,20.0,19.0,9.0,5.0,1.87924,-3.319665


## OLS Estimation

We can estimate a fixed effects regression via the `feols()` function. `feols()` has three arguments: a two-sided model formula, the data, and optionally, the type of inference.

In [7]:
fit = feols(fml="Y~X1 | f1", data=data, vcov="HC1")
type(fit)

pyfixest.feols.Feols

The first part of the formula contains the dependent variable and "regular" covariates, while the second part contains fixed effects.

`feols()` returns an instance of the `Fixest` class.

To inspect the results, we can use a summary function or method:

In [8]:
fit.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.402 |        0.058 |     6.964 |      0.000 |   0.289 |    0.516 |
---
RMSE: 1.422  Adj. R2: 0.046  Adj. R2 Within: 0.046


Alternatively, the `.summarize` module contains a `summary` function, which can be applied on instances of regression model objects 
or lists of regression model objects. 

In [9]:
summary(fit)

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.402 |        0.058 |     6.964 |      0.000 |   0.289 |    0.516 |
---
RMSE: 1.422  Adj. R2: 0.046  Adj. R2 Within: 0.046


You can access individual elements of the summary via dedicated methods: `.tidy()` returns a "tidy" `pd.DataFrame`, 
`.coef()` returns estimated parameters, and `se()` estimated standard errors. Other methods include `pvalue()`, `confint()`
and `tstat()`.

In [10]:
fit.coef()

Coefficient
X1    0.402235
Name: Estimate, dtype: float64

In [11]:
fit.se()

Coefficient
X1    0.057756
Name: Std. Error, dtype: float64

## Standard Errors and Inference

Supported covariance types are "iid", "HC1-3", CRV1 and CRV3 (one-way clustering). Inference can be adjusted "on-the-fly" via the
`.vcov()` method:

In [12]:
fit.vcov({"CRV1": "group_id"}).summary()
fit.vcov({"CRV3": "group_id"}).summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.402 |        0.064 |     6.315 |      0.000 |   0.268 |    0.536 |
---
RMSE: 1.422  Adj. R2: 0.046  Adj. R2 Within: 0.046
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  CRV3
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.402 |        0.068 |     5.884 |      0.000 |   0.259 |    0.546 |
---
RMSE: 1.422  Adj. R2: 0.046  Adj. R2 Within: 0.046


It is also possible to run a wild (cluster) bootstrap after estimation (via the [wildboottest module](https://github.com/s3alfisc/wildboottest), only for Python
versions smaller than `3.11`):

In [13]:
fit2 = feols(fml="Y~ X1", data=data, vcov={"CRV1": "group_id"})
fit2.wildboottest(param="X1", B=999)

param                X1
t value           5.199
Pr(>|t|)            0.0
bootstrap_type       11
impose_null        True
dtype: object

Note that the wild bootstrap currently does not support fixed effects in the regression model. Supporting fixed effects is work in progress.

## IV Estimation 

It is also possible to estimate instrumental variable models with *one* endogenous variable and (potentially multiple) instruments:

In [14]:
iv_fit = feols(fml="Y2~ 1 | f1 + f2 | X1 ~ Z1 + Z2", data=data)
iv_fit.summary()

###

Estimation:  IV
Dep. var.: Y2, Fixed effects: f1+f2
Inference:  CRV1
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.725 |        0.290 |     2.502 |      0.018 |   0.132 |    1.317 |
---


If the model does not contain any fixed effects, just drop the second part of the formula above:

In [15]:
feols(fml="Y~ 1 | X1 ~ Z1 + Z2", data=data).summary()

###

Estimation:  IV
Dep. var.: Y
Inference:  iid
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.228 |        0.127 |    17.543 |      0.000 |   1.979 |    2.478 |
| X1            |      0.336 |        0.114 |     2.952 |      0.003 |   0.113 |    0.559 |
---


IV estimation with multiple endogenous variables and multiple estimation syntax is currently not supported. The syntax is "depvar ~ exog.vars | fixef effects | endog.vars ~ instruments".

## Poisson Regression 

With version `0.8.4`, it is possible to estimate Poisson Regressions (not yet on PyPi): 

In [16]:
from pyfixest.utils import get_data

pois_data = get_data(model="Fepois")
pois_fit = fepois(fml="Y~X1 | f1+f2", data=pois_data, vcov={"CRV1": "group_id"})
pois_fit.summary()

###

Estimation:  Poisson
Dep. var.: Y, Fixed effects: f1+f2
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |     -0.029 |        0.041 |    -0.713 |      0.476 |  -0.109 |    0.051 |
---
Deviance: 1067.556


## Multiple Estimation 

`PyFixest` supports a range of multiple estimation functionality: `sw`, `sw0`, `csw`, `csw0`, and multiple dependent variables. If multiple regression syntax is used, 
`feols()` and `fepois` returns an instance of a `FixestMulti` object, which essentially consists of a dicionary of `Fepois` or `Feols` instances.

In [17]:
multi_fit = feols(fml="Y~X1 | csw0(f1, f2)", data=data, vcov="HC1")
multi_fit

<pyfixest.FixestMulti.FixestMulti at 0x2839e8f6350>

In [18]:
multi_fit.summary()

###

Estimation:  OLS
Dep. var.: Y
Inference:  HC1
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.195 |        0.090 |    24.297 |      0.000 |   2.018 |    2.372 |
| X1            |      0.369 |        0.071 |     5.196 |      0.000 |   0.230 |    0.508 |
---
RMSE: 1.765  Adj. R2: 0.025  Adj. R2 Within: 0.025
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.402 |        0.058 |     6.964 |      0.000 |   0.289 |    0.516 |
---
RMSE: 1.422  Adj. R2: 0.046  Adj. R2 Within: 0.046
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  HC1
Observations

Alternatively, you can look at the estimation results via the `etable()` method:

In [19]:
multi_fit.etable()

fml,Y~X1,Y~X1,Y~X1|f1,Y~X1|f1+f2
Coefficient,Intercept,X1,X1,X1
Estimate,2.195,0.369,0.402,0.364
Std. Error,0.09,0.071,0.058,0.049
t value,24.297,5.196,6.964,7.4
Pr(>|t|),0.0,0.0,0.0,0.0
2.5 %,2.018,0.23,0.289,0.268
97.5 %,2.372,0.508,0.516,0.461


If you are only insterested in some parameters, e.g. "X1", you can use the following syntax:

In [20]:
multi_fit.etable().xs("X1", level=1, axis=1)

fml,Y~X1,Y~X1|f1,Y~X1|f1+f2
Estimate,0.369,0.402,0.364
Std. Error,0.071,0.058,0.049
t value,5.196,6.964,7.4
Pr(>|t|),0.0,0.0,0.0
2.5 %,0.23,0.289,0.268
97.5 %,0.508,0.516,0.461


You can access an individual model by its name - i.e. a formula - via the `all_fitted_models` attribure.

In [21]:
multi_fit.all_fitted_models["Y~X1"].tidy()

Unnamed: 0_level_0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,2.194854,0.090333,24.297498,0.0,2.01759,2.372118
X1,0.369081,0.07103,5.196115,2.467969e-07,0.229695,0.508467


or equivalently via the `fetch_model` method:

In [22]:
multi_fit.fetch_model(0).tidy()

Model:  Y~X1


Unnamed: 0_level_0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,2.194854,0.090333,24.297498,0.0,2.01759,2.372118
X1,0.369081,0.07103,5.196115,2.467969e-07,0.229695,0.508467


Here, `0` simply fetches the first model stored in the `all_fitted_models` dictionary, `1` the second etc.

Objects of type `Fixest` come with a range of additional methods: `tidy()`, `coef()`, `vcov()` etc, which 
essentially loop over the equivalent methods of all fitted models. E.g. `Fixest.vcov()` updates inference for all 
models stored in `Fixest`.

In [23]:
multi_fit.vcov("iid").summary()

###

Estimation:  OLS
Dep. var.: Y
Inference:  iid
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.195 |        0.090 |    24.357 |      0.000 |   2.018 |    2.372 |
| X1            |      0.369 |        0.070 |     5.239 |      0.000 |   0.231 |    0.507 |
---
RMSE: 1.765  Adj. R2: 0.025  Adj. R2 Within: 0.025
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  iid
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.402 |        0.058 |     6.995 |      0.000 |   0.289 |    0.515 |
---
RMSE: 1.422  Adj. R2: 0.046  Adj. R2 Within: 0.046
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  iid
Observations

## Visualization 

`PyFixest` provides two functions to visualize the results of a regression: `coefplot` and `iplot`.

In [24]:
from lets_plot import *
LetsPlot.setup_html()

multi_fit.coefplot().show()

TypeError: 'bool' object is not callable

## TWFE Event Study

To conclude this intro, we estimate an event study from an example of the the [LOST](https://lost-stats.github.io/Model_Estimation/Research_Design/event_study.html) library of statistical techniques.

In [None]:
import pandas as pd
import numpy as np

# Read in data
df = pd.read_csv(
    "https://raw.githubusercontent.com/LOST-STATS/LOST-STATS.github.io/master/Model_Estimation/Data/Event_Study_DiD/bacon_example.csv"
)

df["time_to_treat"] = (df["year"] - df["_nfd"]).fillna(0).astype(int)
df["time_to_treat"] = pd.Categorical(
    df.time_to_treat, np.sort(df.time_to_treat.unique())
)
df["treat"] = np.where(pd.isna(df["_nfd"]), 0, 1)

fml = "asmrs ~ i(time_to_treat, treat, ref = -1) + csw(pcinc, asmrh, cases) | stfips + year"
fit = feols(fml=fml, data=df, vcov={"CRV1": "stfips"})


In [None]:
plot = fit.iplot(yintercept=0, figsize=(800, 400))
plot.show()

NameError: name 'fit' is not defined