# Getting Started with PyFixest

In a first step, we load the module and some example data:

In [35]:
%load_ext autoreload
%autoreload 2

from pyfixest.estimation import feols, fepois
from pyfixest.utils import get_data

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [36]:
data = get_data()
data.head()

Unnamed: 0,Y,Y2,X1,X2,f1,f2,f3,group_id,Z1,Z2
0,,-9.53419,1.0,0.457858,9.0,15.0,0.0,1.0,1.87489,-1.18628
1,2.85399,-3.203116,,-4.998406,8.0,6.0,9.0,11.0,,-5.396992
2,1.817729,-3.353401,2.0,1.55848,,11.0,0.0,18.0,0.483013,1.899145
3,5.179868,14.696121,2.0,1.560402,15.0,1.0,4.0,15.0,2.01508,1.664915
4,1.193511,-6.568647,2.0,-3.472232,20.0,19.0,9.0,5.0,2.698055,-4.937519


## OLS Estimation

We can estimate a fixed effects regression via the `feols()` function. `feols()` has three arguments: a two-sided model formula, the data, and optionally, the type of inference.

In [37]:
fit = feols(fml="Y~X1 | f1", data=data, vcov="HC1")
type(fit)

AttributeError: 'Feols' object has no attribute 'get_vcov'

The first part of the formula contains the dependent variable and "regular" covariates, while the second part contains fixed effects.

`feols()` returns an instance of the `Fixest` class, which supports a range of method to inspect results: 

To inspect the results, we can use a summary function or method:

In [29]:
fit.summary()

###

Model:  OLS
Dep. var.:  Y
Fixed effects:  f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.502 |        0.055 |     9.064 |      0.000 |   0.393 |    0.610 |
---
RMSE: 1.418  Adj. R2: 0.074  Adj. R2 Within: 0.074


Alternatively, the `.summarize` module contains a `summary` function, which can be applied on instances of regression model objects 
or lists of regression model objects. 

In [30]:
from pyfixest.summarize import summary

summary(fit)

###

Model:  OLS
Dep. var.:  Y
Fixed effects:  f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.502 |        0.055 |     9.064 |      0.000 |   0.393 |    0.610 |
---
RMSE: 1.418  Adj. R2: 0.074  Adj. R2 Within: 0.074


You can access individual elements of the summary via dedicated methods: `.tidy()` returns a "tidy" `pd.DataFrame`, 
`.coef()` returns estimated parameters, and `se()` estimated standard errors. Other methods include `pvalue()`, `confint()`
and `tstat()`.

In [31]:
fit.coef()

Coefficient
X1    0.501596
Name: Estimate, dtype: float64

In [32]:
fit.se()

Coefficient
X1    0.055339
Name: Std. Error, dtype: float64

## Standard Errors and Inference

Supported covariance types are "iid", "HC1-3", CRV1 and CRV3 (one-way clustering). Inference can be adjusted "on-the-fly" via the
`.vcov()` method:

In [34]:
fit.vcov({"CRV1": "group_id"}).summary()

###

Model:  OLS
Dep. var.:  Y
Fixed effects:  f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.502 |        0.055 |     9.064 |      0.000 |   0.393 |    0.610 |
---
RMSE: 1.418  Adj. R2: 0.074  Adj. R2 Within: 0.074


It is also possible to run a wild (cluster) bootstrap after estimation (via the [wildboottest module](https://github.com/s3alfisc/wildboottest), only for Python
versions smaller than `3.11`):

In [53]:
fit2 = feols(fml="Y~ csw(X1, X2)", data=data, vcov={"CRV1": "group_id"})
fit2.wildboottest(param="X1", B=999)

Unnamed: 0_level_0,param,t value,Pr(>|t|)
fml,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Y~X1,X1,[4.554272514041933],0.0
Y~X1+X2,X1,[4.6932293717628735],0.0


Note that the wild bootstrap currently does not support fixed effects in the regression model. Supporting fixed effects is work in progress.

## IV Estimation 

It is also possible to estimate instrumental variable models with *one* endogenous variable and (potentially multiple) instruments:

In [55]:
iv_fit = feols(fml="Y2~ 1 | f1 + f2 | X1 ~ Z1 + Z2", data=data)
iv_fit.summary()

###

Model:  IV
Dep. var.:  Y2
Fixed effects:  f1+f2
Inference:  {'CRV1': 'f1'}
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.386 |        0.322 |     1.199 |      0.240 |  -0.272 |    1.045 |
---


If the model does not contain any fixed effects, just drop the second part of the formula above:

In [57]:
feols(fml="Y~ 1 | X1 ~ Z1 + Z2", data=data).summary()

###

Model:  IV
Dep. var.:  Y
Fixed effects:  X1~Z2+Z1
Inference:  iid
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.382 |        0.122 |    19.584 |      0.000 |   2.143 |    2.620 |
| X1            |      0.178 |        0.110 |     1.617 |      0.106 |  -0.038 |    0.395 |
---


IV estimation with multiple endogenous variables and multiple estimation syntax is currently not supported. The syntax is "depvar ~ exog.vars | fixef effects | endog.vars ~ instruments".

## Poisson Regression 

With version `0.8.4`, it is possible to estimate Poisson Regressions (not yet on PyPi): 

In [63]:
from pyfixest.utils import get_poisson_data

pois_data = get_poisson_data()
pois_fit = fepois(fml="Y~X1 | X2+X3+X4", data=pois_data, vcov={"CRV1": "X4"})
pois_fit.summary()

###

Model:  Poisson
Dep. var.:  Y
Fixed effects:  X2+X3+X4
Inference:  {'CRV1': 'X4'}
Observations:  1000

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.874 |        0.037 |    23.780 |      0.000 |   0.802 |    0.946 |
---
Deviance: 481157.824


## Multiple Estimation 

`PyFixest` supports a range of multiple estimation functionality: `sw`, `sw0`, `csw`, `csw0`, and multiple dependent variables. Note that every new call of `.feols()` attaches new regression results the `Fixest` object.

In [67]:
multi_fit = feols(fml="Y~X1 | csw0(f1, f2)", data=data, vcov="HC1")
multi_fit.summary()

###

Model:  OLS
Dep. var.:  Y
Inference:  HC1
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.256 |        0.087 |    25.973 |      0.000 |   2.086 |    2.427 |
| X1            |      0.306 |        0.067 |     4.552 |      0.000 |   0.174 |    0.438 |
---
RMSE: 1.764  Adj. R2: 0.017  Adj. R2 Within: 0.017
###

Model:  OLS
Dep. var.:  Y
Fixed effects:  f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.314 |        0.055 |     5.748 |      0.000 |   0.207 |    0.422 |
---
RMSE: 1.421  Adj. R2: 0.029  Adj. R2 Within: 0.029
###

Model:  OLS
Dep. var.:  Y
Fixed effects:  f1+f2
Inference:  HC1
Observations:  997

| Co

Alternatively, you can look at the estimation results via the `etable()` method:

In [68]:
multi_fit.etable()

fml,Y~X1,Y~X1,Y~X1|f1,Y~X1|f1+f2
Coefficient,Intercept,X1,X1,X1
Estimate,2.256,0.306,0.314,0.325
Std. Error,0.087,0.067,0.055,0.047
t value,25.973,4.552,5.748,6.875
Pr(>|t|),0.0,0.0,0.0,0.0
2.5 %,2.086,0.174,0.207,0.233
97.5 %,2.427,0.438,0.422,0.418


If you are only insterested in some parameters, e.g. "X1", you can use the following syntax:

In [69]:
multi_fit.etable().xs("X1", level=1, axis=1)

fml,Y~X1,Y~X1|f1,Y~X1|f1+f2
Estimate,0.306,0.314,0.325
Std. Error,0.067,0.055,0.047
t value,4.552,5.748,6.875
Pr(>|t|),0.0,0.0,0.0
2.5 %,0.174,0.207,0.233
97.5 %,0.438,0.422,0.418


You can access an individual model by its name - i.e. a formula - via the `all_fitted_models` attribure.

In [70]:
multi_fit.all_fitted_models["Y~X1"].tidy()

Unnamed: 0_level_0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,2.256468,0.086878,25.972709,0.0,2.085982,2.426954
X1,0.30617,0.067261,4.55199,6e-06,0.174181,0.43816


or equivalently via the `fetch_model` method:

In [71]:
multi_fit.fetch_model(0).tidy()

Model:  Y~X1


Unnamed: 0_level_0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,2.256468,0.086878,25.972709,0.0,2.085982,2.426954
X1,0.30617,0.067261,4.55199,6e-06,0.174181,0.43816


Here, `0` simply fetches the first model stored in the `all_fitted_models` dictionary, `1` the second etc.

## TWFE Event Study

To conclude this intro, we estimate an event study from an example of the the [LOST](https://lost-stats.github.io/Model_Estimation/Research_Design/event_study.html) library of statistical techniques.

In [73]:
import pandas as pd
import numpy as np
from pyfixest import Fixest

# Read in data
df = pd.read_csv(
    "https://raw.githubusercontent.com/LOST-STATS/LOST-STATS.github.io/master/Model_Estimation/Data/Event_Study_DiD/bacon_example.csv"
)

df["time_to_treat"] = (df["year"] - df["_nfd"]).fillna(0).astype(int)
df["time_to_treat"] = pd.Categorical(
    df.time_to_treat, np.sort(df.time_to_treat.unique())
)
df["treat"] = np.where(pd.isna(df["_nfd"]), 0, 1)

fml = "asmrs ~ i(time_to_treat, treat, ref = -1) + csw(pcinc, asmrh, cases) | stfips + year"
fit = feols(fml=fml, data=df, vcov={"CRV1": "stfips"})
fit.iplot(yintercept=0)

KeyError: "['C(time_to_treat[T.-1]):treat'] not found in axis"