## Estimação Por Efeitos Fixos


Vamos estimar um modelo com efeitos fixos. 

$$
y_{it} = x_{it}'\beta + \alpha_i + \varepsilon_{it}
$$


### Fixed Effects Model

Vamos regredir a probabilidade de um indíviduo cometer um crime regredido pelo log da probabilidade de ser preso, regredido pelo log da probabilidade de condenação, log da probabilidade da sentença de prisão e do log do tamanho médio da pena na prisão e por fim, pelo log do policiamento per capita.

Vamos calcular:

$$
\hat{\beta}_{FE} = \left ( \sum^{n}_{i=1}\sum_{t} \tilde{X}_{it} \tilde{X}_{it}' \right )^{-1} \left ( \sum^{n}_{i=1}\sum_{t} \tilde{X}_{it} \tilde{y}_{it} \right )
$$

### Importar dados

In [145]:
import pandas as pd
import numpy as np
import wooldridge as woo
import statsmodels.api as sm
from linearmodels.panel import PooledOLS
from linearmodels.panel import PanelOLS
import linearmodels as plm

In [146]:
df = pd.read_stata("cornwell.dta")
df = df.set_index(["county", "year"], drop=True)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,crmrte,prbarr,prbconv,prbpris,avgsen,polpc,density,taxpc,west,central,...,lpctymle,lpctmin,clcrmrte,clprbarr,clprbcon,clprbpri,clavgsen,clpolpc,cltaxpc,clmix
county,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,81,0.039885,0.289696,0.402062,0.472222,5.61,0.001787,2.307159,25.697630,0,1,...,-2.433870,3.006608,,,,,,,,
1,82,0.038345,0.338111,0.433005,0.506993,5.59,0.001767,2.330254,24.874252,0,1,...,-2.449038,3.006608,-0.039376,0.154542,0.074143,0.071048,-0.003571,-0.011364,-0.032565,0.030857
1,83,0.030305,0.330449,0.525703,0.479705,5.80,0.001836,2.341801,26.451443,0,1,...,-2.464036,3.006608,-0.235316,-0.022922,0.193987,-0.055326,0.036879,0.038413,0.061477,-0.244732
1,84,0.034726,0.362525,0.604706,0.520104,6.89,0.001886,2.346420,26.842348,0,1,...,-2.478925,3.006608,0.136180,0.092641,0.140006,0.080857,0.172213,0.026930,0.014670,-0.027331
1,85,0.036573,0.325395,0.578723,0.497059,6.55,0.001924,2.364896,28.140337,0,1,...,-2.497306,3.006608,0.051825,-0.108054,-0.043918,-0.045320,-0.050606,0.020199,0.047223,0.172125
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197,83,0.015575,0.226667,0.480392,0.428571,7.77,0.001073,0.869048,18.905853,1,0,...,-2.538060,1.697597,-0.148666,-0.010969,-0.127018,0.164303,0.157158,0.149330,0.070461,0.020250
197,84,0.013662,0.204188,1.410260,0.372727,10.11,0.001109,0.872024,22.704754,1,0,...,-2.548068,1.697597,-0.131037,-0.104441,1.076927,-0.139610,0.263255,0.032795,0.183103,0.026842
197,85,0.013086,0.180556,0.830769,0.333333,5.96,0.001054,0.875000,24.123611,1,0,...,-2.561072,1.697597,-0.043091,-0.123000,-0.529178,-0.111704,-0.528454,-0.050473,0.060617,-0.366374
197,86,0.012874,0.112676,2.250000,0.244444,7.68,0.001088,0.880952,24.981979,1,0,...,-2.580968,1.697597,-0.016311,-0.471524,0.996334,-0.310156,0.253549,0.031580,0.034964,-0.067911


In [147]:
# Calcular a média ao longo do tempo PARA CADA COUNTY (CONDADO) 
df_mean = df.groupby('county').mean()
df_mean # cada linha um condado e cada coluna a média temporal de cada variável

Unnamed: 0_level_0,crmrte,prbarr,prbconv,prbpris,avgsen,polpc,density,taxpc,west,central,...,lpctymle,lpctmin,clcrmrte,clprbarr,clprbcon,clprbpri,clavgsen,clpolpc,cltaxpc,clmix
county,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.035741,0.324358,0.512017,0.478874,6.292857,0.001846,2.356978,27.534382,0.0,1.0,...,-2.485800,3.006608,-0.018925,0.004861,0.045287,-0.013236,0.029841,0.003788,0.031231,-0.036702
3,0.014936,0.176669,0.997528,0.427240,7.404286,0.000661,1.014341,24.055300,0.0,1.0,...,-2.463531,2.068926,-0.012002,-0.071614,0.088900,-0.005735,-0.047619,0.037970,0.102249,-0.103135
5,0.012567,0.537032,0.390403,0.427434,7.030000,0.001243,0.414590,26.782335,1.0,0.0,...,-2.611558,1.150740,0.054648,0.014835,-0.001495,0.030387,0.033472,0.067988,0.104212,0.286573
7,0.023045,0.418395,0.573859,0.412003,7.812857,0.001467,0.489949,43.795879,0.0,1.0,...,-2.541019,3.869452,0.033240,-0.027848,-0.029470,0.001581,0.028992,0.010949,0.019329,-0.030535
9,0.011378,0.480105,0.583061,0.408591,8.418571,0.000850,0.541583,22.113031,1.0,0.0,...,-2.589735,0.585668,0.057628,-0.032971,-0.025562,0.023969,0.025827,0.004890,0.084936,-0.065799
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
189,0.028850,0.192302,0.403041,0.348085,11.118571,0.002026,1.076433,26.397387,1.0,0.0,...,-1.863639,0.684712,-0.000820,-0.019709,-0.002077,0.001595,-0.028863,0.007781,0.061081,0.005904
191,0.037461,0.238157,0.369684,0.438734,9.270000,0.001194,1.769727,23.442673,0.0,0.0,...,-2.340729,3.538879,0.035364,-0.075250,0.050018,-0.034123,-0.039846,0.012371,0.094527,-0.035777
193,0.020501,0.339669,0.502536,0.463439,6.688571,0.001037,0.801102,22.723814,1.0,0.0,...,-2.497699,1.780208,0.033785,-0.063744,0.053580,-0.050618,0.038906,0.040236,0.081869,-0.053717
195,0.045657,0.220812,0.837039,0.473233,11.207143,0.002903,1.720397,33.712444,0.0,0.0,...,-2.470089,3.622502,-0.116389,-0.017483,0.290986,0.007456,0.042160,0.155238,0.129819,0.040086


In [148]:
# Subtrair as médias temporais
df_mean_expanded = df_mean.reindex(df.index, level = "county")
df_mean_expanded

Unnamed: 0_level_0,Unnamed: 1_level_0,crmrte,prbarr,prbconv,prbpris,avgsen,polpc,density,taxpc,west,central,...,lpctymle,lpctmin,clcrmrte,clprbarr,clprbcon,clprbpri,clavgsen,clpolpc,cltaxpc,clmix
county,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,81,0.035741,0.324358,0.512017,0.478874,6.292857,0.001846,2.356978,27.534382,0.0,1.0,...,-2.485800,3.006608,-0.018925,0.004861,0.045287,-0.013236,0.029841,0.003788,0.031231,-0.036702
1,82,0.035741,0.324358,0.512017,0.478874,6.292857,0.001846,2.356978,27.534382,0.0,1.0,...,-2.485800,3.006608,-0.018925,0.004861,0.045287,-0.013236,0.029841,0.003788,0.031231,-0.036702
1,83,0.035741,0.324358,0.512017,0.478874,6.292857,0.001846,2.356978,27.534382,0.0,1.0,...,-2.485800,3.006608,-0.018925,0.004861,0.045287,-0.013236,0.029841,0.003788,0.031231,-0.036702
1,84,0.035741,0.324358,0.512017,0.478874,6.292857,0.001846,2.356978,27.534382,0.0,1.0,...,-2.485800,3.006608,-0.018925,0.004861,0.045287,-0.013236,0.029841,0.003788,0.031231,-0.036702
1,85,0.035741,0.324358,0.512017,0.478874,6.292857,0.001846,2.356978,27.534382,0.0,1.0,...,-2.485800,3.006608,-0.018925,0.004861,0.045287,-0.013236,0.029841,0.003788,0.031231,-0.036702
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197,83,0.015046,0.188538,1.108684,0.351408,8.820000,0.001038,0.872874,21.561247,1.0,0.0,...,-2.553578,1.697597,-0.038324,0.044537,0.018134,0.002093,0.012446,0.059289,0.074073,0.015064
197,84,0.015046,0.188538,1.108684,0.351408,8.820000,0.001038,0.872874,21.561247,1.0,0.0,...,-2.553578,1.697597,-0.038324,0.044537,0.018134,0.002093,0.012446,0.059289,0.074073,0.015064
197,85,0.015046,0.188538,1.108684,0.351408,8.820000,0.001038,0.872874,21.561247,1.0,0.0,...,-2.553578,1.697597,-0.038324,0.044537,0.018134,0.002093,0.012446,0.059289,0.074073,0.015064
197,86,0.015046,0.188538,1.108684,0.351408,8.820000,0.001038,0.872874,21.561247,1.0,0.0,...,-2.553578,1.697597,-0.038324,0.044537,0.018134,0.002093,0.012446,0.059289,0.074073,0.015064


In [149]:
# Criar variáveis transformadas: y_it - ȳ_i  e  X_it - X̄_i
vars_all = ["lcrmrte", "lprbarr", "lprbconv", "lprbpris", "lavgsen", "lpolpc"]

df_within = df[vars_all] - df_mean_expanded[vars_all]

In [150]:
# Variável dependente transformada
y_within = df_within["lcrmrte"]

# Regressoras transformadas
X_within = df_within[["lprbarr", "lprbconv", "lprbpris", "lavgsen", "lpolpc"]]

In [151]:
# Calcular coeficientes na "mão"

# Transpor matriz
X_within_transpose = np.transpose(X_within)


# Calcular os coeficientes estimados por OLS (X'X)^{-1}%*%(X'y) 
beta = np.dot(np.linalg.inv(np.dot(X_within_transpose, X_within)), np.dot(X_within_transpose, y_within))
beta

array([-0.38353664, -0.30597535, -0.19545129,  0.03566426,  0.41377085],
      dtype=float32)

In [152]:
# Verificar OLS com pacote
model_manual = sm.OLS(y_within, X_within).fit()

print(model_manual.summary())

                                 OLS Regression Results                                
Dep. Variable:                lcrmrte   R-squared (uncentered):                   0.359
Model:                            OLS   Adj. R-squared (uncentered):              0.354
Method:                 Least Squares   F-statistic:                              70.01
Date:                Thu, 27 Nov 2025   Prob (F-statistic):                    4.00e-58
Time:                        15:17:45   Log-Likelihood:                          366.25
No. Observations:                 630   AIC:                                     -722.5
Df Residuals:                     625   BIC:                                     -700.3
Df Model:                           5                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

A probabilidade de ser preso, a probabilidade de condenação e a probabilidade de receber uma sentença influenciam negativamente na probabilidade de se cometer um crime. 

O único coeficiente que não é significativo é a duração média da sentença. 

Todos os coeficientes podem ser entendidos como elasticidades. Aumento de 1% no regressor esta associado a uma variação de $\beta$% na taxa de criminalidade.

Aumentar a duração média das condeções não impacta significativamente a probabilidade de alguém cometer crimes. 

Para o primeiro coeficiente o aumento de 1% na probabilidade de ir preso, reduz em 0,38% a probabilidade de alguém cometer um crime.

## Diratamente utilizando um pacote

In [92]:
# Fixed Effects Estimator
reg = plm.PanelOLS.from_formula(
    formula = "lcrmrte ~ lprbarr + lprbconv + lprbpris + lavgsen + lpolpc + EntityEffects", data = df, drop_absorbed=True
    )

results = reg.fit()

In [94]:
# print regression table:
table = pd.DataFrame({"b" : round(results.params, 4),
                      "se" : round(results.std_errors, 4),
                      "t" : round(results.tstats, 4),
                      "pval": round(results.pvalues, 4)
                      })

print(f'table: \n{table}\n')

table: 
               b      se        t    pval
lprbarr  -0.3835  0.0335 -11.4601  0.0000
lprbconv -0.3060  0.0219 -13.9985  0.0000
lprbpris -0.1955  0.0334  -5.8582  0.0000
lavgsen   0.0357  0.0261   1.3652  0.1728
lpolpc    0.4138  0.0275  15.0633  0.0000



In [95]:
# Resultados mais completos
print(results.summary)

                          PanelOLS Estimation Summary                           
Dep. Variable:                lcrmrte   R-squared:                        0.3590
Estimator:                   PanelOLS   R-squared (Between):              0.7232
No. Observations:                 630   R-squared (Within):               0.3590
Date:                Thu, Nov 27 2025   R-squared (Overall):              0.7224
Time:                        15:06:18   Log-likelihood                    366.25
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      59.925
Entities:                          90   P-value                           0.0000
Avg Obs:                       7.0000   Distribution:                   F(5,535)
Min Obs:                       7.0000                                           
Max Obs:                       7.0000   F-statistic (robust):             59.925
                            

In [96]:
# coeficientes pontuais:
print("Coeficientes (FE):\n", results.params)
# erros padrão e p-valores:
print("\nErros padrão (cluster por county):\n", results.std_errors)
print("\nP-values:\n", results.pvalues)

Coeficientes (FE):
 lprbarr    -0.383537
lprbconv   -0.305976
lprbpris   -0.195451
lavgsen     0.035664
lpolpc      0.413771
Name: parameter, dtype: float64

Erros padrão (cluster por county):
 lprbarr     0.033467
lprbconv    0.021858
lprbpris    0.033364
lavgsen     0.026125
lpolpc      0.027469
Name: std_error, dtype: float64

P-values:
 lprbarr     0.000000e+00
lprbconv    0.000000e+00
lprbpris    8.180120e-09
lavgsen     1.727775e-01
lpolpc      0.000000e+00
Name: pvalue, dtype: float64


# Estimar efeitos aleatórios

Agora o modelo torna-se:

$$
y_{it} = X_{it}\beta + \alpha_i + \varepsilon_{it}
$$

$$
\hat{\beta}_{GLS} = \left ( \sum^{N}_{i=1} X_{it}' \hat{\Omega}_i^{-1} X_{it} \right )^{-1} \left ( \sum^{N}_{i=1} X_{it}' \hat{\Omega}_i^{-1} y_{i}\right)
$$


In [162]:
from linearmodels import RandomEffects
double = RandomEffects(df["lcrmrte"], df[X.columns]).fit()
print(double.summary)

                        RandomEffects Estimation Summary                        
Dep. Variable:                lcrmrte   R-squared:                        0.9372
Estimator:              RandomEffects   R-squared (Between):              0.9855
No. Observations:                 630   R-squared (Within):               0.2230
Date:                Thu, Nov 27 2025   R-squared (Overall):              0.9838
Time:                        15:23:16   Log-likelihood                    228.65
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      1866.1
Entities:                          90   P-value                           0.0000
Avg Obs:                       7.0000   Distribution:                   F(5,625)
Min Obs:                       7.0000                                           
Max Obs:                       7.0000   F-statistic (robust):             1866.1
                            