In [1]:
%load_ext autoreload
%autoreload 2
import numpy as np
from numpy import linalg as la
from scipy.stats import chi2
from tabulate import tabulate

#Suppress Future Warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# Import this weeks LinearModels.py file
import w3_LinearModels as lm

In [2]:
y, x, T, year, label_y, label_x = lm.load_example_data()

## Part 1: Compare POLS to FE/FD
### Question 1:

Start by estimating eq. (3) by POLS. You should already have all the data and code that you need, print it out in a nice table. Is the unionization coefficient statistically significant?

In [3]:
# First, regress y on x without any transformations. Store the resulting dictionary.
# Tip: If you want robust standard errors, you can add the argument robust_se=True to the estimate function.
pols_result = lm.estimate(y, x, T=T)

# Then, print the resulting dictionary using the provided print_table() function. The labels should have been provided to you.
lm.print_table((label_y, label_x), pols_result, title="Pooled OLS", floatfmt='.4f')

Pooled OLS
Dependent variable: Log deflated sales

               Beta      Se    t-values
-----------  ------  ------  ----------
Constant     0.0000  0.0050      0.0000
Log labour   0.6748  0.0102     66.4625
Log capital  0.3100  0.0091     33.9237
R² = 0.914
σ² = 0.131


In [4]:
pols_result_robust = lm.estimate(y, x, T=T, robust_se=True)
lm.print_table((label_y, label_x), pols_result_robust, title="Pooled OLS (Robust SE)", floatfmt='.4f')

Pooled OLS (Robust SE)
Dependent variable: Log deflated sales

               Beta      Se    t-values
-----------  ------  ------  ----------
Constant     0.0000  0.0161      0.0000
Log labour   0.6748  0.0366     18.4526
Log capital  0.3100  0.0324      9.5810
R² = 0.914
σ² = 0.131


### Short recap of fixed effects

As discussed last time, a solution to control for fixed effects, is to "demean" the data. We need to calculate the mean within each person, so we define  $\bar{y}_{i}=T^{-1}\sum_{t=1}^{T}y_{it}, \: \mathbf{\bar{x}}_{i}=T^{-1}\sum_{t=1}^{T}\mathbf{x}_{it}, \: \mathbf{\bar{u}}_{i}=T^{-1}\sum_{t=1}^{T}\mathbf{u}_{it}$, and $c_i=\bar{c}_{i} = T^{-1}\sum_{t=1}^{T}c_{i}$.

Subtracting these means from eq. (1) we are able to demean away the fixed effects,

$$
\begin{align}
y_{it}-\bar{y}_{i} & =\left(\mathbf{x}_{it}-\mathbf{\bar{x}}_{i}\right)\mathbf{\beta}+(\textcolor{red}{c_{i}-c_{i}} )+\left(u_{it}-\bar{u}_{i}\right) \notag \\
\Leftrightarrow\ddot{y}_{it} & =\ddot{\mathbf{x}}_{it}\mathbf{\beta} + \ddot{u}_{it}. \tag{4}
\end{align}
$$
Subtracting the mean within each person is not immediately easy. But you are provided with a `perm` function, that takes a "transformation matrix" Q, and uses it to permutate some vector or matrix A.

In order to demean the data, we need to give this `perm` function the following transformation matrix:

$$
\mathbf{Q}_{T}:=\mathbf{I}_{T}-\left(\begin{array}{ccc}
1/T & \ldots & 1/T\\
\vdots & \ddots & \vdots\\
1/T & \ldots & 1/T
\end{array}\right)_{T\times T}.
$$

### Question 2:
Estimate eq. (3) by fixed effects. You need to perform the following steps:
* Create the demeaning matrix Q.
* Demean x and y using the `perm` function and Q.
* Remove the columns in the demeaned x that are only zeroes and shorten the `label_x`. A function that does this is provided.
* Estimate y on x using the demeaned arrays.
* Print it out in a nice table.

In [5]:
def remove_zero_columns(x, label_x, tol=1e-10):
    """
    Drop columns that are (numerically) all zeros and keep their labels aligned.

    Args:
        x: regressor matrix.
        label_x: list of column labels.
        tol: tolerance for treating entries as zero.

    Returns:
        Filtered matrix and matching labels.
    """
    mask = ~np.all(np.isclose(x, 0.0, atol=tol), axis=0)
    x_nonzero = x[:, mask]
    label_nonzero = [lbl for lbl, keep in zip(label_x, mask) if keep]
    return x_nonzero, label_nonzero


In [6]:
# Transform the data
Q_T = np.eye(T) - 1/T * np.ones((T, T))
y_dot = lm.perm(Q_T, y)
x_dot = lm.perm(Q_T, x)

# Remove the columns that are only zeroes
x_dot, label_x_dot = remove_zero_columns(x_dot, label_x)

# Estimate 
fe_result = lm.estimate(y_dot, x_dot, transform='fe', T=T, robust_se=True)
lm.print_table((label_y, label_x_dot), fe_result, title="Fixed Effects", floatfmt='.4f')

Fixed Effects
Dependent variable: Log deflated sales

               Beta      Se    t-values
-----------  ------  ------  ----------
Log labour   0.6942  0.0417     16.6674
Log capital  0.1546  0.0299      5.1630
R² = 0.477
σ² = 0.018


### Question 3:
Estimate eq. (3) by first differences. You need to perform the following steps:
* Create the first difference matrix D.
* First difference x and y using the `perm` function and Q.
* Remove the columns in the first differenced x that are only zeroes and shorten the `label_x`.
* Estimate y on x using the first differenced arrays.
* Print it out in a nice table.

In [7]:
# Transform the data
D_T = - np.eye(T-1, T) + np.eye(T-1, T, k=1)
y_diff = lm.perm(D_T, y)
x_diff = lm.perm(D_T, x)

# Remove the columns that are only zeroes
x_diff, label_x_diff = remove_zero_columns(x_diff, label_x)

# Estimate 
fd_result = lm.estimate(y_diff, x_diff, transform='fd', T=T-1, robust_se=True)
lm.print_table((label_y, label_x_diff), fd_result, title="First Difference", floatfmt='.4f')

First Difference
Dependent variable: Log deflated sales

               Beta      Se    t-values
-----------  ------  ------  ----------
Log labour   0.5487  0.0292     18.8191
Log capital  0.0630  0.0232      2.7097
R² = 0.165
σ² = 0.014


# Part 2: The random effects (RE) estimator.
In part 1 we used two methods to remove unobserved heterogeneity from each person. Now, what if $E[\text{union}_{it} c_i] = 0$? Then POLS is consistent, but not efficient, since POLS is not using the panel structure of the data. We can therefore do better with the RE estimator.

## A short introduction to the RE estimator
With the FE and FD estimators, we estimate them by OLS, but by first transforming them in a specific way. We can do the same for RE, but our mission is no longer to transform away the fixed effects, but rather to estimate the following model,

$$
\check{y}_{it} = \check{\mathbf{x}}_{it}\boldsymbol{\beta} + \check{v}_{it}
$$

$\check{y}_{it} = y_{it} - \hat{\lambda}\bar{y}_{it}$, $\check{\mathbf{x}}_{it} = \mathbf{x}_{it} - \hat{\lambda}\overline{\mathbf{x}}_{it}$, and $\check{v}_{it} = v_{it} - \hat{\lambda}\bar{v}_{it}$, where we have gathered the errors $v_{it} = c_i + u_{it}$. We are *"quasi-demeaning"* the variables, by premultiplying the means by $\hat{\lambda}$ (see Wooldridge p. 326-328).

Our challenge is thus to estimate this $\lambda$, which we can construct in the following way:

$$
\hat{\lambda} = 1 - \sqrt{\frac{\widehat{\sigma}_{u}^{2}}{\widehat{\sigma}_{u}^{2} + T\widehat{\sigma}_{c}^{2}}}
$$

where $\widehat{\sigma}_{u}^{2}$ can be estimated from the fixed effects regression, and $\hat{\sigma}_{c}^{2}$ can be constructed as  $\hat{\sigma}_{c}^{2} = \hat{\sigma}_{w}^{2} - \frac{1}{T}\hat{\sigma}_{u}^{2}$. Here $\hat{\sigma}_{w}^{2}$ is the error variance from the between estimator, 

$$
\hat{\sigma}_{w}^{2} = \frac{1}{N-K}\left(\overline{\mathbf{y}} - \overline{\mathbf{X}}\hat{\boldsymbol{\beta}}_{BE}\right)^{\prime}\left(\overline{\mathbf{y}} - \overline{\mathbf{X}}\hat{\boldsymbol{\beta}}_{BE}\right),
$$

where $\boldsymbol{\beta}_{BE}$ are the between estimater coefficients. The between-groups estimator is not something we have introduced before, but is attained by regressing the time-averaged outcomes $\overline{y}_i$ on the time-averaged regressors $\overline{\mathbf{x}}_i,i=1,2,\dotsc,N$.

*Note:* There are other procedures for estimating the variances. See Wooldridge p. 294-296 for more details.


### Question 1: The Between Estimator
Estimate the between groups model, which is simply the average within each individual,

$$
\bar{y}_{i} = \boldsymbol{\bar{x}}_{i}\boldsymbol{\beta} + c_i + \bar{u}_{i}.
$$

So instead of demeaning, like we did in FE, we just calculate the mean with the following transformation *vector* $\mathbf{P}_T$,

\begin{equation} 
\mathbf{P}_T \equiv \left( \frac{1}{T}, \frac{1}{T}, ..., \frac{1}{T} \right)_{1 \times T}  \notag
\end{equation}

In order to estimate eq. (3) with the between estimator. You need to perform the following steps:
* Create the mean vector `P`.
* mean `x` and `y` using the `perm` function and `P`.
* Regress `y_mean` on `x_mean`. Note that there are $N$ rows in each, not $NT$. 
* Print it out in a nice table.

In [8]:
# Transform the data
P_T = np.ones((1,T)) * 1/T
y_mean = lm.perm(P_T, y)
x_mean = lm.perm(P_T, x)

# Estimate 
be_result = lm.estimate(y_mean, x_mean, transform='be', T=T)
lm.print_table((label_y, label_x), be_result, title="Between Estimator", floatfmt='.4f')

Between Estimator
Dependent variable: Log deflated sales

               Beta      Se    t-values
-----------  ------  ------  ----------
Constant     0.0000  0.0161      0.0000
Log labour   0.6672  0.0343     19.4572
Log capital  0.3188  0.0309     10.3230
R² = 0.923
σ² = 0.115


### Question 2
You should now have all the error variances that you need to calculate

$$\hat{\lambda} = 1 - \sqrt{\frac{\widehat{\sigma}_{u}^{2}}{(\widehat{\sigma}_{u}^{2} + T\widehat{\sigma}_{c}^{2})}}. $$

In [9]:
# Calculate lambda (note lambda is a reserved keyword in Python, so we use _lambda instead)
sigma2_u = fe_result['sigma2']
sigma2_w = be_result['sigma2']
sigma2_c = sigma2_w - 1/T * sigma2_u
_lambda = 1 - np.sqrt(sigma2_u / (sigma2_u + T*sigma2_c))

# Print lambda 
print(f'Lambda is approximately equal to {_lambda.item():.4f}.')

Lambda is approximately equal to 0.8873.


### Question 3
Now we are finally ready to estimate eq. (3) with random effects. Since we have to use $\hat{\lambda}$ to quasi-demean within each individual, we again use the `perm` function. This time, we pass it the following transformation matrix,

$$
\mathbf{C}_{T}:=\mathbf{I}_{T} - \hat{\lambda}\mathbf{P}_{T},
$$

where $\mathbf{P}_{T}$ is the $1 \times T$ transformation vector we used earlier to calculate the mean of each person.

In [10]:
# Transform the data
P_T_full = np.ones((T, T)) / T
C_T = np.eye(T) - _lambda * P_T_full
y_re = lm.perm(C_T, y)
x_re = lm.perm(C_T, x)

# Estimate 
re_result = lm.estimate(y_re, x_re, transform='re', T=T, robust_se=True)
lm.print_table((label_y, label_x), re_result, title="Random Effects", floatfmt='.4f')

Random Effects
Dependent variable: Log deflated sales

               Beta      Se    t-values
-----------  ------  ------  ----------
Constant     0.0000  0.0168      0.0000
Log labour   0.7197  0.0335     21.4637
Log capital  0.1989  0.0261      7.6174
R² = 0.642
σ² = 0.018


In [11]:
# Constant-returns-to-scale Wald tests for FE, FD, and RE

def crs_wald(result, skip=0):
    R = np.array([[1.0, 1.0]])
    q = np.array([[1.0]])
    b = result['b_hat'][skip:, :]
    cov = result['cov'][skip:, skip:]
    diff = R @ b - q
    var_rb = R @ cov @ R.T
    stat = float(diff.T @ la.inv(var_rb) @ diff)
    crit = chi2.ppf(0.95, 1)
    pval = 1 - chi2.cdf(stat, 1)
    return stat, crit, pval

W_fe, crit_fe, p_fe = crs_wald(fe_result, skip=0)
print(f'CRS Wald test (FE): {W_fe:.4f}')
print(f'Critical value (5%): {crit_fe:.4f}')
print(f'p-value: {p_fe:.4f}')

W_fd, crit_fd, p_fd = crs_wald(fd_result, skip=0)
print(f'CRS Wald test (FD): {W_fd:.4f}')
print(f'Critical value (5%): {crit_fd:.4f}')
print(f'p-value: {p_fd:.4f}')

W_re, crit_re, p_re = crs_wald(re_result, skip=1)
print(f'CRS Wald test (RE): {W_re:.4f}')
print(f'Critical value (5%): {crit_re:.4f}')
print(f'p-value: {p_re:.4f}')


CRS Wald test (FE): 19.4029
Critical value (5%): 3.8415
p-value: 0.0000
CRS Wald test (FD): 150.0280
Critical value (5%): 3.8415
p-value: 0.0000
CRS Wald test (RE): 18.6793
Critical value (5%): 3.8415
p-value: 0.0000


  stat = float(diff.T @ la.inv(var_rb) @ diff)


In [12]:
# Strict exogeneity test for FE using a lead of log labour
F_T = np.eye(T, k=1)[:-1]
labour_lead = lm.perm(F_T, x[:, 1].reshape(-1, 1))

I_T = np.eye(T)[:-1]
x_exo = lm.perm(I_T, x)
y_exo = lm.perm(I_T, y)

x_exo = np.hstack((x_exo, labour_lead))

Q_T_exo = np.eye(T - 1) - 1/(T - 1) * np.ones((T - 1, T - 1))
y_exo_w = lm.perm(Q_T_exo, y_exo)
x_exo_w = lm.perm(Q_T_exo, x_exo)

labels_exo = label_x + ['Lead log labour']
x_exo_w, labels_exo = remove_zero_columns(x_exo_w, labels_exo)

fe_exo_result = lm.estimate(y_exo_w, x_exo_w, transform='fe', T=T-1, robust_se=True)
lm.print_table((label_y, labels_exo), fe_exo_result, title='Strict Exogeneity Test (FE)', floatfmt='.4f')

lead_beta = fe_exo_result['b_hat'][-1, 0]
lead_se = fe_exo_result['se'][-1, 0]
wald_lead = (lead_beta / lead_se) ** 2
crit_lead = chi2.ppf(0.95, 1)
p_lead = 1 - chi2.cdf(wald_lead, 1)
print(f'Wald test H0: lead coefficient = 0 -> {wald_lead:.4f} (crit 5% = {crit_lead:.4f}, p = {p_lead:.4f})')

if p_lead < 0.05:
    print('-> Reject H0: lead term is zero (evidence against strict exogeneity).')
else:
    print('-> Do NOT reject H0: no evidence against strict exogeneity in FE panel.')


Strict Exogeneity Test (FE)
Dependent variable: Log deflated sales

                   Beta      Se    t-values
---------------  ------  ------  ----------
Log labour       0.5681  0.0397     14.3113
Log capital      0.1495  0.0291      5.1287
Lead log labour  0.1532  0.0281      5.4442
R² = 0.473
σ² = 0.016
Wald test H0: lead coefficient = 0 -> 29.6395 (crit 5% = 3.8415, p = 0.0000)
-> Reject H0: lead term is zero (evidence against strict exogeneity).


In [13]:
# Sequential exogeneity test in first differences (lead labour)
import pandas as pd
raw = pd.read_csv('firms.csv').sort_values(['firmid', 'year'])
N_seq = raw['firmid'].nunique()
T_seq = raw['year'].nunique()

y_levels = np.exp(raw['ldsa'].values).reshape(-1, 1)
x_levels = np.column_stack([
    np.ones(raw.shape[0]),
    np.exp(raw['lcap'].values),  # capital
    np.exp(raw['lemp'].values),  # labour
])

lm.fd_exogeneity_lead_test(
    y_levels,
    x_levels,
    N_seq,
    T_seq,
    cap_col=1,
    emp_col=2,
    logs=True,
    drop_zeros=True,
)


FD Exogeneity Test (lead variable)
Model: Delta log y_it = b1 Delta log K_it + b2 Delta log L_it + b3 Delta log L_{i,t+1} + Delta u_it

              Variable        Beta          SE         t
                 const     -0.0000      0.0012     -0.00
     Delta log capital      0.0577      0.0236      2.45
      Delta log labour      0.5304      0.0287     18.49
 Lead Delta log labour      0.1289      0.0236      5.47

Test H0: Lead term = 0 -> t = 5.47, p = 0.0000 (cluster df = 440)
-> Reject H0: lead term differs from zero (sequential exogeneity fails).


#### Question 4: Comparing FE and RE
Use the results from the FE and RE estimations to compute the Hausman test statistics in eq. (7).

* Start by calculating the differences in the FE and RE coefficients $\hat{\boldsymbol{\beta}}_{FE} - \hat{\boldsymbol{\beta}}_{RE}$ (remember to remove the time invariant variables from RE)
* Then calculate the differences in the covariances $\widehat{\mathrm{avar}}(\hat{\boldsymbol{\beta}}_{FE}) - \widehat{\mathrm{avar}}(\hat{\boldsymbol{\beta}}_{RE})$ (again, remember to remove the time invariant variables for RE estimates)
* You now have all the components to compute the Hausman test statistics in eq. (7)

In [14]:
# Hausman test using homoskedastic covariance matrices
fe_nr = lm.estimate(y_dot, x_dot, transform='fe', T=T, robust_se=False)
re_nr = lm.estimate(y_re, x_re, transform='re', T=T, robust_se=False)

# Use only the time-varying regressors
b_fe = fe_nr['b_hat']
b_re = re_nr['b_hat'][1:, :]
cov_fe = fe_nr['cov']
cov_re = re_nr['cov'][1:, 1:]

# Calculate the test statistic
b_diff = b_fe - b_re
cov_diff = cov_fe - cov_re
H = b_diff.T @ la.inv(cov_diff) @ b_diff

# 5% chi-square critical value with M degrees of freedom
M = len(b_diff)
crit_val = chi2.ppf(0.95, M)
p_val = 1 - chi2.cdf(H.item(), M)

print(f"Hausman test statistic: {H.item():.2f}")
print(f"Critical value (5%): {crit_val:.2f}")
print(f"p-value: {p_val:.8f}")
if p_val < 0.05:
    print('-> Reject H0: FE and RE differ (Hausman favors FE).')
else:
    print('-> Do NOT reject H0: no evidence against RE consistency (Hausman).')


Hausman test statistic: 73.54
Critical value (5%): 5.99
p-value: 0.00000000
-> Reject H0: FE and RE differ (Hausman favors FE).


Which assumption is tested by the Hausman test? What is the null hypothesis? Does the Hausman test you have conducted rely on any other assumptions (See Wooldridge, p. 328-331)? Based on your test result, which estimator would you use to estimate eq. (3)? Why?