In [1]:
%load_ext autoreload
%autoreload 2
import numpy as np
from numpy import linalg as la
from scipy.stats import chi2
from tabulate import tabulate

#Suppress Future Warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# Import this weeks LinearModels.py file
import w3_LinearModels as lm

In [2]:
y, x, T, year, label_y, label_x = lm.load_example_data()

## Part 1: Compare POLS to FE/FD
### Question 1:

Start by estimating eq. (3) by POLS. You should already have all the data and code that you need, print it out in a nice table. Is the unionization coefficient statistically significant?

In [3]:
# First, regress y on x without any transformations. Store the resulting dictionary.
# Tip: If you want robust standard errors, you can add the argument robust_se=True to the estimate function.
pols_result = lm.estimate(y, x, T=T)

# Then, print the resulting dictionary using the provided print_table() function. The labels should have been provided to you.
lm.print_table((label_y, label_x), pols_result, title="Pooled OLS", floatfmt='.4f')

Pooled OLS
Dependent variable: lcap

             Beta      Se    t-values
--------  -------  ------  ----------
Constant  -0.0000  0.0068     -0.0000
lemp       0.4411  0.0177     24.8605
ldsa       0.5764  0.0170     33.9237
R² = 0.859
σ² = 0.243


You should get a table that look like this:
Pooled OLS <br>
Dependent variable: Log wage <br>

|                | Beta    | SE             | t-val           |
| -------------- | ------- | ------------------- | -------------------- |
| Constant       | -0.0347 | 0.0646 <br>(0.1199) | -0.5375 <br>(-0.2895)|
| Experience     | 0.0892  | 0.0101 <br>(0.0124) | 8.8200 <br>(7.1793)  |
| Experience sqr | -0.0028 | 0.0007 <br>(0.0009) | -4.0272 <br>(-3.2777)|
| Union          | 0.1801  | 0.0171 <br>(0.0275) | 10.5179 <br>(6.5403) |
| Married        | 0.1077  | 0.0157 <br>(0.0260) | 6.8592 <br>(4.1352)  |
| Education      | 0.0994  | 0.0047 <br>(0.0092) | 21.2476 <br>(10.8119)|
| Hispanic       | 0.0157  | 0.0208 <br>(0.0391) | 0.7543 <br>(0.4012)  |
| Black          | -0.1438 | 0.0236 <br>(0.0500) | -6.1055 <br>(-2.8754)|


R² = 0.187 <br>
σ² = 0.231

**Note:** Standard errors and robust standard errors are calculated in separate function calls depending on the value of `robust_se`.  
- If you call the estimate function with `robust_se=False` (default), you get standard errors.
- If you call the estimate function with `robust_se=True`, you get robust standard errors. (In paranthesis Above)

In [4]:
pols_result_robust = lm.estimate(y, x, T=T, robust_se=True)
lm.print_table((label_y, label_x), pols_result_robust, title="Pooled OLS (Robust SE)", floatfmt='.4f')

Pooled OLS (Robust SE)
Dependent variable: lcap

             Beta      Se    t-values
--------  -------  ------  ----------
Constant  -0.0000  0.0223     -0.0000
lemp       0.4411  0.0700      6.3041
ldsa       0.5764  0.0665      8.6649
R² = 0.859
σ² = 0.243


### Short recap of fixed effects

As discussed last time, a solution to control for fixed effects, is to "demean" the data. We need to calculate the mean within each person, so we define  $\bar{y}_{i}=T^{-1}\sum_{t=1}^{T}y_{it}, \: \mathbf{\bar{x}}_{i}=T^{-1}\sum_{t=1}^{T}\mathbf{x}_{it}, \: \mathbf{\bar{u}}_{i}=T^{-1}\sum_{t=1}^{T}\mathbf{u}_{it}$, and $c_i=\bar{c}_{i} = T^{-1}\sum_{t=1}^{T}c_{i}$.

Subtracting these means from eq. (1) we are able to demean away the fixed effects,

$$
\begin{align}
y_{it}-\bar{y}_{i} & =\left(\mathbf{x}_{it}-\mathbf{\bar{x}}_{i}\right)\mathbf{\beta}+(\textcolor{red}{c_{i}-c_{i}} )+\left(u_{it}-\bar{u}_{i}\right) \notag \\
\Leftrightarrow\ddot{y}_{it} & =\ddot{\mathbf{x}}_{it}\mathbf{\beta} + \ddot{u}_{it}. \tag{4}
\end{align}
$$
Subtracting the mean within each person is not immediately easy. But you are provided with a `perm` function, that takes a "transformation matrix" Q, and uses it to permutate some vector or matrix A.

In order to demean the data, we need to give this `perm` function the following transformation matrix:

$$
\mathbf{Q}_{T}:=\mathbf{I}_{T}-\left(\begin{array}{ccc}
1/T & \ldots & 1/T\\
\vdots & \ddots & \vdots\\
1/T & \ldots & 1/T
\end{array}\right)_{T\times T}.
$$

### Question 2:
Estimate eq. (3) by fixed effects. You need to perform the following steps:
* Create the demeaning matrix Q.
* Demean x and y using the `perm` function and Q.
* Remove the columns in the demeaned x that are only zeroes and shorten the `label_x`. A function that does this is provided.
* Estimate y on x using the demeaned arrays.
* Print it out in a nice table.

In [5]:
def remove_zero_columns(x, label_x):
    """
    The function removes columns from a matrix that are all zeros and returns the updated matrix and
    corresponding labels.
    
    Args:
      x: The parameter `x` is a numpy array representing a matrix with columns that may contain zeros.
      label_x: The parameter `label_x` is a list that contains the labels for each column in the input
    array `x`.
    
    Returns:
      x_nonzero: numpy array of x with columns that are all zeros removed.
      label_nonzero: list of labels for each column in x_nonzero.
    """
    
    # Find the columns that are not all zeros
    nonzero_cols = ~np.all(x == 0, axis=0)
    
    # Remove the columns that are all zeros
    x_nonzero = x[:, nonzero_cols]
    
    # Get the labels for the columns that are not all zeros
    label_nonzero = [label_x[i] for i in range(len(label_x)) if nonzero_cols[i]]
    return x_nonzero, label_nonzero

In [6]:
# Transform the data
Q_T = np.eye(T) - 1/T * np.ones((T, T))
y_dot = lm.perm(Q_T, y)
x_dot = lm.perm(Q_T, x)

# Remove the columns that are only zeroes
x_dot, label_x_dot = remove_zero_columns(x_dot, label_x)

# Estimate 
fe_result = lm.estimate(y_dot, x_dot, transform='fe', T=T, )
lm.print_table((label_y, label_x_dot), fe_result, title="Fixed Effects", floatfmt='.4f')

Fixed Effects
Dependent variable: lcap

                    Beta                   Se    t-values
--------  --------------  -------------------  ----------
Constant  -12462863.7549  29169265762313.5664     -0.0000
lemp              0.4934               0.0181     27.3258
ldsa              0.1845               0.0155     11.9299
R² = 0.338
σ² = 0.021


You should get a table that looks like this.

FE regression<br> Dependent variable: Log wage

|                | Beta    | SE                  | t                    |
| -------------- | ------- | ------------------- | -------------------- |
| Experience     | 0.1168  | 0.0084 <br>(0.0107) | 13.8778 <br>(10.9221)|
| Experience sqr | -0.0043 | 0.0006 <br>(0.0007) | -7.1057 <br>(-6.2773)|
| Union          | 0.0821  | 0.0193 <br>(0.0228) | 4.2553 <br>(3.6011)  |
| Married        | 0.0453  | 0.0183 <br>(0.0210) | 2.4743 <br>(2.1598)  |
R² = 0.178 <br>
σ² = 0.123

## Short recap of first differences

The within transformation is one particular transformation
that enables us to get rid of $c_{i}$. An alternative is the first-difference transformation. To see how it works, lag equation (1) one period and subtract it from (1) such that

\begin{equation}
\Delta y_{it}=\Delta\mathbf{x}_{it}\mathbf{\beta}+\Delta u_{it},\quad t=\color{red}{2},\dotsc,T, \tag{5}
\end{equation}

where $\Delta y_{it}:=y_{it}-y_{it-1}$, $\Delta\mathbf{x}_{it}:=\mathbf{x}_{it}-\mathbf{x}_{it-1}$ and $\Delta u_{it}:=u_{it}-u_{it-1}$. As was the case for the within transformation, first differencing eliminates the time invariant component $c_{i}$. Note, however, that one time period is lost when differencing.

In order to first difference the data, we can pass the following transformation matrix to the `perm` function,

$$
\mathbf{D}:=\left(\begin{array}{cccccc}
-1 & 1 & 0 & \ldots & 0 & 0\\
0 & -1 & 1 &  & 0 & 0\\
\vdots &  &  & \ddots &  & \vdots\\
0 & 0 & 0 & \ldots & -1 & 1
\end{array}\right)_{T - 1\times T}.
$$

### Question 3:
Estimate eq. (3) by first differences. You need to perform the following steps:
* Create the first difference matrix D.
* First difference x and y using the `perm` function and Q.
* Remove the columns in the first differenced x that are only zeroes and shorten the `label_x`.
* Estimate y on x using the first differenced arrays.
* Print it out in a nice table.

In [7]:
# Transform the data
D_T = - np.eye(T-1, T) + np.eye(T-1, T, k=1)
y_diff = lm.perm(D_T, y)
x_diff = lm.perm(D_T, x)

# Remove the columns that are only zeroes
x_diff, label_x_diff = remove_zero_columns(x_diff, label_x)

# Estimate 
fd_result = lm.estimate(y_diff, x_diff, transform='fd', T=T-1)
lm.print_table((label_y, label_x_diff), fd_result, title="First Difference", floatfmt='.4f')

First Difference
Dependent variable: lcap

        Beta      Se    t-values
----  ------  ------  ----------
lemp  0.1167  0.0149      7.8236
ldsa  0.0357  0.0108      3.3043
R² = 0.022
σ² = 0.008


You should get a table that look like this:

FD regression <br>
Dependent variable: Log wage

|                | Beta    | SE             | t-val           |
| -------------- | ------- | ------------------- | -------------------- |
| Experience     | 0.1158  | 0.0196 <br>(0.0144) | 5.9096 <br>(8.0492)  |
| Experience sqr | -0.0039 | 0.0014 <br>(0.0009) | -2.8005 <br>(-4.1233)|
| Union          | 0.0428  | 0.0197 <br>(0.0220) | 2.1767 <br>(1.9469)  |
| Married        | 0.0381  | 0.0229 <br>(0.0242) | 1.6633 <br>(1.5755)  |

R² = 0.004 <br>
σ² = 0.196

## Summing up Part 1: questions 1, 2, and 3.
Compare the results from your POLS, FE and FD estimations. We were mainly interested in the effect of $\textit{union}$ on wages, did the POLS estimation give a correct conclusion on this? Is the effect greater or lower than we first thought? Is the effect still statistically significant?

# Part 2: The random effects (RE) estimator.
In part 1 we used two methods to remove unobserved heterogeneity from each person. Now, what if $E[\text{union}_{it} c_i] = 0$? Then POLS is consistent, but not efficient, since POLS is not using the panel structure of the data. We can therefore do better with the RE estimator.

## A short introduction to the RE estimator
With the FE and FD estimators, we estimate them by OLS, but by first transforming them in a specific way. We can do the same for RE, but our mission is no longer to transform away the fixed effects, but rather to estimate the following model,

$$
\check{y}_{it} = \check{\mathbf{x}}_{it}\boldsymbol{\beta} + \check{v}_{it}
$$

$\check{y}_{it} = y_{it} - \hat{\lambda}\bar{y}_{it}$, $\check{\mathbf{x}}_{it} = \mathbf{x}_{it} - \hat{\lambda}\overline{\mathbf{x}}_{it}$, and $\check{v}_{it} = v_{it} - \hat{\lambda}\bar{v}_{it}$, where we have gathered the errors $v_{it} = c_i + u_{it}$. We are *"quasi-demeaning"* the variables, by premultiplying the means by $\hat{\lambda}$ (see Wooldridge p. 326-328).

Our challenge is thus to estimate this $\lambda$, which we can construct in the following way:

$$
\hat{\lambda} = 1 - \sqrt{\frac{\widehat{\sigma}_{u}^{2}}{\widehat{\sigma}_{u}^{2} + T\widehat{\sigma}_{c}^{2}}}
$$

where $\widehat{\sigma}_{u}^{2}$ can be estimated from the fixed effects regression, and $\hat{\sigma}_{c}^{2}$ can be constructed as  $\hat{\sigma}_{c}^{2} = \hat{\sigma}_{w}^{2} - \frac{1}{T}\hat{\sigma}_{u}^{2}$. Here $\hat{\sigma}_{w}^{2}$ is the error variance from the between estimator, 

$$
\hat{\sigma}_{w}^{2} = \frac{1}{N-K}\left(\overline{\mathbf{y}} - \overline{\mathbf{X}}\hat{\boldsymbol{\beta}}_{BE}\right)^{\prime}\left(\overline{\mathbf{y}} - \overline{\mathbf{X}}\hat{\boldsymbol{\beta}}_{BE}\right),
$$

where $\boldsymbol{\beta}_{BE}$ are the between estimater coefficients. The between-groups estimator is not something we have introduced before, but is attained by regressing the time-averaged outcomes $\overline{y}_i$ on the time-averaged regressors $\overline{\mathbf{x}}_i,i=1,2,\dotsc,N$.

*Note:* There are other procedures for estimating the variances. See Wooldridge p. 294-296 for more details.


### Question 1: The Between Estimator
Estimate the between groups model, which is simply the average within each individual,

$$
\bar{y}_{i} = \boldsymbol{\bar{x}}_{i}\boldsymbol{\beta} + c_i + \bar{u}_{i}.
$$

So instead of demeaning, like we did in FE, we just calculate the mean with the following transformation *vector* $\mathbf{P}_T$,

\begin{equation} 
\mathbf{P}_T \equiv \left( \frac{1}{T}, \frac{1}{T}, ..., \frac{1}{T} \right)_{1 \times T}  \notag
\end{equation}

In order to estimate eq. (3) with the between estimator. You need to perform the following steps:
* Create the mean vector `P`.
* mean `x` and `y` using the `perm` function and `P`.
* Regress `y_mean` on `x_mean`. Note that there are $N$ rows in each, not $NT$. 
* Print it out in a nice table.

In [8]:
# Transform the data
P_T = np.ones((1,T)) * 1/T
y_mean = lm.perm(P_T, y)
x_mean = lm.perm(P_T, x)

# Estimate 
be_result = lm.estimate(y_mean, x_mean, transform='be', T=T)
lm.print_table((label_y, label_x), be_result, title="Between Estimator", floatfmt='.4f')

Between Estimator
Dependent variable: lcap

             Beta      Se    t-values
--------  -------  ------  ----------
Constant  -0.0000  0.0224     -0.0000
lemp       0.4082  0.0620      6.5862
ldsa       0.6139  0.0595     10.3230
R² = 0.870
σ² = 0.221


You should get a table that looks like this:

BE <br>
Dependent variable: Log wage

|                |   Beta |     Se |   t-values |
|----------------|--------|--------|------------|
| Constant        |  0.4923 | 0.2210 |  2.23 | 
| Experience      | -0.0504 | 0.0503 | -1.00 | 
| Experience sqr  |  0.0051 | 0.0032 |  1.60 | 
| Union           |  0.2707 | 0.0466 |  5.81 | 
| Married         |  0.1437 | 0.0412 |  3.49 | 
| Education       |  0.0946 | 0.0109 |  8.68 | 
| Hispanic        |  0.0048 | 0.0427 |  0.11 | 
| Black           | -0.1388 | 0.0489 | -2.84 | 
R² = 0.219 <br>
σ² = 0.121

### Question 2
You should now have all the error variances that you need to calculate

$$\hat{\lambda} = 1 - \sqrt{\frac{\widehat{\sigma}_{u}^{2}}{(\widehat{\sigma}_{u}^{2} + T\widehat{\sigma}_{c}^{2})}}. $$

In [9]:
# Calculate lambda (note lambda is a reserved keyword in Python, so we use _lambda instead)
sigma2_u = fe_result['sigma2']
sigma2_w = be_result['sigma2']
sigma2_c = sigma2_w - 1/T * sigma2_u
_lambda = 1 - np.sqrt(sigma2_u / (sigma2_u + T*sigma2_c))

# Print lambda 
print(f'Lambda is approximately equal to {_lambda.item():.4f}.')

Lambda is approximately equal to 0.9113.


### Question 3
Now we are finally ready to estimate eq. (3) with random effects. Since we have to use $\hat{\lambda}$ to quasi-demean within each individual, we again use the `perm` function. This time, we pass it the following transformation matrix,

$$
\mathbf{C}_{T}:=\mathbf{I}_{T} - \hat{\lambda}\mathbf{P}_{T},
$$

where $\mathbf{P}_{T}$ is the $1 \times T$ transformation vector we used earlier to calculate the mean of each person.

In [10]:
# Transform the data
C_T = - np.eye(T, T) + _lambda * P_T
y_re = lm.perm(C_T, y)
x_re = lm.perm(C_T, x)

# Estimate 
re_result = lm.estimate(y_re, x_re, transform='re', T=T)
lm.print_table((label_y, label_x), re_result, title="Random Effects", floatfmt='.4f')

Random Effects
Dependent variable: lcap

             Beta      Se    t-values
--------  -------  ------  ----------
Constant  -0.0000  0.0229     -0.0000
lemp       0.5573  0.0171     32.5074
ldsa       0.2387  0.0152     15.7111
R² = 0.482
σ² = 0.022


The table should look like this:

RE <br>
Dependent variable: Log wage

|                | Beta    | SE             | t-val           |
| -------------- | ------- | ------------------- | -------------------- |
| Constant       | -0.1075 | 0.1107 <br>(0.1150) | -0.9707 <br>(-0.9348)|
| Experience     | 0.1121  | 0.0083 <br>(0.0105) | 13.5724 <br>(10.6483)|
| Experience sqr | -0.0041 | 0.0006 <br>(0.0007) | -6.8751 <br>(-6.0410)|
| Union          | 0.1074  | 0.0178 <br>(0.0209) | 6.0224 <br>(5.1462)  |
| Married        | 0.0628  | 0.0168 <br>(0.0190) | 3.7439 <br>(3.3112)  |
| Education      | 0.1012  | 0.0089 <br>(0.0089) | 11.3666 <br>(11.4057)|
| Hispanic       | 0.0202  | 0.0426 <br>(0.0399) | 0.4730 <br>(0.5055)  |
| Black          | -0.1441 | 0.0476 <br>(0.0503) | -3.0270 <br>(-2.8679)|

R² = 0.178 <br>
σ² = 0.124 <br>
λ = 0.6426

## Short introduction to Hausman test

It is evident from the previous question that RE has the advantage over FE that time-invariant variables are not demeaned away. But if $E[c_{i}\boldsymbol{x}_{it}] \neq \boldsymbol{0}$, then the RE estimator is inconsistent, where the FE estimator is consistent (but inefficient), assuming strict exogeneity.

We can use the results from the FE and RE estimations to test whether RE is consistent, by calculating the following test statistics,

$$
H := (\hat{\boldsymbol{\beta}}_{FE} - \hat{\boldsymbol{\beta}}_{RE})'[\widehat{\mathrm{avar}}(\hat{\boldsymbol{\beta}}_{FE}) - \widehat{\mathrm{avar}}(\hat{\mathbf{\beta}}_{RE})]^{-1}(\hat{\boldsymbol{\beta}}_{FE}-\hat{\boldsymbol{\beta}}_{RE})\overset{d}{\to}\chi_{M}^{2}, \tag{7}
$$
where M is the number of time-variant variables included in the test.

*Note 1*: The vector $\hat{\boldsymbol{\beta}}_{RE}$ excludes time invariant variables as these are not present in $\hat{\boldsymbol{\beta}}_{FE}$. <br>
*Note 2:* $\widehat{\mathrm{avar}}(\hat{\boldsymbol{\beta}}_{RE})$ means the RE covariance (but again, we only keep the rows and columns for time-variant variables)

#### Question 4: Comparing FE and RE
Use the results from the FE and RE estimations to compute the Hausman test statistics in eq. (7).

* Start by calculating the differences in the FE and RE coefficients $\hat{\boldsymbol{\beta}}_{FE} - \hat{\boldsymbol{\beta}}_{RE}$ (remember to remove the time invariant variables from RE)
* Then calculate the differences in the covariances $\widehat{\mathrm{avar}}(\hat{\boldsymbol{\beta}}_{FE}) - \widehat{\mathrm{avar}}(\hat{\boldsymbol{\beta}}_{RE})$ (again, remember to remove the time invariant variables for RE estimates)
* You now have all the components to compute the Hausman test statistics in eq. (7)

In [23]:
# Unpack only time-varying coefficients (exclude intercept)
b_fe = fe_result['b_hat'][1:,:]  # shape (2,1)
b_re = re_result['b_hat'][1:,:]  # shape (2,1)
cov_fe = fe_result['cov'][1:,1:] # shape (2,2)
cov_re = re_result['cov'][1:,1:] # shape (2,2)

# Calculate the test statistic
b_diff = b_fe - b_re
cov_diff = cov_fe - cov_re
H = b_diff.T @ la.inv(cov_diff) @ b_diff

# Find critical value and p-value at 5% significance level of chi^2 with M degrees of freedom
M = len(b_diff)
crit_val = chi2.ppf(0.95, M)
p_val = 1 - chi2.cdf(H.item(), M)

# Print the results
print(f'The test statistic is {H.item():.2f}.')
print(f'The critical value at a 5% significance level is {crit_val:.2f}.')
print(f'The p-value is {p_val:.8f}.')

The test statistic is 358.15.
The critical value at a 5% significance level is 5.99.
The p-value is 0.00000000.


Which assumption is tested by the Hausman test? What is the null hypothesis? Does the Hausman test you have conducted rely on any other assumptions (See Wooldridge, p. 328-331)? Based on your test result, which estimator would you use to estimate eq. (3)? Why?