# **Project 1: Linear Panel Data and Production Technology**

By Emma Knippel, Anna Abildskov and Tobias Rønn

**Table of contents**

* [Setup](#toc0_)

* [Read and clean data](#toc1_)

* [FE estimation of $\beta_K$ and $\beta_L$](#toc2_)    

* [FD estimation of $\beta_K$ and $\beta_L$](#toc3_)

* [RE estimation of $\beta_K$ and $\beta_L$](#toc4_)

* [Hausman test](#toc5_)

* [Test for strict exogeneity](#toc6_)

* [Test for constant returns to scale](#toc7_)    

## <a id='toc0_'><a>[Setup](#toc0_)

In [2]:
import numpy as np
from numpy import linalg as la
import pandas as pd
from scipy.stats import chi2

#Supress Future Warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

#Import our py-file
import Project_1 as pf

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

## <a id='toc1_'></a>[Read and clean data](#toc1_)

The dataset contains `N = 441` firms observed over `T = 12` years, 1968-1979. There variables are: 
* `lcap`: Log of capital stock, $k_{it}$ 
* `lemp`: log of employment, $\ell_{it}$ 
* `ldsa`: log of deflated sales, $y_{it}$
* `year`: the calendar year of the observation, `year` $ = 1968, ..., 1979$, 
* `firmid`: anonymized indicator variable for the firm, $i = 1, ..., N$, with $N=441$. 

Loading the data and dropping all the even years, leaving us with only odd year-observations:

In [3]:
df_all_years = pd.read_csv('firms.csv')
data = df_all_years[df_all_years.year % 2 != 0]

Converting data to numpy format:

In [4]:
N = data.firmid.unique().size
T = data.year.unique().size
print(f'Data contains data for {N} firms over {T} odd years')

Data contains data for 441 firms over 6 odd years


Extracting data to numpy arrays:

In [5]:
y = data.ldsa.values.reshape((N*T,1))
l = data.lemp.values.reshape((N*T,1))
k = data.lcap.values.reshape((N*T,1))
ones = np.ones((N*T, 1))
X = np.hstack([ones, l, k])

In [6]:
print(X)

[[ 1.        -0.241278   0.9252139]
 [ 1.        -0.317875   0.8430977]
 [ 1.        -0.30135    0.7943461]
 ...
 [ 1.        -0.956662  -1.00608  ]
 [ 1.        -0.672649  -0.719267 ]
 [ 1.        -0.567195  -0.522616 ]]


Creating label-vectors:

In [7]:
label_y = 'Log of deflated sales'
label_x = ['Constant', 'Log of employment', 'Log of capital stock']
label_x_fe = ['Log of employment', 'Log of capital stock']

## <a id='toc2_'></a>[FE estimation of $\beta_K$ and $\beta_L$](#toc2_)

For the purpose of our project, we have decided to use the Fixed-Effects (FE) estimator to find estimates of $\beta_K$ and $\beta_L$. 

For the FE-estimation, we need to demean our dependent variable and our regressors - to perform a "within transformation". We do this by creating a a demeaning matrix, and using it as input in the transformation function from Project_1.py. Here, we create the $Q_T$ demeaning matrix of the form:

$$\mathbf{Q}_T = \mathbf{I}_T - \frac{1}{T}{\mathbf{j}_T}{\mathbf{j}_T}'
    $$
$$ = 
    \begin{bmatrix}
    1-\frac{1}{T} & -\frac{1}{T} & \dots & -\frac{1}{T}\\
    -\frac{1}{T} & 1-\frac{1}{T} & \dots & -\frac{1}{T}\\
    \vdots & \vdots & \ddots & \vdots\\
    -\frac{1}{T} & -\frac{1}{T} & \dots & 1-\frac{1}{T}
    \end{bmatrix}_{T \times T}
$$

We use this as input in our transformation function to obtain $(\ddot{y}_{it}, \ddot{\mathbf{x}}_{it})$

In [8]:
Q_T = np.eye(T) - np.tile(1/T, (T, T)) 

#only include the log of employment and log of capital stock, no constant
X_fe = X[:, 1:]

y_demean = pf.perm(Q_T, y)
x_demean = pf.perm(Q_T, X_fe)

The FE estimator is essentially the pooled OLS estimator on our within-transformed variables. It is defined as such:

$$
\hat{\beta}_{FE} = 
\left(
    \sum_{i=1}^{N}\sum_{t=1}^{T}\ddot{\mathbf{x}}'_{it}\ddot{\mathbf{x}}_{it}
    \right)^{-1}
\left(
    \sum_{i=1}^{N}\sum_{t=1}^{T}\ddot{\mathbf{x}}'_{it}\ddot{y}_{it}
    \right)
$$

Stacked over $t$ and $i$, in matrix form it becomes:

$$
\hat{\beta}_{FE} = (\ddot{\mathbf{X}}'\ddot{\mathbf{X}})^{-1}\ddot{\mathbf{X}}'\ddot{\mathbf{y}}
$$

In order for the FE estimator to be consistent, we must make sure that the regression matrix has full rank (FE.2 condition). We check this with our rank-checking function.

In [9]:
pf.check_rank(x_demean)

Rank of transformed x: 2
Eigenvalues of transformed x: [28. 99.]


Now that we have confirmed the demeaned matrix to have full rank, we don't need to adjust it, we can just run the estimation.

In [10]:
fe_result = pf.estimate(y_demean, x_demean, transform='fe', T=T)
pf.print_table((label_y, label_x_fe), fe_result, title="Fixed Effects", floatfmt='.4f')

Fixed Effects
Dependent variable: Log of deflated sales

                        Beta      Se    t-values
--------------------  ------  ------  ----------
Log of employment     0.7069  0.0221     32.0114
Log of capital stock  0.1424  0.0197      7.2402
R² = 0.468
σ² = 0.019


## <a id='toc3_'></a>[FD estimation of $\beta_K$ and $\beta_L$](#toc3_)

**TEKST OM FD MATRIX TRANSFORMATION**
**TEKST OM RANK CONDITION FOR FD**

In [11]:
D_T = np.eye(T) - np.eye(T, k=-1)
D_T = D_T[1:]

#only include the log of employment and log of capital stock, no constant
X_re = X[:, 1:]

y_diff = pf.perm(D_T, y)
x_diff = pf.perm(D_T, X_re)

In [12]:
pf.check_rank(x_diff)

Rank of transformed x: 2
Eigenvalues of transformed x: [29. 56.]


In [13]:
fd_result = pf.estimate(y_diff, x_diff, transform ='fd', T=T)

pf.print_table((label_y, label_x_fe), fd_result, title="First Differences", floatfmt='.4f')

First Differences
Dependent variable: Log of deflated sales

                        Beta      Se    t-values
--------------------  ------  ------  ----------
Log of employment     0.7253  0.0248     29.2665
Log of capital stock  0.0547  0.0235      2.3307
R² = 0.313
σ² = 0.022


## <a id='toc6_'></a>[Test for strict exogeneity](#toc6_)

To make sure the exogeneity conditions for FE and FD hold, we must test this with an exogeneity test.

First, we add a lead variable for both capital and employment (losing data from the last period, so therefore removing the last year for each of them), and then we estimate the model with FE by demeaning the new variables using the transformation matrix above.

In [14]:
#Transformation matrix to obtain the leads
F_T = np.eye(T, k=1)
F_T = F_T[:-1]

employment_lead = pf.perm(F_T, X[:, 1].reshape(-1, 1))
capital_lead = pf.perm(F_T, X[:, 2].reshape(-1, 1))

# Remove the last observed year for every individual
I_T = np.eye(T, k=0)
I_T = I_T[:-1]

x_exo = pf.perm(I_T, X)
y_exo = pf.perm(I_T, y)

In [15]:
#Add the leads to x_exo
x_exo = np.hstack([x_exo, employment_lead, capital_lead])

# Within transformation of the data
Q_T_exo = np.eye(T-1) - np.tile(1/(T-1), (T-1, T-1)) 
y_exo_demean = pf.perm(Q_T_exo, y_exo)
x_exo_demean = pf.perm(Q_T_exo, x_exo)

# Select cariables
x_exo_demean = np.hstack((x_exo_demean[:, 1:4], x_exo_demean[:, -1].reshape(-1, 1)))

In [23]:
#Estimate the model
exo_test = pf.estimate(y_exo_demean, x_exo_demean, transform='fe', T=T-1)

label_exo = label_x_fe + ['Lead of log of employment', 'Lead of log of capital stock']
pf.print_table((label_y, label_exo), exo_test, title="Exogeneity test", floatfmt='.4f')

Exogeneity test
Dependent variable: Log of deflated sales

                                Beta      Se    t-values
----------------------------  ------  ------  ----------
Log of employment             0.6247  0.0290     21.5751
Log of capital stock          0.0576  0.0253      2.2759
Lead of log of employment     0.0418  0.0279      1.5014
Lead of log of capital stock  0.1571  0.0296      5.3140
R² = 0.462
σ² = 0.016


In [26]:
b_hat = exo_test['b_hat']
r = 0
R = np.array([[0,0,1]]).reshape(1,-1)
cov = exo_test['cov']

pf.wald_test(b_hat, r, R, cov)

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 4 is different from 3)

In [None]:
R_matrix_test = np.array([0,0,1]).reshape(1,-1)
r_target_test = np.array([0]).reshape(1,-1)
Avar_beta_test = test_result['cov']
beta_hat_test = test_result['b_hat']

W_statistic = (R_matrix_test@beta_hat_test - r_target_test).T@(np.linalg.inv(R_matrix_test@Avar_beta_test@R_matrix_test.T))@(R_matrix_test@beta_hat_test-r_target_test)
critical_value = chi2.ppf(1 - 0.05, 1)

print('This is the test statistic: ', W_statistic)
print('This is the critical value: ', critical_value)
p_val = 1 - chi2.cdf(W_statistic.item(), 1)

print('The p-value is:',p_val)

In [None]:
pf.export_to_latex([exo_test], ['Exogeneity test'], [label_exo], filename='exogeneity_test.tex')

LaTeX table saved to exogeneity_test.tex


## <a id='toc4_'></a>[RE estimation of $\beta_K$ and $\beta_L$](#toc_)

The RE estimator is based on a "quasi-demeaning" of the variables by premultiplying the variable-means by $\hat{\lambda}$, which can be estimated by: 

$$\hat{\lambda} \equiv 1 - \sqrt{\frac{\widehat{\sigma}_{u}^{2}}{(\widehat{\sigma}_{u}^{2} + T\widehat{\sigma}_{c}^{2})}}, $$

where $\widehat{\sigma}_{u}^{2}$ can be estimated from the fixed effects regression, and $\hat{\sigma}_{c}^{2}$ can be constructed as  $\hat{\sigma}_{c}^{2} = \hat{\sigma}_{w}^{2} - \frac{1}{T}\hat{\sigma}_{u}^{2}$. Here $\hat{\sigma}_{w}^{2}$ is the error variance from the between estimator (BE), 

$$
\hat{\sigma}_{w}^{2} = \frac{1}{N-K}\left(\bar{\mathbf{y}} - \mathbf{\bar{X}}\hat{\mathbf{\beta}}_{BE}\right)^{\prime}\left(\bar{\mathbf{y}} - \mathbf{\bar{X}}\hat{\mathbf{\beta}}_{BE}\right),
$$

where $\boldsymbol{\beta}_{BE}$ are the between estimator coefficients.

Hence, three steps are required before we can estimate our model with the RE estimator:
1) Estimate the model by FE (done above)
2) Estimate the model by BE.
3) Estimate $\hat{\lambda}$.

**2. Estimating the model by BE**

This is based on the transformation matrix $P_T$:

$$ \mathbf{P}_T=\frac{1}{T}{\mathbf{j}_T}{\mathbf{j}_T}'
    =
    \begin{bmatrix}
    \frac{1}{T} & \frac{1}{T} & \dots & \frac{1}{T}\\
    \frac{1}{T} & \frac{1}{T} & \dots & \frac{1}{T}\\
    \vdots & \vdots & \ddots & \vdots\\
    \frac{1}{T} & \frac{1}{T} & \dots & \frac{1}{T}
    \end{bmatrix}_{T \times T}
$$



In [122]:
# Transform the data
P_T = np.tile(1/T, (T, T)) 

y_mean = pf.perm(P_T, y)
x_mean = pf.perm(P_T, X)

# Estimate 
be_result = pf.estimate(y_mean, x_mean, transform='be', T=T)
pf.print_table((label_y, label_x), be_result, title="Between Estimator", floatfmt='.4f')

Between Estimator
Dependent variable: Log of deflated sales

                         Beta      Se    t-values
--------------------  -------  ------  ----------
Constant              -0.0000  0.0066     -0.0000
Log of employment      0.6717  0.0141     47.7252
Log of capital stock   0.3160  0.0127     24.9179
R² = 0.922
σ² = 0.116


**3. Estimating $\hat{\lambda}$.**

In [123]:
# Calculate lambda (note lambda is a reserved keyword in Python, so we use _lambda instead)
sigma2_u = fe_result['sigma2']
sigma2_w = be_result['sigma2']
sigma2_c = sigma2_w - 1/T * sigma2_u
_lambda = 1 - np.sqrt(sigma2_u / (sigma2_u + T*sigma2_c))

# Print lambda 
print(f'Lambda is approximately equal to {_lambda.item():.4f}.')

Lambda is approximately equal to 0.8356.


Now that we have an estimate of $\hat{\lambda}$, we can estimate the model with the RE estimator. This requires us to quasi-demean our dependent variable and our regressors. We do this by creating a a demeaning matrix, and using it as input in the transformation function from Project_1.py. Here, we create the $\mathbf{\hat{C}}_T$ quasi-demeaning matrix of the form:

$$\mathbf{\hat{C}}_T = \mathbf{I}_t - \hat{\lambda}\mathbf{P}_t
    $$

We use this as input in our transformation function to obtain $(\check{y}_{it}, \check{\mathbf{x}}_{it})$, and then estimate the function:

$$\check{y}_{it} = \mathbf{\check{x}}_{it}\boldsymbol{\beta} + \check{v}_{it},\tag{6}$$ 


In [124]:
# Transform the data
C_T = np.eye(T, T) - _lambda * P_T
y_re = pf.perm(C_T, y)
x_re = pf.perm(C_T, X)

# Estimate 
re_result = pf.estimate(y_re, x_re, transform='re', T=T)
pf.print_table((label_y, label_x), re_result, title="Random Effects Estimation", floatfmt='.4f')

Random Effects Estimation
Dependent variable: Log of deflated sales

                         Beta      Se    t-values
--------------------  -------  ------  ----------
Constant              -0.0000  0.0164     -0.0000
Log of employment      0.7331  0.0180     40.7422
Log of capital stock   0.2150  0.0161     13.3153
R² = 0.725
σ² = 0.019


In [128]:
col_headers = ['FE', 'FD', 'RE']
results_list = [fe_result, fd_result, re_result]
var_names_list = [
    label_x_fe, # FE variables
    label_x_fe, # FD variables
    label_x    # RE variables
]

pf.export_to_latex(results_list, col_headers, var_names_list, filename='regression_table.tex', label_x = label_x)

LaTeX table saved to regression_table.tex


## <a id='toc5_'></a>[Hausman test](#toc5_)

The Hausman test helps us make our final decision on whether to select the FE estimator or the REE estimator. The null hypothesis is that RE.1-3 as well as FE.2 holds, and if this is the case, the RE estimator is asymptotically efficient.

Since we already confirmed our rank condition, the null hypothesis will align with RE.1-3, suggesting the asymptotic efficiency of RE. If the null fails, we interpret that as a suggestion of RE.1(b) failing, and thus, RE being inconsistent and not usable for us.

So, if the null holds, we select RE. If we reject it, we select FE.

In [18]:
# Unpack
b_fe = fe_result['b_hat']
b_re = re_result['b_hat']
cov_fe = fe_result['cov']
cov_re = re_result['cov']

# Calculate the test statistic
pf.hausman_test(b_fe, b_re, cov_fe, cov_re)

Hausman test statistic: 78.83
The p-value is: 0.00000000 (df=2)
Critical value at the 5% level: 5.9915
Critical value at the 1% level: 9.2103
Critical value at the 0.001% level: 23.0259


Hence, we reject $H_0$ at a significance level < 0.001%, and select the FE estimator as our final model.

## <a id='toc7_'></a>[Test for constant returns to scale](#toc7_)

A Cobb Douglas production technology, yield a constant return to scale if the parameters of the production inputs (i.e. the betas) sum to unity. Specifically, the production technology specified in relation to this assignment, will yield constant returns to scale if $\beta_K + \beta_L = 1$. Based on an Wald test, we can test the hypothesis that this is true against the alternative, that it is not true.

$$H_0: \beta_K + \beta_L = 1$$

$$H_1: \beta_K + \beta_L \neq 1$$

The general Wald statistic is: 

$$W \equiv (R\widehat{\beta}-r)'[R\widehat{Avar(\widehat{\beta})}R']^{-1}   (R\widehat{\beta}-r)\$$

where, given the above $H_0$, 

$$R=[1 \quad 1]$$

$$r=1$$

This statistic can be shown to be chi-square distributed under $H_0$, and, hence, $H_0$ can be rejected at level $\alpha$ if $W>(1-\alpha)$-quantile of $\chi_Q^2$ 


In [17]:
b_hat = fe_result['b_hat']
r = 1
R = np.array([[1,1]])
cov = fe_result['cov']

pf.wald_test(b_hat, r, R, cov)

Wald test statistic: 69.89
The p-value is: 0.00000000 (df=1)
Critical value at the 5% level: 3.8415
Critical value at the 1% level: 6.6349
Critical value at the 0.001% level: 19.5114


Hence, we reject $H_0$ at a significance level = 0.001%, and conclude that the production does not exhibit constant returns to scale.