# **Project 1: Linear Panel Data and Production Technology**

By Emma Knippel, Anna Abildskov and Tobias Rønn

**Table of contents**

* [Setup](#toc0_)

* [Read and clean data](#toc1_)

* [FE estimation of $\beta_K$ and $\beta_L$](#toc2_)    

* [RE estimation of $\beta_K$ and $\beta_L$](#toc3_)

* [Hausman test](#toc4_)

* [Test for constant returns to scale](#toc5_)    

* [Conclusion](#toc6_)   

## <a id='toc0_'><a>[Setup](#toc0_)

In [1]:
import numpy as np
from numpy import linalg as la
import pandas as pd
from scipy.stats import chi2

#Supress Future Warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

#Import our py-file
import Project_1 as pf

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

## <a id='toc1_'></a>[Read and clean data](#toc1_)

##

The dataset contains `N = 441` firms observed over `T = 12` years, 1968-1979. There variables are: 
* `lcap`: Log of capital stock, $k_{it}$ 
* `lemp`: log of employment, $\ell_{it}$ 
* `ldsa`: log of deflated sales, $y_{it}$
* `year`: the calendar year of the observation, `year` $ = 1968, ..., 1979$, 
* `firmid`: anonymized indicator variable for the firm, $i = 1, ..., N$, with $N=441$. 

Loading the data and dropping all the even years, leaving us with only odd year-observations:

In [2]:
df_all_years = pd.read_csv('firms.csv')
data = df_all_years[df_all_years.year % 2 != 0]

Converting data to numpy format:

In [3]:
N = data.firmid.unique().size
T = data.year.unique().size
print(f'Data contains data for {N} firms over {T} odd years')

Data contains data for 441 firms over 6 odd years


Extracting data to numpy arrays:

In [4]:
y = data.ldsa.values.reshape((N*T,1))
l = data.lemp.values.reshape((N*T,1))
k = data.lcap.values.reshape((N*T,1))
X = np.hstack([l, k])

Creating label-vectors:

In [5]:
label_y = 'Log of deflated sales'
label_x = ['Log of employment', 'Log of capital stock']

## <a id='toc3_'></a>[FE estimation of $\beta_K$ and $\beta_L$](#toc2_)

For the purpose of our project, we have decided to use the Fixed-Effects (FE) estimator to find estimates of $\beta_K$ and $\beta_L$. 

For the FE-estimation, we need to demean our dependent variable and our regressors - to perform a "within transformation". We do this by creating a a demeaning matrix, and using it as input in the transformation function from Project_1.py. Here, we create the $Q_T$ demeaning matrix of the form:

$$Q_T = I_t - \frac{1}{T}{j_T}{j_T}'
    $$

$$=\begin{bmatrix}
    1 & 0 & \dots & 0\\
    0 & 1 & \dots & 0\\
    \vdots & \vdots & \ddots & \vdots\\
    0 & 0 & \dots & 1
    \end{bmatrix}_{T \times T}
    -
    \left(\mathbf{\frac{1}{T}}\right)_{T \times T}
    $$
    
$$ = 
    \begin{bmatrix}
    1-\frac{1}{T} & 0-\frac{1}{T} & \dots & 0-\frac{1}{T}\\
    0-\frac{1}{T} & 1-\frac{1}{T} & \dots & 0-\frac{1}{T}\\
    \vdots & \vdots & \ddots & \vdots\\
    0-\frac{1}{T} & 0-\frac{1}{T} & \dots & 1-\frac{1}{T}
    \end{bmatrix}_{T \times T}
$$

We use this as input in our transformation function to obtain $(\ddot{y}_{it}, \ddot{\mathbf{x}}_{it})$

In [6]:
Q_T = np.eye(T) - np.tile(1/T, (T, T)) 

y_demean = pf.perm(Q_T, y)
x_demean = pf.perm(Q_T, X)

The FE estimator is essentially the pooled OLS estimator on our within-transformed variables. It is defined as such:

$$
\hat{\beta}_{FE} = 
\left(
    \sum_{i=1}^{N}\sum_{i=1}^{T}\ddot{\mathbf{x}}'_{it}\ddot{\mathbf{x}}_{it}
    \right)^{-1}
\left(
    \sum_{i=1}^{N}\sum_{i=1}^{T}\ddot{\mathbf{x}}'_{it}\ddot{y}_{it}
    \right)^{-1}
$$

Stacked over $t$ and $i$, in matrix form it becomes:

$$
\hat{\beta}_{FE} = (\ddot{\mathbf{X}}'\ddot{\mathbf{X}})^{-1}\ddot{\mathbf{X}}'\ddot{\mathbf{y}}
$$

In order for the FE estimator to be consistent, we must make sure that the regression matrix has full rank (FE.2 condition). We check this with our rank-checking function.

In [7]:
pf.check_rank(x_demean)

Rank of demeaned x: 2
Eigenvalues of within-transformed x: [28. 99.]


Now that we have confirmed the demeaned matrix to have full rank, we don't need to adjust it, we can just run the estimation.

In [8]:
fe_result = pf.estimate(y_demean, x_demean, T=T)
pf.print_table((label_y, label_x), fe_result, title="Fixed Effects", floatfmt='.4f')

Fixed Effects
Dependent variable: Log of deflated sales

                        Beta      Se    t-values
--------------------  ------  ------  ----------
Log of employment     0.7069  0.0202     35.0694
Log of capital stock  0.1424  0.0180      7.9319
R² = 0.468
σ² = 0.016


## <a id='toc3_'></a>[RE estimation of $\beta_K$ and $\beta_L$](#toc3_)

The RE estimator is based on a "quasi-demeaning" of the variables by premultiplying the variable-means by $\hat{\lambda}$. $\hat{\lambda}$ can be estimated by: 

$$\hat{\lambda} = 1 - \sqrt{\frac{\widehat{\sigma}_{u}^{2}}{(\widehat{\sigma}_{u}^{2} + T\widehat{\sigma}_{c}^{2})}}, $$

where $\widehat{\sigma}_{u}^{2}$ can be estimated from the fixed effects regression, and $\hat{\sigma}_{c}^{2}$ can be constructed as  $\hat{\sigma}_{c}^{2} = \hat{\sigma}_{w}^{2} - \frac{1}{T}\hat{\sigma}_{u}^{2}$. Here $\hat{\sigma}_{w}^{2}$ is the error variance from the between estimator (BE), 

$$
\hat{\sigma}_{w}^{2} = \frac{1}{N-K}\left(\bar{\mathbf{y}} - \mathbf{\bar{X}}\hat{\mathbf{\beta}}_{BE}\right)^{\prime}\left(\bar{\mathbf{y}} - \mathbf{\bar{X}}\hat{\mathbf{\beta}}_{BE}\right),
$$

where $\boldsymbol{\beta}_{BE}$ are the between estimator coefficients.

Hence, three steps are required before we can estimate our model with the RE estimator:
1) Estimate the model by FE (done above)
2) Estimate the model by BE.
3) Estimate $\hat{\lambda}$.

2. ESTIMATE MODEL BY BE

This is based on the transformation matrix $P_T$:

$$ P_T=\frac{1}{T}{j_T}{j_T}'



In [9]:
# Transform the data
P_T = np.tile(1/T, (T, T)) 

y_mean = pf.perm(P_T, y)
x_mean = pf.perm(P_T, X)

# Estimate 
be_result = pf.estimate(y_mean, x_mean, transform='be', T=T)
pf.print_table((label_y, label_x), be_result, title="Between Estimator", floatfmt='.4f')

Between Estimator
Dependent variable: Log of deflated sales

                        Beta      Se    t-values
--------------------  ------  ------  ----------
Log of employment     0.6717  0.0141     47.7342
Log of capital stock  0.3160  0.0127     24.9226
R² = 0.922
σ² = 0.116


3. ESTIMATE $\hat{\lambda}$.

In [10]:
# Calculate lambda (note lambda is a reserved keyword in Python, so we use _lambda instead)
sigma2_u = fe_result['sigma2']
sigma2_w = be_result['sigma2']
sigma2_c = sigma2_w - 1/T * sigma2_u
_lambda = 1 - np.sqrt(sigma2_u / (sigma2_u + T*sigma2_c))

# Print lambda 
print(f'Lambda is approximately equal to {_lambda.item():.4f}.')

Lambda is approximately equal to 0.8499.


Now that we have an estimate of $\hat{\lambda}$, we can estimate the model with the RE estimator. This requires that we quasi-demean our dependent variable and our regressors. We do this by creating a a demeaning matrix, and using it as input in the transformation function from Project_1.py. Here, we create the $C_T$ demeaning matrix of the form:

$$C_T = I_t - \hat{\lambda}{P_t}
    $$

We use this as input in our transformation function to obtain $(\check{y}_{it}, \check{x}_{it})$, and then estimate the function:

$$\check{y}_{it} = \mathbf{\check{x}}_{it}\boldsymbol{\beta} + \check{v}_{it},\tag{6}$$ 


In [32]:
# Transform the data
C_T = np.eye(T, T) - _lambda * P_T
y_re = pf.perm(C_T, y)
x_re = pf.perm(C_T, X)

# Estimate 
re_result = pf.estimate(y_re, x_re, transform='re', T=T)
pf.print_table((label_y, label_x), re_result, title="Random Effects Estimation", floatfmt='.4f')

Random Effects
Dependent variable: Log of deflated sales

                        Beta      Se    t-values
--------------------  ------  ------  ----------
Log of employment     0.7335  0.0182     40.3204
Log of capital stock  0.2088  0.0163     12.7955
R² = 0.704
σ² = 0.019


## <a id='toc4_'></a>[Hausman test](#toc4_)

The Hausman test helps us make our final decision on whether to go with our FE estimator or our REE estimator. The null hypothesis is that RE.1-3 as well as FE.2 holds, and if this is the case, the RE estimator is efficient.
Since we already know from our rank condition 

In [23]:
# Unpack
b_fe = fe_result['b_hat']
b_re = re_result['b_hat']
cov_fe = fe_result['cov']
cov_re = re_result['cov']

# Calculate the test statistic
pf.hausman_test(b_fe, b_re, cov_fe, cov_re)

Hausman test statistic: 78.83
The p-value is 0.00000000 (df=2)
Critical value at the 5% level: 5.9915
Critical value at the 1% level: 9.2103
Critical value at the 0.001% level: 23.0259


## <a id='toc4_'></a>[Test for constant returns to scale](#toc5_)

A Cobb Douglas production technology, yield a constant return to scale if the parameters of the production inputs (i.e. the betas) sum to unity. Specifically, the production technology specified in relation to this assignment, will yield constant returns to scale if $\beta_K + \beta_L = 1$. Based on an Wald test, we can test the hypothesis that this is true against the alternative, that it is not true.

$$H_0: \beta_K + \beta_L = 1$$

$$H_1: \beta_K + \beta_L \neq 1$$

The general Wald statistic is: 

$$W \equiv (R\widehat{\beta}-r)'[R\widehat{Avar(\widehat{\beta})}R']^{-1}   (R\widehat{\beta}-r)\$$

where, given the above $H_0$, 

$$R=[1 \quad 1]$$

$$r=1$$

This statistic can be shown to be chi-square distributed under $H_0$, and, hence, $H_0$ can be rejected at level $\alpha$ if $W>(1-\alpha)$-quantile of $\chi_Q^2$ 


In [64]:
b_hat = fe_result['b_hat']
r = 1
R = np.array([[1,1]])
cov = fe_result['cov']

#number of rows in R


pf.wald_test(b_hat, r, R, cov)

Wald test statistic: 69.89
The p-value is: 0.00000000 (df=1)
Critical value at the 5% level: 3.8415
Critical value at the 1% level: 6.6349
Critical value at the 0.001% level: 19.5114


Hence, we reject $H_0$ at a significance level < 0.001%, and conclude that the production does not exhibit constant returns to scale.

## <a id='toc5_'></a>[Conclusion](#toc6_)