# **Project 1: Linear Panel Data and Production Technology**

By Emma Knippel, Anna Abildskov and Oscar Nyholm

**Table of contents**
* [Setup](#toc0_)

* [Read and clean data](#toc2_) 

* [FE estimation of $\beta_K$ and $\beta_L$](#toc3_)    

* [Test for constant returns to scale](#toc4_)    

* [Conclusion](#toc5_)   

## <a id='toc0_'><a>[Setup](#toc0_)

In [22]:
import numpy as np
from numpy import linalg as la
import pandas as pd
from scipy.stats import chi2

#Supress Future Warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

#Import our py-file
import Project_1 as pf

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## <a id='toc2_'></a>[Read and clean data](#toc2_)

The dataset contains `N = 441` firms observed over `T = 12` years, 1968-1979. There variables are: 
* `lcap`: Log of capital stock, $k_{it}$ 
* `lemp`: log of employment, $\ell_{it}$ 
* `ldsa`: log of deflated sales, $y_{it}$
* `year`: the calendar year of the observation, `year` $ = 1968, ..., 1979$, 
* `firmid`: anonymized indicator variable for the firm, $i = 1, ..., N$, with $N=441$. 

Loading the data and dropping all the even years, leaving us with only odd year-observations:

In [23]:
df_all_years = pd.read_csv('firms.csv')
data = df_all_years[df_all_years.year % 2 != 0]

Converting data to numpy format:

In [24]:
N = data.firmid.unique().size
T = data.year.unique().size
print(f'Data contains data for {N} firms over {T} odd years')

Data contains data for 441 firms over 6 odd years


Extracting data to numpy arrays:

In [25]:
y = data.ldsa.values.reshape((N*T,1))
const = np.ones((N*T,1))
l = data.lemp.values.reshape((N*T,1))
k = data.lcap.values.reshape((N*T,1))
X = np.hstack([const, l, k])

Creating label-vectors:

In [48]:
label_y = 'Log of deflated sales'
label_x = ['Log of employment', 'Log of capital stock']

## <a id='toc3_'></a>[FE estimation of $\beta_K$ and $\beta_L$](#toc3_)

For the purpose of our project, we have decided to use the Fixed-Effects (FE) estimator to find estimates of $\beta_K$ and $\beta_L$. 

For the FE-estimation, we need to demean our dependent variable and our regressors - to perform a "within transformation". We do this by creating a a demeaning matrix, and using it as input in the transformation function from Project_1.py. Here, we create the $Q_T$ demeaning matrix of the form:

$$Q_T =
    \begin{bmatrix}
    1 & 0 & \dots & 0\\
    0 & 1 & \dots & 0\\
    \vdots & \vdots & \ddots & \vdots\\
    0 & 0 & \dots & 1
    \end{bmatrix}_{T \times T}
    -
    \left(\mathbf{\frac{1}{T}}\right)_{T \times T}
    $$
    
$$ = 
    \begin{bmatrix}
    1-\frac{1}{T} & 0-\frac{1}{T} & \dots & 0-\frac{1}{T}\\
    0-\frac{1}{T} & 1-\frac{1}{T} & \dots & 0-\frac{1}{T}\\
    \vdots & \vdots & \ddots & \vdots\\
    0-\frac{1}{T} & 0-\frac{1}{T} & \dots & 1-\frac{1}{T}
    \end{bmatrix}_{T \times T}
$$

We use this as input in our transformation function to obtain $(\ddot{y}_{it}, \ddot{\mathbf{x}}_{it})$

In [42]:
Q_T = np.eye(T) - np.tile(1/T, (T, T)) 

y_demean = pf.perm(Q_T, y)
x_demean = pf.perm(Q_T, X)

The FE estimator is essentially the pooled OLS estimator on our within-transformed variables. It is defined as such:

$$
\hat{\beta}_{FE} = 
\left(
    \sum_{i=1}^{N}\sum_{i=1}^{T}\ddot{\mathbf{x}}'_{it}\ddot{\mathbf{x}}_{it}
    \right)^{-1}
\left(
    \sum_{i=1}^{N}\sum_{i=1}^{T}\ddot{\mathbf{x}}'_{it}\ddot{y}_{it}
    \right)^{-1}
$$

Stacked over $t$ and $i$, in matrix form it becomes:

$$
\hat{\beta}_{FE} = (\ddot{\mathbf{X}}'\ddot{\mathbf{X}})^{-1}\ddot{\mathbf{X}}'\ddot{\mathbf{y}}
$$

In order for the FE estimator to be consistent, we must make sure that the regression matrix has full rank (FE.2 condition). We check this with our rank-checking function.

In [45]:
pf.check_rank(x_demean)

Rank of demeaned x: 2
Eigenvalues of within-transformed x: [ 0. 99. 28.]


Now that we have confirmed the demeaned matrix to have full rank, we don't need to adjust it, we can just run the estimation.

In [49]:
fe_result = pf.estimate(y_demean, x_demean[:,1:3], T=T)  # We exclude the constant from the fixed effects estimation, as the given Cobb-Dougles production function does not include a constant term
pf.print_table((label_y, label_x), fe_result, title="Fixed Effects", floatfmt='.4f')

Fixed Effects
Dependent variable: Log of deflated sales

                        Beta      Se    t-values
--------------------  ------  ------  ----------
Log of employment     0.7069  0.0202     35.0694
Log of capital stock  0.1424  0.0180      7.9319
R² = 0.468
σ² = 0.016


## <a id='toc4_'></a>[Test for constant returns to scale](#toc4_)

It can be shown [DO WE HAVE TO SHOW THIS?] that a Cobb Douglas production technology, yield a constant return to scale if the parameters of the production inputs (i.e. the betas) sum to unity. Specifically, the production technology specified in relation to this assignment, will yield constant returns to scale if $\beta_K + \beta_L = 1$. Based on an Wald test, we can test the hypothesis that this is true against the alternative, that it is not true.

$$H_0: \beta_K + \beta_L = 1$$

$$H_1: \beta_K + \beta_L \neq 1$$

The general Wald statistic is: 

$$W \equiv (R\widehat{\beta}-r)'[R\widehat{Avar(\widehat{\beta})}R']^{-1}   (R\widehat{\beta}-r)\$$

where, given the above $H_0$, 

$$R=[1 \quad 1]$$

$$r=1$$

This statistic can be shown to be chi-square distributed under $H_0$, and, hence, $H_0$ can be rejected at level $\alpha$ if $W>(1-\alpha)$-quantile of $\chi_Q^2$ 


In [64]:
b_hat = fe_result['b_hat']
r = 1
R = np.array([[1,1]])
cov = fe_result['cov']

pf.wald_test(b_hat, r, R, cov)

Wald test statistic: 69.8893
Critical value at the 5% level: 3.8415
Critical value at the 1% level: 6.6349
Critical value at the 0.001% level: 19.5114


Hence, we reject $H_0$ at a significance level of 0.001%, and conclude that the production does not exhibit constant returns to scale.

## <a id='toc5_'></a>[Conclusion](#toc5_)