# **Project 1: Linear Panel Data and Production Technology**

By Emma Knippel, Anna Abildskov and Oscar Nyholm

**Table of contents**
* [Setup](#toc0_)

* [Introduction](#toc1_)   

* [Read and clean data](#toc2_) 

* [FE estimation of $\beta_K$ and $\beta_L$](#toc3_)    

* [Test for constant returns to scale](#toc4_)    

* [Conclusion](#toc5_)   

## <a id='toc0_'><a>[Setup](#toc0_)

In [59]:
import numpy as np
from numpy import linalg as la
import pandas as pd
from io import StringIO
from tabulate import tabulate
from matplotlib import pyplot as plt
from scipy.stats import chi2

#Supress Future Warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

#Import our py-file
import Project_1 as pf

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## <a id='toc1_'></a>[Introduction](#toc1_)

- intro
- intro
- intro


We are estimating parameters of a Cobb-Douglas production function of the form:
$$\begin{aligned}
y_it = \beta_kk_{it}+\beta_Ll_{it}+v_{it},
\end{aligned}$$
Where $v_{it} =\ln A_{it}$.

## <a id='toc2_'></a>[Read and clean data](#toc2_)

The dataset contains `N = 441` firms observed over `T = 12` years, 1968-1979. There variables are: 
* `lcap`: Log of capital stock, $k_{it}$ 
* `lemp`: log of employment, $\ell_{it}$ 
* `ldsa`: log of deflated sales, $y_{it}$
* `year`: the calendar year of the observation, `year` $ = 1968, ..., 1979$, 
* `firmid`: anonymized indicator variable for the firm, $i = 1, ..., N$, with $N=441$. 

Loading the data and dropping all the even years, leaving us with only odd year-observations:

In [60]:
df_all_years = pd.read_csv('firms.csv')
data = df_all_years[df_all_years.year % 2 != 0]

Converting data to numpy format:

In [61]:
N = data.firmid.unique().size
T = data.year.unique().size
print(f'Data contains data for {N} firms over {T} odd years')

Data contains data for 441 firms over 6 odd years


Extracting data to numpy arrays:

In [62]:
y = data.ldsa.values.reshape((N*T,1))
const = np.ones((N*T,1))
l = data.lemp.values.reshape((N*T,1))
k = data.lcap.values.reshape((N*T,1))
X = np.hstack([const, l, k])

Creating label-vectors:

In [63]:
label_y = 'ldsa'
label_x = ['lemp', 'lcap']

## <a id='toc3_'></a>[FE estimation of $\beta_K$ and $\beta_L$](#toc3_)

When deciding which estimator to use for analysis of panel data (POLS, RE, FE, FD), the following aspects are relevant to consider:

- xxx

Based on this, it seem reasonable to proceed with the FE estimator. 

(here we should insert a description of the FE estimator (including the transformation matrix, Q))

In [64]:
Q_T = np.eye(T) - np.tile(1/T, (T, T)) 

y_demean = pf.perm(Q_T, y)
x_demean = pf.perm(Q_T, X)

fe_result = pf.estimate(y_demean, x_demean[:,1:3], T=T)  # We exclude the constant from the fixed effects estimation, as the given Cobb-Dougles production function does not include a constant term
pf.print_table((label_y, label_x), fe_result, title="Fixed Effects", floatfmt='.4f')

Fixed Effects
Dependent variable: ldsa

        Beta      Se    t-values
----  ------  ------  ----------
lemp  0.7069  0.0202     35.0694
lcap  0.1424  0.0180      7.9319
R² = 0.468
σ² = 0.016


## <a id='toc4_'></a>[Test for constant returns to scale](#toc4_)

It can be shown [DO WE HAVE TO SHOW THIS?] that a Cobb Douglas production technology, yield a constant return to scale if the parameters of the production inputs (i.e. the betas) sum to unity. Specifically, the production technology specified in relation to this assignment, will yield constant returns to scale if $\beta_K + \beta_L = 1$. Based on an Wald test, we can test the hypothesis that this is true against the alternative, that it is not true.

$H_0: \beta_K + \beta_L = 1$

$H_1: \beta_K + \beta_L \neq 1$

The general Wald statistic is: 

$W:=(R\widehat{\beta}-r)'[R\widehat{Avar(\widehat{\beta})}R']^{-1}   (R\widehat{\beta}-r)$

, where, given the above $H_0$, 

$R=[1 \quad 1]$

$r=1$

This statistic can be shown [DO WE HAVE TO SHOW THIS?] to be chi-square distributed under $H_0$, and, hence, $H_0$ can be rejected at level $\alpha$ if $W>(1-\alpha)$-quantile of $\chi_Q^2$ 


In [65]:
R=np.array([[1,1]])
r=1
b_hat = fe_result['b_hat']
cov=fe_result['cov']

W=(R @ b_hat - r).T*(R @ cov @ R.T)**(-1)*(R @ b_hat - r)

print(f'Wald test statistic: {W[0,0]:.4f}')


chi_2_05 = chi2.ppf(1 - 0.05, df=1)  # degrees of freedom = 1
chi_2_01 = chi2.ppf(1 - 0.01, df=1)  # degrees of freedom = 1
chi_2_00001 = chi2.ppf(1 - 0.00001, df=1)  # degrees of freedom = 1
print(f'Critical value at the 5% level: {chi_2_05:.4f}')
print(f'Critical value at the 1% level: {chi_2_01:.4f}')
print(f'Critical value at the 0.001% level: {chi_2_00001:.4f}')


Wald test statistic: 69.8893
Critical value at the 5% level: 3.8415
Critical value at the 1% level: 6.6349
Critical value at the 0.001% level: 19.5114


Hence, we reject $H_0$ at a significance level of 0.001%, and conclude that the production does not exhibit constant returns to scale.

## <a id='toc5_'></a>[Conclusion](#toc5_)