# Production Technology

The dataset contains `N = 441` firms observed over `T = 12` years, 1968-1979. There variables are: 
* `lcap`: Log of capital stock, $k_{it}$ 
* `lemp`: log of employment, $\ell_{it}$ 
* `ldsa`: log of deflated sales, $y_{it}$
* `year`: the calendar year of the observation, `year` $ = 1968, ..., 1979$, 
* `firmid`: anonymized indicator variable for the firm, $i = 1, ..., N$, with $N=441$. 

In [14]:
%load_ext autoreload
%autoreload 2
import pandas as pd 
import numpy as np
import seaborn as sns

import linear_panel_class as lm 
# content in lm is approximately the same as in the exercises
# only major change is that 'robust' is added as argument in estimate()
# to compute robust standard errors.

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Converting data to numpy format 

In [15]:
# load data and extract odd years 
dat = pd.read_csv('firms.csv')
df = dat[dat['year'] % 2 != 0]
df.year.unique()


array([1969, 1971, 1973, 1975, 1977, 1979])

In [16]:
# define N & T in data 
N = df.firmid.unique().size
T = df.year.unique().size
assert df.shape[0] == N*T, f'Error: data is not a balanced panel'
print(f'Data has N={N} and T={T}')

Data has N=441 and T=6


Extract data from `pandas` to `numpy` arrays. 

In [17]:
y = df.ldsa.values.reshape((N*T,1))
y_label = ['ldsa']

ones = np.ones((N*T,1))
l = df.lemp.values.reshape((N*T,1))
k = df.lcap.values.reshape((N*T,1))
x = np.hstack([ones, l, k])
x_label = ['intercept', 'lemp', 'lcap']

# Estimate FE and FD

In [18]:
### print FE estimation
# create a demeaning matrix
Q_T = lm.demeaning_matrix(T)
# transform the data
y_dot = lm.perm(Q_T, y)
x_dot = lm.perm(Q_T, x)

# remove columns with only zeros
x_dot, x_dot_label = lm.remove_zero_columns(x_dot, x_label)

# estimate 
fe_result = lm.estimate(y_dot, x_dot, transform='fe', T=T, robust=True)

# print 
lm.print_table((y_label, x_dot_label), fe_result, title="Fixed Effects", floatfmt='.4f')

---------------------------------------------
Fixed Effects
Dependent variable: ['ldsa']

        Beta      Se    t-values    p-values
----  ------  ------  ----------  ----------
lemp  0.7069  0.0282     25.0420      0.0000
lcap  0.1424  0.0224      6.3621      0.0000
R² = 0.468
σ² = 0.019
Robust standard errors: True


In [19]:
### print FD estimation
# Create transformation matrix
D_T = lm.fd_matrix(T) # (T-1)xT matrix

# transform the data
y_diff = lm.perm(D_T,y)
x_diff = lm.perm(D_T,x)

# remove columns with only zeros
x_diff, x_diff_label = lm.remove_zero_columns(x_diff, x_label)

# estimate 
fd_result = lm.estimate(y_diff, x_diff, transform = 'fd', robust=True)

# print
lm.print_table((y_label, x_diff_label), fd_result, title="First Difference", floatfmt='.4f')

---------------------------------------------
First Difference
Dependent variable: ['ldsa']

        Beta      Se    t-values    p-values
----  ------  ------  ----------  ----------
lemp  0.7253  0.0338     21.4418      0.0000
lcap  0.0547  0.0268      2.0403      0.0413
R² = 0.313
σ² = 0.022
Robust standard errors: True


# Check for autocorrelation in FE- and FD-residuals 
If serial correlation is present, then the coefficient would be equal to $\frac{-1}{(T-1)}$ under the null hypothesis (Wooldridge, p. 310-311).

In [20]:
### check for autocorrelation in FE-residuals (Woolridge p. 275)
FE_u_hat = fe_result.get('u_hat') 
# serial correlation regression
corr_result = lm.serial_corr(FE_u_hat, T, robust=True)
# print - H0: the time-demeaned errors are serially CORRELATED
lm.print_table(([r'$\hat{u}$', r'$\hat{u}_{t-1}$'], ['']), corr_result, title="Serial Correlation in FE", floatfmt='.4f')

---------------------------------------------
Serial Correlation in FE
Dependent variable: ['$\\hat{u}$', '$\\hat{u}_{t-1}$']

      Beta      Se    t-values    p-values
--  ------  ------  ----------  ----------
    0.2117  0.0318      6.6623      0.0000
R² = 0.044
σ² = 0.014
Robust standard errors: True


In [21]:
### check for autocorrelation in FE-residuals (Woolridge p. 275)
fd_u_hat = fd_result.get('u_hat')
# serial correlation regression
corr_result = lm.serial_corr(fd_u_hat , T, robust=True)
# print - H0: the time-demeaned errors are serially CORRELATED
lm.print_table(([r'$\hat{u}$', r'$\hat{u}_{t-1}$'], ['']), corr_result, title="Serial Correlation in FD", floatfmt='.4f')

# NULL: the error term should be serially uncorrelated
# finding of significant serial correlation in the error warrants computing the robust variance matrix for the FD estimator 
# -> we reject the null, which suggest that the error term is serially CORRELATED. Thus, we should use HETEROSKEDASTIC ERRORS


---------------------------------------------
Serial Correlation in FD
Dependent variable: ['$\\hat{u}$', '$\\hat{u}_{t-1}$']

       Beta      Se    t-values    p-values
--  -------  ------  ----------  ----------
    -0.2208  0.0309     -7.1452      0.0000
R² = 0.047
σ² = 0.022
Robust standard errors: True


# Exogenity test for FE
I'm unsure whether we should make any transformation or not. The test should be the leaded coefficient is different from zero.

In [22]:
# Step 7: Estimate FE adding different leads of x to the regrossers
fe_x_label = x_label[1:]
for i, var in enumerate(fe_x_label):
    test_ = lm.strict_exogeneity_test(y, x_dot, i, N=N, T=T,with_in_transformation=False, robust=True)
    x_test_labels = fe_x_label + [f'{var}_lead']
    lm.print_table((y_label,x_test_labels), test_, title='Exogeneity test', floatfmt='.4f')
    

---------------------------------------------
Exogeneity test
Dependent variable: ['ldsa']

              Beta      Se    t-values    p-values
---------  -------  ------  ----------  ----------
lemp        1.3479  0.2506      5.3782      0.0000
lcap       -0.0279  0.1890     -0.1477      0.8826
lemp_lead  -0.4595  0.2203     -2.0855      0.0370
R² = 0.017
σ² = 1.869
Robust standard errors: True
---------------------------------------------
Exogeneity test
Dependent variable: ['ldsa']

              Beta      Se    t-values    p-values
---------  -------  ------  ----------  ----------
lemp        1.2637  0.2307      5.4786      0.0000
lcap        0.1445  0.2044      0.7069      0.4797
lcap_lead  -0.4754  0.2161     -2.2003      0.0278
R² = 0.017
σ² = 1.868
Robust standard errors: True


Basically, we get that we FE.1. is violated. Lets just state this in the text, and then assume that it holds, going further.

## Hypothesis testing

Testing the null hypothesis that sum of the coefficients are equal to 1. 
$$
H_0 : \:\:\: \: R\beta = r
$$
where $R = [1, 1]$ and $r = 1$ which corresponds to $\beta_K+\beta_L = 1$

Wald statistic is given by:
$$
\begin{align*}
W = (R\hat{\beta} - r)'\left[R\hat{Avar(\beta)} R'\right]^{-1}(R\hat{\beta} - r)
\end{align*}
$$
Under $H_0$, $W \sim \chi^2_Q$ where $Q$ is equal to 1 in this case.

In [23]:
# Define the parameters for the hypothesis test
b_hat = fe_result.get('b_hat')
cov_mat = fe_result.get('cov')

R = np.array(([1,1])).reshape(1,-1)
r = np.array([1]).reshape(1,-1)

# Perform the Wald test
wald_stat, p_value = lm.wald_test(b_hat=b_hat, cov_mat=cov_mat, 
                                  R=R, r=r, # H0: Rb = r
                                  verbose=1 # if False, it will not print the results
                                  )
p_value

On a 5% significance level, we reject the null hypothesis.


9.871962136642765e-10