# Production Technology

The dataset contains `N = 441` firms observed over `T = 12` years, 1968-1979. There variables are: 
* `lcap`: Log of capital stock, $k_{it}$ 
* `lemp`: log of employment, $\ell_{it}$ 
* `ldsa`: log of deflated sales, $y_{it}$
* `year`: the calendar year of the observation, `year` $ = 1968, ..., 1979$, 
* `firmid`: anonymized indicator variable for the firm, $i = 1, ..., N$, with $N=441$. 

In [1]:
import pandas as pd 
import numpy as np
import seaborn as sns
from scipy.stats import chi2
import numpy.linalg as la
from numpy import linalg as la
import scipy.stats as st
import importlib.util, sys


In [2]:
# --- loader ---
def load_firm_data(path="firms.csv"):
    dat = pd.read_csv(path)
    y = dat["ldsa"].values.reshape(-1, 1)
    x = np.column_stack([np.ones(dat.shape[0]), dat["lcap"].values, dat["lemp"].values])
    T = dat["year"].nunique()  # 12
    label_y = "Log deflated sales"
    label_x = ["Constant", "Capital", "Labor"]
    return y, x, T, dat["year"].values, label_y, label_x



In [3]:
import w3_LinearModels as lm

# Load firm data
y, x, T, year, label_y, label_x = load_firm_data()

# Pooled OLS estimation
pols_result = lm.estimate(y, x, T=T)

# Print results in a nice table
lm.print_table((label_y, label_x), pols_result, title="Pooled OLS", floatfmt='.4f')


Pooled OLS
Dependent variable: Log deflated sales

            Beta      Se    t-values
--------  ------  ------  ----------
Constant  0.0000  0.0050      0.0000
Capital   0.3100  0.0091     33.9237
Labor     0.6748  0.0102     66.4625
R² = 0.914
σ² = 0.131


In [4]:
pols_result_robust = lm.estimate(y, x, T=T, robust_se=True)
lm.print_table((label_y, label_x), pols_result_robust, title="Pooled OLS (Robust SE)", floatfmt='.4f')

Pooled OLS (Robust SE)
Dependent variable: Log deflated sales

            Beta      Se    t-values
--------  ------  ------  ----------
Constant  0.0000  0.0161      0.0000
Capital   0.3100  0.0324      9.5810
Labor     0.6748  0.0366     18.4526
R² = 0.914
σ² = 0.131


In [5]:
def remove_zero_columns(x, label_x):
    """
    The function removes columns from a matrix that are all zeros and returns the updated matrix and
    corresponding labels.
    
    Args:
      x: The parameter `x` is a numpy array representing a matrix with columns that may contain zeros.
      label_x: The parameter `label_x` is a list that contains the labels for each column in the input
    array `x`.
    
    Returns:
      x_nonzero: numpy array of x with columns that are all zeros removed.
      label_nonzero: list of labels for each column in x_nonzero.
    """
    
    # Find the columns that are not all zeros
    nonzero_cols = ~np.all(x == 0, axis=0)
    
    # Remove the columns that are all zeros
    x_nonzero = x[:, nonzero_cols]
    
    # Get the labels for the columns that are not all zeros
    label_nonzero = [label_x[i] for i in range(len(label_x)) if nonzero_cols[i]]
    return x_nonzero, label_nonzero

In [6]:
# Transform the data
Q_T = np.eye(T) - 1/T * np.ones((T, T))
y_dot = lm.perm(Q_T, y)
x_dot = lm.perm(Q_T, x)

# Remove the columns that are only zeroes
x_dot, label_x_dot = remove_zero_columns(x_dot, label_x)

# Estimate 
fe_result = lm.estimate(y_dot, x_dot, transform='fe', T=T, )
lm.print_table((label_y, label_x_dot), fe_result, title="Fixed Effects", floatfmt='.4f')

Fixed Effects
Dependent variable: Log deflated sales

                    Beta                   Se    t-values
--------  --------------  -------------------  ----------
Constant  167566786.2620  26706560837548.8555      0.0000
Capital           0.1546               0.0130     11.9299
Labor             0.6942               0.0147     47.2398
R² = 0.477
σ² = 0.018


In [7]:
# Transform the data
D_T = - np.eye(T-1, T) + np.eye(T-1, T, k=1)
y_diff = lm.perm(D_T, y)
x_diff = lm.perm(D_T, x)

# Remove the columns that are only zeroes
x_diff, label_x_diff = remove_zero_columns(x_diff, label_x)

# Estimate 
fd_result = lm.estimate(y_diff, x_diff, transform='fd', T=T-1)
lm.print_table((label_y, label_x_diff), fd_result, title="First Difference", floatfmt='.4f')

First Difference
Dependent variable: Log deflated sales

           Beta      Se    t-values
-------  ------  ------  ----------
Capital  0.0630  0.0191      3.3043
Labor    0.5487  0.0183     29.9635
R² = 0.165
σ² = 0.014


In [8]:
# Transform the data
P_T = np.ones((1,T)) * 1/T
y_mean = lm.perm(P_T, y)
x_mean = lm.perm(P_T, x)

# Estimate 
be_result = lm.estimate(y_mean, x_mean, transform='be', T=T)
lm.print_table((label_y, label_x), be_result, title="Between Estimator", floatfmt='.4f')

Between Estimator
Dependent variable: Log deflated sales

            Beta      Se    t-values
--------  ------  ------  ----------
Constant  0.0000  0.0161      0.0000
Capital   0.3188  0.0309     10.3230
Labor     0.6672  0.0343     19.4572
R² = 0.923
σ² = 0.115


In [9]:
# Calculate lambda (note lambda is a reserved keyword in Python, so we use _lambda instead)
sigma2_u = fe_result['sigma2']
sigma2_w = be_result['sigma2']
sigma2_c = sigma2_w - 1/T * sigma2_u
_lambda = 1 - np.sqrt(sigma2_u / (sigma2_u + T*sigma2_c))

# Print lambda 
print(f'Lambda is approximately equal to {_lambda.item():.4f}.')

Lambda is approximately equal to 0.8873.


In [10]:
# Transform the data
C_T = - np.eye(T, T) + _lambda * P_T
y_re = lm.perm(C_T, y)
x_re = lm.perm(C_T, x)

# Estimate 
re_result = lm.estimate(y_re, x_re, transform='re', T=T)
lm.print_table((label_y, label_x), re_result, title="Random Effects", floatfmt='.4f')

Random Effects
Dependent variable: Log deflated sales

            Beta      Se    t-values
--------  ------  ------  ----------
Constant  0.0000  0.0162      0.0000
Capital   0.1989  0.0117     17.0246
Labor     0.7197  0.0131     54.8250
R² = 0.642
σ² = 0.018


In [11]:

# Unpack
b_fe = fe_result['b_hat'][1:3,:]      # Select only Capital and Labor coefficients
b_re = re_result['b_hat'][1:3,:]      # Select only Capital and Labor coefficients
cov_fe = fe_result['cov'][1:3,1:3]    # Select only Capital and Labor covariance
cov_re = re_result['cov'][1:3,1:3]    # Select only Capital and Labor covariance

# Calculate the test statistic
b_diff = b_fe - b_re
cov_diff = cov_fe - cov_re
H = b_diff.T @ la.inv(cov_diff) @ b_diff

# Find critical value and p-value at 5% significance level of chi^2 with M degrees of freedom
M = len(b_diff)
crit_val = chi2.ppf(0.95, M)
p_val = 1 - chi2.cdf(H.item(), M)

# Print the results
print(f'The test statistic is {H.item():.2f}.')
print(f'The critical value at a 5% significance level is {crit_val:.2f}.')
print(f'The p-value is {p_val:.16f}.')


The test statistic is 73.51.
The critical value at a 5% significance level is 5.99.
The p-value is 0.0000000000000001.


denne sektion skal nok ind i py filen

In [12]:
def crs_test(results):
    """
    Wald test of H0: beta_K + beta_L = 1
    where beta_K = coefficient on 'Capital'
          beta_L = coefficient on 'Labor'
    """
    # Extract coefficients (skip constant if included)
    b = results['b_hat'][1:, :]       # Capital, Labor
    cov = results['cov'][1:, 1:]      # covariance of (Capital, Labor)

    # Restriction: R*b = q
    R = np.array([[1, 1]])            # [1,1] * [betaK, betaL]
    q = np.array([[1]])               # H0: = 1

    # Wald statistic
    Rb = R @ b
    diff = Rb - q
    var_Rb = R @ cov @ R.T
    W = float(diff.T @ la.inv(var_Rb) @ diff)

    # Chi-squared(1) distribution
    p_value = 1 - st.chi2.cdf(W, 1)
    return W, p_value

# Example: test CRS under Fixed Effects
W, pval = crs_test(fe_result)
print("CRS Wald test statistic:", W)
print("p-value:", pval)


CRS Wald test statistic: 135.16206666242363
p-value: 0.0


  W = float(diff.T @ la.inv(var_Rb) @ diff)


In [13]:
import numpy as np
from numpy import linalg as la
import pandas as pd
from io import StringIO
from tabulate import tabulate
from matplotlib import pyplot as plt

#Supress Future Warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# Import this weeks LinearModels .py file
import w2_LinearModels_post as lm
%load_ext autoreload
%autoreload 2

In [14]:
# First, import the data into numpy. 
# Data should load the wagepan.txt file.
data = pd.read_csv('firms.csv')
id_array = np.array(data.iloc[:, 0])

# Count how many persons we have. This returns a tuple with the unique IDs,
# and the number of times each person is observed.
unique_id = np.unique(id_array, return_counts=True)
N = unique_id[0].size
T = int(unique_id[1].mean())
year = np.array(data.iloc[:, 1], dtype=int)

In [15]:
# Load the rest of the data into arrays.
# Load the rest of the data into arrays.
y = data['lcap'].to_numpy().reshape(-1, 1)

# x needs to have a constant vector in the first row. How would you add this? 
# Note that the order is set to match the order of variables in the model.
x = np.column_stack([
    np.ones(N * T),
    data['lemp'].to_numpy(),
    data['ldsa'].to_numpy()
])

# Lets also make some variable names
label_y = 'Log capital'
label_x = [
    'Constant',
    'Log employment',
    'Log DSA'
]


In [16]:
# Estimate coefficients
b_hat = lm.est_ols(y,x)


In [17]:
# Calculate the residuals
resid = y - x @ b_hat

# Calculate estimate of variance of residuals
SSR = resid.T @ resid
K = x.shape[1]
sigma = SSR / (N*T - K)

# Calculate the variance-covariance matrix
cov = sigma * la.inv(x.T @ x)

# Calculate the standard errors 
# Make sure to output the result in a vector
se = np.sqrt(np.diag(cov)).reshape(-1,1)

#Print results
for label, b_k, se_k in zip(label_x, b_hat, se):
    print(f'{label:16}: {b_k[0]:7.4f}    ({se_k[0]:6.4f})')

Constant        : -0.0000    (0.0068)
Log employment  :  0.4411    (0.0177)
Log DSA         :  0.5764    (0.0170)


In [18]:
# Estimate model using OLS
ols_result = lm.estimate(y,x, N=N, T=T)

In [19]:
# Create transformation matrix
def demeaning_matrix(T):
    Q_T = np.eye(T) - np.tile(1/T, (T, T))
    return Q_T

In [20]:
# Transform the data
y_demean = lm.perm(Q_T, y)
x_demean = lm.perm(Q_T, x)


In [21]:
# Create function to check rank of demeaned matrix, and return its eigenvalues.
def check_rank(x):
    print(f'Rank of demeaned x: {la.matrix_rank(x)}')
    lambdas, V = la.eig(x.T@x)
    np.set_printoptions(suppress=True)  # This is just to print nicely.
    print(f'Eigenvalues of within-transformed x: {lambdas.round(decimals=0)}')
    print(V)
    # Use eigen vectors to identify which variables are dropped.

# Check rank of demeaned x
check_rank(x_demean)

Rank of demeaned x: 2
Eigenvalues of within-transformed x: [  0.  44. 237.]
[[ 1.         -0.          0.        ]
 [ 0.          0.78215475 -0.62308422]
 [ 0.         -0.62308422 -0.78215475]]


In [22]:
# Choose variables to include in fixed effects model
x_demean = x_demean[:, 1:5]
label_x_fe = label_x[1:5]

In [23]:
# Estimate FE OLS using the demeaned variables.
fe_result = lm.estimate(y_demean, x_demean, transform='fe', N=N, T=T)


In [24]:
# Create transformation matrix
def fd_matrix(T):
    D_T = np.eye(T) - np.eye(T, k=-1)
    D_T = D_T[1:]
    return D_T

# Print the matrix
D_T = fd_matrix(T)
print(f'First differening matrix for T={T} \n', D_T)

First differening matrix for T=12 
 [[-1.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0. -1.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0. -1.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0. -1.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0. -1.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0. -1.  1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0. -1.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0. -1.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0. -1.  1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0. -1.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0. -1.  1.]]


In [25]:
# Transform the data.
y_diff = lm.perm(D_T, y)
x_diff = lm.perm(D_T, x)

# Print x_diff
print(x_diff)

[[ 0.         0.000907  -0.0365607]
 [ 0.        -0.023856   0.0350736]
 [ 0.        -0.052741  -0.112822 ]
 ...
 [ 0.         0.102943   0.2348499]
 [ 0.         0.048671  -0.0499039]
 [ 0.         0.056783  -0.119419 ]]


In [26]:
# Check rank condition.
check_rank(x_diff)

Rank of demeaned x: 2
Eigenvalues of within-transformed x: [93. 31.  0.]
[[ 0.          0.          1.        ]
 [ 0.42938025  0.9031238   0.        ]
 [ 0.9031238  -0.42938025  0.        ]]


In [27]:
# Choose variables to include in fixed effects model
x_diff = x_diff[:, 1:5]
label_x_fd = label_x[1:5]

In [28]:
# Estimate FE OLS using the demeaned variables.
fd_result = lm.estimate(y_diff, x_diff, transform='fd', N=N, T=T-1)

# Print results
lm.print_table((label_y, label_x_fd), fd_result, title='FD regression', floatfmt='.4f')

FD regression
Dependent variable: Log capital

                  Beta      Se    t-values
--------------  ------  ------  ----------
Log employment  0.1167  0.0149      7.8236
Log DSA         0.0357  0.0108      3.3043
R² = 0.022
σ² = 0.008


In [29]:
# Make function to calculate the serial correlation
def serial_corr(y, x, T):
    # Calculate the residuals
    b_hat = lm.est_ols(y, x)
    e = y - x@b_hat
    
    # Create a lag transformation matrix
    L_T = np.eye(T, k=-1)
    L_T = L_T[1:]

    # Lag residuals
    e_l = lm.perm(L_T, e)

    # Create a transformation matrix that removes the first observation of each individual
    I_T = np.eye(T, k=0)
    I_T = I_T[1:]
    
    # Remove first observation of each individual
    e = lm.perm(I_T, e)
    
    # Calculate the serial correlation
    return lm.estimate(e, e_l,N=N,T=T-1)

In [30]:
# Estimate serial correlation
corr_result = serial_corr(y_diff, x_diff, T-1)

# Print results
label_ye = 'OLS residual, e\u1d62\u209c'
label_e = ['e\u1d62\u209c\u208B\u2081']
lm.print_table(
    (label_ye, label_e), corr_result, 
    title='Serial Correlation', floatfmt='.4f'
)

Serial Correlation
Dependent variable: OLS residual, eᵢₜ

         Beta      Se    t-values
-----  ------  ------  ----------
eᵢₜ₋₁  0.2028  0.0144     14.0426
R² = 0.043
σ² = 0.008


In [31]:
# Lead employment
F_T = np.eye(T, k=1)
F_T = F_T[:-1]

employment_lead = lm.perm(F_T, x[:, 1].reshape(-1, 1))

In [32]:
# Remove the last observed year for every individual
I_T = np.eye(T, k=0)
I_T = I_T[:-1]

x_exo = lm.perm(I_T, x)
y_exo = lm.perm(I_T, y)

In [33]:
# Add employment_lead to x_exo
x_exo = np.hstack((x_exo, employment_lead))

# Within transform the data
Q_T = demeaning_matrix(T - 1)
yw_exo = lm.perm(Q_T, y_exo)
xw_exo = lm.perm(Q_T, x_exo)

# Select variables
xw_exo = np.hstack((xw_exo[:, 1:5], xw_exo[:, -1].reshape(-1, 1)))

In [34]:
# Estimate model
exo_test = lm.estimate(yw_exo, xw_exo, N=N, T=T - 1, transform='fe')

# Print results
label_exo = label_x_fe + ['Employment lead']
lm.print_table((label_y, label_exo), exo_test, title='Exogeneity test', floatfmt='.4f')

Exogeneity test
Dependent variable: Log capital

                    Beta        Se    t-values
---------------  -------  --------  ----------
Log employment    1.6418    0.0424     38.7010
Log DSA           0.2935    0.0265     11.0949
Employment lead  -2.0000  nan         nan
R² = -0.748
σ² = 0.051


  se = np.sqrt(cov.diagonal()).reshape(-1, 1)


In [35]:
# Create lead of employment
F_T = np.eye(T, k=1)[:-1]  # shifts forward, drops last row
employment_lead = lm.perm(F_T, x[:, 2].reshape(-1, 1))  # employment is column 2

# Drop last year for everyone (so dimensions match)
I_T = np.eye(T)[:-1]
x_exo = lm.perm(I_T, x)
y_exo = lm.perm(I_T, y)  # <-- now y is ldsa

# Add employment_lead to regressors
x_exo = np.hstack((x_exo, employment_lead))

# Within transform (demean)
Q_T = demeaning_matrix(T - 1)
yw_exo = lm.perm(Q_T, y_exo)
xw_exo = lm.perm(Q_T, x_exo)

# Keep the usual regressors + the lead
xw_exo = np.hstack((xw_exo[:, 1:3], xw_exo[:, -1].reshape(-1, 1)))  
# columns: capital, employment, lead employment

# Estimate FE model with the lead
exo_test = lm.estimate(yw_exo, xw_exo, N=N, T=T-1, transform='fe')

# Print results
label_exo = ["Capital", "Employment", "Lead employment"]
lm.print_table(("Output (ldsa)", label_exo), exo_test, title="Strict exogeneity test", floatfmt=".4f")


Strict exogeneity test
Dependent variable: Output (ldsa)

                   Beta      Se    t-values
---------------  ------  ------  ----------
Capital          0.4752  0.0193     24.6305
Employment       0.1586  0.0197      8.0618
Lead employment  0.0421  0.0180      2.3371
R² = 0.325
σ² = 0.020


In [36]:
# Create FD transformation matrix
D_T = np.eye(T-1, T, k=1) - np.eye(T-1, T)

# Transform y and x with FD
y_fd = lm.perm(D_T, y)
x_fd = lm.perm(D_T, x)

# Create lead of Δemployment (column 2 of x_fd is Δemployment)
F_T_fd = np.eye(T-2, T-1, k=1)   # shifts forward in differences
employment_lead_fd = lm.perm(F_T_fd, x_fd[:, 2].reshape(-1, 1))

# Drop last FD observation for alignment
I_T_fd = np.eye(T-1)[:-1]
x_fd_exo = lm.perm(I_T_fd, x_fd)
y_fd_exo = lm.perm(I_T_fd, y_fd)

# Add lead of Δemployment
x_fd_exo = np.hstack((x_fd_exo, employment_lead_fd))

# Keep same regressors: Δcapital, Δemployment, Δemployment_lead
x_fd_exo = np.hstack((x_fd_exo[:, 1:3], x_fd_exo[:, -1].reshape(-1, 1)))

# Estimate FD model
exo_test_fd = lm.estimate(y_fd_exo, x_fd_exo, N=N, T=T-2, transform="fd")

# Print results
labels_fd = ["ΔCapital", "ΔEmployment", "Lead ΔEmployment"]
lm.print_table(("ΔOutput (ldsa)", labels_fd), exo_test_fd,
               title="Strict Exogeneity Test (FD)", floatfmt=".4f")


Strict Exogeneity Test (FD)
Dependent variable: ΔOutput (ldsa)

                    Beta      Se    t-values
----------------  ------  ------  ----------
ΔCapital          0.0986  0.0159      6.1859
ΔEmployment       0.0473  0.0116      4.0699
Lead ΔEmployment  0.0532  0.0108      4.9361
R² = 0.028
σ² = 0.008


In [37]:
from w3_LinearModels import fd_exogeneity_lead_test


# FD test in log differences (assignment style)
fd_exogeneity_lead_test(y, x, N, T, cap_col=1, emp_col=2, logs=True, drop_zeros=True)


FD Exogeneity Test (lead-variable)
Model: Δ log y_it = b1 Δ log K_it + b2 Δ log L_it + b3 Δ log L_{i,t+1} + Δu_it

              Variable        Beta          SE         t
                 const     -0.0000      0.0035     -0.01
          Δlog Capital      0.0753      0.0345      2.18
       Δlog Employment     -0.0015      0.0241     -0.06
  Lead Δlog Employment      0.0338      0.0192      1.76

Test H0: b3 (Lead term) = 0 → t=1.76, p=0.08036 (df clustered=141)
→ Do NOT reject exogeneity in FD.
