# Data Analysis

Objectives:

- [x] Determine if there has been a slow down in productivity growth.
- [ ] Determine if there is a relationship between productivity growth and creative destruction.

## Load in the data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
data = pd.read_csv('../data/merged.csv')

## Slowdown in creative destruction

## Has there been a slow down in productivity growth?

In [3]:
# group the data by NAICS industry, determine the change in tfp_growth from 1988 to 2022
df = data[["NAICS", "Industry", "Year", "TFP_growth"]].copy()

To reduce the volatility of the data, we will use a 5 year moving average to analyse the trends in productivity growth. This is because, annual data is subject to idiosyncratic shocks and is also vulnerable to business cycles. The 5 year moving average smooths out these shocks and cycles, allowing us to see the underlying trend in productivity growth.

In [4]:
# calculate five year moving average for TFP_growth, grouped by NAICS
df["TFP_growth_5yr"] = df.groupby("NAICS")["TFP_growth"].transform(
    lambda x: x.rolling(5, min_periods=5).mean()
)

# Create a pivot table with years 1992 and 2022
tfp_change_df = df[df["Year"].isin([1992, 2022])].pivot_table(
    index=["NAICS", "Industry"], columns="Year", values="TFP_growth_5yr"
)

# Calculate change from 1992 to 2022
tfp_change_df["change"] = tfp_change_df[2022] - tfp_change_df[1992]

# Display the results
tfp_change_df.sort_values("change", ascending=False, inplace=True)
tfp_change_df.rename_axis(None, axis=1, inplace=True)
tfp_change_df.reset_index(inplace=True)

tfp_change_df

Unnamed: 0,NAICS,Industry,1992,2022,change
0,55,Management of companies and enterprises,-0.020471,0.034691,0.055161
1,62,Health care and social assistance,-0.021248,0.003732,0.02498
2,56,Administrative and waste management services,-0.003985,0.015336,0.019321
3,51,Information,0.000679,0.017603,0.016925
4,53,Real estate and rental and leasing,-0.002028,0.01416,0.016188
5,54,"Professional, scientific, and technical services",0.014054,0.027785,0.013732
6,72,Accommodation and food services,0.001343,0.007154,0.00581
7,MN,Manufacturing sector,0.005146,0.002308,-0.002838
8,61,Educational services,-0.001599,-0.006349,-0.00475
9,81,"Other services, except government",0.001327,-0.004621,-0.005947


In [None]:
df["biggest_movers"] = 

Unnamed: 0,NAICS,Industry,Year,TFP_growth,TFP_growth_5yr
0,11,"Agriculture, forestry, fishing, and hunting",1987,,
1,11,"Agriculture, forestry, fishing, and hunting",1988,-0.078618,
2,11,"Agriculture, forestry, fishing, and hunting",1989,0.054822,
3,11,"Agriculture, forestry, fishing, and hunting",1990,0.054591,
4,11,"Agriculture, forestry, fishing, and hunting",1991,0.007561,
...,...,...,...,...,...
607,MN,Manufacturing sector,2018,0.013710,0.000849
608,MN,Manufacturing sector,2019,-0.016731,-0.004727
609,MN,Manufacturing sector,2020,-0.009090,-0.004432
610,MN,Manufacturing sector,2021,0.035791,0.006640


## Is there a relationship between productivity growth and creative destruction?

Here I will use a panel regression that includes industry and time fixed effects. The panel regression will allow us to control for unobserved heterogeneity across industries and time periods, which can help us isolate the effect of creative destruction on productivity growth. The fixed effects will also help us control for any time-invariant characteristics of the industries that may be correlated with productivity growth.

In [6]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from linearmodels.panel import PanelOLS

# Assuming your data is already a pandas DataFrame
df = data.copy()
df.dropna(inplace=True)

# Convert your data to a panel format
# You need a multi-index DataFrame with entity (NAICS) and time (Year)
df = df.set_index(['NAICS', 'Year'])

# Create dependent and independent variables
exog_vars = ['firm_death_rate', 'firm_birth_rate', 'job_reallocation_rate']
exog = df[exog_vars]
exog = sm.add_constant(exog)  # Add constant term
dependent = df['TFP_growth']

# Estimate the model with entity (NAICS) and time (Year) fixed effects
model = PanelOLS(dependent, 
                exog, 
                entity_effects=True,  # NAICS fixed effects
                time_effects=True)    # Year fixed effects

# Fit the model
results = model.fit()

# Display just the main results without all fixed effects
print(results.summary.tables[1])  # This table contains just the coefficients without fixed effects

                                   Parameter Estimates                                   
                       Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
-----------------------------------------------------------------------------------------
const                    -0.0016     0.0146    -0.1093     0.9130     -0.0304      0.0272
firm_death_rate           0.3323     0.1600     2.0768     0.0383      0.0180      0.6467
firm_birth_rate          -0.0003     0.0005    -0.6883     0.4916     -0.0013      0.0006
job_reallocation_rate    -0.0748     0.0447    -1.6722     0.0951     -0.1626      0.0131
