## **Exploratory analysis of dynamism from the BSD**

**Research objectives**
- RQ1: How has the composition of UK firms evolved over the past decade according to the BSD?
- RQ2: To what extent has the rate of creative destruction in the UK declined between 1997 and 2023? 
- RQ3: How have gaps between the most productive ‘frontier’ firms and ‘laggard’ firms evolved? 
- RQ4: How are changes in business dynamism and productivity dispersion related?

**Data source**: Business Structure Database (1998-2023). Aggregated data tables have been exported from the UK Data Service SecureLab

### Executive Summary
- The business population has grown.
- Stable entry and exit. Entry was particularly strong between 2012 and 2018.

In [None]:
# Import packages and set filepaths
import pandas as pd
import numpy as np
import altair as alt
from pandas.api.types import CategoricalDtype
import os

import_path =
export_path =

#### **Summary of data tables**
##### *Table 1 - Population and job flows*
This table provides information on the business population each year and job flows.Index is the year. The following dimensions are provided:
- Total
- Firm size (employment)
- Firm age
- Sector
- Region
- Within-industry productivity decile

|Year|Dimension|Category|Number of firms|Employment|Turnover|Entrants|Exits|JC|JD|Multi-site firms|Multi-site emp|Site expansion|Site contraction|
|----|---------|--------|---------------|----------|--------|--------|-----|--|--|----------------|--------------|--------------|----------------|
|2000|Total|All|
|2000|Size|Micro|
|2000|Size|Small|
|2000|Size|Medium|
|2000|Size|Large|

##### *Table 2 - Cohort analysis*
This table looks at cohorts of firms starting in each year and tracks the entire cohort by age. The followning dimensions are provided:
- Total
- Sector
- Region
- Firm size (employment)

|Cohort|Age|Dimension|Category|Number of firms|Avg size|Survival rate|KM rate|Share of employment|Share of turnover|High growth firms|Stagnant firms|
|------|---|---------|--------|---------------|--------|-------------|-------|-------------------|-----------------|-----------------|--------------|
|2000|0|Total|All|
|2000|1|Total|All|
|2000|2|Total|All|
|2000|3|Total|All|
|2000|4|Total|All|

##### *Table 3 - Growth rates*

|Year|Dimension|Category|Number of firms|Employment|Turnover|Entrants|Exits|JC|JD|Multi-site firms|Multi-site emp|Site expansion|Site contraction|
|----|---------|--------|---------------|----------|--------|--------|-----|--|--|----------------|--------------|--------------|----------------|
|2000|Total|All|
|2000|Size|Micro|
|2000|Size|Small|
|2000|Size|Medium|
|2000|Size|Large|

##### *Table 4 - Productivity dispersion*

|Year|Dimension|Category|Number of firms|P10_Prod|P25_Prod|P50_Prod|Mean_Prod|P75_Prod|P90_Prod|SD_Prod|
|----|---------|--------|---------------|--------|--------|--------|---------|--------|--------|-------|
|2000|Total|All|
|2000|Size|Micro|
|2000|Size|Small|
|2000|Size|Medium|
|2000|Size|Large|

In [None]:
#-------------------
#  Load data tables
#--------------------
population_df = 
cohort_df = 
growth_df =
prod_df = 

In [None]:
#-------------------------
# Create rate columns
#---------------------------

def calculate_dynamism_rates(df):
    # Make a copy to avoid modifying the original
    df = df.copy()
    
    # Sort data
    sort_cols = ['category','year']
    df = df.sort_values(sort_cols)
    
    # Create lagged employment (with or without grouping)
    if group_by_cols is None:
        df['total_employment_lagged'] = df['employment'].shift(1)
    else:
        df['total_employment_lagged'] = df.groupby(category)['employment'].shift(1)
    
    # Calculate rates (same regardless of grouping)
    df['Entry rate'] = (df['n_entrants'] + df['n_entry_and_exit']) / df['n_firms']
    df['Exit rate'] = (df['n_exiters'] + df['n_entry_and_exit']) / df['n_firms']
    df['Job creation rate'] = (df['jc_incumbents'] + df['jc_entrants']) / df['total_employment_lagged']
    df['Job destruction rate'] = (df['jd_incumbents'] + df['jd_exiters']) / df['total_employment_lagged']
    df['Entry job creation rate'] = (df['jc_entrants']) / df['total_employment_lagged']
    df['Incumbent job creation rate'] = (df['jc_incumbents']) / df['total_employment_lagged']
    df['Exit job destruction rate'] = (df['jd_exiters']) / df['total_employment_lagged']
    df['Incumbent job destruction rate'] = (df['jd_incumbents']) / df['total_employment_lagged']


    # We can't use the first/last year for dynamic variables due to no backward/forward looking observatinons
    years = df['year'].unique()
    df = df[~df['year'].isin([years.min(), years.max()])]

    return df



<details>
<summary> View data preprocessing code</summary>

hi

</details>

#### **1. The composition of the UK business population**

First, we want to assess what types of firms make up the business population in 2023. Big or small, young or old. Which types of firms contribute the most to economic activity?

How has this changed over the last 20 years? Can we learn anything about structural change in the economy?

**Overall section findings**

In [None]:
# BSD facts - how has the total number of firms, employment and turnover changed over time?

total_population_df = population_df[population_df['Dimension']=='Total']

n_firm_chart = alt.Chart(total_population_df).mark_line().encode(
    x=alt.X('year:O', axis=alt.Axis(
                labelExpr="datum.value % 2 == 0 ? datum.label : ''",  # Show every 2nd year
            labelAngle=0)),
    y=alt.Y('n_firms:Q',title='Total number of firms in BSD', scale=alt.Scale(domainMin=1500000, domainMax=2500000),axis=alt.Axis(format=".2s"))
)

emp_chart = alt.Chart(total_population_df).mark_line().encode(
    x=alt.X('year:O', axis=alt.Axis(
                labelExpr="datum.value % 2 == 0 ? datum.label : ''",  # Show every 2nd year
            labelAngle=0)),
    y=alt.Y('employment:Q',title='Total employment in BSD', scale=alt.Scale(domainMin=15000000, domainMax=22000000), axis=alt.Axis(format=".2s"))
)

turnover_chart = alt.Chart(total_population_df).mark_line().encode(
    x=alt.X('year:O', axis=alt.Axis(
                labelExpr="datum.value % 2 == 0 ? datum.label : ''",  # Show every 2nd year
            labelAngle=0)),
    y=alt.Y('turnover:Q',title='Total turnover in BSD', scale=alt.Scale(domainMin=, domainMax=), axis=alt.Axis(format=".2s"))
)

productivity_chart = alt.Chart(total_population_df).mark_line().encode(
    x=alt.X('year:O', axis=alt.Axis(
                labelExpr="datum.value % 2 == 0 ? datum.label : ''",  # Show every 2nd year
            labelAngle=0)),
    y=alt.Y('turnover_per_employee:Q',title='Average turnover per employee in BSD', scale=alt.Scale(domainMin=0, domainMax=), axis=alt.Axis(format=".2s"))
)

basic_facts_chart = n_firm_chart | emp_chart | turnover_chart | productivity_chart

**Key Findings**
- The business population has expanded over the last 20 years, with substantial growth taking place between 2011 and 2018.

**Questions to explore**

In [None]:
# Write helper functions to create formatted tables and charts to explore subsequently

# Additional variables to calculate
- Average employees per firm
- Average turnover per employee

### **2. Assessing the decline in business dynamism**
- Entry and exit rates
- Survival rates
- Growth rates
- Job reallocation rates



#### **Entry and exit rates**

This section examines firm entry and exit dynamics over time. Entry rates measure the flow of new firms into the market relative to the total population, while exit rates capture firms leaving the market. These metrics reveal the intensity of business turnover and provide insights into entrepreneurial activity, market competitiveness, and structural changes in the business environment.

**Headline findings**
- Entry and exit rates have remained relatively stable, there is no prominent decline unlike the US.

In [None]:
#----------------------------------
#  HEADLINE: entry and exit rates
#----------------------------------

# Process entry and exit rates from dataframe
total_population_df = population_df[population_df['dimension'] == 'Total']

total_entry_exit_df = total_population_df.melt(id_vars='year',value_vars=['Entry rate','Exit rate'])

# Display table of entry and exit rates in notebook

entry_exit_df.style.format({
    'entry_rate': '{:.2%}',
    'exit_rate': '{:.2%}',
    'entrants': '{:,.0f}',
    'exiters': '{:,.0f}'
}).background_gradient(cmap='YlOrRd', subset=['entry_rate', 'exit_rate'])

# Display chart of entry and exites rates in notebook
chart = alt.Chart(entry_exit_df).mark_line().encode(

)

In [None]:
#----------------------------------
#  SECTOR: entry and exit rates
#----------------------------------

sectoral_df = population_df[population_df['dimension'] == 'Sector']

sectoral_entry_exit_df = sectoral_df.melt(id_vars=['year','category'],value_vars=['Entry rate','Exit rate'])

# Display chart of 

In [None]:
#----------------------------------
#  REGION: entry and exit rates
#----------------------------------

In [None]:
#----------------------------------
#  FIRM SIZE: entry and exit rates
#----------------------------------

In [None]:
#----------------------------------
#  FIRM AGE: exit rates
#----------------------------------

#### **Survival and growth (cohort analysis)**
- Are firms surviving the same rate over time?
- Are firms growing at the same rate over time?
- Cross-sectional differences across industries: which sectors/regions perform  better?


##### Survival rates (cohorts)

In [None]:
#----------------------------------------
# 2.1 HEADLINE: Survival by Cohort
#---------------------------------------

print("\n>>> HEADLINE: Overall Survival by Cohort\n")
total_cohort_df = cohort_df[cohort_df['dimension']=='Total']


cohort_survival = cohort_df['cohort','age','km']

# Plot the survival rates of each cohort
chart = alt.Chart(total_cohort_df).mark_line().encode(
    x=alt.X('age:O'),
    y=alt.Y('km:Q'),
    color=alt.Color('cohort')
)
chart

# Probability that a firm reaches five years over cohorts.
threeyr_survival = cohort_survival[cohort_survival['age']==5]
fiveyr_survival = cohort_survival[cohort_survival['age']==5]


# Does the average survival probability change before/after GFC?


>>> HEADLINE: Overall Survival by Cohort



In [None]:
#----------------------------
# Survival across sectors
#--------------------------

In [None]:
#----------------------------
# Survival across regions
#--------------------------

##### Growth rates (cohorts)

In [None]:
#----------------------------------------
# HEADLINE: Growth rates by cohort
#--------------------------------------

#### **Annual growth rates**

Here we are interested in how existing firms are growing/shrinking each year, not just entering cohorts. This analysis focuses exclusively on incumbents firms.

At the firm-level, we calculate DHS growth rates.

#### **Job reallocation rates**

#### **Firm-level productivity dispersion**

#### **Within-firm productivity quartile analysis**

To distinguish between 'good' and 'bad' dynamism and to explore the link with productivity, we have classified firms into four categories with respect to within-industry productivity performance.