# Skill Mismatch and Labor Market Tightness in Local Markets

This notebook outlines the steps to measure the degree of skill mismatch between employer demands and worker supply across U.S. metropolitan areas. The analysis uses data from O*NET, BLS JOLTS, and the Census Bureau.

The goal is to compute:
1. A skill mismatch index (SMI) for each region.
2. Labor market tightness by region.
3. Visualize these metrics and offer policy recommendations.


## Step 1: Data Acquisition

In this step, we will load the necessary datasets from O*NET, BLS JOLTS, and Census Bureau. We assume that the datasets are saved as CSV files, but they may come in other formats (e.g., Excel, JSON). Replace the placeholders with actual file paths.

### O*NET Data
O*NET provides detailed information about the skills required for each occupation.

### BLS JOLTS Data
BLS JOLTS data contains information about job openings, hires, and separations by region.

### Census Bureau Data
The Census Bureau provides workforce characteristics by region, including education, occupation distribution, and unemployment data.


In [None]:
import pandas as pd

# Load O*NET data
onet_data = pd.read_csv('onet_skills.csv')  # Replace with actual file path

# Load BLS JOLTS data
jolts_data = pd.read_csv('jolts_vacancies.csv')  # Replace with actual file path

# Load Census Bureau data
census_data = pd.read_csv('census_workforce.csv')  # Replace with actual file path


## Step 2: Data Cleaning and Preprocessing

In this step, we will clean and merge the datasets on common keys, such as occupation codes and region codes. We will also handle missing values and ensure the data is in a usable format for further analysis.

In [None]:
# Merge datasets on common keys (occupation code and region code)
merged_data = onet_data.merge(jolts_data, on='occupation_code', how='inner')
merged_data = merged_data.merge(census_data, on='region_code', how='inner')

# Check for missing values
merged_data.isnull().sum()

# Fill missing values with median for numeric columns (you can modify this as needed)
merged_data.fillna(merged_data.median(), inplace=True)  # Example for numeric data


## Step 3: Initial Calculations

In this step, we will calculate the Skill Mismatch Index (SMI) and Labor Market Tightness.

### Skill Mismatch Index (SMI)
The Skill Mismatch Index measures the difference between the share of workers possessing a skill and the share of job vacancies requiring that skill.

### Labor Market Tightness
Labor market tightness is defined as the ratio of vacancies to unemployed workers. This provides a measure of how tight the labor market is.


In [None]:
# Define function to compute Skill Mismatch Index (SMI) for each region and skill category
def compute_smi(worker_skills, job_vacancies):
    mismatch = abs(worker_skills - job_vacancies)
    return mismatch.sum()

# Apply the SMI calculation to the merged dataset (assuming 'worker_skill_share' and 'job_vacancy_share' columns)
merged_data['skill_mismatch'] = merged_data.apply(
    lambda x: compute_smi(x['worker_skill_share'], x['job_vacancy_share']), axis=1
)

# Calculate labor market tightness (vacancies / unemployed)
merged_data['tightness'] = merged_data['vacancies'] / merged_data['unemployed']


## Step 4: Visualization and Analysis

Now that we have calculated the Skill Mismatch Index and labor market tightness, we can visualize the results. We will create:
1. A heatmap showing skill mismatch by region.
2. A scatter plot displaying the relationship between labor market tightness and skill mismatch.

These visualizations will help identify regions with high mismatch and tight labor markets.


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Heatmap of Skill Mismatch by Region
plt.figure(figsize=(12, 8))
sns.heatmap(merged_data.pivot('region_name', 'skill_category', 'skill_mismatch'), cmap="YlGnBu", annot=True)
plt.title("Skill Mismatch by Region")
plt.xlabel("Skill Category")
plt.ylabel("Region")
plt.show()


In [None]:

# Scatter plot of Labor Market Tightness vs. Skill Mismatch
plt.figure(figsize=(10, 6))
plt.scatter(merged_data['tightness'], merged_data['skill_mismatch'], alpha=0.6)
plt.title("Labor Market Tightness vs. Skill Mismatch")
plt.xlabel("Labor Market Tightness")
plt.ylabel("Skill Mismatch")
plt.show()