### Summary of Wage Data Series IDs (Technology Industry):
| Series ID             | Category                                     | Area              | Adjustment              | Description                                                                 |
|-----------------------|----------------------------------------------|-------------------|-------------------------|-----------------------------------------------------------------------------|
| CMU2025100100000D      | Total Compensation (Technology Industry)     | U.S. city average  | Not seasonally adjusted | Tracks total compensation for all workers in the technology industry, including wages and benefits. |
| CMU2025100100000P      | Total Compensation (Technology Industry)     | U.S. city average  | Seasonally adjusted     | Tracks total compensation for all workers in the technology industry, including wages and benefits, adjusted for seasonal effects. |
| CMU2025100120000D      | Wages and Salaries (Technology Industry)     | U.S. city average  | Not seasonally adjusted | Tracks wages and salaries for workers in the technology industry, excluding benefits. |
| CMU2025100120000P      | Wages and Salaries (Technology Industry)     | U.S. city average  | Seasonally adjusted     | Tracks wages and salaries for workers in the technology industry, excluding benefits, adjusted for seasonal effects. |
| CMU2025100200000D      | Benefits (Technology Industry)              | U.S. city average  | Not seasonally adjusted | Tracks the cost of benefits (e.g., health insurance, retirement plans) for workers in the technology industry. |
| CMU2025100200000P      | Benefits (Technology Industry)              | U.S. city average  | Seasonally adjusted     | Tracks the cost of benefits (e.g., health insurance, retirement plans) for workers in the technology industry, adjusted for seasonal effects. 

### Wage Data Explanation

The **wage data** in this dataset tracks changes in compensation for workers in the technology industry over time. The dataset includes data on **total compensation**, **wages and salaries**, and **benefits** for workers in the private sector.

- **Wage Value**: Represents the average compensation for workers in the technology industry during a given period. The value is presented in terms of dollars per hour worked.
  - For example, if the **wage value** is **$50**, it means that, on average, workers in the technology sector earned $50 per hour during that quarter.

- **Uses of Wage Data**:
  - **Tracking Wage Inflation**: This dataset allows for the calculation of wage inflation (percentage change in wages) over time, helping to assess whether compensation is keeping up with the cost of living.
  - **Labor Market Trends**: Wage data helps to identify trends in the labor market, including which industries are experiencing wage growth or stagnation.
  - **Economic Policy**: Wage growth is often used as an indicator of labor market health, and is monitored by policymakers for decisions on employment and economic growth.

In this dataset, the wage data is broken down into specific categories:
- **Total Compensation**: Includes wages, salaries, and benefits.
- **Wages and Salaries**: Excludes benefits and focuses solely on direct pay.
- **Benefits**: Tracks the value of non-wage compensation, such as health insurance and retirement benefits.


In [1]:
import pandas as pd

# Load the CSV file into a DataFrame
df = pd.read_csv('../Resources/BLS_Wages.csv')

# Check the first few rows of the data to understand its structure
print(df.head())

           Series ID  Year Period      Label  Value
0  CMU2025100100000D  2018    Q01  2018 Qtr1  48.44
1  CMU2025100100000D  2018    Q02  2018 Qtr2  49.08
2  CMU2025100100000D  2018    Q03  2018 Qtr3  49.92
3  CMU2025100100000D  2018    Q04  2018 Qtr4  49.00
4  CMU2025100100000D  2019    Q01  2019 Qtr1  49.87


In [3]:
# Extract the quarter number from the 'Period' column (removing the 'Q')
df['Quarter'] = df['Period'].str.extract(r'(\d+)').astype(int)  # Extract and convert to integer

# Convert 'Year' and 'Quarter' into a proper quarterly datetime format
df['Date'] = pd.PeriodIndex(year=df['Year'], quarter=df['Quarter'], freq='Q').to_timestamp()

# Drop the now redundant 'Year', 'Period', 'Label', and 'Quarter' columns
df = df.drop(columns=['Year', 'Period', 'Label', 'Quarter'])

# Set 'Date' as the index (optional for time series analysis)
df.set_index('Date', inplace=True)

# Check the first few rows to verify changes
print(df.head())

                    Series ID  Value
Date                                
2018-01-01  CMU2025100100000D  48.44
2018-04-01  CMU2025100100000D  49.08
2018-07-01  CMU2025100100000D  49.92
2018-10-01  CMU2025100100000D  49.00
2019-01-01  CMU2025100100000D  49.87


In [4]:
unique_values = df['Series ID'].unique()
unique_values_list = list(unique_values)
print(unique_values_list)

['CMU2025100100000D', 'CMU2025100100000P', 'CMU2025100120000D', 'CMU2025100120000P', 'CMU2025100200000D', 'CMU2025100200000P']


In [7]:
# Filter to remove seasonally adjusted Series IDs (those ending with 'P')
df = df[df['Series ID'].str.endswith('D')].copy()

# Filter for each non-seasonally adjusted compensation category using Series ID
total_compensation_df = df[df['Series ID'] == 'CMU2025100100000D'].copy()
wages_salaries_df = df[df['Series ID'] == 'CMU2025100120000D'].copy()
benefits_df = df[df['Series ID'] == 'CMU2025100200000D'].copy()

# Calculate the quarter-over-quarter inflation rate for each compensation category
total_compensation_df.loc[:, 'Inflation Rate'] = total_compensation_df['Value'].pct_change() * 100
wages_salaries_df.loc[:, 'Inflation Rate'] = wages_salaries_df['Value'].pct_change() * 100
benefits_df.loc[:, 'Inflation Rate'] = benefits_df['Value'].pct_change() * 100




In [10]:
# Add a column to identify each compensation category
total_compensation_df['Category'] = 'Total Compensation'
wages_salaries_df['Category'] = 'Wages and Salaries'
benefits_df['Category'] = 'Benefits'

# Combine all the DataFrames into one
combined_wage_df = pd.concat([total_compensation_df, wages_salaries_df, benefits_df])

# Convert the Date index to YYYY-MM format using PeriodIndex
combined_wage_df.index = combined_wage_df.index.to_period('M')

# Clean up the combined DataFrame
combined_wage_df = combined_wage_df[['Series ID', 'Value', 'Inflation Rate', 'Category']]

# Fill missing Inflation Rate values with 0
combined_wage_df['Inflation Rate'].fillna(0, inplace=True)

# Check the first few rows of the combined DataFrame
print(combined_wage_df.head(10))
print(combined_wage_df.tail(10))

                 Series ID  Value  Inflation Rate            Category
Date                                                                 
2018-01  CMU2025100100000D  48.44        0.000000  Total Compensation
2018-04  CMU2025100100000D  49.08        1.321222  Total Compensation
2018-07  CMU2025100100000D  49.92        1.711491  Total Compensation
2018-10  CMU2025100100000D  49.00       -1.842949  Total Compensation
2019-01  CMU2025100100000D  49.87        1.775510  Total Compensation
2019-04  CMU2025100100000D  50.21        0.681773  Total Compensation
2019-07  CMU2025100100000D  50.68        0.936069  Total Compensation
2019-10  CMU2025100100000D  50.94        0.513023  Total Compensation
2020-01  CMU2025100100000D  50.82       -0.235571  Total Compensation
2020-04  CMU2025100100000D  50.73       -0.177096  Total Compensation
                 Series ID  Value  Inflation Rate  Category
Date                                                       
2021-10  CMU2025100200000D  25.87       

In [9]:
import os

# Step 1: Create the 'Dataframes' folder if it doesn't exist
folder_path = 'Dataframes'
if not os.path.exists(folder_path):
    os.makedirs(folder_path)
    print(f"Folder created: {folder_path}")
else:
    print(f"Folder already exists: {folder_path}")

# Step 2: Define the CSV file path for wage data
csv_path = os.path.join(folder_path, 'combined_wage_data.csv')

# Step 3: Save the combined wage DataFrame as a CSV file
combined_wage_df.to_csv(csv_path, index=True)
print(f"DataFrame saved as CSV at {csv_path}")

Folder created: Dataframes
DataFrame saved as CSV at Dataframes\combined_wage_data.csv
