---
date: "Monday June 16th, 2025"
---

# Introduction to the 2024-2025 AI Job Market Dataset

## Dataset Overview and Context

This analysis explores the United States AI Job Market dataset from 2024-2025. The dataset contains detailed information on artificial intelligence and machine learning job postings in the United States. The dataset was sourced from [Kaggle](https://www.kaggle.com/datasets/bismasajjad/global-ai-job-market-and-salary-trends-2025) and represents a comprehensive collection of AI-related positions, including details on compensation, required skills, experience levels, and employment characteristics.

## Connection to STEAMe's Mission

STEAMe operates as "The Engine for Workforce Transformation", providing a platform that connects job seekers, training providers, and employers. Our analysis of this AI job market dataset directly supports STEAMe's mission by:

1. Helping **learners** identify in-demand AI skills and promising career pathways.
2. Enabling **employers** to benchmark their compensation offerings and skill requirements.
3. Guiding **learning providers** in developing curricula aligned with market demands.
4. Supporting **workforce intermediaries** in connecting system partners through data-driven insights.

## Problem Statement

The primary questions we aim to answer through this analysis are:

1. What is the current state of the AI job market in terms of compensation, required skills, and experience levels?
2. How do factors such as industry, company size, and job characteristics influence salary offerings?
3. Which AI and ML skills are most valuable in today's job market?
4. What are the optimal pathways for job seekers looking to enter or advance in AI careers?

By answering these questions, we will generate actionable insights that can inform STEAMe's platform development and help stakeholders make more informed decisions about training, hiring, and career development in the AI field.

## Exploratory Data Analysis

In [31]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as make_subplots

# Set visualization styles
plt.style.use("seaborn-v0_8-whitegrid")
sns.set_palette("colorblind")

# Load the dataset
full_df = pd.read_csv("./data/ai_job_dataset_united_states.csv", header=0)

# Display the count of the rows and columns
print(f"Dataset shape:\nRows: {full_df.shape[0]}\nColumns: {full_df.shape[1]}")
print("\nFirst 5 rows of the dataset:")
display(full_df.head(5))

Dataset shape:
Rows: 724
Columns: 19

First 5 rows of the dataset:


Unnamed: 0,job_id,job_title,salary_usd,salary_currency,experience_level,employment_type,company_location,company_size,employee_residence,remote_ratio,required_skills,education_required,years_experience,industry,posting_date,application_deadline,job_description_length,benefits_score,company_name
0,AI00022,Autonomous Systems Engineer,102550,USD,MI,PT,United States,M,United States,0,"Tableau, Spark, NLP, TensorFlow, PyTorch",Bachelor,2,Automotive,4/23/2024,6/23/2024,625,10.0,Cognitive Computing
1,AI00041,Data Scientist,96956,USD,MI,FT,United States,M,China,0,"Data Visualization, Azure, Spark, MLOps",Bachelor,2,Telecommunications,2/24/2024,3/9/2024,761,5.3,DataVision Ltd
2,AI00042,AI Architect,196954,USD,SE,FL,United States,L,United States,50,"Java, Mathematics, SQL",Bachelor,8,Finance,3/17/2025,4/16/2025,2290,7.5,DataVision Ltd
3,AI00046,AI Research Scientist,174663,USD,SE,CT,United States,M,Singapore,50,"Data Visualization, Statistics, R",Associate,7,Media,1/10/2024,3/22/2024,1151,5.4,DeepTech Ventures
4,AI00053,Research Scientist,106579,USD,MI,PT,United States,S,United States,100,"Docker, Tableau, Mathematics",Associate,3,Gaming,2/24/2024,5/2/2024,2028,5.4,Cognitive Computing


In [32]:
# Summary of the data types
print("Data Types:")
display(full_df.dtypes)

Data Types:


job_id                     object
job_title                  object
salary_usd                  int64
salary_currency            object
experience_level           object
employment_type            object
company_location           object
company_size               object
employee_residence         object
remote_ratio                int64
required_skills            object
education_required         object
years_experience            int64
industry                   object
posting_date               object
application_deadline       object
job_description_length      int64
benefits_score            float64
company_name               object
dtype: object

In [33]:
# Summary statistics for numerical columns
print("Summary statistics:")
display(full_df.describe())

Summary statistics:


Unnamed: 0,salary_usd,remote_ratio,years_experience,job_description_length,benefits_score
count,724.0,724.0,724.0,724.0,724.0
mean,146833.04558,48.895028,6.367403,1493.426796,7.568508
std,66654.990254,41.648533,5.54926,571.540994,1.469855
min,54512.0,0.0,0.0,500.0,5.0
25%,93625.75,0.0,2.0,998.0,6.275
50%,128606.0,50.0,5.0,1493.5,7.7
75%,187721.5,100.0,10.0,2002.25,8.9
max,344471.0,100.0,19.0,2493.0,10.0


In [34]:
# Check for missing values
print("Missing values:")
missing_values = full_df.isnull().sum()
missing_percentage = (missing_values / len(full_df)) * 100
missing_info = pd.DataFrame({
    "Missing_Values": missing_values,
    "Percentage": missing_percentage
})
if missing_info[missing_info["Missing_Values"] > 0].to_numpy().tolist() == []:
    print("No missing values found.")
else:
    display(missing_info[missing_info["Missing_Values"] > 0])

Missing values:
No missing values found.


In [35]:
# Count of unique values for categorical columns
print("Unique values in categorial columns:")
categorical_columns = full_df.select_dtypes(include=["object"]).columns
for col in categorical_columns:
    print(f"{col}: {full_df[col].nunique()} unique values")

Unique values in categorial columns:
job_id: 724 unique values
job_title: 20 unique values
salary_currency: 1 unique values
experience_level: 4 unique values
employment_type: 4 unique values
company_location: 1 unique values
company_size: 3 unique values
employee_residence: 20 unique values
required_skills: 720 unique values
education_required: 4 unique values
industry: 15 unique values
posting_date: 382 unique values
application_deadline: 380 unique values
company_name: 16 unique values


In [36]:
# Visualize the distribution of job titles
job_title_counts = full_df["job_title"].value_counts().sort_values(ascending=False).head(10)
fig1 = px.bar(
    x=job_title_counts.values,
    y=job_title_counts.index,
    orientation="h",
    title="Top 10 AI Job Titles",
    labels={"x": "Count", "y": "Job Title"},
    color=job_title_counts.values,
    color_continuous_scale="Viridis"
)
fig1.update_layout(
    height=500,
    width=800,
    yaxis={"categoryorder": "total ascending"}
)
fig1.show()

In [37]:
# Visualize salary distribution
fig2 = px.histogram(
    full_df,
    x="salary_usd",
    marginal="box",
    title="Distribution of AI Job Salaries 2024-2025",
    labels={"salary_usd": "Salary (USD)"},
    opacity=0.7,
    color_discrete_sequence=["#636EFA"]
)
fig2.add_vline(
    x=full_df["salary_usd"].mean(),
    line_dash="dash",
    line_color="red",
    annotation_text=f"Mean: ${full_df["salary_usd"].mean():,.0f}",
    annotation_position="top right"
)
fig2.add_vline(
    x=full_df["salary_usd"].median(),
    line_dash="dash",
    line_color="blue",
    annotation_text=f"Median: ${full_df["salary_usd"].median():,.0f}",
    annotation_position="top left"
)
fig2.update_layout(
    height=500,
    width=800
)
fig2.show()

In [67]:
# Export salary summary by job title to Excel
salary_by_job = full_df.groupby("job_title")["salary_usd"].agg(["mean", "median", "std", "count"]).reset_index()
salary_by_job.to_csv("exports/s1-descriptive/salary_by_job_title.csv", index=False)

## Summary of Descriptive Statistics and Visualizations

After examining the dataset containing 724 AI job postings across the United States, we have a solid foundation for deeper analysis. The dataset is complete with no missing values and includes 20 distinct job titles across 15 industries, with salaries ranging from $54,512 to $344,471 annually.

Key observations from our initial exploration:

- The median salary for AI positions is $128,606, with a mean of $146,833 indicating a right-skewed distribution where higher-paying positions pull the average upward.
- There's significant salary variation (standard deviation of $66,655), suggesting potential for targeted career pathing.
- Experience requirements span from entry-level (0 years) to highly experienced (19 years), with a median of 5 years.
- Remote work options vary widely, with the remote_ratio evenly distributed between full in-office (0%), hybrid (50%), and fully remote (100%) positions.
- The dataset includes roles across various employment types: full-time (FT), part-time (PT), contract (CT), and freelance (FL)
- Job positings come from companies of different sizes, with varying benefit scores (ranging from 5.0 to 10.0).

## Salary Distribution

Next, we examine the distribution of salary by job title, experience level, and industry. We also examine the relationship between salary and years of experience. We create a heatmap of salary by experience level and company size. Finaly, we create a salary distribution by employment type.

## 1. Salary Distribution by Job Title

In this section, we examine how compensation varies across different AI job titles to identify which roles command the highest salaries in the market. The boxplot visualization below is particularly valuable for this analysis, as it simultaneously displays median salaries, interquartile ranges, minimum and maximum values, and outliers for each job title. This multi-dimensional view allows us to compare not just typical compensation levels but also the variability and extreme values within each role category. Understanding these patterns helps learners target lucrative career paths and provides employers with benchmarking data for competitive compensation offerings. The visualization reveals significant salary differences between AI specializations and highlights the financial incentives for pursuing specific roles.


In [40]:
# 1. Salary analysis by job title
# Calculate statistics for top job titles by count
job_titles = full_df["job_title"].value_counts().head(19).index.tolist()
job_salary_df = full_df[full_df["job_title"].isin(job_titles)]

fig1 = px.box(
    job_salary_df,
    x="job_title",
    y="salary_usd",
    color="job_title",
    title="Salary Distribution by AI Job Titles",
    labels={"salary_usd": "Salary (USD)", "job_title": "Job Title"},
    height=600,
    width=800
)
fig1.update_layout(
    xaxis={"categoryorder": "median descending"},
    showlegend=False
)
fig1.show()

In [41]:
# 1. Salary Statistics by Job Title
job_title_stats = full_df.groupby('job_title')['salary_usd'].agg(['mean', 'median', 'min', 'max', 'count']).sort_values('median', ascending=False)
print("Job Title Salary Statistics:")
display(job_title_stats.head(8))

Job Title Salary Statistics:


Unnamed: 0_level_0,mean,median,min,max,count
job_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Machine Learning Researcher,164528.967742,162756.0,79146,284282,31
Computer Vision Engineer,175080.791667,161474.0,75675,319984,24
AI Architect,170238.756098,160162.0,70238,295466,41
Machine Learning Engineer,146827.0,143496.0,57234,344427,30
NLP Engineer,143023.485714,138709.0,61055,290462,35
Autonomous Systems Engineer,151197.35,138453.0,54803,273986,40
Research Scientist,152662.8,135405.0,62960,333499,35
Head of AI,159591.289474,134609.5,70667,343803,38


Machine Learning Researchers command the highest median salary at $162,756, closely followed by Computer Vision Engineers at $161,474 and AI Architects at $160,162. Despite having similar median values, the mean salaries reveal interesting variations - Computer Vision Engineers have the highest average salary at $175,081, suggesting some exceptionally high-paying positions in this specialization. The salary ranges within each role are substantial; for example, Machine Learning Engineers' compensation spans from $57,234 to $344,427, representing a difference of over $287,000. This wide range indicates that factors beyond job title, such as experience, industry, and specific technical expertise, significantly influence compensation in AI careers. For training providers and job seekers using STEAMe's platform, focusing on specialized roles like Computer Vision and Machine Learning Research appears to offer the highest financial returns, while also considering the substantial variation within each category when planning career pathways.

## 2. Salary by Experience Level

Experience level is a fundamental determinant of compensation in the AI field. This analysis explores the salary progression from Entry to Executive positions, quantifying the financial benefits of career advancement. For STEAMe's platform users, this information helps set realistic salary expectations at different career stages and demonstrates the tangible rewards of professional growth and skill development.

In [44]:
# 2. Salary by Experience Level
fig2 = px.box(
    full_df,
    x="experience_level",
    y="salary_usd",
    color="experience_level",
    title="Salary Distribution by Experience Level",
    labels={"salary_usd": "Salary (USD)", "experience_level": "Experience Level"},
    category_orders={"experience_level": ["EN", "MI", "SE", "EX"]},
    height=500,
    width=800
)
fig2.update_layout(showlegend=False)
fig2.add_annotation(
    x=0, y=full_df["salary_usd"].max()*0.95,
    text="EN=Entry, MI=Mid, SE=senior, EX=Executive",
    showarrow=False,
    bgcolor="rgba(255,255,255,0.8)"
)
fig2.show()

In [45]:
exp_level_stats = full_df.groupby('experience_level')['salary_usd'].agg(['mean', 'median', 'min', 'max', 'count']).reset_index()
exp_level_map = {'EN': 'Entry', 'MI': 'Mid-level', 'SE': 'Senior', 'EX': 'Expert'}
exp_level_stats['experience_level'] = exp_level_stats['experience_level'].map(exp_level_map)
exp_level_stats = exp_level_stats.set_index('experience_level')
print("Experience Level Salary Statistics:")
display(exp_level_stats)

Experience Level Salary Statistics:


Unnamed: 0_level_0,mean,median,min,max,count
experience_level,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Entry,80344.103448,78786.5,54512,108843,174
Expert,238768.723404,239025.0,144454,344471,188
Mid-level,109548.907609,108860.0,77185,149281,184
Senior,153268.157303,151228.5,110342,206671,178


The data reveals a clear progression in AI salaries as professionals advance through experience levels, with Executive level positions holding a median salary of $239,025, more than three times that of Entry-level roles at $78,787. This substantial difference highlights the significant financial rewards for career advancement in the AI field. Mid-level professionals earn a median of $108,860, representing a 38% increase over entry-level positions, while Senior roles at $151,229 mark another substantial 39% jump from mid-level compensation. Notably, the transition from Senior to Executive positions yields the largest percentage increase of 58%, suggesting this final career advancement delivers the greatest financial returns. The relatively narrow salary ranges within each experience level (for example, Entry-level positions ranging from $54,512 to $108,843) indicate that experience is a consistent and reliable predictor of compensation across the AI industry. For STEAMe's learners and workforce development initiatives, these statistics emphasize the importance of structured career progression, with clear financial incentives at each advancement stage.

## 3. Salary by Industry

AI talent is in demand across diverse industries, each offering different compensation structures. This section explores which sectors provide the most lucrative opportunities for AI professionals and how traditional industries compare to technology companies in the race for talent. These insights help guide both career decisions for individuals and talent acquisition strategies for organizations across different sectors.


In [46]:
# 3. Salary by industry (top industries by count)
top_industries = full_df["industry"].value_counts().head(10).index.tolist()
industry_salary_df = full_df[full_df["industry"].isin(top_industries)]

fig3 = px.box(
    industry_salary_df,
    x="industry",
    y="salary_usd",
    color="industry",
    title="Salary Distribution by Top Industries",
    labels={"salary_usd": "Salary (USD)", "industry": "Industry"},
    height=500,
    width=950
)
fig3.update_layout(
    xaxis={"categoryorder": "median descending"},
    showlegend=False
)
fig3.show()

In [47]:
industry_stats = full_df.groupby('industry')['salary_usd'].agg(['mean', 'median', 'min', 'max', 'count']).sort_values('median', ascending=False)
print("Industry Salary Statistics:")
display(industry_stats.head(10))

Industry Salary Statistics:


Unnamed: 0_level_0,mean,median,min,max,count
industry,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Finance,169952.384615,146023.0,60631,338443,39
Gaming,154774.575,145436.5,69279,288318,40
Retail,165816.42623,143399.0,59600,344471,61
Technology,148872.962264,140497.0,70238,314917,53
Government,158318.306122,140234.0,63794,343803,49
Consulting,154073.298246,139187.0,61684,293738,57
Real Estate,153015.222222,134914.0,73105,342272,45
Education,142331.196429,129481.0,54803,275914,56
Transportation,136345.559322,125276.0,61447,325119,59
Telecommunications,139420.44186,124367.0,64058,337306,43


In [54]:
# Create detailed industry analysis with additional context
industry_analysis = industry_stats.reset_index().copy()  # Create an explicit copy

# Add top jobs per industry
top_jobs_by_industry = {}
for industry in industry_analysis["industry"]:
    industry_df = full_df[full_df["industry"] == industry]
    top_jobs = industry_df["job_title"].value_counts().head(3).index.tolist()
    top_jobs_by_industry[industry] = ", ".join(top_jobs)

# Use .loc to set values properly
industry_analysis.loc[:, "top_job_titles"] = industry_analysis["industry"].map(top_jobs_by_industry)
industry_analysis.to_csv("exports/s1-descriptive/industry_salary_analysis.csv", index=False)

Among the industries employing AI professionals, Finance leads with the highest median salary at $146,023, closely followed by Gaming at $145,437 and Retail at $143,399. Surprisingly, traditional sectors like Retail and Government ($140,234) are competing effectively with Technology companies ($140,497) for AI talent, offering comparable compensation packages. The Finance industry not only tops the median salary rankings but also shows the highest mean salary at $169,952, indicating some exceptionally high-paying positions in this sector. Retail demonstrates the widest salary range, from $59,600 to $344,471, suggesting more variable compensation practices or a broader spectrum of AI roles within this industry. Telecommunications and Transportation industries, while still offering six-figure median salaries, fall toward the lower end of the compensation spectrum at $124,367 and $125,276 respectively. For STEAMe's platform users, these findings highlight that lucrative AI opportunities exist well beyond the traditional technology sector, with financial services, gaming, and retail emerging as particularly rewarding industries for AI professionals.

## 4. Salary vs Years of Experience

This analysis examines the relationship between years of professional experience and salary levels across different experience categories. The scatter plot visualization below is particularly revealing, as it maps each position in our dataset according to years of experience and salary, with points colored by experience level classification. This approach allows us to observe both the overall correlation pattern and the distinct clusters that form at different experience thresholds. The clear separation between experience level groups highlights career transition points, while the variation within each cluster demonstrates how other factors influence compensation beyond raw years of experience. By quantifying these relationships, we gain insights into how experience translates to compensation at different career stages, revealing where experience has the strongest impact on salary and where other factors become more influential determinants of earning potential.


In [48]:
# 4. Salary vs Years of Experience Scatter Plot with Trend Line
fig4 = px.scatter(
    full_df,
    x="years_experience",
    y="salary_usd",
    color="experience_level",
    size="benefits_score",
    hover_data=["job_title", "industry", "company_size"],
    trendline="ols",
    title="Salary vs Years of Experience",
    labels={"salary_usd": "Salary (USD)", "years_experience": "Years of Experience"},
    height=600,
    width=900
)
fig4.show()

In [49]:
# Overall correlation between salary and years of experience
overall_corr = full_df["salary_usd"].corr(full_df["years_experience"])
print(f"Overall correlation between salary and years of experience: {overall_corr:.3f}")

# Correlation within each experience level
experience_levels = ["EN", "MI", "SE", "EX"]  # Entry, Mid-level, Senior, Executive
exp_level_names = {"EN": "Entry", "MI": "Mid-level", "SE": "Senior", "EX": "Executive"}

for level in experience_levels:
    level_df = full_df[full_df["experience_level"] == level]
    level_corr = level_df["salary_usd"].corr(level_df["years_experience"])
    print(f"Correlation for {exp_level_names[level]} level: {level_corr:.3f}")

Overall correlation between salary and years of experience: 0.870
Correlation for Entry level: 0.125
Correlation for Mid-level level: 0.013
Correlation for Senior level: -0.076
Correlation for Executive level: 0.109


In [61]:
# Create detailed dataset for experience-salary scatter plot
experience_salary_df = full_df[["years_experience", "salary_usd", "experience_level", 
                               "job_title", "industry", "company_size", "benefits_score"]].copy()
# Add correlation as a reference field
experience_salary_df["overall_correlation"] = overall_corr
# Map experience level codes to full names
experience_salary_df["experience_level_name"] = experience_salary_df["experience_level"].map(exp_level_names)
experience_salary_df.to_csv("exports/s1-descriptive/experience_salary_relationship.csv", index=False)

The overall correlation between years of experience and salary is remarkably strong at 0.870, confirming that experience is a powerful determinant of compensation across the entire AI job market. However, when examining this relationship within specific experience levels, a far more nuanced picture emerges. Within the Entry level category, there's only a weak positive correlation of 0.125, suggesting that while additional experience provides some benefit, other factors likely play more significant roles in salary differentiation among newcomers. For Mid-level professionals, the correlation becomes nearly non-existent at 0.013, indicating that at this career stage, factors beyond raw years of experience-such as specific technical skills or domain expertise-likely drive salary variations. Interestingly, Senior-level positions show a slight negative correlation of -0.076, potentially reflecting that specialized expertise or leadership capabilities outweigh incremental years of experience at this stage. Executive positions return to a weak positive correlation of 0.109, suggesting that while extensive experience is expected at this level, compensation differences are more likely determined by strategic impact and organizational value. For STEAMe's platform users, these findings highlight that while accumulating years of experience is crucial for advancing between career levels, other qualitative factors become increasingly important for maximizing compensation within each level.

## 5. Salary by Experience Level and Company Size

Company size can significantly influence compensation structures in the AI field. This section explores the intersection of experience level and organizational scale to identify optimal combinations for maximizing earning potential. The heatmap below provides an effective solution for displaying this two-dimensional relationship, using color intensity to represent median salary values across the experience-company size matrix. This format makes it easy to identify patterns that might not be apparent when examining each factor in isolation, revealing how the impact of company size varies across different career stages. The numeric values embedded in each cell further enhance the clarity of the analysis, allowing for exact comparisons. Understanding these patterns provides strategic insights for career planning, helping professionals understand how targeting specific company sizes might optimize their earning potential at each career stage.

In [50]:
# 5. Salary heatmap by experience level and company size
salary_pivot = full_df.pivot_table(
    values="salary_usd",
    index="experience_level",
    columns="company_size",
    aggfunc="median"
)
fig5 = px.imshow(
    salary_pivot,
    text_auto=True,
    color_continuous_scale="Viridis",
    title="Median Salary Heatmap: Experience Level vs Company Size",
    labels={"color": "Median Salary (USD)"},
    height=500,
    width=800
)
fig5.update_layout(
    xaxis_title="Company Size (S=Small, M=Medium, L=Large)",
    yaxis_title="Experience Level (EN=Entry, MI=Mid, SE=Senior, EX=Executive)"
)
fig5.show()

In [62]:
# Pivot the data on experience_level by company size 
# and get the median salary as the resulting values
exp_company_stats = full_df.pivot_table(
    values="salary_usd", 
    index="experience_level", 
    columns="company_size", 
    aggfunc="median"
)
print("Median Salary by Experience Level and Company Size:")
display(exp_company_stats)

Median Salary by Experience Level and Company Size:


company_size,L,M,S
experience_level,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
EN,88628.5,76889.0,71683.0
EX,260953.0,241076.0,209017.0
MI,123906.5,108587.5,99866.0
SE,172035.5,146085.0,133950.0


In [63]:
# Export the median salary by experience level and company size
# Create company size matrix with additional context
company_size_matrix = exp_company_stats.reset_index()

# Add count information for reference
count_matrix = full_df.pivot_table(
    values='salary_usd', 
    index='experience_level', 
    columns='company_size', 
    aggfunc='count'
).reset_index()

# Rename columns to avoid confusion
count_matrix.columns = ['experience_level'] + [f'{col}_count' for col in count_matrix.columns if col != 'experience_level']

# Merge the datasets
matrix_combined = pd.merge(company_size_matrix, count_matrix, on='experience_level')

# Map experience level codes to names
matrix_combined['experience_level_name'] = matrix_combined['experience_level'].map(exp_level_names)
matrix_combined.to_csv("exports/s1-descriptive/company_size_matrix.csv", index=False)

The analysis reveals a consistent pattern where larger companies offer higher compensation across all experience levels, with the most pronounced advantage at the Executive level. At large companies, Executive positions command a median salary of $260,953, significantly exceeding their counterparts at medium ($241,076) and small organizations ($209,017)—representing a premium of 25% over small companies. This company size differential persists across all experience tiers, with large companies offering approximately 23% higher salaries than small companies for Entry-level positions ($88,629 vs. $71,683), 24% more for Mid-level roles ($123,907 vs. $99,866), and 28% higher for Senior positions ($172,036 vs. $133,950). The salary progression between experience levels is steepest at large companies, where advancing from Entry to Executive level represents a 194% increase, compared to 214% at medium companies and 192% at small firms. For STEAMe's platform users, these statistics suggest that while career advancement provides substantial salary growth across all company sizes, targeting larger organizations consistently yields higher compensation at every career stage—though this advantage must be weighed against other factors like work-life balance, job security, and growth opportunities that may vary by company size.

### 6. Salary by Employment Type

The modern workforce includes various employment arrangements beyond traditional full-time roles. This analysis compares compensation across contract, freelance, full-time, and part-time positions in the AI field. The violin plot visualization below is particularly effective for this comparison, as it combines aspects of a box plot with a density plot, showing both the distribution shape and key statistics for each employment type. This approach reveals not just median values but also where salaries are concentrated within each category and how the distributions differ in shape. The width of each "violin" at different salary levels indicates the probability density of observations at those values, providing insights into common salary bands within each employment arrangement. Understanding these differences helps STEAMe's users evaluate the financial implications of different work arrangements and identify potentially unexpected opportunities for competitive compensation outside conventional employment structures.

In [34]:
# 6. Salary Distribution by Employment Type
fig6 = px.violin(
    full_df,
    x="employment_type",
    y="salary_usd",
    color="employment_type",
    title="Salary Distribution by Employment Type",
    labels={"salary_usd": "Salary (USD)", "employment_type": "Employment Type"},
    category_orders={"employment_type": ["FT", "PT", "CT", "FL"]},
    height=500,
    width=800
)
fig6.update_layout(showlegend=False)
fig6.add_annotation(
    x=0,
    y=full_df["salary_usd"].max()*0.95,
    text="FT=Full-Time, PT=Part-time, CT=Contract, FL=Freelance",
    showarrow=False,
    bgcolor="rgba(255,255,255,0.8)"
)
fig6.show()

In [64]:
# Create employment type analysis
employment_type_stats = full_df.groupby("employment_type")["salary_usd"].agg(["mean", "median", "min", "max", "count"]).reset_index()
# Add readable employment type names
emp_type_map = {"FT": "Full-Time", "PT": "Part-Time", "CT": "Contract", "FL": "Freelance"}
employment_type_stats["employment_type_name"] = employment_type_stats["employment_type"].map(emp_type_map)
employment_type_stats.to_csv("exports/s1-descriptive/employment_type_analysis.csv", index=False)

Contrary to conventional expectations, contract positions offer the highest median salary among AI employment arrangements at $134,678, exceeding full-time roles at $127,966. This suggests that organizations may be willing to pay a premium for specialized contract talent in the AI field. Part-time positions show a surprisingly competitive median salary of $128,234, slightly higher than full-time roles, while freelance arrangements trail with a median of $120,916. Despite contract positions leading in median salary, the mean values tell a different story—contract roles average $154,303, followed by part-time at $147,351, full-time at $145,331, and freelance at $140,486. All employment types demonstrate wide salary ranges, with contract positions showing the broadest span from $54,512 to $344,471, representing a difference of nearly $290,000. This substantial variability indicates that factors like experience level, job title, and industry significantly influence compensation across all employment arrangements. For STEAMe's workforce development initiatives, these findings challenge traditional assumptions about employment stability and compensation, suggesting that flexible work arrangements in AI can be financially rewarding and potentially offering alternative pathways for career advancement beyond conventional full-time employment.

### Enhanced Salary Summary

Finally, we export salary summary across job title, experience level, and industry for further analysis and visualization in Tableau.

In [35]:
# Export enhanced salary summary to Excel
salary_summary = full_df.groupby(["job_title", "experience_level", "industry"])["salary_usd"].agg(["mean", "median", "std", "count"]).reset_index()
salary_summary.to_csv("ai_salary_detailed_summary.csv", index=False)

## AWS Pipeline for AI Job Market Statistics to Tableau

The statistics calculated in this notebook can be deployed in an AWS-based pipeline to make them accessible for Tableau visualizations. We would start by converting the current notebook processing into AWS Lambda functions that handle the raw job data, calculate descriptive statistics, and generate aggregated datasets similar to our Excel exports.

Data storage would involve Amazon S3 for both raw data (the original CSV file) and processed statistics in JSON or CSV formats. AWS Glue would serve as the ETL tool, running scheduled jobs to transform the data, standardize skills terminology, create statistical summaries, and maintain consistent data schemas.

For larger datasets, we'd implement Amazon RDS or Redshift with optimized tables and a star schema containing salary facts and dimension tables for job titles, experience levels, and industries.

Tableau can connect directly to these data sources using either the S3 connector for flat files or database connectors for RDS/Redshift. We'd set up extract refresh schedules to automatically update the data without manual intervention. The Tableau data sources would include predefined calculations for metrics like median salaries and experience level groupings, along with consistent field naming and formatting.

For implementation, AWS CloudWatch Events would trigger the pipeline when new job data becomes available or on regular refresh cycles. Security would be handled through IAM roles for access between AWS services, and we'd use S3 Intelligent-Tiering to optimize storage costs while Lambda functions would minimize compute expenses.

This approach creates an automated pipeline that refreshes AI job market statistics and makes them immediately available in Tableau, providing STEAMe's users with current insights for career decision-making.