"Sectors with higher AI adoption show lower employment growth"
Objective: Analyze whether sectors with higher AI adoption rates are experiencing lower employment growth.
Assumptions:
* country_ai_adoption table contains AI adoption metrics (obs_value) by sector and country.
* sector table maps sectors to occupation codes.
* occupation_growth table contains employment growth data (employment_percent_change_2023_2033).

In [None]:
import spacy
from collections import defaultdict
from wordcloud import WordCloud
import matplotlib.pyplot as plt



In [None]:
# Example: Load data from CSV files
save_path = '../data/clean/cleaned _for_sql/'
df_ai_adoption = pd.read_csv(save_path + 'country_ai_adoption.csv', sep=';', encoding='utf-8', on_bad_lines='skip')
df_employment_growth = pd.read_csv(save_path + 'occupation_growth.csv', sep=';', encoding='utf-8', on_bad_lines='skip')
df_ai_sent= pd.read_csv(save_path + 'ai_job_sentiments.csv', sep=';', encoding='utf-8', on_bad_lines='skip')
df_wage = pd.read_csv(save_path + 'occupation_wages.csv', sep=';', encoding='utf-8', on_bad_lines='skip')


In [None]:
df_employment_growth.head()
#df_ai_adoption.head()

In [None]:
#Top 10 min growth by sectors

df_employment_growth_min=df_employment_growth.groupby('occupation_title')['employment_percent_change_2023-2033'].min()

top_10_min_growth= df_employment_growth_min.sort_values().head(10)
print(top_10_min_growth)


In [None]:
# Get the top 10 occupations with the most negative employment growth
top_10_decline = df_employment_growth.sort_values(by='employment_percent_change_2023-2033').head(10)
print(" Top 10 Occupations with Employment Decline:")
print(top_10_decline[['occupation_title', 'employment_percent_change_2023-2033']])


In [None]:
df_ai_adoption.head()

In [None]:
df_ai_adoption['sector'].sort['obs_value']


In [None]:
# Group by sector and find the minimum value in each sector
min_values_by_sector = df_ai_adoption.groupby('sector')['obs_value'].min()

# Sort the minimum values and get the top 10 sectors with the smallest minimums
top_10_min_sectors = min_values_by_sector.sort_values().head(10)



In [None]:
# Group by sector and find the max value in each sector
max_values_by_sector = df_ai_adoption.groupby('sector')['obs_value'].max()

# Sort the minimum values and get the top 10 sectors with the smallest minimums
top_10_max_sectors = max_values_by_sector.sort_values().head(10)

print(top_10_max_sectors)

### Are AI adoption levels in Retail, Food Services, or Manufacturing low — while their related occupations are shrinking?


-- Key Observations
Administrative & Support Services dominates
Most of these roles fall under the Administrative and Support Service Activities sector — which shows 0.0 AI adoption.

Decline in Information-related Roles
Roles like switchboard operators also fall under Information and Communication, another sector with 0.0 AI adoption.

No AI, but Jobs Are Still Disappearing
Despite no AI adoption, these roles are shrinking fast. This strongly suggests other forces at play:

Digitization: Many of these jobs are being automated with basic software, not AI. Think Excel, CRMs, VoIP, automated phone trees.

Business model shifts: Companies no longer need these roles due to outsourcing, centralization, or self-service platforms.

Obsolescence: Roles like typists or data entry clerks are simply becoming outdated.

AI Is Not the Villain Here
In this case, AI adoption isn’t driving job loss — it’s likely simple tech, process improvement, or organizational change.

These declines don’t correlate with high AI adoption.

Instead, they point to a natural phasing out of certain clerical jobs due to baseline automation and digital tools.

If AI adoption rises in these sectors later, the remaining roles may transform (e.g., from data entry to data analysis support), but the core clerical work is already fading — with or without AI

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Data preparation
occupations = [
    "Word processors and typists",
    "Word processors and typists",
    "Word processors and typists",
    "Data entry & info processing workers",
    "Switchboard operators",
    "Data entry & info processing workers",
    "Data entry keyers",
    "Data entry keyers",
    "Switchboard operators",
    "Data entry & info processing workers"
]

employment_change = [-40.1, -39.1, -36.7, -35.6, -30.3, -29.7, -29.6, -29.5, -28.4, -28.0]

sectors = [
    "Administrative Services",
    "Administrative Services",
    "Administrative Services",
    "Administrative Services",
    "Information & Communication",
    "Administrative Services",
    "Administrative Services",
    "Administrative Services",
    "Information & Communication",
    "Administrative Services"
]

# Create DataFrame
df_plot = pd.DataFrame({
    "Occupation": occupations,
    "Employment Change (%)": employment_change,
    "Sector": sectors
})

# Plotting
plt.figure(figsize=(10, 6), facecolor='none')  # Transparent background
ax = plt.gca()
bars = plt.barh(df_plot["Occupation"], df_plot["Employment Change (%)"], color="gray")

# Set bar colors based on sector
for i, bar in enumerate(bars):
    if df_plot["Sector"].iloc[i] == "Information & Communication":
        bar.set_color("#9ec9ff")  # Light blue
    else:
        bar.set_color("#add8e6")  # Sky blue

# White labels and title
ax.set_xlabel("Employment Percent Change (2023–2033)", color='white')
ax.set_title("Top 10 Declining Occupations and Their Sectors (Low AI Adoption)", color='white')
ax.tick_params(colors='white')

# Style
ax.invert_yaxis()  # Highest decline at top
plt.grid(axis='x', linestyle='--', alpha=0.3, color='white')
plt.tight_layout()

# Save 
plt.savefig("../slides/visuals/aiadoptions.png", transparent=True, format="png")

plt.show()


In [None]:
# Show unique sector names to inspect if any match creative fields
unique_sectors = df_ai_adoption["sector"].dropna().unique()
unique_sectors[:30]  # Show first 30 for inspection


In [None]:
# Ensure all occupation titles are strings
df_employment_growth["occupation_title"] = df_employment_growth["occupation_title"].fillna("").astype(str)

# Reuse the creative keywords
creative_keywords = ["writer", "artist", "designer", "editor", "musician", "composer", "architect", "illustrator",
                     "photographer", "copywriter", "creative", "actor", "entertainer", "performer", "producer", "animator"]

# Flag creative roles
df_employment_growth["is_creative"] = df_employment_growth["occupation_title"].str.lower().apply(
    lambda title: any(keyword in title for keyword in creative_keywords)
)

# Calculate average employment growth for creative vs non-creative
creative_growth = df_employment_growth[df_employment_growth["is_creative"] == True]["employment_percent_change_2023-2033"].mean()
non_creative_growth = df_employment_growth[df_employment_growth["is_creative"] == False]["employment_percent_change_2023-2033"].mean()

creative_growth, non_creative_growth


### Creative occupations are projected to grow by +3.76%

Non-creative occupations are projected to grow by +3.67%
This supports the hypothesis:
Creative jobs are not declining — they show slightly stronger growth than non-creative ones, despite rapid tech and AI evolution.

Paired with your earlier finding that creative sectors have lower AI adoption, you're building a strong argument for creative job resilience in the AI era.



In [None]:
# Data for plotting
categories = ['Creative Occupations', 'Non-Creative Occupations']
values = [creative_growth, non_creative_growth]

# Set the dark purple background
plt.figure(figsize=(8, 5), facecolor='#2c003e')  # Dark purple background

# Bar plot with high-contrast colors
bars = plt.bar(categories, values, color=['#b8a9f4', '#d3d3d3'])  # Light purple and light gray

# Axes styling
ax = plt.gca()
ax.set_facecolor('#2c003e')  # Match background
ax.tick_params(colors='white')  # White ticks
plt.ylabel('Average Employment Percent Change (2023–2033)', color='white')
plt.title('Employment Growth: Creative vs. Non-Creative Occupations', color='white')
plt.ylim(min(values) - 1, max(values) + 1)

# Add value labels on bars
for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval + 0.1, f'{yval:.2f}%', 
             ha='center', va='bottom', color='white')

# Grid lines and layout
plt.grid(axis='y', linestyle='--', alpha=0.3, color='white')
plt.tight_layout()

# Save with transparent background to preserve dark theme if used over other

plt.savefig("../slides/visuals/creative.png", transparent= True, format="png")
plt.show()


In [None]:
df_wage['occ_code'].unique()

In [None]:
# Step 1: Clean and align code columns
df_employment_growth["occupation_code"] = df_employment_growth["occupation_code"].str.replace('="', '').str.replace('"', '').str.strip()
df_wage["occ_code"] = df_wage["occ_code"].str.strip()

# Step 2: Merge the datasets on occupation code
merged_df = pd.merge(df_employment_growth, df_wage, left_on="occupation_code", right_on="occ_code", how="inner")

# Show a preview of the merged data
merged_df[["occupation_title", "occupation_code", "a_mean", "employment_percent_change_2023-2033"]].tail()


In [None]:
# Step 1: Clean and align code columns
df_employment_growth["occupation_code"] = df_employment_growth["occupation_code"].str.replace('="', '').str.replace('"', '').str.strip()
df_wage["occ_code"] = df_wage["occ_code"].str.strip()

# Step 2: Merge the datasets on occupation code
merged_df = pd.merge(df_employment_growth, df_wage, left_on="occupation_code", right_on="occ_code", how="inner")

# Show a preview of the merged data
merged_df[["occupation_title", "occupation_code", "a_mean", "employment_percent_change_2023-2033"]].head(20)

In [None]:

# Load spaCy English model
nlp = spacy.load("en_core_web_sm")

# Prepare a dictionary to collect adverbs by sentiment
adverbs_by_sentiment = defaultdict(list)

# Analyze each comment body, grouped by sentiment
for _, row in sentiment_df.iterrows():
    text = str(row["comment_body"]).replace("_", " ")  # fix underscore formatting
    sentiment = row["sentiment"]
    doc = nlp(text)
    for token in doc:
        if token.pos_ == "ADV" and not token.is_stop and token.is_alpha:
            adverbs_by_sentiment[sentiment].append(token.lemma_.lower())

# Create wordclouds for each sentiment
plt.figure(figsize=(15, 5))
for i, (sentiment, adverbs) in enumerate(adverbs_by_sentiment.items(), 1):
    wordcloud = WordCloud(width=600, height=400, background_color="white").generate(" ".join(adverbs))
    plt.subplot(1, 3, i)
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.title(f"{sentiment.capitalize()} Adverbs")

plt.tight_layout()
plt.savefig("../slides/visuals/adverbs.png", format="png")
plt.show()


In [None]:
# Prepare dictionaries for adverbs and adjectives by sentiment
adverbs_by_sentiment = defaultdict(list)
adjectives_by_sentiment = defaultdict(list)

# Analyze comments
for _, row in sentiment_df.iterrows():
    text = str(row["comment_body"]).replace("_", " ")
    sentiment = row["sentiment"]
    doc = nlp(text)
    for token in doc:
        if token.is_alpha and not token.is_stop:
            if token.pos_ == "ADV":
                adverbs_by_sentiment[sentiment].append(token.lemma_.lower())
            elif token.pos_ == "ADJ":
                adjectives_by_sentiment[sentiment].append(token.lemma_.lower())

# Create wordclouds
plt.figure(figsize=(15, 10))

for i, (sentiment, adverbs) in enumerate(adverbs_by_sentiment.items(), 1):
    wordcloud = WordCloud(width=600, height=400, background_color="white").generate(" ".join(adverbs))
    plt.subplot(2, 3, i)
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.title(f"{sentiment.capitalize()} Adverbs")

for i, (sentiment, adjectives) in enumerate(adjectives_by_sentiment.items(), 1):
    wordcloud = WordCloud(width=400, height=400, background_color="white").generate(" ".join(adjectives))
    plt.subplot(2, 3, i + 3)
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.title(f"{sentiment.capitalize()} Adjectives")

plt.tight_layout()
plt.savefig("../slides/visuals/adj2.png", format="png")
plt.show()

Adverbs like definitely, easily, or clearly → show confidence in their own skills or belief in AI as a tool.

Adjectives like useful, exciting, efficient → indicate they see AI as enhancing their work, not replacing it.

In [None]:

df_employment_growth["occupation_code"] = df_employment_growth["occupation_code"].str.replace('="|"', '', regex=True)

df_wage["occ_code"] = df_wage["occ_code"].astype(str).str.replace('="|"', '', regex=True)


# Merge occupation_growth with wages on the shared occupation code
merged_growth_wages = pd.merge(
    df_employment_growth,
    df_wage,
    left_on="occupation_code",
    right_on="occ_code",
    how="inner"
)

# Simplify: just average employment growth and average wage per occupation
simplified_analysis = merged_growth_wages[[
    "occupation_title",
    "occupation_code",
    "employment_percent_change_2023-2033",
    "a_mean"
]].rename(columns={
    "employment_percent_change_2023-2033": "employment_growth_percent",
    "a_mean": "average_annual_wage"
})

print(simplified_analysis.head(10))  # view first 10 rows



In [None]:
sector_map = {
    "software": "Tech",
    "computer" : "Tech",
    "engineer": "Tech",
    "developer": "Tech",
    "scientist": "Tech",
    "nurse": "Healthcare",
    "therapist": "Healthcare",
    "teacher": "Education",
    "professor": "Education",
    "manager": "Management",
    "mechanic": "Manufacturing",
    "machinist": "Manufacturing",
    "construction": "Construction",
    "architecture": "Architecture",
    "manager" : "Manager",
    "sales" : "Retail",
    "cashier": "Retail",
    "accountant": "Finance",
    "clerk": "Administration",
    "lawyer": "Legal",
    "judge": "Legal",
    "food": "Food",
    "Database": "Tech",
}


def map_sector_from_title(title, mapping):
    if not isinstance(title, str):
        return "Other"
    title_lower = title.lower()
    for keyword, sector in mapping.items():
        if keyword in title_lower:
            return sector
    return "Other"

# Apply to DataFrame
simplified_analysis["mapped_sector"] = simplified_analysis["occupation_title"].apply(
    lambda x: map_sector_from_title(x, sector_map)
)

# Aggregate: mean wage and employment growth per sector
sector_summary = simplified_analysis.groupby("mapped_sector")[["average_annual_wage", "employment_growth_percent"]].mean().reset_index()

# Minimal bar plot: Employment growth per sector
plt.figure(figsize=(8, 5))
plt.bar(sector_summary["mapped_sector"], sector_summary["employment_growth_percent"], color='gray')
plt.ylabel("Avg Employment Growth (%)")
plt.title("Average Employment Growth by Sector")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

