## Age Group Analysis
#### Author: liufeic

#### Conclusion: 
The age group analysis reveals that GDP growth rates remain consistent between youth (15-24) and adults (25+), with no observable differences. However, unemployment rates show a stark disparity, with youth consistently facing much higher rates compared to adults. The weak negative correlations between GDP growth and unemployment rates for both age groups suggest minimal influence of GDP growth on unemployment outcomes, highlighting the need to address unemployment disparities independently of economic growth trends.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from seaborn import objects as so
import plotly.express as px
import plotly.graph_objects as go
pd.set_option('display.max_columns', None)

gdp_unemp = pd.read_csv('../Data/gdp_unemp_final.csv')
gdp_unemp.head()

#NOTICE: the GDP data in the dataset is identical across sex and age groups for each country, it means 
#        that the GDP is not differentiated based on those demographics. 
#Despite the uniformity, the analysis was performed to visualize and confirm this observation.


#FUTURE WORK: Future studies could focus on GDP contribution rates (e.g., labor force participation, productivity, 
#             or sector-specific contributions) by gender to provide more meaningful insights.

Unnamed: 0,Country,Continent,Sex,Age_Group,Age_Categories,GDP Growth Rate % [2014],GDP Growth Rate % [2015],GDP Growth Rate % [2016],GDP Growth Rate % [2017],GDP Growth Rate % [2018],GDP Growth Rate % [2019],GDP Growth Rate % [2020],GDP Growth Rate % [2021],GDP Growth Rate % [2022],GDP Growth Rate % [2023],Unemployment Rate [2014],Unemployment Rate [2015],Unemployment Rate [2016],Unemployment Rate [2017],Unemployment Rate [2018],Unemployment Rate [2019],Unemployment Rate [2020],Unemployment Rate [2021],Unemployment Rate [2022],Unemployment Rate [2023]
0,AFGHANISTAN,AS,Female,15-24,Youth,2.7,1.0,2.2,2.6,1.2,3.9,-2.4,-14.5,-6.2,2.7,13.34,15.974,18.57,21.137,20.649,20.154,21.228,21.64,30.561,32.2
1,AFGHANISTAN,AS,Female,25+,Adults,2.7,1.0,2.2,2.6,1.2,3.9,-2.4,-14.5,-6.2,2.7,8.576,9.014,9.463,9.92,11.223,12.587,14.079,14.415,23.818,26.192
2,AFGHANISTAN,AS,Male,15-24,Youth,2.7,1.0,2.2,2.6,1.2,3.9,-2.4,-14.5,-6.2,2.7,9.206,11.502,13.772,16.027,15.199,14.361,14.452,15.099,16.655,18.512
3,AFGHANISTAN,AS,Male,25+,Adults,2.7,1.0,2.2,2.6,1.2,3.9,-2.4,-14.5,-6.2,2.7,6.463,6.879,7.301,7.728,7.833,7.961,8.732,9.199,11.357,12.327
4,ALBANIA,EU,Female,15-24,Youth,1.8,2.2,3.3,3.8,4.0,2.1,-3.3,8.9,4.9,3.5,32.59,40.274,34.102,27.429,25.765,26.005,29.766,28.687,27.004,25.758


### Step 1: Reshaping the Data
##### This structure allows for easier filtering, grouping, and analysis across years, regions, and demographic categories.

In [2]:
years = range(2014, 2024)
gdp_col = [f'GDP Growth Rate % [{year}]' for year in years]
unemp_col = [f'Unemployment Rate [{year}]' for year in years]
col = ['Country', 'Continent', 'Sex', 'Age_Group', 'Age_Categories']

# Step 1: Reshape GDP Growth Rate
reshaped_gdp = pd.melt(
    gdp_unemp[col + gdp_col], 
    id_vars = col, 
    value_vars = gdp_col, 
    var_name = 'Year', 
    value_name = 'GDP Growth Rate'
)
reshaped_gdp['Year'] = reshaped_gdp['Year'].str.extract(r'\[(\d+)\]').astype(int)  # Extract numeric year

# Step 2: Reshape Unemployment Rate
reshaped_unemp = pd.melt(
    gdp_unemp[col + unemp_col], 
    id_vars = col,
    value_vars = unemp_col, 
    var_name = 'Year', 
    value_name = 'Unemployment Rate' 
)
reshaped_unemp['Year'] = reshaped_unemp['Year'].str.extract(r'\[(\d+)\]').astype(int)  # Extract numeric year

# Step 3: Merge the two reshaped datasets
reshaped_df = pd.merge(
    reshaped_gdp, reshaped_unemp, 
    on=['Country', 'Continent', 'Year', 'Sex', 'Age_Group', 'Age_Categories']  # Shared columns
)

reshaped_df.head()

Unnamed: 0,Country,Continent,Sex,Age_Group,Age_Categories,Year,GDP Growth Rate,Unemployment Rate
0,AFGHANISTAN,AS,Female,15-24,Youth,2014,2.7,13.34
1,AFGHANISTAN,AS,Female,25+,Adults,2014,2.7,8.576
2,AFGHANISTAN,AS,Male,15-24,Youth,2014,2.7,9.206
3,AFGHANISTAN,AS,Male,25+,Adults,2014,2.7,6.463
4,ALBANIA,EU,Female,15-24,Youth,2014,1.8,32.59


### Step 2:  Validate and Explore Data

In [3]:
# Check Unique Values in Age_Group Column
reshaped_df['Age_Group'].unique()

array(['15-24', '25+'], dtype=object)

In [4]:
# Verify no Missing Values in Age_Group Column
reshaped_df['Age_Group'].isnull().sum()

0

### Step 3: Summary Statistics

In [5]:
# Summary Statistics by Age_Group and Age_Categories

# Summary statistics for GDP Growth Rate
summary_gdp_age = reshaped_df.groupby(['Age_Group', 'Age_Categories'])['GDP Growth Rate'].agg(['mean', 'median', 'std']).reset_index()

# Summary statistics for Unemployment Rate
summary_unemp_age = reshaped_df.groupby(['Age_Group', 'Age_Categories'])['Unemployment Rate'].agg(['mean', 'median', 'std']).reset_index()

# Display results
print('Summary Statistics for GDP Growth Rate by Age_Group and Age_Categories')
display(summary_gdp_age)

print()

print('Summary Statistics for Unemployment Rate by Age_Group and Age_Categories')
display(summary_unemp_age)

# Insights :
# - Both Youth (15-24) and Adults (25+) have the same average GDP Growth Rate of 2.68%.
# - Youth (15-24) have a much higher average unemployment rate (17.52%) compared to Adults (6.32%).

Summary Statistics for GDP Growth Rate by Age_Group and Age_Categories


Unnamed: 0,Age_Group,Age_Categories,mean,median,std
0,15-24,Youth,2.678101,3.0,5.780046
1,25+,Adults,2.678101,3.0,5.780046



Summary Statistics for Unemployment Rate by Age_Group and Age_Categories


Unnamed: 0,Age_Group,Age_Categories,mean,median,std
0,15-24,Youth,17.525436,13.204,13.466985
1,25+,Adults,6.321915,4.4685,5.466375


### Step 4: Analyze Trends in GDP Growth Rate and Unemployment Rate by Age_Group and Age_Categories Over Time -- Line Plot
##### This step allows us to examine trends across years and identify whether there are any consistent patterns or differences by age.

In [6]:
# Group data by Year, Age_Group, and Age_Categories, calculating the mean
trends_age = reshaped_df.groupby(['Year', 'Age_Group', 'Age_Categories'])[['GDP Growth Rate', 'Unemployment Rate']].mean().reset_index()

# Display the trends
print('Trends in GDP Growth Rate and Unemployment Rate by Age_Group and Age_Categories Over Time')
display(trends_age)

# Reshape the data for plotting
melted_trends_age = trends_age.melt(
    id_vars = ['Year', 'Age_Group', 'Age_Categories'],
    value_vars = ['GDP Growth Rate', 'Unemployment Rate'],
    var_name = 'Rate Type',
    value_name = 'Rate'
)

# Create a line plot
fig = px.line(
    melted_trends_age,
    x = 'Year',
    y = 'Rate',
    color = 'Age_Group',
    line_dash = 'Rate Type',  # Differentiates GDP and Unemployment
    title = 'Trends in GDP Growth Rate and Unemployment Rate by Age_Group and Age_Categories Over Time',
    labels = {'Rate': 'Average Rate (%)', 'Year': 'Year', 'Rate Type': 'Rate Type'},
    markers = True
)

overlap_data = melted_trends_age[(melted_trends_age['Age_Group'] == '15-24') & (melted_trends_age['Rate Type'] == 'GDP Growth Rate')]

fig.add_trace(
    go.Scatter(
        x = overlap_data['Year'],
        y = overlap_data['Rate'],
        mode = 'lines',
        line = dict(color = 'blue', width=1),
        name = '15-24, GDP Growth Rate (Overlap)',
    )
)

fig.show()

# 1. GDP Growth Rate:
# - Youth (15-24) and Adults (25+) GDP growth rates overlap entirely, showing no variation between the two groups.
# - Both age groups experienced a significant decline in 2020 (likely due to COVID-19) with a recovery in 2021.
# - The overlap indicates that economic growth trends impact both Youth and Adults similarly over the analyzed period.

# 2. Unemployment Rate:
# - Youth consistently have higher unemployment rates than Adults throughout the years.
# - Youth unemployment shows more variability, indicating that younger individuals may be more sensitive to economic changes.
# - Both age groups show a gradual decline in unemployment rates from 2014 to 2023, with Youth consistently experiencing higher rates.

Trends in GDP Growth Rate and Unemployment Rate by Age_Group and Age_Categories Over Time


Unnamed: 0,Year,Age_Group,Age_Categories,GDP Growth Rate,Unemployment Rate
0,2014,15-24,Youth,3.260335,18.287626
1,2014,25+,Adults,3.260335,6.622304
2,2015,15-24,Youth,2.721229,18.07174
3,2015,25+,Adults,2.721229,6.561098
4,2016,15-24,Youth,2.674302,17.824385
5,2016,25+,Adults,2.674302,6.458897
6,2017,15-24,Youth,3.469274,17.442522
7,2017,25+,Adults,3.469274,6.275737
8,2018,15-24,Youth,3.151955,16.885497
9,2018,25+,Adults,3.151955,6.030774


### Step 5: Compare Averages in GDP Growth Rate and Unemployment Rate by Age_Group and Age_Categories -- Bar Plot
##### This step allows us to examine the patterns or differences by age directly.

In [7]:
# Calculate averages for each Age_Group and Age_Categories
averages = reshaped_df.groupby(['Age_Group', 'Age_Categories'])[['GDP Growth Rate', 'Unemployment Rate']].mean().reset_index()

# Display the averages
print('Average GDP Growth Rate and Unemployment by Age Group and Age Categories')
display(averages)

# Melt the data for plotting
melted_avg = averages.melt(id_vars = ['Age_Group', 'Age_Categories'], 
                           var_name = 'Rate Type', 
                           value_name = 'Average Rate')

# Create bar plot
fig = px.bar(melted_avg, 
             x = 'Rate Type', 
             y = 'Average Rate', 
             color = 'Age_Group', 
             barmode = 'group', 
             title = 'Average GDP Growth Rate and Unemployment Rate by Age Group and Age Categories',
             labels = {'Rate Type': 'Rate Type', 'Average Rate': 'Average (%)'})

fig.show()

# Insights:
# - Youth (15-24) has a higher average unemployment rate compared to Adults (25+).
# - GDP Growth Rates are identical for Youth and Adults.


Average GDP Growth Rate and Unemployment by Age Group and Age Categories


Unnamed: 0,Age_Group,Age_Categories,GDP Growth Rate,Unemployment Rate
0,15-24,Youth,2.678101,17.525436
1,25+,Adults,2.678101,6.321915


### Step 6: Country-Level Analysis -- Map
##### This step allows us to examine trends across each countries

In [8]:
# Aggregate GDP Growth Rate and Unemployment Rate by Country, Age Group, and Age Categories
country_age_analysis = reshaped_df.groupby(['Country', 'Age_Group', 'Age_Categories'])[['GDP Growth Rate', 'Unemployment Rate']].mean().reset_index()
print('Aggregated Data by Country, Age Group, and Age Categories:')
display(country_age_analysis.head())

# GDP Growth Rate map for Age Group and Age Categories
fig_gdp_age = px.choropleth(
    country_age_analysis,
    locations = 'Country',
    locationmode = 'country names',
    color = 'GDP Growth Rate',
    facet_col = 'Age_Group',  # Differentiates by Age Group
    title = 'World Map: GDP Growth Rate by Age Group and Age Categories',
    color_continuous_scale = 'Blues',
    labels = {'GDP Growth Rate': 'GDP Growth Rate (%)'}
)
fig_gdp_age.update_layout(title_x=0.5)
fig_gdp_age.show()

# Unemployment Rate map for Age Group and Age Categories
fig_unemp_age = px.choropleth(
    country_age_analysis,
    locations = 'Country',
    locationmode = 'country names',
    color = 'Unemployment Rate',
    facet_col = 'Age_Group',  # Differentiates by Age Group
    title = 'World Map: Unemployment Rate by Age Group and Age Categories',
    color_continuous_scale = 'Reds',
    labels = {'Unemployment Rate': 'Unemployment Rate (%)'}
)
fig_unemp_age.update_layout(title_x=0.5)
fig_unemp_age.show()

# Insights from World Maps (Age Group and Categories):
# 1. GDP Growth Rate:
#    - Youth (15-24) and Adults (25+) show similar GDP growth trends globally.
#    - Africa and South America have lower GDP growth compared to Asia and Europe.
#    - GDP growth differences are more regional than age-based.

# 2. Unemployment Rate:
#    - Youth (15-24) have much higher unemployment rates than Adults (25+) globally.
#    - North Africa, the Middle East, and Sub-Saharan Africa show the highest youth unemployment.
#    - Adults have stable and lower unemployment rates across all regions.

# 3. Key Takeaway:
#    - Youth unemployment is a global issue, especially in developing regions.

Aggregated Data by Country, Age Group, and Age Categories:


Unnamed: 0,Country,Age_Group,Age_Categories,GDP Growth Rate,Unemployment Rate
0,AFGHANISTAN,15-24,Youth,-0.68,18.0119
1,AFGHANISTAN,25+,Adults,-0.68,11.25335
2,ALBANIA,15-24,Youth,3.12,31.4308
3,ALBANIA,25+,Adults,3.12,11.41535
4,ALGERIA,15-24,Youth,2.15,36.8413


### Step 7: Continent-Level Analysis --- Bar Plots 
##### This step allows us to examine trends across each continents

In [9]:
# Group data by Continent, Age_Group, and Age_Categories
continent_age = reshaped_df.groupby(['Continent', 'Age_Group', 'Age_Categories'])[['GDP Growth Rate', 'Unemployment Rate']].mean().reset_index()

# Display aggregated data
print("Aggregated Data by Continent, Age_Group, and Age_Categories:")
display(continent_age)

# Create bar plot for comparison
fig = px.bar(
    continent_age,
    x = 'Continent',
    y = ['GDP Growth Rate', 'Unemployment Rate'],
    color = 'Age_Group',
    barmode = 'group',
    title = 'Average GDP Growth Rate and Unemployment Rate by Continent and Age Groups',
    labels = {'value': 'Average (%)', 'variable': 'Rate Type'},
     facet_col = 'variable', 
    height = 600,
    width = 1000
)

fig.show()


# GDP Growth Rate:
# - GDP Growth Rates are consistent across continents for both Youth and Adults, with no significant regional variation.

# Unemployment Rate:
# - Youth unemployment is significantly higher than adults across all continents.
# - Oceania reports the lowest unemployment rates for both age groups.

Aggregated Data by Continent, Age_Group, and Age_Categories:


Unnamed: 0,Continent,Age_Group,Age_Categories,GDP Growth Rate,Unemployment Rate
0,AF,15-24,Youth,3.004808,19.075791
1,AF,25+,Adults,3.004808,7.878346
2,AS,15-24,Youth,2.961957,14.031218
3,AS,25+,Adults,2.961957,4.581543
4,AS/EU,15-24,Youth,3.085,20.676225
5,AS/EU,25+,Adults,3.085,7.44395
6,EU,15-24,Youth,2.533947,19.164667
7,EU,25+,Adults,2.533947,7.030217
8,NAM,15-24,Youth,1.99,19.98645
9,NAM,25+,Adults,1.99,6.353113


### Step 8: Correlation Analysis

In [10]:
# Examine the relationship between GDP Growth Rate and Unemployment Rate for each Age Group
# Compute correlations for each Age Group separately
correlation_youth = reshaped_df[reshaped_df['Age_Group'] == '15-24'][['GDP Growth Rate', 'Unemployment Rate']].corr()
correlation_adults = reshaped_df[reshaped_df['Age_Group'] == '25+'][['GDP Growth Rate', 'Unemployment Rate']].corr()

# Print correlation results
print('Correlation between GDP Growth Rate and Unemployment Rate for Youth (15-24):')
print(correlation_youth)

print('Correlation between GDP Growth Rate and Unemployment Rate for Adults (25+):')
print(correlation_adults)

# Scatter plots with regression lines for Youth and Adults
fig_youth = px.scatter(
    reshaped_df[reshaped_df['Age_Group'] == '15-24'],
    x='Unemployment Rate',
    y='GDP Growth Rate',
    trendline='ols',
    title='Correlation Between GDP Growth Rate and Unemployment Rate (Youth: 15-24)',
    labels={'Unemployment Rate': 'Unemployment Rate (%)', 'GDP Growth Rate': 'GDP Growth Rate (%)'}
)

fig_adults = px.scatter(
    reshaped_df[reshaped_df['Age_Group'] == '25+'],
    x='Unemployment Rate',
    y='GDP Growth Rate',
    trendline='ols',
    title='Correlation Between GDP Growth Rate and Unemployment Rate (Adults: 25+)',
    labels={'Unemployment Rate': 'Unemployment Rate (%)', 'GDP Growth Rate': 'GDP Growth Rate (%)'}
)

# Display scatter plots
fig_youth.show()
fig_adults.show()

# Insights Based on Correlation Analysis for Age Groups:

# 1. Youth (15-24):
# - Weak negative correlation (-0.10) between GDP Growth Rate and Unemployment Rate.
# - Indicates minimal impact of unemployment changes on GDP growth for youth.
# - Scatter plot shows a broad dispersion, reflecting weak predictability.

# 2. Adults (25+):
# - Weak negative correlation (-0.11) between GDP Growth Rate and Unemployment Rate.
# - Slightly stronger than youth but still minimal.
# - Scatter plot also shows a widely spread relationship, confirming low correlation.

# Observations:
# - Both age groups show negligible relationships between GDP growth and unemployment.

Correlation between GDP Growth Rate and Unemployment Rate for Youth (15-24):
                   GDP Growth Rate  Unemployment Rate
GDP Growth Rate           1.000000          -0.101467
Unemployment Rate        -0.101467           1.000000
Correlation between GDP Growth Rate and Unemployment Rate for Adults (25+):
                   GDP Growth Rate  Unemployment Rate
GDP Growth Rate           1.000000          -0.106604
Unemployment Rate        -0.106604           1.000000
