## Sex and Age Group Analysis
### Author: liufeic

#### Conclusion: 
This analysis highlights that GDP growth rates are consistent across sexes, indicating no gender-specific economic growth disparities at both global and regional levels. However, unemployment rates reveal gender-based differences, with females generally facing higher unemployment rates than males, particularly in regions like Africa and South America. The weak correlation between GDP growth and unemployment rates for both sexes suggests that other factors may play a more significant role in influencing employment trends than economic growth alone.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from seaborn import objects as so
import plotly.express as px
import plotly.graph_objects as go
pd.set_option('display.max_columns', None)

gdp_unemp = pd.read_csv('../Data/gdp_unemp_final.csv')

# Display a preview of the dataset
gdp_unemp.head()

#NOTICE: the GDP data in the dataset is identical across sex and age groups for each country, it means 
#        that the GDP is not differentiated based on those demographics. 
#Despite the uniformity, the analysis was performed to visualize and confirm this observation.


#FUTURE WORK: Future studies could focus on GDP contribution rates (e.g., labor force participation, productivity, 
#             or sector-specific contributions) by gender to provide more meaningful insights.

Unnamed: 0,Country,Continent,Sex,Age_Group,Age_Categories,GDP Growth Rate % [2014],GDP Growth Rate % [2015],GDP Growth Rate % [2016],GDP Growth Rate % [2017],GDP Growth Rate % [2018],GDP Growth Rate % [2019],GDP Growth Rate % [2020],GDP Growth Rate % [2021],GDP Growth Rate % [2022],GDP Growth Rate % [2023],Unemployment Rate [2014],Unemployment Rate [2015],Unemployment Rate [2016],Unemployment Rate [2017],Unemployment Rate [2018],Unemployment Rate [2019],Unemployment Rate [2020],Unemployment Rate [2021],Unemployment Rate [2022],Unemployment Rate [2023]
0,AFGHANISTAN,AS,Female,15-24,Youth,2.7,1.0,2.2,2.6,1.2,3.9,-2.4,-14.5,-6.2,2.7,13.34,15.974,18.57,21.137,20.649,20.154,21.228,21.64,30.561,32.2
1,AFGHANISTAN,AS,Female,25+,Adults,2.7,1.0,2.2,2.6,1.2,3.9,-2.4,-14.5,-6.2,2.7,8.576,9.014,9.463,9.92,11.223,12.587,14.079,14.415,23.818,26.192
2,AFGHANISTAN,AS,Male,15-24,Youth,2.7,1.0,2.2,2.6,1.2,3.9,-2.4,-14.5,-6.2,2.7,9.206,11.502,13.772,16.027,15.199,14.361,14.452,15.099,16.655,18.512
3,AFGHANISTAN,AS,Male,25+,Adults,2.7,1.0,2.2,2.6,1.2,3.9,-2.4,-14.5,-6.2,2.7,6.463,6.879,7.301,7.728,7.833,7.961,8.732,9.199,11.357,12.327
4,ALBANIA,EU,Female,15-24,Youth,1.8,2.2,3.3,3.8,4.0,2.1,-3.3,8.9,4.9,3.5,32.59,40.274,34.102,27.429,25.765,26.005,29.766,28.687,27.004,25.758


### Step 1: Reshaping the Data
##### This structure allows for easier filtering, grouping, and analysis across years, regions, and demographic categories.

In [2]:
years = range(2014, 2024)
gdp_col = [f'GDP Growth Rate % [{year}]' for year in years]
unemp_col = [f'Unemployment Rate [{year}]' for year in years]
col = ['Country', 'Continent', 'Sex', 'Age_Group', 'Age_Categories']

# Step 1: Reshape GDP Growth Rate
reshaped_gdp = pd.melt(
    gdp_unemp[col + gdp_col], 
    id_vars = col, 
    value_vars = gdp_col, 
    var_name = 'Year', 
    value_name = 'GDP Growth Rate'
)
reshaped_gdp['Year'] = reshaped_gdp['Year'].str.extract(r'\[(\d+)\]').astype(int)  # Extract numeric year

# Step 2: Reshape Unemployment Rate
reshaped_unemp = pd.melt(
    gdp_unemp[col + unemp_col], 
    id_vars = col,
    value_vars = unemp_col, 
    var_name = 'Year', 
    value_name = 'Unemployment Rate' 
)
reshaped_unemp['Year'] = reshaped_unemp['Year'].str.extract(r'\[(\d+)\]').astype(int)  # Extract numeric year

# Step 3: Merge the two reshaped datasets
reshaped_df = pd.merge(
    reshaped_gdp, reshaped_unemp, 
    on=['Country', 'Continent', 'Year', 'Sex', 'Age_Group', 'Age_Categories']  # Shared columns
)

reshaped_df.head()


Unnamed: 0,Country,Continent,Sex,Age_Group,Age_Categories,Year,GDP Growth Rate,Unemployment Rate
0,AFGHANISTAN,AS,Female,15-24,Youth,2014,2.7,13.34
1,AFGHANISTAN,AS,Female,25+,Adults,2014,2.7,8.576
2,AFGHANISTAN,AS,Male,15-24,Youth,2014,2.7,9.206
3,AFGHANISTAN,AS,Male,25+,Adults,2014,2.7,6.463
4,ALBANIA,EU,Female,15-24,Youth,2014,1.8,32.59


### Step 2:  Validate and Explore Data

In [3]:
# Check Unique Values in Sex Column
reshaped_df['Sex'].unique()

array(['Female', 'Male'], dtype=object)

In [4]:
# Verify no Missing Values in Sex Column
reshaped_df['Sex'].isnull().sum()

0

### Step 3: Summary Statistics

In [5]:
# Compute summary statistics for GDP Growth Rate and Unemployment Rate by Sex

# Summary for GDP Growth Rate by Sex
summary_gdp_sex = reshaped_df.groupby('Sex')['GDP Growth Rate'].agg(['mean', 'median', 'std']).reset_index()

# Summary for Unemployment Rate by Sex
summary_unemp_sex = reshaped_df.groupby('Sex')['Unemployment Rate'].agg(['mean', 'median', 'std']).reset_index()

# Display both summaries
print('Summary Statistics for GDP Growth Rate by Sex')
display(summary_gdp_sex)

print()

print('Summary Statistics for Unemployment Rate by Sex')
display(summary_unemp_sex)

# Insights:
# - Unemployment rates differ by sex, with females experiencing higher unemployment on average.
# - GDP growth shows no variation by sex.

Summary Statistics for GDP Growth Rate by Sex


Unnamed: 0,Sex,mean,median,std
0,Female,2.678101,3.0,5.780046
1,Male,2.678101,3.0,5.780046



Summary Statistics for Unemployment Rate by Sex


Unnamed: 0,Sex,mean,median,std
0,Female,13.13092,8.589,12.818532
1,Male,10.716431,7.371,10.333801


### Step 4: Analyze Trends in GDP Growth Rate and Unemployment Rate by Sex Over Time -- Line Plot
##### This step allows us to examine trends across years and identify whether there are any consistent patterns or differences by sex.

In [6]:
# Group data by Year and Sex, calculating mean for GDP Growth Rate and Unemployment Rate
trends_sex = reshaped_df.groupby(['Year', 'Sex'])[['GDP Growth Rate', 'Unemployment Rate']].mean().reset_index()

# Display the trends
print('Trends in GDP Growth Rate and Unemployment Rate by Sex Over Time')
display(trends_sex)

Trends in GDP Growth Rate and Unemployment Rate by Sex Over Time


Unnamed: 0,Year,Sex,GDP Growth Rate,Unemployment Rate
0,2014,Female,3.260335,13.575559
1,2014,Male,3.260335,11.334372
2,2015,Female,2.721229,13.441302
3,2015,Male,2.721229,11.191536
4,2016,Female,2.674302,13.29252
5,2016,Male,2.674302,10.990763
6,2017,Female,3.469274,13.090265
7,2017,Male,3.469274,10.627994
8,2018,Female,3.151955,12.64286
9,2018,Male,3.151955,10.273411


In [7]:
# Line Plot for Trends in GDP Growth Rate and Unemployment Rate by Sex Group Over Time
melted_trends = trends_sex.melt(
    id_vars = ['Year', 'Sex'], 
    value_vars = ['GDP Growth Rate', 'Unemployment Rate'],
    var_name = 'Rate Type', 
    value_name = 'Rate'
)

fig = px.line(
    melted_trends,
    x = 'Year',
    y = 'Rate',
    color = 'Sex',
    line_dash = 'Rate Type',  # Differentiates GDP and Unemployment
    title = 'Trends in GDP Growth Rate and Unemployment Rate by Sex Over Time',
    labels = {'Rate': 'Average Rate (%)', 'Year': 'Year', 'Rate Type': 'Rate Type'},
    markers = True
)

overlap_data = melted_trends[(melted_trends['Sex'] == 'Female') & (melted_trends['Rate Type'] == 'GDP Growth Rate')]

fig.add_trace(
    go.Scatter(
        x = overlap_data['Year'],
        y = overlap_data['Rate'],
        mode = 'lines',
        line = dict(color = 'blue', width=1),
        name = 'Female GDP Growth Rate (Overlap)',
    )
)

fig.show()

# 1. GDP Growth Rate:
#    - Male and female GDP growth rates overlap entirely, showing no variation by sex.
#    - Significant drop in 2020 (COVID-19), followed by recovery in 2021.

# 2. Unemployment Rate:
#    - Female unemployment consistently higher than male unemployment across years.
#    - Both rates show a general decline from 2014 to 2023.

# 3. Highlight:
#    - Highlighting female GDP growth emphasizes overlapping trends and directs attention to unemployment differences.

### Step 5: Compare Averages in GDP Growth Rate and Unemployment Rate by Sex -- Bar Plot
##### This step allows us to examine the patterns or differences by sex directly.

In [8]:
# Calculate averages for each sex
averages = reshaped_df.groupby('Sex')[['GDP Growth Rate', 'Unemployment Rate']].mean().reset_index()

# Display the averages
print('Average GDP Growth Rate and Unemployment')
display(averages)
      
# Melt the data for plotting
melted_avg = averages.melt(id_vars = 'Sex', var_name = 'Rate Type', value_name = 'Average Rate')

# Create bar plot
fig = px.bar(melted_avg, x = 'Rate Type', y = 'Average Rate', color = 'Sex',
             barmode = 'group',
             title = 'Average GDP Growth Rate and Unemployment Rate by Sex',
             labels = {'Rate Type': 'Rate Type', 'Average Rate': 'Average (%)'})
fig.show()

# The average GDP growth rate is identical for males and females at 2.68%. This reinforces that GDP growth 
# in the dataset does not depend on sex, but higher unemployment for females highlights potential gender-based employment challenges.

Average GDP Growth Rate and Unemployment


Unnamed: 0,Sex,GDP Growth Rate,Unemployment Rate
0,Female,2.678101,13.13092
1,Male,2.678101,10.716431


### Step 6: Country-Level Analysis -- Map
##### This step allows us to examine trends across each countries

In [9]:
# Aggregate GDP Growth Rate and Unemployment Rate by Country and Sex
country_sex = reshaped_df.groupby(['Country', 'Sex'])[['GDP Growth Rate', 'Unemployment Rate']].mean().reset_index()
print('Aggregated Data by Country and Sex:')
display(country_sex.head())

Aggregated Data by Country and Sex:


Unnamed: 0,Country,Sex,GDP Growth Rate,Unemployment Rate
0,AFGHANISTAN,Female,-0.68,17.737
1,AFGHANISTAN,Male,-0.68,11.52825
2,ALBANIA,Female,3.12,20.4841
3,ALBANIA,Male,3.12,22.36205
4,ALGERIA,Female,2.15,31.728


In [10]:
# GDP Growth Rate map for males and females
fig_gdp = px.choropleth(
    country_sex,
    locations = 'Country', 
    locationmode = 'country names', 
    color = 'GDP Growth Rate', 
    facet_col = 'Sex', 
    title = 'World Map: GDP Growth Rate by Sex',
    color_continuous_scale = 'Blues',
    labels={'GDP Growth Rate': 'GDP Growth Rate (%)'}
)

fig_gdp.update_layout(title_x = 0.5) 
fig_gdp.show()

# Unemployment Rate map for males and females
fig_unemp = px.choropleth(
    country_sex,
    locations = 'Country',
    locationmode = 'country names',
    color = 'Unemployment Rate',
    facet_col = 'Sex',
    title = 'World Map: Unemployment Rate by Sex',
    color_continuous_scale = 'Reds',
    labels = {'Unemployment Rate': 'Unemployment Rate (%)'}
)

fig_unemp.update_layout(title_x=0.5)
fig_unemp.show()

# Comments based on the visualized world maps:
# 1. GDP Growth Rate:
#    - Both male and female GDP Growth Rates show similar patterns globally, confirming no significant sex-based disparity in economic growth.
#    - Countries in regions like Africa and South America exhibit lower GDP growth compared to Asia or Europe, regardless of sex.

# 2. Unemployment Rate:
#    - There is a notable disparity in unemployment rates between males and females in many regions.
#    - Female unemployment rates are consistently higher in regions like the Middle East, North Africa, and South Asia, indicating significant employment challenges for females.
#    - Male unemployment rates are generally lower but follow a similar regional distribution, with higher rates in economically struggling regions.

# Insights:
# - These maps highlight global unemployment disparities by sex, despite GDP growth being consistent across sexes.

### Step 7: Continent-Level Analysis --- Bar Plots 
##### This step allows us to examine trends across each continents

In [11]:
# Group data by Continent and Sex, and calculate average GDP Growth Rate and Unemployment Rate
continent_sex = reshaped_df.groupby(['Continent', 'Sex'])[['GDP Growth Rate', 'Unemployment Rate']].mean().reset_index()

# Display the aggregated data
print('Aggregated Data by Continent and Sex:')
display(continent_sex)

# Create a bar plot to compare the averages by Continent and Sex
fig = px.bar(
    continent_sex,
    x = 'Continent',
    y = ['GDP Growth Rate', 'Unemployment Rate'],
    color = 'Sex',
    barmode = 'group',
    title = 'Average GDP Growth Rate and Unemployment Rate by Continent and Sex',
    labels = {'value': 'Average (%)', 'variable': 'Rate Type'},
    facet_col = 'variable', 
    height = 600,
    width = 1000
)

fig.show()

# Observations Based on the Bar Plot:

# GDP Growth Rate:
# - Across all continents, the GDP growth rates for females and males are nearly identical,
#   indicating no significant disparity in GDP growth based on sex.

# Unemployment Rate:
# - Female unemployment rates are consistently higher than male unemployment rates across most continents,
#   highlighting a potential gender disparity in employment opportunities.
# - Africa (AF) and South America (SAM) show the highest unemployment rates for females,
#   while Oceania (OC) has the lowest for both sexes.

# Insights:
# - The data emphasizes that while economic growth impacts sexes equally.
#Gender disparities in unemployment persist globally, varying by region.


Aggregated Data by Continent and Sex:


Unnamed: 0,Continent,Sex,GDP Growth Rate,Unemployment Rate
0,AF,Female,3.004808,15.063198
1,AF,Male,3.004808,11.890939
2,AS,Female,2.961957,10.726218
3,AS,Male,2.961957,7.886543
4,AS/EU,Female,3.085,15.048863
5,AS/EU,Male,3.085,13.071312
6,EU,Female,2.533947,13.148464
7,EU,Male,2.533947,13.04642
8,NAM,Female,1.99,14.587871
9,NAM,Male,1.99,11.751692


### Step 8: Correlation Analysis

In [12]:
# Examine the relationship between GDP Growth Rate and Unemployment Rate for each sex.
# Compute correlations for males and females separately
correlation_female = reshaped_df[reshaped_df['Sex'] == 'Female'][['GDP Growth Rate', 'Unemployment Rate']].corr()
correlation_male = reshaped_df[reshaped_df['Sex'] == 'Male'][['GDP Growth Rate', 'Unemployment Rate']].corr()

# Print correlation results
print('Correlation between GDP Growth Rate and Unemployment Rate for Females:')
print(correlation_female)

print('Correlation between GDP Growth Rate and Unemployment Rate for Males:')
print(correlation_male)

# Scatter plots with regression lines for males and females
fig_female = px.scatter(
    reshaped_df[reshaped_df['Sex'] == 'Female'],
    x = 'Unemployment Rate',
    y = 'GDP Growth Rate',
    trendline = 'ols',
    title = 'Correlation Between GDP Growth Rate and Unemployment Rate (Females)',
    labels = {'x': 'GDP Growth Rate (%)', 'y': 'Unemployment Rate (%)'}
)

fig_male = px.scatter(
    reshaped_df[reshaped_df['Sex'] == 'Male'],
    x = 'Unemployment Rate',
    y = 'GDP Growth Rate',
    trendline = 'ols',
    title = 'Correlation Between GDP Growth Rate and Unemployment Rate (Males)',
    labels = {'x': 'GDP Growth Rate (%)', 'y': 'Unemployment Rate (%)'}
)

# Display scatter plots
fig_female.show()
fig_male.show()

# Insights Based on the Results:
#    - There is a very weak negative relationship between unemployment rate 
#      and GDP growth rate for both males and females. The correlation values 
#      are close to zero (-0.09 for females, -0.08 for males).
#    - The R² values are very small (0.008 for females, 0.006 for males), meaning 
#      unemployment rate explains almost nothing about GDP growth rate.
#    - The results for males and females are almost the same, showing that sex 
#      doesn’t affect the relationship between unemployment and GDP growth.

Correlation between GDP Growth Rate and Unemployment Rate for Females:
                   GDP Growth Rate  Unemployment Rate
GDP Growth Rate           1.000000          -0.090514
Unemployment Rate        -0.090514           1.000000
Correlation between GDP Growth Rate and Unemployment Rate for Males:
                   GDP Growth Rate  Unemployment Rate
GDP Growth Rate           1.000000          -0.076346
Unemployment Rate        -0.076346           1.000000
