### Introduction
In this notebook, we will explore and visualize a dataset on **Gross Domestic Product (GDP) Per Capita** by Gap Minder. The data is sourced from [Kaggle](https://www.kaggle.com/datasets/albertovidalrod/gapminder-dataset?select=gapminder_data_graphs.csv).  

Using **Altair** and **Python**, we will create interactive visualizations to uncover trends and insights from this important public health dataset.  

In [22]:
import pandas as pd
import altair as alt

#### Read the dataset

In this section, we're going to read the dataset, and inspect it

In [24]:
df = pd.read_csv("./gapminder_data_graphs.csv")

print(df.describe())
print(df.columns)
print(df.head())


              year     life_exp    hdi_index  co2_consump            gdp  \
count  3675.000000  3675.000000  3563.000000  3671.000000    3633.000000   
mean   2008.000000    69.849306     0.674864     4.712731   11966.053675   
std       6.056125     8.886563     0.164834     6.567435   17105.787953   
min    1998.000000    32.500000     0.255000     0.015900     238.000000   
25%    2003.000000    63.900000     0.537500     0.560500    1470.000000   
50%    2008.000000    71.700000     0.699000     2.250000    4280.000000   
75%    2013.000000    76.400000     0.805000     6.615000   13600.000000   
max    2018.000000    84.800000     0.956000    67.100000  105000.000000   

          services  
count  3675.000000  
mean     51.248705  
std      18.312501  
min       5.590000  
25%      37.600000  
50%      52.900000  
75%      65.700000  
max      88.500000  
Index(['country', 'continent', 'year', 'life_exp', 'hdi_index', 'co2_consump',
       'gdp', 'services'],
      dtype='object'

Based on the head, we can see that GDP column is actually GDP per capita, so change the name of the column in the dataframe

In [26]:
df.rename(columns={"gdp": "gdp_per_capita"}, inplace=True)


### Data Visualization
First, lets visualize which countries have the highest amount of GDP per capita in the most recent year

In [None]:
most_recent_year = df['year'].max()
print(most_recent_year)

chart = alt.Chart(df).mark_bar().transform_filter(alt.datum.year == most_recent_year).encode(
    x=alt.X('country', sort='-y'),
    y='gdp_per_capita')

chart.display()


2018


  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


Now, lets take top 20 countries and analyze them further.

In [45]:
top_20_countries_2018 = df[df['year'] == 2018].sort_values(by='gdp_per_capita', ascending=False).head(20)['country']

filtered_df = df[df['country'].isin(top_20_countries_2018)]

In [47]:
chart = alt.Chart(filtered_df).mark_bar().encode(
    x='year:O',
    y='gdp_per_capita:Q',
    color='country:N',
    tooltip=['country', 'year', 'gdp_per_capita']
).properties(
    title='GDP Per Capita Over Time'
)

chart.display()

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


Lets visualize GDP/capita growth by country.

In [56]:
filtered_df = filtered_df.sort_values(by=['country', 'year'])

filtered_df['gdp_growth'] = filtered_df.groupby('country')['gdp_per_capita'].pct_change() * 100

filtered_df = filtered_df.dropna(subset=['gdp_growth'])

selection = alt.selection_single(fields=['country'], bind='legend', name="Country", empty='all')

# Create the chart with interactive selection and opacity for non-selected countries
chart = alt.Chart(filtered_df).mark_line().encode(
    x='year:O',  # Ordinal scale for the year
    y='gdp_growth:Q',  # Quantitative scale for GDP growth rate
    color='country:N',  # Different colors for each country
    opacity=alt.condition(
        selection,
        alt.value(1),
        alt.value(0.2)
    ),
    tooltip=['country', 'year', 'gdp_growth'] 
).add_selection(
    selection
).properties(
    title='GDP Growth Over Time by Country (Highlight Selected Country)'
)

chart


  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


Finally, lets visualize the correlation between GDP per capita and life expectancy

In [None]:
selection_gdp = alt.selection_single(fields=['country'], bind='legend', name="Country (GDP)", empty='all')

# Create the GDP per capita chart (Chart 1)
gdp_chart = alt.Chart(filtered_df).mark_bar().encode(
    x='gdp_per_capita:Q',
    y='country:N', 
    color='country:N',
    tooltip=['country', 'gdp_per_capita'],
    opacity=alt.condition(
        selection_gdp, alt.value(1), alt.value(0.2)  
    )
).add_selection(
    selection_gdp
).properties(
    title='GDP per Capita by Country'
)

# Create the life expectancy chart (Chart 2)
life_exp_chart = alt.Chart(filtered_df).mark_line().encode(
    x='year:O', 
    y=alt.Y('life_exp:Q', scale=alt.Scale(domain=[65, 85])),  # Set y-axis scale from 65 to 85 years
    color='country:N', 
    tooltip=['country', 'year', 'life_exp'],
).transform_filter(
    selection_gdp
).properties(
    title='Life Expectancy Over Time for Selected Country'
)

# Combine the charts vertically
final_chart = alt.hconcat(
    gdp_chart,
    life_exp_chart 
)

final_chart

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


### Conclusion

In this project, we analyzed global data to explore the relationship between GDP per capita and life expectancy over time. We utilized an interactive visualization tool, Altair, to create dynamic charts that allow users to explore and compare the data for different countries.