# How does climate change feel around the globe?
# The final project from Spiced Academy
# Notebook for creating visualizations

Check data_exploration.ipynb for the description of data sets, cleaning, wrangling and analyses.

In this notebook, we are going to generate visualizations based on the data frames whch we saved as CSV files in the data_exploration.ipynb notebook. We will be generating interactive plots using plotly.express but saving them mostly as static images for the graduation presentation.

## Contents

[Importing libraries and packages](#import)

[Maps and animation of the population exposure in different years](#maps_exp)

[Linecharts for countries (time vs population exposure)](#lines_countries)

[Explore different durations for a selected country (Pakistan)](#lines_pak)

[Correlation between exposure and time](#corr_exp_time)

[Exposure change for the whole world](#exp_world)

[Temperature vs exposure worldwide](#temp_exp)

[The effects in rich and poor countries (GDP per capita)](#rich_poor)

## Importing libraries and packages <a id='import'></a>

In [1]:
import pandas as pd

import plotly.express as px   # interactive and geospatial images
import plotly.graph_objects as go   # to enhance visualizations
import plotly.io as pio   # saving interactive images as static

from tqdm import tqdm   # to see real-time progress of execution

## Maps and animation of the population exposure in different years <a id='maps_exp'></a>

We will create a world map, showing the population exposure in different color brightness. This will be done for several years to export maps for graduation presentation but also for every year, merged into an animation.

Let us start wit loading the data for population exposures, summed over all durations.

In [205]:
df = pd.read_csv('../exported_dfs/exposures_summed.csv')

In many plots, we will be using rolling averages of the exposure over 5 years period to eliminate random fluctuations.

In [172]:
window_size = 5   # 5 years  period

df['rolling_avg'] = df.groupby(['ref_area', 'measure'])['exposure'].rolling(window_size, center=True).mean()\
.reset_index(level=[0, 1, 2], drop=True)

Select the measure and time period.

In [173]:
mask = (df['measure'] == 'HD_TN_POP_IND') & (df['time_period'].between(1981,2019))   # boolean mask

df_plot = df[mask]   # filtered data frame to be plotted / animated

We want to have black and 4 shades of red color in the map, let us define them and assign in a new column of the data frame.

In [174]:
# new empty column
df_plot['color']=''

# fill the column with codes assign to different color shades
for index, row in df_plot.iterrows():
    if row['rolling_avg']<=20:
        df_plot.at[index, 'color']='1'
    elif row['rolling_avg']<=40:
        df_plot.at[index, 'color']='2'
    elif row['rolling_avg']<=60:
        df_plot.at[index, 'color']='3'
    elif row['rolling_avg']<=80:
        df_plot.at[index, 'color']='4'
    else:
        df_plot.at[index, 'color']='5'

# define colors for all codes
worldmap_colors = {
    '1': 'rgb(0, 0, 0)',   
    '2': 'rgb(63, 0, 0)',
    '3': 'rgb(127, 0, 0)',
    '4': 'rgb(191, 0, 0)',
    '5': 'rgb(255, 0, 0)'
}



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Make maps for selected yers.

In [175]:
# List of years for the maps
years = [1981, 1993, 2006, 2019]

# styling constants
title_font_size = 24
annotation_font_size = 80
legend_font_size = 60
caption_bgcolor = '#FFFAE5'
caption_opacity = 1.0

# loop for making maps for all selected years
for year in tqdm(years, desc='Creating Choropleth Maps'):
    # filter the data for the current year
    df_subset = df_plot[df_plot['time_period'] == year]
    
    # create the choropleth map
    fig = px.choropleth(
        data_frame=df_subset,
        locations='ref_area',
        projection='natural earth',
        color='color',
        locationmode='ISO-3',
        color_discrete_map=worldmap_colors,
    )
    
    # add the year caption as an annotation
    caption = f'{year}'   # caption showing the respective year with settings below
    fig.add_annotation(
        text=caption,
        x=0.5,
        y=0.15,
        xref='paper',
        yref='paper',
        showarrow=False,
        font=dict(size=annotation_font_size, color='black'),
        bgcolor=caption_bgcolor,
        opacity=caption_opacity
    )
    
    # update layout for better appearance
    fig.update_layout(
        margin=dict(l=10, r=10, t=10, b=10),   # spaces around map
        title_font=dict(size=title_font_size),   # title
        font=dict(family='Arial', size=12, color='black'),   # title
        paper_bgcolor='#FFFAE5',   # background color outside of plot
        geo=dict(bgcolor='#FFFAE5'),   # background color inside of plot
        showlegend=False   # can be set to True to display legend
    )
    
    # save as an image
    fig.write_image(f'../vizzes/plot_{year}.svg', width=2400, height=1200)   # vector format
    fig.write_image(f'../vizzes/plot_{year}.png', width=2400, height=1200)   # raster format


Creating Choropleth Maps: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.51it/s]


Make an animation with rolling average for every year.

In [176]:
# choropleth map
fig = px.choropleth(data_frame=df_plot, 
                    locations='ref_area', 
                    projection='natural earth',
                    color='color', 
                    locationmode='ISO-3',
                    animation_frame='time_period',   # animating with years
                    animation_group='ref_area',
                    height=600,
                    color_discrete_map=worldmap_colors,
                    title="Rolling Average Across Time Periods",
                    labels={'obs_value': 'Exposure', 'time_period':'Year', 'ref_area':'Country'}   # understandable labels
                   )

# caption showing current year
for frame in fig.frames:
    year = frame['name']
    caption = f'Year: {year}'
    frame['layout'].update(
        annotations=[
            go.layout.Annotation(
                text=caption,
                showarrow=False,
                x=0.5,
                y=0.05,
                xanchor='center',
                yanchor='bottom',
                font=dict(size=18, color="black"),
                bgcolor='white',
                opacity=0.8
            )
        ]
    )

# background colors
fig.update_geos(bgcolor='white')
fig.update_layout(geo=dict(bgcolor='#FFFAE5'))

# improve overall layout
fig.update_layout(
    margin=dict(l=0, r=0, t=100, b=0),   # spaces around map
    title_font=dict(size=24),   # title
    font=dict(family='Arial', size=12, color='black'),   # title
    showlegend=False   # can be set to True to display legend
)

# save animation as html and show
fig.write_html('../vizzes/animation.html')
fig.show()

## Linecharts for countries (time vs population exposure) <a id='lines_countries'></a>

Make linecharts, shwing the change in population exposure (with previously selected measure) in time for selected countries.

Start with selecting countries.

In [177]:
country_choice = ['DEU', 'ESP', 'USA', 'CHN', 'PAK']
df_plot = df_plot[df_plot['ref_area'].isin(country_choice)]

Make a linechart by creating scatterplor and choosing trace.mode = 'lines'.

In [178]:
fig = px.scatter(
    data_frame=df_plot,
    x='time_period',
    y='rolling_avg',
    color = 'country',
    hover_name='country',
    trendline = 'ols'
)

for trace in fig.data:
    trace.mode = 'lines'   # make it a linechart
    trace.line.width = 6   # line thickness

# trend lines dashed and thinner
for i in [1,3,5,7,9]:
    fig.data[i].line.dash='dash'
    fig.data[i].line.width=4

# customize the layout
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Exposure (%)',
    legend_title='',
    #showlegend=False,
    title_font=dict(size=24),
    font=dict(family='Arial', size=24),
    plot_bgcolor='#FFFAE5',
    paper_bgcolor='#FFFAE5'
)


# customize grid appearance
fig.update_xaxes(
    showgrid=True,
    gridcolor='#d0d0d0',  # set grid color to black
    gridwidth=1,      # set grid line width
    showticksuffix='all',  # show ticks on all the grid lines
    linecolor='black'
)

fig.update_yaxes(
    showgrid=True,
    gridcolor='#d0d0d0',  # set grid color to black
    gridwidth=1,      # set grid line width
    showticksuffix='all',  # show ticks on all the grid lines
    linecolor='black',
    range=[0,100.3]   # range of y axis
)

# save as an image and show
pio.write_image(fig, '../vizzes/lineplot_countries.svg', width=1000, height=600)
pio.write_image(fig, '../vizzes/lineplot_countries.png', width=1000, height=600)
fig.show()

## Explore different durations for a selected country (Pakistan) <a id='lines_pak'></a>

Load the data frame.

In [179]:
df_all_durations = pd.read_csv('../exported_dfs/exposures_all_durations.csv')

Keep only the rows with population exposures.

In [180]:
df_all_durations = df_all_durations.loc[~(df_all_durations['measure'].str.contains('TEMP') |\
                       df_all_durations['measure'].str.contains('UTCI_POP_IND'))]
df_all_durations.reset_index(drop=True, inplace=True)

If we want to display all durations separately, uncomment the following cell.

In [5]:
# IF WE WANT TO DISPLAY ALL DURATIONS SEPARATELY

# window_size = 5

# df_all_durations['rolling_avg'] = df_all_durations.groupby(['ref_area','measure','duration'])['obs_value'].\
# rolling(window=window_size, center=True)\
# .mean().reset_index(level=[0,1,2], drop=True)
# df_all_durations = df_all_durations[df_all_durations['time_period'].between(1981,2019)]
# df_all_durations['country_name'] = df_all_durations['ref_area'].map(country_alpha3)

# df_all_durations

Otherwise, let us sum up all the durations between 0 and 8 weeks. In the case of Pakistan, their contribution is small and makes the plot hard to read.

First, we create a column 'group' to distinguish between rows with duration over 8 week and shorter durations.

In [181]:
df_all_durations['group']=''
df_all_durations.loc[df_all_durations['duration'] == 'W_GT_8','group']='W_GT_8'
df_all_durations.loc[df_all_durations['duration'] != 'W_GT_8','group']='shorter'

Now sum up the shorter durations.

In [182]:
df_all_durations['group_sum'] = df_all_durations.groupby(['ref_area','measure', 'time_period', 'group']).\
transform('sum', numeric_only=True)

This process creates some duplicate rows which are only different in the 'duration' and 'obs_value' columns. This however have no more meaning, the values we want to investigate are in the 'group_sum' column. To drop these, we first need to drop the 'duration' column.

In [183]:
df_all_durations = df_all_durations.drop(['duration', 'obs_value'], axis='columns')

Now drop the duplicate rows.

In [184]:
df_all_durations = df_all_durations.drop_duplicates()

**Choose the measure to explore on the country level**

In [185]:
measure = 'HD_TN_POP_IND'

df_choice = df_all_durations[df_all_durations['measure']==measure]

**CHOOSE COUNTRY**

In [186]:
country_choice = ['PAK']
df_plot = df_choice[df_choice['ref_area'].isin(country_choice)]
df_plot.reset_index(drop=True, inplace=True)

Calculate the rolling average of 'group_sum' to plot.

In [187]:
window_size = 5

df_plot['rolling_avg'] = df_plot.groupby(['ref_area', 'measure', 'group'])['group_sum']\
.rolling(window_size, center=True).mean().reset_index(level=[0, 1, 2], drop=True)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Calculation of the rolling average creates some NaN values in the beginning and end, let us drop those lines.

In [188]:
df_plot = df_plot[df_plot['rolling_avg']>=0]
df_plot.reset_index(drop=True, inplace=True)
df_plot = df_plot[['time_period', 'group', 'rolling_avg']]

Some part of the population is likely not exposed to extreme heat. Calculate this as the addition to 100 % and add to the plot.

We will create a new group 'zero' and concatenate these lines to the data frame. First, we need to know th elength of the current data frame without the new group.

In [189]:
original_length = len(df_plot)

Concatenate new rows.

In [190]:
for year in range(1981,2019):
    new_record = pd.DataFrame([{'time_period':year, 'group':'zero', 'rolling_avg':0}])
    df_plot = pd.concat([df_plot, new_record], ignore_index=True)

Calculate the values (addition to 100 %).

In [191]:
for i in range (0,original_length):
    df_plot.loc[original_length+i,'rolling_avg'] =\
    100 - df_plot.loc[i,'rolling_avg'] - df_plot.loc[original_length/2+i,'rolling_avg']

Make a scatter plot for the selected country and different durations of heat exposure.

In [192]:
fig = px.scatter(
    data_frame=df_plot,
    x='time_period',
    y='rolling_avg',
    symbol='group',
    hover_name='group', # column to add to hover information
    trendline = 'ols'
)

# modify symbol color to black
fig.update_traces(marker=dict(color='black', size=15))

# trend lines dashed and thinner
for i in [1,3]:
    fig.data[i].line.dash='dash'
    fig.data[i].line.width=4

# customize the layout
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Exposure (%)',
    legend_title='',
    #showlegend=False,
    title_font=dict(size=24),
    font=dict(family='Arial', size=24),
    plot_bgcolor='#FFFAE5',
    paper_bgcolor='#FFFAE5'
)

# customize grid appearance
fig.update_xaxes(
    showgrid=True,
    gridcolor='#d0d0d0',  # set grid color to black
    gridwidth=1,      # set grid line width
    showticksuffix='all',  # Show ticks on all the grid lines
    linecolor='black',
    range=[1980.1,2019.9]
)

fig.update_yaxes(
    showgrid=True,
    gridcolor='#d0d0d0',  # Set grid color to black
    gridwidth=1,      # set grid line width
    showticksuffix='all',  # Show ticks on all the grid lines
    linecolor='black',
    range=[0,90.3]
)

pio.write_image(fig, '../vizzes/plot_pakistan.svg', width=1000, height=600)
pio.write_image(fig, '../vizzes/plot_pakistan.png', width=1000, height=600)
fig.show()

## Correlation between exposure and time <a id='corr_exp_time'></a>

Make a map, showing the correlation between population exposure and time for individual countries.

Load the relevant data frame.

In [193]:
df_for_corr = pd.read_csv('../exported_dfs/correlations.csv')

Make the map.

In [194]:
# styling constants
title_font_size = 24
annotation_font_size = 80
caption_bgcolor = '#FFFAE5'
caption_opacity = 1.0

custom_color_scale = ["#89CDFF", '#ffffff', '#e06666']   # define custom colorscale

fig = px.choropleth(data_frame=df_for_corr, 
                    locations='countries', 
                    projection='natural earth',
                    color='corr_coeff', 
                    locationmode='ISO-3',
                    height=600,
                    color_continuous_scale=custom_color_scale,
                    range_color=[-1, 1],
#                     title="Corr coefficients for countries",
                    labels={'corr_coeffs': ''}
                   )

fig.update_layout(
    margin=dict(l=10, r=10, t=10, b=10),
    paper_bgcolor='#FFFAE5',
    geo=dict(bgcolor='#FFFAE5')
    #coloraxis_showscale=False
)

fig.write_image('../vizzes/map_corr.svg', width=2400, height=1200)
fig.write_image('../vizzes/map_corr.png', width=2400, height=1200)
fig.show()

## Exposure change for the whole world <a id='exp_world'></a>

How did the worldwide population exposure change over time?

Start with loading the relevant dataset.

In [195]:
df_world_exp = pd.read_csv('../exported_dfs/world_exp.csv')

In [196]:
fig = px.scatter( # colorful areas (e.g. countries)
    data_frame=df_world_exp,
    x='year',
    y='exposure',
    trendline = 'ols'
)

# Customize the layout
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Exposure (%)',
    title_font=dict(size=24),
    font=dict(family='Arial', size=24),
    plot_bgcolor='#FFFAE5',
    paper_bgcolor='#FFFAE5'
    #showlegend=False
)

# Modify symbols
fig.update_traces(marker=dict(color='black', size=15))

# trend line
fig.data[1].line.dash='dash'
fig.data[1].line.width=4
fig.data[1].line.color='black'

# Customize grid appearance
fig.update_xaxes(
    showgrid=True,
    gridcolor='#d0d0d0',  # Set grid color to black
    gridwidth=1,      # Set grid line width
    showticksuffix='all',  # Show ticks on all the grid lines
    linecolor='black'
)

y_range = [0, 80]  # Set your desired y range here
fig.update_yaxes(
    showgrid=True,
    gridcolor='#d0d0d0',  # Set grid color to black
    gridwidth=1,      # Set grid line width
    showticksuffix='all',  # Show ticks on all the grid lines
    linecolor='black',
    range=y_range
)

pio.write_image(fig, '../vizzes/lineplot_world.png', width=1200, height=600)
pio.write_image(fig, '../vizzes/lineplot_world.png', width=1200, height=600)
fig.show()

## Temperature vs exposure worldwide <a id='temp_exp'></a>

We will merge the data on worldwide population exposure and temperature anomalies to check for correlation and to visualize any trends.

First, load the relevant files.

In [197]:
df_temp_anomaly = pd.read_csv('../exported_dfs/temp_anomaly_clean.csv')
df_world_exp = pd.read_csv('../exported_dfs/world_exp.csv')

Merge temperature anomaly and worldwide population exposure data frames.

In [198]:
df_temp_anomaly_world_exp = pd.merge(df_temp_anomaly, df_world_exp)

Make the scatter plot. Visualize time with color.

In [199]:
custom_color_scale = ['#0000FF', '#ff9c00']

fig = px.scatter(df_temp_anomaly_world_exp,
                    x='avg_anomaly', y='exposure',
                    color='year',
                    color_continuous_scale=custom_color_scale
                )

# customize the layout
fig.update_layout(
    xaxis_title='Yearly temperature anomaly (°C)',
    yaxis_title='Exposure (%)',
    title_font=dict(size=24),
    font=dict(family='Arial', size=24),
    plot_bgcolor='#FFFAE5',
    paper_bgcolor='#FFFAE5',
    showlegend=False
)

# modify symbols
fig.update_traces(marker=dict(size=15))

# customize grid appearance
fig.update_xaxes(
    showgrid=False,
    gridcolor='#d0d0d0',  # set grid color to black
    gridwidth=1,      # set grid line width
    showticksuffix='all',  # show ticks on all the grid lines
    linecolor='black'
)

fig.update_yaxes(
    showgrid=False,
    gridcolor='#d0d0d0',  # set grid color to black
    gridwidth=1,      # set grid line width
    showticksuffix='all',  # show ticks on all the grid lines
    linecolor='black'
)

# update the color bar settings
fig.update_coloraxes(colorbar_title='Year')

pio.write_image(fig, '../vizzes/exp_vs_temp_anomaly.svg', width=1200, height=600)
pio.write_image(fig, '../vizzes/exp_vs_temp_anomaly.png', width=1200, height=600)
fig.show()

## The effects in rich and poor countries (GDP per capita) <a id='rich_poor'></a>

Let us visualize the clusters prepared in the last part of the analysis.

In [200]:
df_clusters = pd.read_csv('../exported_dfs/clusters.csv')
df_clusters['cluster'] = df_clusters['cluster'].astype(str) # labels as string, useful for visualizations

Create a dictionary to map clusters to colors we will be using in plots.

In [201]:
cluster_colors_2 = {
    '0': 'rgba(242, 142, 43, 0.8)',
    '1': 'rgba(89, 161, 79, 0.8)'
}

## Uncomment if working with 4 clusters instead of 2.
# cluster_colors_4 = {
#     '0': 'rgba(237, 201, 72, 0.8)',
#     '1': 'rgba(225, 87, 89, 0.8)',
#     '2': 'rgba(78, 121, 167, 0.8)',
#     '3': 'rgba(118, 183, 178, 0.8)'
# }

In [202]:
fig_new = px.choropleth(data_frame=df_clusters, 
                    locations='ref_area', 
                    projection='natural earth',
                    color='cluster', 
                    locationmode='ISO-3',
                    height=600,
                    color_discrete_map=cluster_colors_2
                    # color_discrete_map=cluster_colors_4   # uncomment if working with 4 clusters instead of 2
                       )

fig_new.update_layout(
    #showlegend=False,
    paper_bgcolor='#FFFAE5',
    geo=dict(bgcolor='#FFFAE5')
    )

fig_new.write_image('../vizzes/clusters_2.svg', width=2400, height=1200)
fig_new.write_image('../vizzes/clusters_2.png', width=2400, height=1200)
fig_new.show()

Visualize the countries in a scatterplot featuring GDP per capita and population exposure.

In [203]:
fig = px.scatter(df_clusters,
                    x='gdp', y='exposure',
                    hover_name='ref_area',
                    color='cluster',
                    color_discrete_map={'0':'#999999', '1':'#999999'})

# customize the layout
fig.update_layout(
    xaxis_title='GDP per capita',
    yaxis_title='Exposure (%)',
    title_font=dict(size=24),
    font=dict(family='Arial', size=24),
    plot_bgcolor='#FFFAE5',
    paper_bgcolor='#FFFAE5',
    showlegend=False
)

# modify symbols
fig.update_traces(marker=dict(size=15))

# customize grid appearance
fig.update_xaxes(
    showgrid=False,
    gridcolor='#d0d0d0',  # set grid color to black
    gridwidth=1,      # set grid line width
    showticksuffix='all',  # show ticks on all the grid lines
    linecolor='black',
    range=[-100,90000]
)

fig.update_yaxes(
    showgrid=False,
    gridcolor='#d0d0d0',  # set grid color to black
    gridwidth=1,      # set grid line width
    showticksuffix='all',  # show ticks on all the grid lines
    linecolor='black',
    range=[-1,105]
)

pio.write_image(fig, '../vizzes/injustice_grey.svg', width=1200, height=600)
pio.write_image(fig, '../vizzes/injustice_grey.png', width=1200, height=600)
fig.show()

Visualize the two clusters in a scatterplot featuring GDP per capita and population exposure.

In [204]:
fig = px.scatter(df_clusters,
                    x='gdp', y='exposure',
                    hover_name='ref_area',
                    color='cluster',
                    color_discrete_map=cluster_colors_2)

# customize the layout
fig.update_layout(
    xaxis_title='GDP per capita',
    yaxis_title='Exposure (%)',
    title_font=dict(size=24),
    font=dict(family='Arial', size=24),
    plot_bgcolor='#FFFAE5',
    paper_bgcolor='#FFFAE5'
    #showlegend=False
)

# modify symbols
fig.update_traces(marker=dict(size=15))

# customize grid appearance
fig.update_xaxes(
    showgrid=False,
    gridcolor='#d0d0d0',  # set grid color to black
    gridwidth=1,      # set grid line width
    showticksuffix='all',  # show ticks on all the grid lines
    linecolor='black',
    range=[-100,90000]
)

fig.update_yaxes(
    showgrid=False,
    gridcolor='#d0d0d0',  # set grid color to black
    gridwidth=1,      # set grid line width
    showticksuffix='all',  # show ticks on all the grid lines
    linecolor='black',
    range=[-1,105]
)

pio.write_image(fig, '../vizzes/injustice.svg', width=1200, height=600)
pio.write_image(fig, '../vizzes/injustice.png', width=1200, height=600)
fig.show()