<h1 style="font-family:verdana;"> <center>Custom Plotly Charts to Visualise <br>Changes in WHO Indicators from 1999-2019</center> </h1>

<div style="font-size:15px; font-family:verdana">
    
This dataset contains data on a number of indicators from the World Health Organisation. Thanks to <a href="https://www.kaggle.com/utkarshxy">Zeus</a> for making the information available in this form. A full description of the dataset contents is available <a href="https://www.kaggle.com/utkarshxy/who-worldhealth-statistics-2020-complete">here</a>. <br><br>

The aim of this notebook is to give an example of what can be done with Plotly if you're happy to play around with the underlying structure. In this case we create a chart that can be used to visualise time series data for two variables. In doing so we'll reveal some trends in world health.<br><br>

All code cells and explanations of what we're doing with Plotly are currently visible -
    
<p style="font-size:18px">To streamline the notebook by hiding all code cells and markdown cells explaining code, click the button below! </p><br> 

Please spare an upvote if you like or learn something useful from this kernel ðŸ˜„
</div>

In [None]:
from IPython.display import HTML
htmlCodeHide="""
<style>
    .button {
        background-color: #008CBA;;
        border: none;
        color: white;
        padding: 8px 22px;
        text-align: center;
        text-decoration: none;
        display: inline-block;
        font-size: 16px;
        margin: 4px 2px;
        cursor: pointer;
    }
</style>
<script>
    function toggleInput() {
        var i;
        var textCellsToHide = [1,2,3,4,5,6,11];
        var outputCellsToHide = [1];
        for (i = 0; i < document.getElementsByClassName("input").length; i++) {
            var divTag = document.getElementsByClassName("input")[i]
            var displaySetting = divTag.style.display;
            if (displaySetting == 'none') { 
                divTag.style.display = '';
            }
            else { 
                divTag.style.display = 'none';
            }
        }
        for (i of textCellsToHide) {
            var divTag = document.getElementsByClassName("text_cell")[i]
            var displaySetting = divTag.style.display;
            if (displaySetting == 'none') { 
                divTag.style.display = '';
            }
            else { 
                divTag.style.display = 'none';
            }
        }
        for (i of outputCellsToHide) {
            var divTag = document.getElementsByClassName("output")[i]
            var displaySetting = divTag.style.display;
            if (displaySetting == 'none') { 
                divTag.style.display = '';
            }
            else { 
                divTag.style.display = 'none';
            }
        }
    }  
</script>
<button onclick="javascript:toggleInput(0)" class="button">Show/Hide Explanatory Cell Visibility</button>
"""
HTML(htmlCodeHide)

<h1 style="font-family:verdana;"> <center>Setup</center> </h1>


In [None]:
#General
import os
import numpy as np 
import pandas as pd
pd.set_option('display.max_rows', 200)
pd.set_option('display.max_columns', 150)
pd.set_option('display.max_colwidth', 200)
from itertools import product
from scipy.optimize import curve_fit
import warnings
warnings.filterwarnings("ignore")

import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
import plotly.io as pio

<div style="font-size:15px; font-family:verdana">
    
We will be using the plotly_dark template throughout this notebook. <br><br>

If you are not on Windows, the country graphs will be displaying flags to identify each datapoint. Windows 10 does not support country flag emojis, so on Windows these characters will display as alpha-2 country codes. We want a colour palette that white text is more legible against than the plotly_dark default, so we define our own. <br><br>

Plotly figures and templates are trees of attributes. If you want to change a behaviour and even googling isn't yielding a solution, you can look through this structure for an attribute which might control this behaviour. <br><br>

A portion of the figure structure is shown below.
</div>

In [None]:
def short_print(obj):
    print(str(obj)[:1500])

short_print(pio.templates['plotly_dark'])

<div style="font-size:15px; font-family:verdana">
    
We replace the appropriate attribute with our own colour scheme.
    
</div>

In [None]:
pio.templates['plotly_dark'].layout.colorway = ('#0840D9','#882775','#00A34C','#B86200','#AA250E','#1D8682')

<div style="font-size:15px; font-family:verdana">

When text is being used to mark points, 'Aa' covers the coloured circles in the legend. This behaviour seems unavoidable without removing text but the piece of code below is a workaround that will hide this text.
    
</div>

In [None]:
HTML("""
<style>
g.pointtext {display: none;}
</style>
""")

<div style="font-size:15px; font-family:verdana">
    
Each indicator variable is stored in a separate csv, we combine these csv's into one dataframe. There will be a lot of null values, but it will be easier to slice and query this dataframe than to work with so many separate objects.
    
</div>

In [None]:
def unify_data(interpolate=True):
    regions = pd.read_csv('../input/who-countryregion-key/countryInfo.csv')
    regions = regions.replace({'region':{
        'Eastern Mediterranean':'East Mediterranean',
        'Western Pacific':'West Pacific',
        'South-East Asia':'SE Asia'
    }})
    count = 0
    for dirname, _, filenames in os.walk('../input/who-worldhealth-statistics-2020-complete'):
        for filename in filenames:
            if 'region' not in filename and 'of' not in filename: 
                new = pd.read_csv(f'../input/who-worldhealth-statistics-2020-complete/{filename}')
                pivot_cols = ['Indicator']
                if 'Dim2' in new.columns:
                    pivot_cols += ['Dim1','Dim2']
                elif 'Dim1' in new.columns:
                    new = new.query('Dim1 in ("Both sexes","Total")')
                    del new['Dim1']
                new.Period = new.Period.astype('str')
                new['Period'] = [x[-4:] for x in new['Period']]
                new['Period'] = new['Period'].astype('int32')
                new = new.pivot(index=['Location','Period'], columns=pivot_cols, values='First Tooltip').reset_index()
                for col in new.columns:
                    if col not in ['Location','Period',('Location', '', ''),('Period', '', '')] and new[col].dtype=='object':
                        new[col] = new[col].replace({'No data':np.NaN})
                        new[col] = new[col].str.extract(r'^<?([\d\.]*)')
                        new[col] = new[col].astype('float64')
                if count == 0:
                    df = new
                else:
                    df = pd.merge(df,new,on=['Location','Period'], how='outer')
                count += 1
    df = df.rename(columns={
        'Location':'country',
        'Period':'year'
    })
    #take the last year of periods for ease of use
    df = df.replace({'country':{'Sudan (until 2011)':'Sudan'}})
    df = df.join(regions.set_index('country'), on='country')
    df = df.sort_values(['region','country','year'])
    df.loc[df['Nursing and midwifery personnel (per 10,000)']>1000,'Nursing and midwifery personnel (per 10,000)'] = np.NaN
    df.loc[(df['country']=='Haiti') & (df['year']==2010), ['Healthy life expectancy (HALE) at birth (years)','Life expectancy at birth (years)']] = np.NaN
    if interpolate:
        df = df.groupby('country').apply(pd.DataFrame.interpolate)
        df = df.groupby('country').apply(pd.DataFrame.bfill)
    df['hale_percent'] = df['Healthy life expectancy (HALE) at birth (years)']/df['Life expectancy at birth (years)']*100
    df = df[df['year']>=1999]
    df[['Medical doctors (per 10,000)','Nursing and midwifery personnel (per 10,000)']] *= 10
    df[['Neonatal mortality rate (per 1000 live births)','Under-five mortality rate (probability of dying by age 5 per 1000 live births)']] *= 100
    df = pd.merge(df, pd.read_csv('../input/who-countryregion-key/population.csv'), on=['country','year'], how='left')
    df = df[~df['region'].isna()]
    return df

complete_df = unify_data()

In [None]:
def country_graphing_data(df, col_dict, years):
    """
    input df with all indicators
    returns df's needed to create movement graphs with desired variables
    """
    relevant_indicators = list(col_dict.keys())
    new_cols = list(col_dict.values())
    df = complete_df[['country','year','region','flag','population']+relevant_indicators]
    df = df.rename(columns=col_dict)
    df = df.dropna()
    df = df[df['year'].isin(years)]
    countries_with_all_data = df.groupby('country').agg({'year':'nunique'}).query(f'year=={len(years)}').reset_index().country.unique()
    df = df[df['country'].isin(countries_with_all_data)]
    df_vectors = (
        df.pivot(
            index=['country','region'], 
            columns='year', 
            values=new_cols + ['population'])
        .sort_values(['region','country'])
        .reset_index()
    )
    return df, df_vectors

In [None]:
def group_data_by_region(df, x, y):
    """ 
    groups country data into regional data using population weighted means
    returns the dataframes needed by func movement_graphs
    """
    df = pd.merge(
    df,
    df.groupby(['region','year']).agg({'population':'sum'}).rename(columns={'population':'region_population'}).reset_index(),
    on=['region','year'],
    how='left'
    )
    df['population_prop'] = df['population']/df['region_population']
    df['drinking_water'] = df[x]*df['population_prop']
    df['sanitation'] = df[y]*df['population_prop']
    df = df.groupby(['region','year']).agg({x:'sum',y:'sum','population_prop':'sum'}).reset_index()

    df_vectors = (
        df.pivot(
            index=['region'], 
            columns='year', 
            values=[x,y])
        .sort_values(['region'])
        .reset_index()
    )
    return df, df_vectors

<div style="font-size:15px; font-family:verdana">

We will be making multiple graphs of the same type, so we define a function that will save time in the future. <br><br>

Most attributes have self-explanatory names. <br><br>

fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] sets the time between movements when the play button is pressed. If anyone knows of a way to set the wait time before the initial movement to a separate value please let me know! 
    
</div>

In [None]:
def movement_graph(df,df2,x,y,years,group,labels,title):
    """
    Outputs an animated scatterplot with paths indicating how
    each datapoint has moved over time.
    
    df: a dataframe formatted to make a regular scatterplot
    df2: a dataframe movement paths will be generated from
    x: x-axis variable
    y: y-axis variable
    years: a list containing the years forming the paths
    group: how the data is grouped, by country or region
    labels: axis labels
    title: plot title
    """
    num_periods = len(years)
    #generate figure with scatter plot points
    x_true_range = df[x].max()-df[x].min()
    y_true_range = df[y].max()-df[y].min()
    x_display_range = [max(0,df[x].min()-x_true_range/20),df[x].max()+x_true_range/20]
    y_display_range = [max(0,df[y].min()-y_true_range/20),df[y].max()+y_true_range/20]
    if group == 'country':
        text = 'flag'
        opacity = 0.15
    else:
        text = 'region'
        opacity = 0.6    
    region_rgba = {hale_region_vectors['region'][i]:[f'rgba{tuple(int(h[i:i+2], 16) for i in (1, 3, 5))+(opacity,)}' for h in pio.templates['plotly_dark'].layout.colorway][i] for i in range(6)}
    fig = px.scatter(
        df, x=x, y=y,
        animation_frame='year',
        text=text, color='region', hover_name=group,
        height=700, template='plotly_dark',
        range_x=x_display_range, range_y=y_display_range,
        labels=labels, title=title
    )
    for i in range(len(df2)):
        for j in range(num_periods-1):
            fig.add_trace(go.Scatter(
                x=[df2[(x,years[j])][i],df2[(x,years[j+1])][i]],
                y=[df2[(y,years[j])][i],df2[(y,years[j+1])][i]],
                mode='lines', legendgroup=df2['region'][i],showlegend=False, hoverinfo='skip',
                line={
                    "color":region_rgba[df2['region'][i]],
                    'width':(j/(num_periods-1)+0.2)*15              
                }
            ))
    
    #update figure attributes
    fig.update_traces(marker={'size':20}, textfont_size=15)
    fig.update_layout(legend={
        'orientation':"h",
        'yanchor':'bottom','y':1.02,
        'xanchor':'left', 'x':0.01,
        },
        margin={'l':40,'r':20}                 
    )
    #draw the scatter points on top
    #the final line alone would accomplish this if there were on animations
    #the loop is needed to prevent messing up animation frames after reordering data
    for i in range(num_periods):
        fig['frames'][i]['data'] = fig['data'][6:] + fig['frames'][i]['data']
    fig['data'] = fig['data'][6:] + fig['data'][:6]
    fig.update_xaxes(showline=True,linewidth=2,linecolor='white',hoverformat='.2f')
    fig.update_yaxes(showline=True,linewidth=2,linecolor='white',hoverformat='.2f')
    #attributes controlling the play button
    fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 1500
    fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 2200
    fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["easing"] = 'cubic-in-out'
    fig.layout.updatemenus[0].pad.r = 20
    fig.layout.updatemenus[0].pad.t = 50
    #attributes controlling the slider
    fig.layout.sliders[0].pad.t = 30
    fig.layout.sliders[0].pad.r = 35
    #display figure without the floating menu
    fig.show(config={'displayModeBar':False})

# <h1 style="font-family:verdana;"> <center>Life Expectancy</center> </h1>

<div style="font-size:15px; font-family:verdana">
    
Life expectancy at birth for a given year is the number of years a person born in that year is expected to live. <br><br>

Health-adjusted life expectancy, HALE, is a measure that accounts for quality of life. The value of years are weighted based on the expected distribution of health states they will be lived in. A year in full health is worth 1 year, a year in a state of less than full health is worth less than 1 year. The weighting given to a health state is based on how severely it limits one's ability to perform their usual activities. <br><br>

The following graphs show both HALE and life expectancy. Life expectancy is on the x-axis and HALE/life expectancy, % of healthy years in life expectancy, is on the y-axis.
</div>

<div style="font-size:15px; font-family:verdana">

Life expectancies universally increased between 2000 and 2019. Africa saw the largest increases but this could be part of the general pattern that a lower life expectancy in 2000 was associated with a larger increase in life expectancy between 2000 and 2019. <br><br>

Moving vertically downwards on the graph means that a lower proportion of life will be lived at full health. Between 2000 and 2010 movement was primarily horizontal, but between 2010 and 2019 a distinct downward tilt is introduced to the majority of the paths. This downward movement is particularly pronounced at higher life expectancies.
    
</div>

In [None]:
col_dict = {
    'hale_percent':'hale_percent',
    'Life expectancy at birth (years)':'life_exp',
    'Healthy life expectancy (HALE) at birth (years)':'hale'
}
years = [2000,2010,2019]
hale, hale_vectors = country_graphing_data(complete_df,col_dict,years)

In [None]:
#dataframe for life expectancy by region scatterplot
hale_region = (pd
    .read_csv('../input/who-worldhealth-statistics-2020-complete/HALeWHOregionLifeExpectancyAtBirth.csv')
    .query('Dim1=="Both sexes" and Period in (2000,2010,2019)')
    .rename(columns={
        'Location':'region',
        'Hale Expectency':'hale',
        'Life expectany':'life_exp',
        'Unnamed: 6':'hale_percent',
        'Period':'year'
    })
    .replace({'region':{
        'Eastern Mediterranean':'East Mediterranean',
        'Western Pacific':'West Pacific',
        'South-East Asia':'SE Asia'
    }})
    [['region','year','hale','life_exp','hale_percent']]
    .sort_values(['region','year'])
)
#dateframe for life expectancy by region movement paths
hale_region_vectors = (
    hale_region.pivot(
        index=['region'], 
        columns='year', 
        values=['life_exp','hale_percent','hale'])
    .reset_index()
    .sort_values(['region'])
)
#map regions to colours
region_colors = {hale_region_vectors['region'][i]:pio.templates['plotly_dark'].layout.colorway[i] for i in range(6)}

In [None]:
labels= {
    'life_exp': 'Life Expectancy (years)',
    'hale_percent': '% of healthy years in life expectancy',
    'region': 'Region',
    'year': 'Year'
}
title = 'National changes in life expectancy and % of healthy years'
movement_graph(
    hale, hale_vectors, 'life_exp', 'hale_percent',
    years, 'country', labels, title
)

<div style="font-size:15px; font-family:verdana">

Using regional data makes the general movement patterns more clear.
    
</div>

In [None]:
labels= {
    'life_exp': 'Life Expectancy (years)',
    'hale_percent': '% of healthy years in life expectancy',
    'region': 'Region',
    'year': 'Year'
}
title = 'Regional changes in life expectancy and % of healthy years'
years = [2000,2010,2019]
movement_graph(
    hale_region,hale_region_vectors,'life_exp',
    'hale_percent', years,
    'region', labels, title
)

In [None]:
hale_changes = hale_vectors[[
    ('country',''),('region',''),
    ('life_exp',2000),('life_exp',2010),('life_exp',2019),
    ('hale',2000),('hale',2010),('hale',2019)
]]
hale_changes.columns = [
    'country','region', 
    'le_2000','le_2010','le_2019',
    'hale_2000','hale_2010','hale_2019']
hale_changes['le_change'] = hale_changes['le_2019'] - hale_changes['le_2000']
hale_changes['hale_change'] = hale_changes['hale_2019'] - hale_changes['hale_2000']

In [None]:
#fit an exponential curve to the data
def exp_func(x, a, b, c, d):
    return a * np.exp(-b * (x - c)) + d

#add the exp_func line of best fit to the graph
def add_exp_func_line(x, y, name, new_axis=False, init_vals=[30, 0, 100, 1]):
    y = y.replace([np.inf,-np.inf], np.nan).dropna()
    x = x.iloc[y.index]
    popt, pcov = curve_fit(exp_func, x, y, p0=init_vals)
    y_pred = exp_func(x, *popt)
    y_plot = exp_func(x_plot, *popt)    
    fig.add_trace(go.Scatter(x=x_plot, y=y_plot,line_shape='spline',name=name), secondary_y=new_axis)

<div style="font-size:15px; font-family:verdana">

Let's look at the trends we've observed based on the life expectancy countries had in 2000.

<p><div style="color:#073AC5">&#9632;</div>Life expectancy in 2000 had little relation to healthy year % in 2000</p>

<p><div style="color:#AE3295">&#9632;</div>As we thought when looking at the graph showing country movement, a higher starting life expectancy is associated with a lower ratio of HALE change to life expectancy change. This indicates that later years added to life expectancy are more likely to be lived in a more limiting state of health. The overall '% years healthy' value will be decreasing in the life expectancy range where the green line lies beneath the blue line.</p>

<p><div style="color:#00A34C">&#9632;</div>Countries that started with lower life expectancies saw substantially greater increases in life expectancy.</p>

</div>

In [None]:
fig = make_subplots(specs=[[{"secondary_y": True}]])

x = hale_changes['le_2000']
x_plot = np.linspace(hale_changes['le_2000'].min(),hale_changes['le_2000'].max(),100)

add_exp_func_line(x,hale_changes['hale_2000']/hale_changes['le_2000'],'% of years healthy in 2000')
add_exp_func_line(x,hale_changes['hale_change']/hale_changes['le_change'],'Change in HALE/Change in Life Exp')
add_exp_func_line(x,hale_changes['le_change'],'Change in Life Exp', True)

fig.update_layout(
    template='plotly_dark',
    title='Trendlines based on Life Expectancy and HALE Values from 2000 to 2019',
    legend={
        'orientation':"h",
        'yanchor':'bottom','y':1.02,
        'xanchor':'left', 'x':0.01,
        },
        margin={'l':40,'r':20}
)
fig.update_xaxes(showline=True,showgrid=False, linewidth=2, linecolor='white', title='Life Expectancy in 2000',hoverformat='.3g')
fig.update_yaxes(showline=True,showgrid=False, linewidth=2, linecolor='white', title='% of healthy years in life expectancy', secondary_y=False,hoverformat='.3g')
fig.update_yaxes(showline=True,showgrid=False, linewidth=2, linecolor='white',title='Life Expectancy Increase', secondary_y=True, hoverformat='.3g')
fig.show(config={'displayModeBar':False})

<div style="font-size:15px; font-family:verdana">

Note: Missing values in the segments that follow were imputed using linear interpolation combined with padding and backfilling to create the most complete graphs possible.

</div>

<h1 style="font-family:verdana;"> <center>Medical Personnel</center> </h1>

In [None]:
short_names = {
    'Medical doctors (per 10,000)':'doctors',
    'Nursing and midwifery personnel (per 10,000)':'nursing',
    'Births attended by skilled health personnel (%)':'birth_attended',
    'Maternal mortality ratio (per 100 000 live births)':'maternal_mort',
    'Neonatal mortality rate (per 1000 live births)':'neonatal_mort',
    'Under-five mortality rate (probability of dying by age 5 per 1000 live births)':'u5_mort',
}
years = [1999,2005,2011,2016]
personnel_birth_mort, personnel_birth_mort_vectors = country_graphing_data(complete_df, short_names, years)

<div style="font-size:15px; font-family:verdana">

Exponential relationships between medical personnel and early life/maternal mortality. Getting those first 15 doctors/50 nursing personnel per 10,000 is associated with hugely decreased rates of maternal and early life mortality. Greatly diminished returns beyond this point but still a positive relationship.
    
</div>

In [None]:
fig = px.scatter_matrix(
    personnel_birth_mort.query('year==2016'), 
    dimensions=personnel_birth_mort.columns[5:],
    template='plotly_dark')
fig.update_layout(height=800)
fig.show()

<div style="font-size:15px; font-family:verdana">

Back and forth movement for some countries but overall a clear trend towards moderately increasing numbers of doctors per capita and a substantial reduction in neonatal mortality.
    
</div>

In [None]:
#dataframe for country level doctors and neonatal mortality scatterplot
docs_mort = personnel_birth_mort[['country','region','year','flag','population','doctors','neonatal_mort']]
docs_mort['log_doctors'] = np.log(docs_mort['doctors'])
docs_mort['log_neonatal_mort'] = np.log(docs_mort['neonatal_mort'])
#dateframe for country level doctors and neonatal mortality movement paths
docs_mort_vectors = (
    docs_mort.pivot(
        index=['country','region'], 
        columns='year', 
        values=['doctors','log_doctors','neonatal_mort','log_neonatal_mort','population'])
    .sort_values(['region','country'])
    .reset_index()
)

labels= {
    'doctors': 'Doctors per 10,000 Persons',
    'neonatal_mort': 'Neonatal Mortality per 100,000 Live Births',
    'region': 'Region',
    'year': 'Year'
}
title = 'National changes in doctors per person and neonatal mortality rates'
movement_graph(
    docs_mort, docs_mort_vectors, 'doctors', 'neonatal_mort',
    [1999,2005,2011,2016], 'country', labels, title
)

<div style="font-size:15px; font-family:verdana">

To get a more spread out view here is the same graph using the log of both variables.
    
</div>

In [None]:
labels= {
    'log_doctors': 'log(Doctors per 100,000 Persons)',
    'log_neonatal_mort': 'log(Neonatal Mortality per 100,000 Live Births)',
    'region': 'Region',
    'year': 'Year'
}
title = 'National changes in doctors per person and neonatal mortality rates'
movement_graph(
    docs_mort, docs_mort_vectors, 'log_doctors', 'log_neonatal_mort',
    [1999,2005,2011,2016], 'country', labels, title
)

In [None]:
docs_mort_region, docs_mort_region_vectors = group_data_by_region(docs_mort, 'doctors', 'neonatal_mort')

labels= {
    'doctors': 'Doctors per 100,000 Persons',
    'neonatal_mort': 'Neonatal Mortality per 100,000 Live Births',
    'region': 'Region',
    'year': 'Year'
}
title = 'Regional changes in doctors per person and neonatal mortality rates'
movement_graph(
    docs_mort_region, docs_mort_region_vectors, 'doctors', 'neonatal_mort',
    years, 'region', labels, title
)

<h1 style="font-family:verdana;"> <center>Drinking Water and Sanitation Services</center> </h1>

<div style="font-size:15px; font-family:verdana">

Fairly chaotic movement by a lot of countries but with a general trend towards the top right that becomes very apparent in the region level chart.
    
</div>

In [None]:
col_dict = {
    'Population using at least basic drinking-water services (%)':'drinking_water',
    'Population using at least basic sanitation services (%)':'sanitation',
}
years = [2000,2006,2012,2017]
air_wash, air_wash_vectors = country_graphing_data(complete_df, col_dict, years)

labels= {
    'drinking_water': 'Population using at least basic drinking-water services (%)',
    'sanitation': 'Population using at least basic sanitation services (%)',
    'region': 'Region',
    'year': 'Year'
}
title = 'National changes in percentage of people with access to basic drinking water and sanitation services'
movement_graph(
    air_wash, air_wash_vectors, 'drinking_water', 'sanitation',
    years, 'country', labels, title
)

In [None]:
air_wash_region, air_wash_region_vectors = group_data_by_region(air_wash, 'drinking_water', 'sanitation')

labels= {
    'drinking_water': 'Population using at least basic drinking-water services (%)',
    'sanitation': 'Population using at least basic sanitation services (%)',
    'region': 'Region',
    'year': 'Year'
}
title = 'Regional changes in percentage of people with access to basic drinking water and sanitation services'
movement_graph(
    air_wash_region, air_wash_region_vectors, 'drinking_water', 'sanitation',
    years, 'region', labels, title
)

<h1 style="font-family:verdana;"> <center>Tobacco and Alcohol Consumption</center> </h1>

<div style="font-size:15px; font-family:verdana">

Alcohol usage has remained fairly constant aside from a notable decline among European countries. Tobacco usage has almost universally declined.    
</div>

In [None]:
col_dict = {
        'Total (recorded+unrecorded) alcohol per capita (15+) consumption':'alcohol',
        'Age-standardized prevalence of current tobacco smoking among persons aged 15 years and older':'tobacco',
    }
years = [2000,2006,2012,2017]
substances, substances_vectors = country_graphing_data(complete_df, col_dict, years)

labels= {
    'alcohol': 'Alcohol consumption per capita ages 15+',
    'tobacco': 'Tobacco use rate ages 15+ (age-standardised rate)',
    'region': 'Region',
    'year': 'Year'
}
title = 'National changes in alcohol and tobacco usage'
movement_graph(
    substances, substances_vectors, 'alcohol', 'tobacco',
    years, 'country', labels, title
)

In [None]:
substances_region, substances_region_vectors = group_data_by_region(substances, 'alcohol', 'tobacco')

labels= {
    'alcohol': 'Alcohol consumption per capita ages 15+',
    'tobacco': 'Tobacco use rate ages 15+ (age-standardised rate)',
    'region': 'Region',
    'year': 'Year'
}
title = 'Regional changes in alcohol and tobacco usage'
movement_graph(
    substances_region, substances_region_vectors, 'alcohol', 'tobacco',
    years, 'region', labels, title
)

In [None]:
complete_df

In [None]:
col_dict = {
        'Crude suicide rates (per 100 000 population)':'tb',
        'Probability (%) of dying between age 30 and exact age 70 from any of cardiovascular disease, cancer, diabetes, or chronic respiratory disease':'malaria',
    }
years = [2000,2005,2010,2015]
substances, substances_vectors = country_graphing_data(complete_df, col_dict, years)

labels= {
    'tb': 'tb consumption per capita ages 15+',
    'malaria': 'mal use rate ages 15+ (age-standardised rate)',
    'region': 'Region',
    'year': 'Year'
}
title = 'National changes in tb and mal usage'
movement_graph(
    substances, substances_vectors, 'malaria', 'tb',
    years, 'country', labels, title
)

In [None]:
substances_region, substances_region_vectors = group_data_by_region(substances, 'tb', 'malaria')

labels= {
    'tb': 'Alcohol consumption per capita ages 15+',
    'malaria': 'Tobacco use rate ages 15+ (age-standardised rate)',
    'region': 'Region',
    'year': 'Year'
}
title = 'Regional changes in alcohol and tobacco usage'
movement_graph(
    substances_region, substances_region_vectors, 'malaria', 'tb',
    years, 'region', labels, title
)