<br>
<h1 style = "font-size:30px; font-weight : bold; color : blue; text-align: center; border-radius: 10px 15px;"> Road Traffic Deaths by Country from 2000 to 2019 - Interactive Maps with Plotly </h1>
<br>

## Overview

This dataset contains the estimated number of road traffic deaths by country per year, from 2000 to 2019, according to the World Health Organization. In this short notebook, the number of deaths by country is presented with Choropleth Maps to highlight the differences among regions/continents.

The initial intent of this notebook was to help a friend to work with this dataset. Thus, the first section is dedicated to showing how to manipulate the dataset, preparing it for analysis. In the last section, interactive maps (using Plotly) are used to show the death rate by country (and by gender) in the year 2019 and its evolution from 2000 to 2019.

## <center> If you find this notebook useful, support with an upvote! <center>

## Importing Libraries and Dataset + Data Preprocessing

To plot the interactive maps in the last section, we will use the plotly.express package.

In [None]:
import pandas as pd 
import matplotlib as mat
import matplotlib.pyplot as plt    
import numpy as np
import seaborn as sns
%matplotlib inline

import plotly.express as px

In [None]:
df = pd.read_csv ('../input/world-health-organization-road-traffic-deaths/who_road_deaths.csv')

Let's take a first look at the dataset.

In [None]:
df

The first row doesn’t contain the type of data we’re expecting, but it does contain complementary information about the columns. We can deal with this by renaming the columns to include this information and then remove the first row.

In [None]:
new_col_names = ['Country', 'Year', 'Estimated road traffic deaths - All', 
                 'Estimated road traffic deaths - Male', 'Estimated road traffic deaths - Female',
                 'Estimated road traffic death rate - All', 'Estimated road traffic death rate - Male', 
                 'Estimated road traffic death rate - Female']

df.columns = new_col_names

df

Now, we can remove the first row.

In [None]:
df = df.drop(index=df.index[0], axis=0)
df

All columns in this dataset are type = object. With the exception of the column ‘Country’, we need to convert all the data to a numerical format. The column’ Year’ is easier to deal with, since we can directly apply the pd.to_numeric function.

In [None]:
df.info()

In [None]:
df['Year'] = pd.to_numeric(df['Year'], errors='coerce') 

Regarding all columns that contain the estimated number of deaths (or death rate), more steps are needed to transform them into a ‘workable’ format. The data is presented as ‘xxxx [yyyy-zzzz]’ where the first number (x) represents the estimated value, while the second (y) and third (z) numbers are (presumably) the confidence intervals. We can split each column into three, containing: 1) the estimated value; 2) the minimum value and; 3) the maximum value. The function below performs the required steps.

In [None]:
def split_function (df, col_name):

    '''
    Function to transform columns with values like '4339 [3751-4930]'
    into three separeted columns with numerical values (4339, 3751, 4930)
    and converting all columns from 'object' to numerical.

    Input: DataFrame, List of column names

    '''

    #Splitting first number from the rest
    for col in col_name:
        df[[col,col + ' (Min Value)']] = df[col].str.split(expand=True)

    #Creating the list of names with _Min. Ex: Male -> Male_Min
    col_name_min = [(x +' (Min Value)') for x in col_name]

    #Creating the list of names with _Max. Ex: Male -> Male_Max
    col_name_max = [(x +' (Max Value)') for x in col_name]

    #Stripping the numbers from inside the brackets
    for col in col_name_min:
        df[col] = df[col].str.strip('[]')

    #Splitting the remaining string into two numbers
    for i, col_min in enumerate (col_name_min):
        df[[col_min, col_name_max[i]]] = df[col_min].str.split('-',expand=True)

    #Creating a new list with columns names
    all_columns = [x for x in col_name]
    all_columns.extend (col_name_min)
    all_columns.extend (col_name_max)

    #Converting the columns from object to numeric
    for col in all_columns:
        df[col] = pd.to_numeric(df[col], errors='coerce')    

    return df

After defining our function, we specify the columns we want to transform and then call the function to do it.

In [None]:
columns_to_clean = ['Estimated road traffic deaths - All', 'Estimated road traffic deaths - Male',
                    'Estimated road traffic deaths - Female','Estimated road traffic death rate - All',
                    'Estimated road traffic death rate - Male', 'Estimated road traffic death rate - Female']

df = split_function(df, columns_to_clean)
df

In [None]:
df.info()

Now, all columns are in the desired format. The dataset is ready for the analysis stage.

Before moving on to the next section, let’s see some descriptive statistics about the data.

In [None]:
df.describe().T

One thing that draws attention is that the estimated number of deaths by men are noticeably higher than the number of deaths by women. The death rate (deaths per 100,000 inhabitants) by men is 110.6 while the death rate by women is 29.5.

Note: The color scale used in the maps will follow a range from 0 to the maximum number of the analyzed metric. It’s important to keep this in mind to avoid mistaken comparisons among different plots.

## Estimated Road Traffic Deaths by Country - Interactive Maps with Plotly

Let’s start by plotting the map with estimated road traffic deaths per country in 2019.

In [None]:
fig = px.choropleth(df[df['Year'] == 2019], locations="Country", locationmode='country names',
                    color="Estimated road traffic deaths - All",
                    hover_name="Country",
                    color_continuous_scale=px.colors.sequential.Jet,
                    labels= {'Estimated road traffic deaths - All': '# of deaths'},
                    title = 'Estimated road traffic deaths in 2019')

fig.update(layout=dict(title=dict(x=0.5)))

fig.show()

The total number of deaths doesn’t seem to be the most insightful metric, since more populated countries (mainly China and India) have a, expected, significantly higher number. From now on, we explore the number of deaths per 100,000 inhabitants (death rate). 

In [None]:
fig = px.choropleth(df[df['Year'] == 2019], locations="Country", locationmode='country names',
                    color="Estimated road traffic death rate - All",
                    hover_name="Country",
                    color_continuous_scale=px.colors.sequential.Jet,
                    labels= {'Estimated road traffic death rate - All': 'Death Rate'},
                    title = 'Estimated road traffic deaths per 100,000 inhabitants in 2019')

fig.update(layout=dict(title=dict(x=0.5)))

fig.show()

Observations:
* The Dominican Republic has the highest traffic death rate (64.6). 
* Venezuela also has a considerably high rate (39).
* Analyzing by continent, Africa has a clearly higher death rate than the rest, having several of its countries with a rate over 30 deaths per 100,000 inhabitants.
* Looking outside the African continent and the previously mentioned countries, Saudi Arabia, Thailand and Vietnam stand out for also having a death rate over 30. 

Let’s look into the rate by gender.

In [None]:
fig1 = px.choropleth(df[df['Year'] == 2019], locations="Country", locationmode='country names',
                    color="Estimated road traffic death rate - Male",
                    hover_name="Country",
                    color_continuous_scale=px.colors.sequential.Jet,
                    labels= {'Estimated road traffic death rate - Male': 'Death Rate'},
                    title = 'Road traffic death rate in 2019 - Men (deaths per 100,000 male inhabitants)')

fig1.update(layout=dict(title=dict(x=0.5)))                    
                   
fig1.show()


fig2 = px.choropleth(df[df['Year'] == 2019], locations="Country", locationmode='country names',
                    color="Estimated road traffic death rate - Female",
                    hover_name="Country",
                    color_continuous_scale=px.colors.sequential.Jet,
                    labels= {'Estimated road traffic death rate - Female': 'Death Rate'},
                    title = 'Road traffic death rate in 2019 - Women (deaths per 100,000 female inhabitants)')

fig2.update(layout=dict(title=dict(x=0.5)))  

fig2.show()

In the first plot, we see some similarities between the maps for total traffic death rate and death rate by men. The most noticeable difference is the higher presence of light green and blue colors instead of green/yellow colors in the African continent, indicating a proportionally lower death rate in comparison with the highest number (Dominican Republic).

In contrast, in the second plot we have a greater presence of red/orange colors in the African continent, indicating that several of its countries are close to the limit of the traffic death ratio (by women) scale. The country of Liberia has the highest traffic death ratio by women in 2019 (24.9)

We end this brief notebook with animated plots to show the evolution in the estimated traffic death rate by country from 2000 to 2019.

In [None]:
#In the original dataset, the years are presented starting at 2019 and ending at 2000
#To set the animation in the right order (2000 to 2019), we need to sort the dataset
sorted_df = df.sort_values(by=['Year'])

fig = px.choropleth(sorted_df, locations="Country", locationmode='country names',
                    color="Estimated road traffic death rate - All",
                    hover_name="Country",
                    color_continuous_scale=px.colors.sequential.Jet,
                    animation_frame= "Year", range_color = [0,70],
                    labels= {'Estimated road traffic death rate - All': 'Death Rate'},
                    title = 'Estimated road traffic deaths per 100,000 inhabitants from 2000 to 2019')

fig.update(layout=dict(title=dict(x=0.5)))
           
fig.show()

In [None]:
fig1 = px.choropleth(sorted_df, locations="Country", locationmode='country names',
                    color="Estimated road traffic death rate - Male",
                    hover_name="Country",
                    color_continuous_scale=px.colors.sequential.Jet,
                    animation_frame= "Year", range_color = [0,110],
                    labels= {'Estimated road traffic death rate - Male': 'Death Rate'},
                    title = 'Road traffic death rate in from 2000 to 2019 - Men (deaths per 100,000 male inhabitants)')

fig1.update(layout=dict(title=dict(x=0.5)))

fig1.show()


fig2 = px.choropleth(sorted_df, locations="Country", locationmode='country names',
                    color="Estimated road traffic death rate - Female",
                    hover_name="Country",
                    color_continuous_scale=px.colors.sequential.Jet,
                    animation_frame= "Year", range_color = [0,30],
                    labels= {'Estimated road traffic death rate - Female': 'Death Rate'},
                    title = 'Road traffic death rate in from 2000 to 2019 - Women (deaths per 100,000 female inhabitants)')

fig2.update(layout=dict(title=dict(x=0.5)))

fig2.show()

## <center> If you find this notebook useful, support with an upvote! <center>