# Informative Visualization about Malaria

In this blog, we are going to present information regarding malaria through interactive visualizations, where the datasets about malaria were retrieved from the github repo [rfordatascience](https://github.com/rfordatascience/tidytuesday/tree/master/data/2018/2018-11-13).  

In this blog, we will be using plotly to

## Dataset for Plotting Preparation

In [34]:
import pandas as pd


def combine_malaria_dataset(url_deaths, url_deaths_age, url_inc):
    """Combine the three csv files from the github repo for plotting."""
    malaria_deaths = pd.read_csv(url_deaths, sep=',')
    malaria_inc = pd.read_csv(url_inc, sep = ',')
    malaria_deaths_age = pd.read_csv(url_deaths_age, sep = ',', index_col = 0)

    #To capitalize the first letter in the column names for better join
    malaria_deaths_age.columns = malaria_deaths_age.columns.str.title()
    
    common_cols = list(set(malaria_deaths.columns) & 
     set(malaria_deaths_age.columns) & 
     set(malaria_inc.columns))

    malaria_df = (
        pd.
        merge(malaria_deaths, malaria_inc, how = 'left', on = common_cols).
        merge(malaria_deaths_age, how = 'left', on = common_cols)
    )
    
    malaria_df.columns = ["Entity", "Code", "Year",
                     "Total_death_per_million", "Incidence_per_thousand",
                     "Age_group", "Deaths_per_age_group"]
    malaria_df = malaria_df.reset_index(drop = True)

    return malaria_df

In [35]:
url_deaths = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-11-13/malaria_deaths.csv"
url_inc = 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-11-13/malaria_inc.csv'
url_deaths_age = 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-11-13/malaria_deaths_age.csv'

malaria_df = combine_malaria_dataset(url_deaths, url_deaths_age, url_inc)
malaria_df.head()

Unnamed: 0,Entity,Code,Year,Total_death_per_million,Incidence_per_thousand,Age_group,Deaths_per_age_group
0,Afghanistan,AFG,1990,6.80293,,Under 5,184.606435
1,Afghanistan,AFG,1990,6.80293,,70 or older,10.72885
2,Afghanistan,AFG,1990,6.80293,,5-14,53.352844
3,Afghanistan,AFG,1990,6.80293,,15-49,414.709676
4,Afghanistan,AFG,1990,6.80293,,50-69,60.541746


In [25]:
malaria_df.Deaths_per_age_group

0        184.606435
1         10.728850
2         53.352844
3        414.709676
4         60.541746
            ...    
30775    745.340029
30776     66.572213
30777    177.953936
30778    453.902190
30779     97.402058
Name: Deaths_per_age_group, Length: 30780, dtype: float64

## Interactive Visualization for Malaria Data

### Malaria Incidence rate in Country Levels between 1990 to 2016 

One interesting aspect to look at this malaria dataset is to monitor the change of malaria incidence rate across a time series at different country levels. Thus, we generated a time series plot of malaria incidence rate between year 1990 to 2016. This visual is interactive in a way that we can self-define a couple of countries as the observation.  

To interact with the plot, user could perform the following actions:  
1. Double click at the legend to select the countries as many as they want to look at.  
2. Double click at the legend to exist the selection mode.
3. Scroll up and down the legend to include countries into the selection.  
4. Move the cursor on the line to know the value corresponding to the cursor location.  

In [36]:
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.express as px
from IPython.display import HTML


fig = px.line(malaria_df, 
              x = 'Year', 
              y = 'Total_death_per_million',
              color = 'Entity',
              labels={'Entity': 'Country', 
                      'Year': 'Year', 
                      'Total_death_per_million' : 'Total Number of Deaths per Million'})
HTML(fig.show())

<IPython.core.display.HTML object>

In [37]:
fig = px.line(malaria_df.dropna(), 
              x = 'Year', 
              y = 'Incidence_per_thousand',
              color = 'Entity',
              labels={'Entity': 'Country', 
                      'Year': 'Year', 
                      'Incidence_per_thousand' : 'Death Incidence'})
HTML(fig.show())

<IPython.core.display.HTML object>

## Death Cases between Age Groups at Country Level

Another way to look at the death incidence rate is to look at the death case between different age groups. 

In [33]:
fig = px.box(malaria_df,
             x = 'Age_group',
             y = 'Deaths_per_age_group', 
             color = 'Year')

fig.update_layout(yaxis=dict(tickformat="3,.0f"))

HTML(fig.show())