# "Malaria Data Visualizations"
> "Plotly plots summarizing information on incidence of malaria, mortality, and bednet usage."

- toc: false
- branch: master
- badges: true
- comments: true
- categories: [fastpages, jupyter]
- image: images/some_folder/your_image.png
- hide: false
- search_exclude: true
- metadata_key1: metadata_value1
- metadata_key2: metadata_value2
- use_plotly: true

## Assignment Instructions

Create 3 informative visualizations about malaria using Python in a Jupyter notebook, starting with the data sets at https://github.com/rfordatascience/tidytuesday/tree/master/data/2018/2018-11-13. Where appropriate, make the visualizations interactive.

Note There are many libraries you can use for each task. Choose one library and explain why you chose it in your blog.

The incidence data set from the World Health Organization and the bednet data set and death rate data set from the United States: Institute for Health Metrics and Evaluation are used in these plots.  All can be downloaded at the link provided above.

## Library Choice

Plotly Express is used to create all of the graphs in this post.  This library was selected because it is a high-level API, which makes creating interactive plots relatively simple.  It has clear, easy-to-use instructions in the documentation (https://plotly.com/python/plotly-express/).  It can create a wide-range of plots from simply line graphs to data overlayed on a geographic map.  

## Part 1: Incidence vs Bednet Use

According to the World Health Organization, 

"Six countries accounted for more than half of all malaria cases worldwide: Nigeria (25%), the Democratic Republic of the Congo (12%), Uganda (5%), and Côte d’Ivoire, Mozambique and Niger (4% each)" (https://www.who.int/news-room/feature-stories/detail/world-malaria-report-2019)

For this reason, the data for those six countries will be illustrated.  The data in this section comes from the 'incidence-of-malaria.csv' and 'children-sleeping-under-treated-bednet.csv' files from the website listed in the assignment instructions.  The data ranges from 2000-2018.

In [87]:
#hide
import pandas as pd
import plotly.express as px

In [88]:
#hide
incidence = pd.read_csv('incidence-of-malaria.csv')
bednet = pd.read_csv('children-sleeping-under-treated-bednet.csv')

In [89]:
#hide
incidence = incidence.rename(columns = {"Incidence of malaria (per 1,000 population at risk)":"Inc"})
bednet = bednet.rename(columns = {"Use of insecticide-treated bed nets (% of under-5 population)":"Nets"})

In [90]:
#hide
top6 = (
    pd.concat(
        [incidence[incidence.Entity == "Nigeria"], 
        incidence[incidence.Entity == "Democratic Republic of Congo"],
        incidence[incidence.Entity == "Uganda"],
        incidence[incidence.Entity == "Cote d'Ivoire"],
        incidence[incidence.Entity == "Mozambique"],
        incidence[incidence.Entity == "Niger"]]
    )
)

In [91]:
#hide
top6_nets = (
    pd.concat(
        [bednet[bednet.Entity == "Nigeria"], 
        bednet[bednet.Entity == "Democratic Republic of Congo"],
        bednet[bednet.Entity == "Uganda"],
        bednet[bednet.Entity == "Cote d'Ivoire"],
        bednet[bednet.Entity == "Mozambique"],
        bednet[bednet.Entity == "Niger"]]
    )
)

top6_nets = top6_nets[top6_nets.Year < 2019]

In [92]:
#hide
merged = pd.merge(top6, top6_nets, how = 'inner', on = ['Entity', 'Year'])

In [93]:
#hide
deaths = pd.read_csv('malaria-deaths-by-region.csv')
deaths = deaths.rename(columns = {"Deaths - Malaria - Sex: Both - Age: All Ages (Number)": "Deaths"})
deaths = deaths.drop(deaths.index[deaths.Entity == "World"])


The first plot shows the incidence of malaria in the six countries of interest has been steadily decreasing since 2000. The best-fit line for Niger has a positive slope, but a clear nonlinear pattern that began decreasing after 2012 can be seen in the data points.   

In [106]:
#hide_input
fig = px.scatter(
    top6,
    x="Year",
    y="Inc",
    color="Entity",
    trendline = "ols",
    title = "Incidence of Malaria Per 1,000 Population At-Risk",
)
fig.show()

The next plot shows the percentage of children sleeping under insecticide-treated bednets has been steadily increasing since 2000.  

In [95]:
#hide_input
fig2 = px.scatter(
    top6_nets,
    x="Year",
    y="Nets",
    color="Entity",
    trendline = "ols",
    title = "Percent of Children Under 5 Sleeping Under an Insectiside-Treated Bednet")

fig2.show()

The final plot in this section graphs the relationship between the incidence and bednet variables.  All countries, excluding Niger because of the nonlinear trend in the incidence, have a decreasing trend.  This implies that as bed net use increases, the incidence of malaria decreases.  This illustrated the common knowledge that bednets are protective against malaria.

In [107]:
#hide_input
fig3 = px.scatter(
    merged,
    x="Nets",
    y="Inc",
    color="Entity",
    trendline = "ols",
    title = "Ratio of Incidence/Nets",
)
fig3.show()

## Part 2: SDI and Malaria Deaths

Socio-demographic Index, or SDI, is a measurement of the development level of a country.  In the plot below, it is clear that countries with a lower SDI contribute a higher percentage of malaria deaths than countries with a higher SDI.  

This plot uses the 'malaria-deaths-by-region.csv' from the website listed in the assignment instructions.  

In [108]:
#hide
SDIs = ["Low SDI", "Low-middle SDI", "Middle SDI", "High-middle SDI", "High SDI"]
s1 = deaths[deaths.Entity == "Low SDI"]
s2 = deaths[deaths.Entity == "Low-middle SDI"]
s3 = deaths[deaths.Entity == "Middle SDI"]
s4 = deaths[deaths.Entity == "High-middle SDI"]
s5 = deaths[deaths.Entity == "High SDI"]

SDI = pd.concat([s1,s2,s3,s4,s5])

               Entity Code  Year       Deaths
2576  High-middle SDI  NaN  1990   891.897812
2577  High-middle SDI  NaN  1991   896.270236
2578  High-middle SDI  NaN  1992   919.348459
2579  High-middle SDI  NaN  1993   932.652519
2580  High-middle SDI  NaN  1994   960.700283
2581  High-middle SDI  NaN  1995   973.506940
2582  High-middle SDI  NaN  1996   993.896280
2583  High-middle SDI  NaN  1997  1019.451708
2584  High-middle SDI  NaN  1998  1039.771643
2585  High-middle SDI  NaN  1999  1012.920998
2586  High-middle SDI  NaN  2000   979.673976
2587  High-middle SDI  NaN  2001   950.758676
2588  High-middle SDI  NaN  2002   924.450050
2589  High-middle SDI  NaN  2003   909.331911
2590  High-middle SDI  NaN  2004   890.040216
2591  High-middle SDI  NaN  2005   878.209011
2592  High-middle SDI  NaN  2006   848.515531
2593  High-middle SDI  NaN  2007   820.933554
2594  High-middle SDI  NaN  2008   793.274573
2595  High-middle SDI  NaN  2009   772.313083
2596  High-middle SDI  NaN  2010  

In [97]:
#hide_input
fig4 = px.area(SDI, x = "Year", y = "Deaths", color = "Entity", title = "Malaria Deaths by Socio-Demographic Index")
fig4.show()

## Part 3: Geography of Malaria

The plot below illustrates where malaria was present throughout the world from 1990 to 2017 using the same data set used in part 2.

In [98]:
#hide
from geopy.geocoders import Nominatim
import time
from pprint import pprint

app = Nominatim(user_agent="tutorial")

In [99]:
#hide
coor = pd.DataFrame()

for i in range(len(deaths)):
    if type(deaths.iloc[i]['Deaths']) != 0:
        if type(deaths.iloc[i]['Code']) == str:
        
            location = app.geocode(deaths.iloc[i]['Entity']).raw
            coor = pd.concat(
                [coor, 
                 pd.DataFrame(
                     {"latitude" : location['lat'],
                      "longitude" : location['lon'],
                      "Entity" : deaths.iloc[i]['Entity'],
                      "Deaths" : deaths.iloc[i]['Deaths'],
                      "Year" : deaths.iloc[i]['Year']
                     },
                     index = [i]
                 )
         
                ]
            )


KeyboardInterrupt: 

In [104]:
#hide_input
df = px.data.gapminder()
fig3 = px.scatter_geo(coor, lat = "latitude", lon = "longitude",
                      hover_name="Entity", 
                      animation_frame="Year", projection="natural earth")
fig3.show()