# "Malaria Data Visualizations"
> "Three plots summarizing information on incidence of malaria, mortality, and bednet usage."

- toc: false
- branch: master
- badges: true
- comments: true
- categories: [fastpages, jupyter]
- image: images/some_folder/your_image.png
- hide: false
- search_exclude: true
- metadata_key1: metadata_value1
- metadata_key2: metadata_value2
- use_plotly: true

## Assignment Instructions

Create 3 informative visualizations about malaria using Python in a Jupyter notebook, starting with the data sets at https://github.com/rfordatascience/tidytuesday/tree/master/data/2018/2018-11-13. Where appropriate, make the visualizations interactive.

Note There are many libraries you can use for each task. Choose one library and explain why you chose it in your blog.

The incidence data set from the World Health Organization and the bednet data set and death rate data set from the United States: Institute for Health Metrics and Evaluation are used in these plots.  All can be downloaded at the link provided above.

In [80]:
import pandas as pd
import plotly.express as px

In [33]:
incidence = pd.read_csv('incidence-of-malaria.csv')
bednet = pd.read_csv('children-sleeping-under-treated-bednet.csv')

In [34]:
incidence = incidence.rename(columns = {"Incidence of malaria (per 1,000 population at risk)":"Inc"})
bednet = bednet.rename(columns = {"Use of insecticide-treated bed nets (% of under-5 population)":"Nets"})

According to the World Health Organization, 

"Six countries accounted for more than half of all malaria cases worldwide: Nigeria (25%), the Democratic Republic of the Congo (12%), Uganda (5%), and Côte d’Ivoire, Mozambique and Niger (4% each)" (https://www.who.int/news-room/feature-stories/detail/world-malaria-report-2019)

The incidence of malaria and percentage of children sleeping under insecticide-treated bednets in these six countries will be explored below.


In [35]:
top6 = (
    pd.concat(
        [incidence[incidence.Entity == "Nigeria"], 
        incidence[incidence.Entity == "Democratic Republic of Congo"],
        incidence[incidence.Entity == "Uganda"],
        incidence[incidence.Entity == "Cote d'Ivoire"],
        incidence[incidence.Entity == "Mozambique"],
        incidence[incidence.Entity == "Niger"]]
    )
)

In [36]:
top6_nets = (
    pd.concat(
        [bednet[bednet.Entity == "Nigeria"], 
        bednet[bednet.Entity == "Democratic Republic of Congo"],
        bednet[bednet.Entity == "Uganda"],
        bednet[bednet.Entity == "Cote d'Ivoire"],
        bednet[bednet.Entity == "Mozambique"],
        bednet[bednet.Entity == "Niger"]]
    )
)

top6_nets = top6_nets[top6_nets.Year < 2019]

In [49]:
merged = pd.merge(top6, top6_nets, how = 'inner', on = ['Entity', 'Year'])

Unnamed: 0,Entity,Code_x,Year,Inc,Code_y,Nets
0,Nigeria,NGA,2003,409.157078,NGA,1.2
1,Nigeria,NGA,2008,424.655344,NGA,5.5
2,Nigeria,NGA,2010,398.90262,NGA,28.9
3,Nigeria,NGA,2011,372.557183,NGA,16.4
4,Nigeria,NGA,2013,328.654579,NGA,16.6
5,Nigeria,NGA,2014,314.404862,NGA,25.4
6,Nigeria,NGA,2015,296.0814,NGA,43.6
7,Nigeria,NGA,2017,283.064074,NGA,49.1
8,Nigeria,NGA,2018,291.942514,NGA,52.2
9,Democratic Republic of Congo,COD,2001,473.607811,COD,1.0


In [38]:
deaths = pd.read_csv('malaria-deaths-by-region.csv')
deaths = deaths.rename(columns = {"Deaths - Malaria - Sex: Both - Age: All Ages (Number)": "Deaths"})

Unnamed: 0,Entity,Code,Year,Deaths
5621,Syria,SYR,2011,0.0
5746,Timor,TLS,1996,2.618739
4055,Nicaragua,NIC,2013,2.434741
652,Bhutan,BTN,1998,46.853076
1778,Ecuador,ECU,2004,15.172327


In [54]:
fig = px.scatter(
    top6,
    x="Year",
    y="Inc",
    color="Entity",
    trendline = "ols",
    title = "Incidence of Malaria Per 1,000 At-Risk Individuals",
)
fig.show()

In [81]:
fig2 = px.scatter(
    top6_nets,
    x="Year",
    y="Nets",
    color="Entity",
    trendline = "ols",
    title = "Percent of Children Under 5 Sleeping Under an Insectiside-Treated Bednet")

fig2.show()

Socio-demographic Index, or SDI, is a measurement of the development level of a country.  

In [74]:

SDIs = ["Low SDI", "Low-middle SDI", "Middle SDI", "High-middle SDI", "High SDI"]
s1 = deaths[deaths.Entity == "Low SDI"]
s2 = deaths[deaths.Entity == "Low-middle SDI"]
s3 = deaths[deaths.Entity == "Middle SDI"]
s4 = deaths[deaths.Entity == "High-middle SDI"]
s5 = deaths[deaths.Entity == "High SDI"]

SDI = pd.concat([s1,s2,s3,s4,s5])

       Entity Code  Year         Deaths
3332  Low SDI  NaN  1990  415915.923114
3333  Low SDI  NaN  1991  425368.967512
3334  Low SDI  NaN  1992  433697.331953
3335  Low SDI  NaN  1993  442349.120749
3336  Low SDI  NaN  1994  446932.265400
3337  Low SDI  NaN  1995  454380.305001
3338  Low SDI  NaN  1996  461154.753114
3339  Low SDI  NaN  1997  470316.268613
3340  Low SDI  NaN  1998  479664.245254
3341  Low SDI  NaN  1999  488137.907061
3342  Low SDI  NaN  2000  496577.847772
3343  Low SDI  NaN  2001  512711.015493
3344  Low SDI  NaN  2002  523647.402003
3345  Low SDI  NaN  2003  531024.716945
3346  Low SDI  NaN  2004  530887.195049
3347  Low SDI  NaN  2005  525639.704115
3348  Low SDI  NaN  2006  511358.446652
3349  Low SDI  NaN  2007  501561.261653
3350  Low SDI  NaN  2008  484558.903494
3351  Low SDI  NaN  2009  466484.030464
3352  Low SDI  NaN  2010  447267.978823
3353  Low SDI  NaN  2011  423547.980525
3354  Low SDI  NaN  2012  398521.798110
3355  Low SDI  NaN  2013  382357.891503


Unnamed: 0,Entity,Code,Year,Deaths
3332,Low SDI,,1990,415915.923114
3333,Low SDI,,1991,425368.967512
3334,Low SDI,,1992,433697.331953
3335,Low SDI,,1993,442349.120749
3336,Low SDI,,1994,446932.2654


In [75]:
fig2 = px.area(SDI, x = "Year", y = "Deaths", color = "Entity", title = "Malaria Deaths by Socio-Demographic Index")
fig2.show()

In [60]:
from geopy.geocoders import Nominatim
import time
from pprint import pprint

app = Nominatim(user_agent="tutorial")

In [78]:
coor = pd.DataFrame()

for i in range(len(deaths)):
    if type(deaths.iloc[i]['Deaths']) != 0:
        if type(deaths.iloc[i]['Code']) == str:
        
            location = app.geocode(deaths.iloc[i]['Entity']).raw
            coor = pd.concat(
                [coor, 
                 pd.DataFrame(
                     {"latitude" : location['lat'],
                      "longitude" : location['lon'],
                      "Entity" : deaths.iloc[i]['Entity'],
                      "Deaths" : deaths.iloc[i]['Deaths'],
                      "Year" : deaths.iloc[i]['Year']
                     },
                     index = [i]
                 )
         
                ]
            )


In [79]:
df = px.data.gapminder()
fig3 = px.scatter_geo(coor, lat = "latitude", lon = "longitude",hover_name="Entity", size = "Deaths",
               animation_frame="Year", projection="natural earth")
fig3.show()

In [66]:
coor

Unnamed: 0,latitude,longitude,Entity,Deaths
0,33.7680065,66.2385139,Afghanistan,463.612423
1,33.7680065,66.2385139,Afghanistan,487.191614
2,33.7680065,66.2385139,Afghanistan,521.714216
3,33.7680065,66.2385139,Afghanistan,675.657696
4,33.7680065,66.2385139,Afghanistan,782.445339
...,...,...,...,...
1566,-2.9814344,23.8222636,Democratic Republic of Congo,81622.267994
1567,-2.9814344,23.8222636,Democratic Republic of Congo,81226.476656
1568,55.670249,10.3333283,Denmark,0.000000
1569,55.670249,10.3333283,Denmark,0.000000
