# Exploring the WHO's Malaria data (complete)
> A quick look at the WHO malaria database. 

- toc: false 
- badges: false
- comments: false
- categories: [jupyter, malaria, WHO, data, quicklook]
- image: images/malaria.jpg

The World Health Organization (WHO) has some interesting datasets on Malaria. We have linked [some of them](https://ckan.africadatahub.org/dataset/who-malaria) in the ADH [CKAN data repository](https://ckan.africadatahub.org/). In this blog post, we provide a quicklook tool to showcase the information available in this data. This post is coded in Jupyter Notebook and if you are interested, you can view the code by clicking `Show Code`.

In [None]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install jupyter_contrib_nbextensions
!{sys.executable} jupyter contrib nbextension install --user
!{sys.executable} jupyter nbextension enable python-markdown/main

In [None]:
#collapse
import altair as alt
import pandas as pd
import requests     
import json         
import matplotlib.pyplot as plt
import math

In [None]:
#hide
# Package list of ADH CKAN
packages = 'https://ckan.africadatahub.org/api/3/action/package_list'

# Make the HTTP request
response = requests.get(packages)

# Use the json module to load CKAN's response into a dictionary
response_dict = json.loads(response.content)

# Check the contents of the response
assert response_dict['success'] is True  # make sure if response is OK

datasets = response_dict['result']         # extract all the packages from the response
#print(len(datasets))                       # print the total number of datasets

#print(datasets)

# Specify the package you are interested in:
package = 'who-malaria'

# Base url for package information. This is always the same.
base_url = 'https://ckan.africadatahub.org/api/3/action/package_show?id='

# Construct the url for the package of interest
package_information_url = base_url + package

# Make the HTTP request
package_information = requests.get(package_information_url)

# Use the json module to load CKAN's response into a dictionary
package_dict = json.loads(package_information.content)

# Check the contents of the response.
assert package_dict['success'] is True  # again make sure if response is OK
package_dict = package_dict['result']   # we only need the 'result' part from the dictionary
data_id_deaths = package_dict['resources'][1]['id']
data_id_cases = package_dict['resources'][-1]['id']

### Get the data
The following datasets have been used in this post:
* [Estimated Malaria Deaths](https://ckan.africadatahub.org/dataset/who-malaria/resource/3fb5a88a-c48b-432d-a9b4-d76b26363705) 
* [Estimated Malaria Cases](https://ckan.africadatahub.org/dataset/a747f73e-6009-43f6-8b55-486409ca92e4/resource/c4bc5b8d-80f6-4b4e-a193-0e8c50bc4d51/download/data.csv) 

In [None]:
#collapse
# get the data
def get_data(data_id):
    r = requests.request('GET', 'https://ckan.africadatahub.org/api/3/action/datastore_search?resource_id=%s&limit=5000'%(data_id))
    c = json.loads(r.content)
    df = pd.json_normalize(c['result']['records'])
    #print(df.head(10))
    #print(df.Indicator.drop_duplicates())
    return df
df_deaths = get_data(data_id_deaths)
df_cases = get_data(data_id_cases)

These datasets have {{df_deaths.shape[0]}} rows with {{df_deaths.shape[1]}} columns. We're only interested in a few columns and we're also only interested in African countries, so we can select the countries and columns as follows.

In [None]:
#collapse_show
def cut_data(df):
    df = df[df.ParentLocation=='Africa']
    cols = ['SpatialDimValueCode', 'Location','Indicator','Period','FactValueNumeric','FactValueNumericLow','FactValueNumericHigh','DateModified']
    df = df.loc[:,cols]
    df = df.rename(columns={'FactValueNumeric':'value','FactValueNumericLow':'low_bound','FactValueNumericHigh':'up_bound'})
    print("New shape: {}".format(df.shape))
    return df

df_deaths = cut_data(df_deaths)
df_cases = cut_data(df_cases)


Let's combine these datasets and explore the data

In [None]:
#collapse_show
df = pd.merge(df_deaths,df_cases,on=['SpatialDimValueCode','Location','Period'],suffixes=("_deaths","_cases"))

In [None]:
#collapse
#%% create filters

locations = df.Location.unique()
locations = list(filter(lambda d: d is not None, locations)) # filter out None values
locations.sort() # sort alphabetically
demo_labels = locations.copy()

input_dropdown = alt.binding_select(options=locations, name='Select country',labels=demo_labels)
selection = alt.selection_single(fields=['Location'], bind=input_dropdown,init={'Location':'Kenya'})

In [None]:
#collapse
# Deaths
w = 350 # width
h = 300 # height
title = alt.TitleParams('Estimated number of deaths due to Malaria in Selected Country', anchor='middle')
line = alt.Chart(df,title=title).mark_line().encode(
alt.X('Period:O',title='Year'), # :O tells altair that the data is ordinal
alt.Y('value_deaths',title='Number of Deaths')
).properties(
    width=w,
    height=h  
)  

#line.show()

point = alt.Chart(df).mark_area(opacity=0.3).encode(
alt.X('Period:O'),
alt.Y('low_bound_deaths'),
alt.Y2('up_bound_deaths'),
tooltip=['Period','low_bound_deaths','value_deaths','up_bound_deaths']
).properties(
    width=w,
    height=h  
).interactive()   

# cases
title = alt.TitleParams('Estimated number of cases of Malaria in Selected Country', anchor='middle')
line2 = alt.Chart(df,title=title).mark_line().encode(
alt.X('Period:O',title='Year'), # :O tells altair that the data is ordinal
alt.Y('value_cases',title='Number of Cases')
).properties(
    width=w,
    height=h  
)  

#line.show()

point2 = alt.Chart(df).mark_area(opacity=0.3).encode(
alt.X('Period:O'),
alt.Y('low_bound_cases'),
alt.Y2('up_bound_cases'),
tooltip=['Period','low_bound_cases','value_cases','up_bound_cases']
).properties(
    width=w,
    height=h  
).interactive() 

# combine plots
x = line + point | line2 + point2
x = x.add_selection(
    selection
).transform_filter(
    selection
)
x.save('chart.html')
x