### Visualizing the history of the ebola Epidemic

**Note** : 
This is last year's exercise analyzing the spread of Ebola. The outbreak has been contained now, nonetheless we will analyze the history of the spread for the 3 of the countries found below. (You can perform uptodate analysis as well, by changing the dates to 2015 and cutting down the csv content limit to 100 instead of 1000)


The data is available in [this site](http://apps.who.int/gho/data/view.ebola-sitrep.ebola-summary-latest?lang=en) and they provie a restful api to download the data in csv formats. you can download up to date data about the ebola epidemic. We will download the numbers pertaining to cumulative cases and cumulative deaths for the dates as given in the question and generate graphs to visualize them.

In [None]:
# The URL through which the data can be gotten is the following
# The %s indicates the place where the date should be inserted, in a format such as: "2014-11-20"

base_url='http://apps.who.int/gho/athena/xmart/data-verbose.csv?'\
    'target=EBOLA_MEASURE/CASES,DEATHS&profile=verbose&'\
    'filter=COUNTRY:GIN;COUNTRY:UNSPECIFIED;COUNTRY:LBR;COUNTRY:UNSPECIFIED;'\
    'COUNTRY:SLE;COUNTRY:UNSPECIFIED;LOCATION:-;'\
    'DATAPACKAGEID:%s;INDICATOR_TYPE:SITREP_CUMULATIVE;'\
    'INDICATOR_TYPE:SITREP_CUMULATIVE_21_DAYS;'
base_url

In [None]:
# Use requests.get and pandas to create a list of dataframes, one for each date.
# Dates for which the downloaded file is less than 1000 byte are ignored (because they contain no data)

import pandas as pd
import requests
from datetime import date, timedelta as td
%pylab inline
import io

d1 = date(2014,11,1)
d2 = date(2014,12,5)
delta = d2 - d1

DF={}
for i in range(delta.days + 1):
    D=str(d1 + td(days=i))
    url= base_url%D

    csv= requests.get(url)
    if len(csv.content) > 1000:
        print '"%s"'%D,len(csv.content),
        # read the content of csv.content into DF[D]
        DF[D]=pd.read_csv(io.BytesIO(csv.content))
        print shape(DF[D])


In [None]:
# list the names of the columns for "2014-11-12"
DF["2014-11-12"].head()

In [None]:
#rename date columns
for k in DF.keys():
    DF[k].rename(columns={'EPI_WEEK (DISPLAY)': 'EPI_DATE (DISPLAY)'}, inplace=True)


In [None]:
# merge the dataframes in DF into a single dataframe called DF_combined
# Find the names of the columns that are common to all of the dataframes.
# Restrict the data frames to the rows where the measurements have been 
# cumulative (rather than for the last 21 days) and that the number is confirmed 
# (rather than estimated or probable)



Cols = None
DF_Combined=pd.DataFrame()
for k in DF.keys():
    C=set(DF[k].columns)
    if Cols==None:
        Cols=C
    else:
        Cols= C & Cols # complete    
    df=DF[k]
    df = df[list(Cols)]
    df=df[df["INDICATOR_TYPE (CODE)"]=="SITREP_CUMULATIVE"] # Finish to restrict to  cumulative
    df=df[df["CASE_DEFINITION (CODE)"]=="CONFIRMED"]        # Finish to restrict to confirmed
    DF_Combined = pd.concat([DF_Combined, df]) # concatanate df to DF_Combined
    print k, shape(DF_Combined)
    

   

In [None]:
# As it turns out, some of the rows have a column called 'EPI_WEEK (DISPLAY)' and others have,
# for apparently the same meaning, a column called 'EPI_DATE (DISPLAY)'

# write a command to fill in the missing entries in EPI_DATE (DISPLAY) with the 
# corresponding values in 'EPI_DATE (DISPLAY)'

dateparse = lambda x: pd.datetime.strptime(x[:-13], '%d %B %Y').date

from datetime import datetime
# transform the dates into python date objects so that they can be compared and plotted.
dates = np.array(DF_Combined['EPI_DATE (DISPLAY)'])

DF_Combined['date']=[datetime.strptime(date[:16], "%d %B %Y") for date in dates]

In [None]:
# cols contains all of the columns we need in DF_Combined
cols=['COUNTRY (DISPLAY)',
 'EBOLA_MEASURE (DISPLAY)',
 'date',
 'Numeric']
cols

In [None]:
#Fill in the missing commands to create the plots given at the bottom.

import matplotlib.dates as mdates
myFmt = mdates.DateFormatter('%d %B')

dff=DF_Combined[cols]
dff=dff.sort(columns=['COUNTRY (DISPLAY)','EBOLA_MEASURE (DISPLAY)','date'])
Countries=set(dff['COUNTRY (DISPLAY)'])
types=set(dff['EBOLA_MEASURE (DISPLAY)'])

fig, axarr = plt.subplots(3,sharex=True)
fig.set_size_inches(10,15)  #there is a bug in this line
fig.autofmt_xdate()

i=0
for C in Countries:
    for t in types:
        data=dff[(dff['COUNTRY (DISPLAY)']==C)&(dff['EBOLA_MEASURE (DISPLAY)']==t)]
        axarr[i].plot(data['date'].values, data['Numeric'].values, label=t)
        
    axarr[i].xaxis.set_major_formatter(myFmt)
    # Use the country name as the title
    axarr[i].set_title(C)
    axarr[i].legend(loc='best')
    axarr[i].grid()
    i+=1

show()