Trying to get my head around pandas.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
df = pd.read_csv('../input/ebola.csv')
df.head()

The data does not seem to be sorted in any particular order. Also, the Indicator column can have quite a few values:

In [None]:
df['Indicator'].unique()

We pick the three most important indicators: total deaths, cumulative prevalence, and incidence (new cases). The other indicators are all functions of these three.

In [None]:
str_total = 'Cumulative number of confirmed, probable and suspected Ebola cases'
str_death = 'Cumulative number of confirmed, probable and suspected Ebola deaths'
str_new = 'Number of confirmed, probable and suspected Ebola cases in the last 21 days'

Also, we sort the frame by date, and add an extra column 'DateCount' giving the number of days since the start.

In [None]:
df['Date'] = df['Date'].apply(pd.to_datetime)
df = df.sort_values(by='Date')
date0 = df['Date'].min()
df['DateCount'] = df['Date'].apply(lambda date: (date - date0).days)
df.head()

We can now plot the most important graphs.

In [None]:
fig, ax = plt.subplots(3,1, figsize=(10,20))
for i, country in enumerate(['Guinea', 'Liberia', 'Sierra Leone']):
    for ind in [str_total, str_death, str_new]:
        df[(df['Country'] == country) & (df['Indicator'] == ind)].plot(
                'Date', 'value', ax=ax[i])
    ax[i].legend(['Total', 'Deaths', 'New (21 days)'])
    ax[i].set_title(country)
    ax[i].set_xlabel('')


I'm not sure I'll be able to do much mathematical analysis of this. Epidemiological models look for exponential spread, at least in the beginning of an epidemic, but here it seems that healthcare got things pretty much under control within just a few months, at least in Liberia and Sierra Leone.