**Creating Charts that play in the browser**

Using the dataset for birth rates and mortality rates per 1000 population, I am trying to create an animated GIF, that shows the progress of India and its neighbouring countries in controlling the mortality rates and birth rates.

Gapminder provides socio-economic data which can be freely downloaded from their [website](https://www.gapminder.org/data/)
The original datasets were discrete - one CSV file for birthrates, another one for mortality rates and yet another one for population figures.

To create the chart, I merged all three of them together for each country and year.

**Gapminder provides following datasets:**
    - birth_rate_per1000_population.csv
    - infant_mortality_per1000_births.csv
    - total_population_with_projections.csv
    
I merged the first two datasets into a single csv file - "bmr_merged.csv". The python code for the merge operation is [here.](https://github.com/justinpolackal/animated-plots/merge.py) 
The resultant csv file is loaded into a dataframe below and then subsequently, the population data is also merged on to it, per country and year.

In [225]:
#import datasets 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df=pd.read_csv('./brmr_merged.csv')     #preprocessed data, derived from Gapminder original data sets
popdf = pd.read_csv('./population.csv') #population data

In [116]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56376 entries, 0 to 56375
Data columns (total 4 columns):
Country                       56160 non-null object
Year                          56160 non-null float64
BirthRatePer1000Population    43284 non-null float64
MortalityRatePer1000Births    13776 non-null object
dtypes: float64(2), object(2)
memory usage: 1.7+ MB


In [25]:
df.loc[(df['Country'].isin(['Afghanistan','Bangladesh','China','India','Nepal','Pakistan','Sri Lanka'])) & (df['Year']==1950)]

Unnamed: 0,Country,Year,BirthRatePer1000Population,MortalityRatePer1000Births
366,Afghanistan,1950.0,50.008,
4038,Bangladesh,1950.0,44.906,
9654,China,1950.0,46.907,195.0
21966,India,1950.0,43.966,164.0
34062,Nepal,1950.0,48.558,
37302,Pakistan,1950.0,41.399,279.6
46374,Sri Lanka,1950.0,35.617,94.8


Create a subset of data for the countries that we are interested in

In [48]:
mydf=df.loc[(df['Country'].isin(['Afghanistan','Bangladesh','China','India','Nepal','Pakistan','Sri Lanka']))].copy()
mydf.head()

Unnamed: 0,Country,Year,BirthRatePer1000Population,MortalityRatePer1000Births
216,Afghanistan,1800.0,48.136,
217,Afghanistan,1801.0,48.136,
218,Afghanistan,1802.0,48.136,
219,Afghanistan,1803.0,48.136,
220,Afghanistan,1804.0,48.136,


For each country, there may be missing values for birthrate, mortalityrate or population. The below code tries to forward fill the values for such rows from the previous non-null value. 

In [229]:
#Preprocess the data 
#

#Select a subset of records from the gapminder data for India and a few of its neighbouring countries
mydf=df.loc[(df['Country'].isin(['Afghanistan','Bangladesh','China','India','Nepal','Pakistan','Sri Lanka']))].copy()
#Merge population data
mydf=mydf.merge(popdf, how="left", on=["Country","Year"])
mydf['Population'] = mydf['Population']/1000000   #population in millions

newdf = pd.DataFrame()
clist = ['Afghanistan','Bangladesh','China','India','Nepal','Pakistan','Sri Lanka']
#
# Forward fill birth rates and mortality rates, if some cells are blank
#
for c in clist:
    cdf = mydf.loc[mydf['Country']==c].copy()
    cdf['BirthRatePer1000Population'].replace(to_replace=np.nan, method='ffill', inplace=True)
    cdf['MortalityRatePer1000Births'].replace(to_replace=np.nan, method='ffill', inplace=True)
    cdf['Population'].replace(to_replace=np.nan, method='ffill', inplace=True)
    
    newdf = pd.concat([newdf,cdf], ignore_index = True)

# Reset index and delete records where birthrate or mortality rates are still NaNs
newdf.reset_index(inplace=True, drop=True)    
newdf.drop(newdf[newdf['MortalityRatePer1000Births'].isnull()].index, inplace=True)
newdf.drop(newdf[newdf['BirthRatePer1000Population'].isnull()].index, inplace=True)  

mydf['MortalityRatePer1000Births'] = pd.to_numeric(mydf['MortalityRatePer1000Births'], errors="ignore")

# Store the result to csv (only for verification)
newdf.to_csv('./ffill.csv')


In [230]:
# Convert all numeric columns to numeric
newdf=newdf.apply(pd.to_numeric,errors='ignore')
#newdf.info()
newdf['MortalityRatePer1000Births'].max()

286.7

In [232]:
newdf.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 490 entries, 160 to 1511
Data columns (total 5 columns):
Country                       490 non-null object
Year                          490 non-null float64
BirthRatePer1000Population    490 non-null float64
MortalityRatePer1000Births    490 non-null float64
Population                    490 non-null float64
dtypes: float64(4), object(1)
memory usage: 23.0+ KB


**Generate PNG files **

Loop through each year and generate a PNG file for that year. 1950-2014 range will have one image per year.

In [245]:
startyear = 1950
endyear=2015
for year in np.arange(startyear,endyear,1):
    #year = 1960
    gdf = newdf[newdf['Year']==year]
    maxx = newdf['MortalityRatePer1000Births'].max()+10
    maxy = newdf['BirthRatePer1000Population'].max()+10
    clist ={"Afghanistan":'black','Bangladesh':'orange','China':'red','India':'blue','Nepal':'purple','Pakistan':'green','Sri Lanka':'brown'}
    bubble_linecolor = 'None'
    legend_labels=[]
    legend_lines=[]
    
    fig = plt.figure()
    ax = plt.axes()
    ax.set_title("Mortality Vs Birth Rate per 1000 People: " + str(year), fontsize=16)
    ax.set_xlabel("Mortality Rate", fontsize=14)
    ax.set_ylabel("Birth Rate", fontsize=14)
    for key in clist:
        #bubblesize=12
        bubblesize = 12 + (gdf.loc[gdf['Country']==key,'Population'].max()/100)
        #bubble_linecolor = clist[key]
        bubble_linecolor = 'skyblue'
        cline, =plt.plot('MortalityRatePer1000Births', 'BirthRatePer1000Population', data=gdf[gdf['Country']==key], marker='o', markerfacecolor=clist[key], markersize=bubblesize, color=bubble_linecolor, linewidth=4)
        legend_labels.append(key)
        legend_lines.append(cline)

    plt.xlim(0, maxx)
    plt.ylim(0, maxy);
    plt.legend(handles=legend_lines, labels=legend_labels,bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)
    
    fig.savefig('./savedfigures/' + str(year) + '.png', bbox_inches = "tight")
    plt.close('all')

**Convert the sequence of generated PNG files into an animated GIF**

Using imageio package, stack all the PNG files generated in the previous step to create an animated GIF that plays the progress of mortality and birthrates

In [246]:
import imageio
import os
import os.path
import sys

In [247]:
def fileExists(path,fname):
    return os.path.isfile(path + fname)


In [255]:
pngfilenums = np.arange(startyear,endyear,1)
path = './savedfigures/'
images = []
for filenum in pngfilenums:
    filename = str(filenum) + '.png'
    if(fileExists(path,filename)):
        images.append(imageio.imread(path+filename))
imageio.mimsave('./movie.gif', images, duration = 0.4)