<h1>Chapter 2 | Case Study B1 | <b>Displaying Immunization Rates across Countries</b></h1>
<p>In this notebook, I'll be reproducing the tables from Chapter 02 on different table formats. The author used data downladed from the World Development Indicators as an example. We'll display the data in a <b>tidy format</b> and in a <b>wide format</b>.</p>
<h2><b>PART A</b> | Read the data</h2>


In [1]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
import os
import sys

In [6]:
# Current script folder
current_path = os.getcwd()
dirname = current_path.split("da_case_studies")[0]

#  Get location folders
data_in = f"{dirname}da_data_repo/worldbank-immunization/clean/"
data_out = f"{dirname}da_case_studies/ch02-immunization-crosscountry/"
output = f"{dirname}da_case_studies/ch02-immunization-crosscountry/output/"
func = f"{dirname}da_case_studies/ch00-tech_prep/"
sys.path.append(func)

In [7]:
# Import the prewritten helper functions 
from py_helper_functions import *

<p>Let's load the clean and tidy data and create our workfile.</p>

In [8]:
df = pd.read_csv(f"{data_in}worldbank-immunization-panel.csv")

<p>To reproduce the table in the book, let's do some cleaning.</p>

In [9]:
df = df.filter(["countryname", "year", "imm", "gdppc"]).loc[
    (df["year"] >= 2015)
    & ((df["countryname"] == "Pakistan") | (df["countryname"] == "India"))
]

In [10]:
df.describe(percentiles=[])

Unnamed: 0,year,imm,gdppc
count,6.0,6.0,6.0
mean,2016.0,81.5,5373.962113
std,0.894427,6.774954,874.322204
min,2015.0,75.0,4459.146517
50%,2016.0,81.5,5257.315866
max,2017.0,88.0,6516.17262


In [11]:
df.filter(["countryname", "year", "imm", "gdppc"])

Unnamed: 0,countryname,year,imm,gdppc
3240,India,2015,87,5743.426497
3370,Pakistan,2015,75,4459.146517
3431,India,2016,88,6145.294595
3561,Pakistan,2016,75,4608.527214
3622,India,2017,88,6516.17262
3752,Pakistan,2017,76,4771.205236


In [12]:
# Table 2.4
df.sort_values(["countryname", "year"])

Unnamed: 0,countryname,year,imm,gdppc
3240,India,2015,87,5743.426497
3431,India,2016,88,6145.294595
3622,India,2017,88,6516.17262
3370,Pakistan,2015,75,4459.146517
3561,Pakistan,2016,75,4608.527214
3752,Pakistan,2017,76,4771.205236


<hr>
<p><b>Table 2.4</b> <i>Country-year panel on immunization and GDP per capita</i> - tidy, long table
<p>We can now reproduce the data in a wide format. We can apply <code>.set_index()</code> to define the index based on <code>"countryname"</code> and <code>"year"</code>. Following, we apply <code>.unstack()</code> to pivot the years that were indexed in a hierarchichal index label.</p>

In [13]:
df = df.set_index(["countryname", "year"]).unstack("year")
df

Unnamed: 0_level_0,imm,imm,imm,gdppc,gdppc,gdppc
year,2015,2016,2017,2015,2016,2017
countryname,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
India,87,88,88,5743.426497,6145.294595,6516.17262
Pakistan,75,75,76,4459.146517,4608.527214,4771.205236


In [14]:
# Reset the column names from the generated multiindex
df.columns = [x[0] + "_" + str(x[1]) for x in df.columns]

In [15]:
# Table 2.5
df

Unnamed: 0_level_0,imm_2015,imm_2016,imm_2017,gdppc_2015,gdppc_2016,gdppc_2017
countryname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
India,87,88,88,5743.426497,6145.294595,6516.17262
Pakistan,75,75,76,4459.146517,4608.527214,4771.205236


<hr>
<p><b>Table 2.5</b> <i>Country-year panel immmunization and GDP per capita</i> - wide data table</p>

<p>And that's it!</p>
<hr>
