# DATA100 - Final Project: Storytelling with Data 

## Group 3 Members 
* Argonza, Antoinette Joy 
* Jamia, Gillian Nicole 
* Magsano, Niño Matthew 
* Reyes, Anton Gabriel

## Motivation

**As Lasallian students, we would want to determine the possible causes of child mortality and provide credible, consolidated information and insights that help prevent or solve this pressing social issue.**


## Libraries, Packages, or Modules

In [9]:
import os 
import csv
import time
import numpy as np
import pandas as pd
import seaborn as sns
import requests 
import datetime as dt
import geopandas as gpd
import matplotlib.pyplot as plt

from shapely.geometry import Point
from IPython.core.display import HTML

%matplotlib inline
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [10]:
plt.style.use('seaborn-whitegrid')

## Data Collection

### Datasets

The dataset for **Mortality among children** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/mpwolke/cusersmarildownloadsdeathscsv) <br>
The dataset for **WHO - Immunization coverage estimates by country** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/lsind18/who-immunization-coverage) <br>
The dataset for **Malnutrition across the globe** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/ruchi798/malnutrition-across-the-globe) <br>
The dataset for **Child Health Dataset** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/hijest/child-health-dataset-who) <br>
The dataset for **Out of School Rates Global Data** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/komalkhetlani/out-of-school-rates-global-data?select=Primary.csv) <br>
The dataset for **World Health Statistics 2020** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/utkarshxy/who-worldhealth-statistics-2020-complete?select=adolescentBirthRate.csv) <br>

## Data Wrangling & Exploratory Data Analysis

### Mortality among children

In [27]:
mortality = pd.read_csv('mortality_children.csv', delimiter=';')
mortality.head()

Unnamed: 0,region,Unnamed: 1,Unnamed: 2,median,upper,lower,median.1,upper.1
0,Sub-Saharan Africa,,,39820157,42396932,568662,591774,626820
1,West and Central Africa,,,40214721,44137154,262349,279006,303798
2,Eastern and Southern Africa,,,39474451,42521735,295842,312767,334924
3,Middle East and North Africa,,,9516906,10260104,63007,66627,71738
4,South Asia,,,20386521,21239496,561782,582802,605420


### WHO - Immunization coverage estimates by country

In [24]:
rotac = pd.read_csv('Immunization/ROTAC.csv')
rotac.head()

Unnamed: 0,Country,2020,2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006
0,Afghanistan,62.0,58.0,45.0,,,,,,,,,,,,
1,Angola,39.0,58.0,48.0,43.0,44.0,40.0,18.0,,,,,,,,
2,Argentina,72.0,77.0,80.0,78.0,75.0,61.0,,,,,,,,,
3,Armenia,,92.0,93.0,94.0,94.0,93.0,91.0,33.0,,,,,,,
4,Australia,87.0,87.0,87.0,87.0,87.0,85.0,84.0,84.0,84.0,85.0,84.0,82.0,81.0,,


### Malnutrition across the globe

In [12]:
malnutrition = pd.read_csv('malnutrition_country_avg.csv')
malnutrition.head()

Unnamed: 0,Country,Income Classification,Severe Wasting,Wasting,Overweight,Stunting,Underweight,U5 Population ('000s)
0,AFGHANISTAN,0.0,3.033333,10.35,5.125,47.775,30.375,4918.5615
1,ALBANIA,2.0,4.075,7.76,20.8,24.16,7.7,232.8598
2,ALGERIA,2.0,2.733333,5.942857,12.833333,19.571429,7.342857,3565.213143
3,ANGOLA,1.0,2.4,6.933333,2.55,42.633333,23.6,3980.054
4,ARGENTINA,2.0,0.2,2.15,11.125,10.025,2.6,3613.65175


### Child Health Dataset

In [19]:
bfd = pd.read_csv('Breastfeeding.csv', encoding='latin1')
bfd.head()

Unnamed: 0,Country,Year,Early initiation of breastfeeding (%),Infants exclusively breastfed for the first six months of life (%),Low-birth-weight newborns (%)
0,Afghanistan,2015-2016,40.9,,
1,Afghanistan,2015,,43.1,
2,Albania,2008-2009,43.4,,
3,Albania,2008,,37.1,
4,Albania,2005,29.9,3.4,7.0


### Out of School Rates Global Data

In [20]:
outofschool = pd.read_csv('Primary_out_of_school_rates.csv', encoding='latin1')
outofschool.head()

Unnamed: 0,ISO3,Countries and areas,Region,Sub-region,Development Regions,Total,Female,Male,Rural_Residence,Urban_Residence,Poorest_Wealth quintile,Second_Wealth quintile,Middle_Wealth quintile,Fourth_Wealth quintile,Richest_Wealth quintile,Data source,Time period
0,AFG,Afghanistan,SA,SA,Least Developed,37.0,47.0,28.0,42.0,19.0,42.0,47.0,46.0,32.0,16.0,DHS 2015,2015.0
1,ALB,Albania,ECA,EECA,More Developed,2.0,2.0,3.0,4.0,1.0,4.0,3.0,2.0,2.0,1.0,DHS 2017-18,2018.0
2,DZA,Algeria,MENA,MENA,Less Developed,2.0,2.0,2.0,2.0,2.0,3.0,2.0,2.0,2.0,1.0,MICS 2012-13,2013.0
3,AND,Andorra,ECA,WE,More Developed,,,,,,,,,,,,
4,AGO,Angola,SSA,ESA,Least Developed,22.0,22.0,21.0,35.0,14.0,39.0,33.0,19.0,12.0,5.0,DHS 2015-16,2016.0


### World Health Statistics 2020

In [18]:
adolescentBR = pd.read_csv('adolescentBirthRate.csv')
adolescentBR.head()

Unnamed: 0,Location,Period,Indicator,First Tooltip
0,Afghanistan,2017,Adolescent birth rate (per 1000 women aged 15-...,62.0
1,Afghanistan,2014,Adolescent birth rate (per 1000 women aged 15-...,77.2
2,Afghanistan,2013,Adolescent birth rate (per 1000 women aged 15-...,87.0
3,Afghanistan,2011,Adolescent birth rate (per 1000 women aged 15-...,125.7
4,Afghanistan,2009,Adolescent birth rate (per 1000 women aged 15-...,80.0


## Data Visualization