# DATA100 - Final Project: Storytelling with Data 

<hr>
<hr>

## Group 3 Members 
* Argonza, Antoinette Joy 
* Jamia, Gillian Nicole 
* Magsano, Niño Matthew 
* Reyes, Anton Gabriel

<hr>
<hr>

## Motivation

*As Lasallian students, we would want to determine the possible causes of child mortality and provide credible, consolidated information and insights that help prevent or solve this pressing social issue.*


<hr>
<hr>

## Libraries, Packages, or Modules

In [467]:
import os 
import csv
import time
import numpy as np
import pandas as pd
import seaborn as sns
import requests 
import datetime as dt
import geopandas as gpd
import matplotlib.pyplot as plt

from shapely.geometry import Point
from IPython.core.display import HTML

%matplotlib inline
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [468]:
plt.style.use('seaborn-whitegrid')

<hr>
<hr>

## Data Collection

### Datasets

**Initial datasets:** <br>
Upon initial data collection and reading of datasets, these databases were dropped, but it is still listed here to show our initial trajectory towards the storytelling. The data collection codes for these are removed in the notebook to make it presentable. <br>
* (Removed) The dataset for **Mortality among children** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/mpwolke/cusersmarildownloadsdeathscsv) <br>
* (Removed) The dataset for **WHO - Immunization coverage estimates by country** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/lsind18/who-immunization-coverage) <br>
* (Removed) The dataset for **Child Health Dataset** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/hijest/child-health-dataset-who) <br>
* (Removed) The dataset for **Out of School Rates Global Data** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/komalkhetlani/out-of-school-rates-global-data?select=Primary.csv) <br>
* (Removed) The dataset for **World Health Statistics 2020** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/utkarshxy/who-worldhealth-statistics-2020-complete?select=adolescentBirthRate.csv) <br>

**Official datasets**: <br>
* The dataset for **Mortality rate, under-5 (per 1,000 live births)** can be found here: [`The World Bank` Dataset Source](https://data.worldbank.org/indicator/SH.DYN.MORT?end=2019&fbclid=IwAR1SIiyIcig6Mwin1-szdOQoKMCC6BMJrb0NrdbS1-bnL8gd2JoalibPjYI&start=1960) <br>
* The dataset for **Causes of death in children under 5** can be found here: [`Our World in Data` Dataset Source](https://ourworldindata.org/grapher/causes-of-death-in-children-under-5) <br>
* The dataset for **Malnutrition across the globe** can be found here: [`Kaggle` Dataset Source](https://www.kaggle.com/ruchi798/malnutrition-across-the-globe) <br>

**Helpful dataset/s**: <br>
These/This are/is included for possible merges of cleaned databases for data visualization and analysis. <br>
* The dataset for **Country Codes Alpha-2 & Alpha-3** can be found here: [`IBAN` Dataset Source](https://www.iban.com/country-codes) <br>

<hr>
<hr>

## Data Wrangling & Exploratory Data Analysis

<hr>

### Mortality rate, under-5 (per 1,000 live births)

In [469]:
mortality = pd.read_csv('Databases/mortality_rate.csv')
mortality.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,Aruba,ABW,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,,,,,,,...,,,,,,,,,,
1,Africa Eastern and Southern,AFE,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,,,,,,,...,82.105187,78.354228,74.991597,71.996841,69.288947,66.667614,64.347025,62.115387,60.098659,
2,Afghanistan,AFG,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,,,344.6,338.7,333.1,327.6,...,83.9,80.3,76.8,73.6,70.4,67.6,64.9,62.5,60.3,
3,Africa Western and Central,AFW,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,,,,,308.353805,302.620962,...,118.442691,115.498791,112.667923,110.264426,107.765705,105.055552,102.43068,99.598781,96.81424,
4,Angola,AGO,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,,,,,,,...,112.3,105.0,98.6,93.0,88.2,84.2,80.6,77.7,74.7,


In [470]:
mortality.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Data columns (total 65 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    266 non-null    object 
 1   Country Code    266 non-null    object 
 2   Indicator Name  266 non-null    object 
 3   Indicator Code  266 non-null    object 
 4   1960            112 non-null    float64
 5   1961            112 non-null    float64
 6   1962            116 non-null    float64
 7   1963            119 non-null    float64
 8   1964            124 non-null    float64
 9   1965            127 non-null    float64
 10  1966            129 non-null    float64
 11  1967            133 non-null    float64
 12  1968            137 non-null    float64
 13  1969            142 non-null    float64
 14  1970            144 non-null    float64
 15  1971            149 non-null    float64
 16  1972            153 non-null    float64
 17  1973            153 non-null    flo

In [471]:
mortality.describe()

Unnamed: 0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
count,112.0,112.0,116.0,119.0,124.0,127.0,129.0,133.0,137.0,142.0,...,241.0,241.0,241.0,241.0,241.0,241.0,241.0,241.0,241.0,0.0
mean,151.166071,146.613393,149.111207,146.794958,146.716563,145.343472,143.094516,139.498974,134.915993,132.530716,...,38.006505,36.644066,35.375455,34.254044,33.178938,32.107112,31.126408,30.145293,29.214801,
std,95.556097,94.282688,98.883656,99.767226,99.983448,100.351548,98.92658,97.909379,96.274734,95.129489,...,36.033354,34.745564,33.555812,32.563909,31.620253,30.478658,29.595854,28.698271,27.841743,
min,19.6,19.2,18.6,17.9,17.1,16.3,15.5,14.9,14.4,13.9,...,2.5,2.4,2.2,2.1,2.0,1.9,1.8,1.7,1.7,
25%,66.975,64.45,68.55,64.3,63.7,62.5,62.0,60.5,56.7,55.775,...,10.0,9.8,9.3,8.9,8.3,8.0,7.7,7.4,7.0,
50%,149.4,146.5,142.95,137.1,133.15,127.9,125.0,119.1,114.8,108.55,...,22.4,21.8,21.1,20.6,19.7,19.1,18.6,18.1,17.3,
75%,220.925,215.9,214.525,210.9,218.35,215.7,212.6,210.6,200.7,201.15,...,59.155108,56.8,54.6,52.6,50.8,49.0,47.3,45.6,44.2,
max,391.7,386.3,409.8,420.6,413.5,406.5,399.9,393.4,387.3,381.4,...,153.2,146.6,142.1,137.7,138.3,128.4,124.4,120.3,117.2,


In [472]:
mortality.describe(include=object)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code
count,266,266,266,266
unique,266,266,1,1
top,Aruba,ABW,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT
freq,1,1,266,266


In [473]:
mortality.isnull().sum()*100/mortality.shape[0]

Country Name        0.000000
Country Code        0.000000
Indicator Name      0.000000
Indicator Code      0.000000
1960               57.894737
                     ...    
2016                9.398496
2017                9.398496
2018                9.398496
2019                9.398496
2020              100.000000
Length: 65, dtype: float64

In [474]:
mortality.duplicated().sum()

0

In [475]:
mortality.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017', '2018', '2019', '2020'],
      dtype='object')

#### Since 2020 has no observations and the years below 2000 will not provide any sufficient recent conclusions and recommendations other than provide a downward trend, these columns or variables will be dropped to accomodate other databases which has lesser years as well. 

In [476]:
mortality.drop(['1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2020'], axis=1, inplace=True)

mortality.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,2000,2001,2002,2003,2004,2005,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,Aruba,ABW,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,,,,,,,...,,,,,,,,,,
1,Africa Eastern and Southern,AFE,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,136.917815,131.670422,126.139085,120.635993,115.263632,110.060055,...,86.208587,82.105187,78.354228,74.991597,71.996841,69.288947,66.667614,64.347025,62.115387,60.098659
2,Afghanistan,AFG,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,128.7,124.6,120.4,116.3,112.1,107.9,...,87.6,83.9,80.3,76.8,73.6,70.4,67.6,64.9,62.5,60.3
3,Africa Western and Central,AFW,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,169.548432,164.533824,159.25195,153.916285,148.577352,143.417384,...,121.783272,118.442691,115.498791,112.667923,110.264426,107.765705,105.055552,102.43068,99.598781,96.81424
4,Angola,AGO,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,203.9,197.8,190.9,183.2,174.6,165.6,...,120.3,112.3,105.0,98.6,93.0,88.2,84.2,80.6,77.7,74.7


In [477]:
mortality.isnull().sum()*100/mortality.shape[0]

Country Name      0.000000
Country Code      0.000000
Indicator Name    0.000000
Indicator Code    0.000000
2000              9.398496
2001              9.398496
2002              9.398496
2003              9.398496
2004              9.398496
2005              9.398496
2006              9.398496
2007              9.398496
2008              9.398496
2009              9.398496
2010              9.398496
2011              9.398496
2012              9.398496
2013              9.398496
2014              9.398496
2015              9.398496
2016              9.398496
2017              9.398496
2018              9.398496
2019              9.398496
dtype: float64

#### Since there are 9.40% nulls, these can also be dropped in the table at it has neglible effect in the aggregated and summarized analysis. 

In [478]:
cleaned_mortality = mortality.dropna(axis=0)
cleaned_mortality.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,2000,2001,2002,2003,2004,2005,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
1,Africa Eastern and Southern,AFE,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,136.917815,131.670422,126.139085,120.635993,115.263632,110.060055,...,86.208587,82.105187,78.354228,74.991597,71.996841,69.288947,66.667614,64.347025,62.115387,60.098659
2,Afghanistan,AFG,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,128.7,124.6,120.4,116.3,112.1,107.9,...,87.6,83.9,80.3,76.8,73.6,70.4,67.6,64.9,62.5,60.3
3,Africa Western and Central,AFW,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,169.548432,164.533824,159.25195,153.916285,148.577352,143.417384,...,121.783272,118.442691,115.498791,112.667923,110.264426,107.765705,105.055552,102.43068,99.598781,96.81424
4,Angola,AGO,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,203.9,197.8,190.9,183.2,174.6,165.6,...,120.3,112.3,105.0,98.6,93.0,88.2,84.2,80.6,77.7,74.7
5,Albania,ALB,"Mortality rate, under-5 (per 1,000 live births)",SH.DYN.MORT,27.2,25.8,24.4,22.9,21.4,20.0,...,13.2,12.1,11.2,10.4,9.9,9.6,9.4,9.4,9.5,9.7


In [479]:
cleaned_mortality.isnull().sum()*100/cleaned_mortality.shape[0]

Country Name      0.0
Country Code      0.0
Indicator Name    0.0
Indicator Code    0.0
2000              0.0
2001              0.0
2002              0.0
2003              0.0
2004              0.0
2005              0.0
2006              0.0
2007              0.0
2008              0.0
2009              0.0
2010              0.0
2011              0.0
2012              0.0
2013              0.0
2014              0.0
2015              0.0
2016              0.0
2017              0.0
2018              0.0
2019              0.0
dtype: float64

In [480]:
cleaned_mortality['Country Name'].unique()

array(['Africa Eastern and Southern', 'Afghanistan',
       'Africa Western and Central', 'Angola', 'Albania', 'Andorra',
       'Arab World', 'United Arab Emirates', 'Argentina', 'Armenia',
       'Antigua and Barbuda', 'Australia', 'Austria', 'Azerbaijan',
       'Burundi', 'Belgium', 'Benin', 'Burkina Faso', 'Bangladesh',
       'Bulgaria', 'Bahrain', 'Bahamas, The', 'Bosnia and Herzegovina',
       'Belarus', 'Belize', 'Bolivia', 'Brazil', 'Barbados',
       'Brunei Darussalam', 'Bhutan', 'Botswana',
       'Central African Republic', 'Canada',
       'Central Europe and the Baltics', 'Switzerland', 'Chile', 'China',
       "Cote d'Ivoire", 'Cameroon', 'Congo, Dem. Rep.', 'Congo, Rep.',
       'Colombia', 'Comoros', 'Cabo Verde', 'Costa Rica',
       'Caribbean small states', 'Cuba', 'Cyprus', 'Czech Republic',
       'Germany', 'Djibouti', 'Dominica', 'Denmark', 'Dominican Republic',
       'Algeria', 'East Asia & Pacific (excluding high income)',
       'Early-demographic dividen

In [481]:
cleaned_mortality.to_csv('Databases/cleaned_mortality.csv')

<hr>

### Causes of death in children under 5

In [482]:
cause = pd.read_csv('Databases/causes-of-death-in-children-under-5.csv')
cause.head(2)

Unnamed: 0,Entity,Code,Year,Invasive Non-typhoidal Salmonella (iNTS),Interpersonal violence,Nutritional deficiencies,Acute hepatitis,Neoplasms,Measles,Digestive diseases,...,Other neonatal disorders,Whooping cough,Diarrheal diseases,"Fire, heat, and hot substances",Road injuries,Tuberculosis,HIV/AIDS,Drowning,Malaria,Syphilis
0,Afghanistan,AFG,1990,48.186866,105.0,1779.0,718.0,431.0,8649.0,477.0,...,7112.0,2455.0,3968.0,131.0,802.0,808.0,10.0,776.0,21.0,123.089256
1,Afghanistan,AFG,1991,54.688521,130.0,1822.0,741.0,439.0,8669.0,495.0,...,7574.0,2385.0,4650.0,129.0,781.0,800.0,12.0,748.0,41.0,132.205263


In [483]:
cause.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8400 entries, 0 to 8399
Data columns (total 32 columns):
 #   Column                                                      Non-Null Count  Dtype  
---  ------                                                      --------------  -----  
 0   Entity                                                      8400 non-null   object 
 1   Code                                                        6150 non-null   object 
 2   Year                                                        8400 non-null   int64  
 3   Invasive Non-typhoidal Salmonella (iNTS)                    8220 non-null   float64
 4   Interpersonal violence                                      8220 non-null   float64
 5   Nutritional deficiencies                                    8220 non-null   float64
 6   Acute hepatitis                                             8220 non-null   float64
 7   Neoplasms                                                   8220 non-null   float64
 8 

In [484]:
cause.describe()

Unnamed: 0,Year,Invasive Non-typhoidal Salmonella (iNTS),Interpersonal violence,Nutritional deficiencies,Acute hepatitis,Neoplasms,Measles,Digestive diseases,Cirrhosis and other chronic liver diseases,Chronic kidney disease,...,Other neonatal disorders,Whooping cough,Diarrheal diseases,"Fire, heat, and hot substances",Road injuries,Tuberculosis,HIV/AIDS,Drowning,Malaria,Syphilis
count,8400.0,8220.0,8220.0,8220.0,8220.0,8220.0,8220.0,8220.0,8220.0,8220.0,...,8220.0,8220.0,8220.0,8220.0,8220.0,8220.0,8220.0,8220.0,8220.0,8220.0
mean,2004.5,2615.949104,859.594161,14379.168127,1749.517883,3173.361679,19596.436861,2709.428954,579.309489,673.26472,...,25498.213017,8699.608273,56875.76,1185.287348,3699.817762,6598.908273,8091.167032,4954.366545,29737.624088,4890.959081
std,8.655957,9855.561015,2577.264853,51287.896913,6597.215893,9702.950682,74162.090312,8412.127212,1847.469859,2073.84993,...,83323.081241,28048.043215,191045.9,3694.451853,11737.10722,22413.420663,30454.082266,17598.90855,106981.240857,15561.078593
min,1990.0,5.4e-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.6e-05
25%,1997.0,0.050169,2.0,1.0,0.0,11.0,0.0,4.0,1.0,1.0,...,34.0,0.0,5.0,3.0,7.0,0.0,1.0,5.0,0.0,0.316421
50%,2004.5,2.513477,19.0,57.0,4.0,98.0,4.0,60.0,9.0,16.0,...,395.5,49.0,271.0,36.5,90.0,31.0,23.0,73.0,0.0,35.516973
75%,2012.0,78.027837,231.25,2098.75,86.0,776.0,1554.75,567.25,98.25,169.0,...,4382.25,1291.25,8405.25,307.0,817.0,829.25,665.25,718.75,1072.0,769.880142
max,2019.0,62334.44515,21223.0,524103.0,50184.0,85197.0,704288.0,77952.0,15916.0,18047.0,...,539952.0,240021.0,1649581.0,35583.0,115624.0,209562.0,223680.0,184096.0,631523.0,99247.9203


In [485]:
cause.describe(include=object)

Unnamed: 0,Entity,Code
count,8400,6150
unique,280,205
top,Afghanistan,AFG
freq,30,30


In [486]:
cause.isnull().sum()*100/cause.shape[0]

Entity                                                         0.000000
Code                                                          26.785714
Year                                                           0.000000
Invasive Non-typhoidal Salmonella (iNTS)                       2.142857
Interpersonal violence                                         2.142857
Nutritional deficiencies                                       2.142857
Acute hepatitis                                                2.142857
Neoplasms                                                      2.142857
Measles                                                        2.142857
Digestive diseases                                             2.142857
Cirrhosis and other chronic liver diseases                     2.142857
Chronic kidney disease                                         2.142857
Cardiovascular diseases                                        2.142857
Congenital birth defects                                       2

In [487]:
cause.duplicated().sum()

0

#### Since there are 2.14% nulls per cause of deaths, these can also be dropped in the table at it has neglible effect in the aggregated and summarized analysis. 

In [488]:
cause.dropna(axis=0, inplace=True)
cause.head()

Unnamed: 0,Entity,Code,Year,Invasive Non-typhoidal Salmonella (iNTS),Interpersonal violence,Nutritional deficiencies,Acute hepatitis,Neoplasms,Measles,Digestive diseases,...,Other neonatal disorders,Whooping cough,Diarrheal diseases,"Fire, heat, and hot substances",Road injuries,Tuberculosis,HIV/AIDS,Drowning,Malaria,Syphilis
0,Afghanistan,AFG,1990,48.186866,105.0,1779.0,718.0,431.0,8649.0,477.0,...,7112.0,2455.0,3968.0,131.0,802.0,808.0,10.0,776.0,21.0,123.089256
1,Afghanistan,AFG,1991,54.688521,130.0,1822.0,741.0,439.0,8669.0,495.0,...,7574.0,2385.0,4650.0,129.0,781.0,800.0,12.0,748.0,41.0,132.205263
2,Afghanistan,AFG,1992,67.520172,155.0,2069.0,836.0,486.0,8539.0,554.0,...,8614.0,2370.0,5833.0,137.0,821.0,863.0,13.0,777.0,51.0,180.232202
3,Afghanistan,AFG,1993,78.25036,178.0,2427.0,970.0,549.0,8949.0,630.0,...,9458.0,2659.0,7800.0,155.0,923.0,979.0,16.0,872.0,24.0,239.050138
4,Afghanistan,AFG,1994,82.658468,194.0,2649.0,1063.0,589.0,10642.0,681.0,...,9823.0,3187.0,7894.0,170.0,1015.0,1064.0,19.0,961.0,52.0,258.975281


#### Since 2020 is not available and the years below 2000 will not provide any sufficient recent conclusions and recommendations, these years will not be included or will be dropped to accomodate other databases which has lesser years as well. 

In [489]:
cause.drop(cause.index[cause['Year'] < 2000], inplace=True)
cleaned_cause = cause
cleaned_cause.head()

Unnamed: 0,Entity,Code,Year,Invasive Non-typhoidal Salmonella (iNTS),Interpersonal violence,Nutritional deficiencies,Acute hepatitis,Neoplasms,Measles,Digestive diseases,...,Other neonatal disorders,Whooping cough,Diarrheal diseases,"Fire, heat, and hot substances",Road injuries,Tuberculosis,HIV/AIDS,Drowning,Malaria,Syphilis
10,Afghanistan,AFG,2000,130.89578,180.0,2526.0,1104.0,595.0,13818.0,716.0,...,10269.0,4154.0,10960.0,180.0,1049.0,1048.0,30.0,1036.0,71.0,267.957457
11,Afghanistan,AFG,2001,133.656719,179.0,2426.0,1065.0,585.0,13944.0,715.0,...,10557.0,3962.0,11073.0,173.0,994.0,1000.0,30.0,988.0,54.0,258.439617
12,Afghanistan,AFG,2002,139.812677,191.0,2268.0,1011.0,571.0,13266.0,705.0,...,11001.0,3820.0,11045.0,161.0,915.0,919.0,31.0,906.0,740.0,267.00793
13,Afghanistan,AFG,2003,181.026615,219.0,2478.0,1128.0,680.0,4295.0,821.0,...,11526.0,4241.0,12009.0,189.0,1080.0,1006.0,32.0,1078.0,589.0,283.04379
14,Afghanistan,AFG,2004,203.233435,222.0,2467.0,1129.0,714.0,1735.0,854.0,...,11800.0,4095.0,11925.0,195.0,1107.0,992.0,33.0,1117.0,209.0,280.265432


In [490]:
cleaned_cause = cleaned_cause.rename(columns={"Entity": "Country Name", "Code": "Country Code"})
cleaned_cause.head()

Unnamed: 0,Country Name,Country Code,Year,Invasive Non-typhoidal Salmonella (iNTS),Interpersonal violence,Nutritional deficiencies,Acute hepatitis,Neoplasms,Measles,Digestive diseases,...,Other neonatal disorders,Whooping cough,Diarrheal diseases,"Fire, heat, and hot substances",Road injuries,Tuberculosis,HIV/AIDS,Drowning,Malaria,Syphilis
10,Afghanistan,AFG,2000,130.89578,180.0,2526.0,1104.0,595.0,13818.0,716.0,...,10269.0,4154.0,10960.0,180.0,1049.0,1048.0,30.0,1036.0,71.0,267.957457
11,Afghanistan,AFG,2001,133.656719,179.0,2426.0,1065.0,585.0,13944.0,715.0,...,10557.0,3962.0,11073.0,173.0,994.0,1000.0,30.0,988.0,54.0,258.439617
12,Afghanistan,AFG,2002,139.812677,191.0,2268.0,1011.0,571.0,13266.0,705.0,...,11001.0,3820.0,11045.0,161.0,915.0,919.0,31.0,906.0,740.0,267.00793
13,Afghanistan,AFG,2003,181.026615,219.0,2478.0,1128.0,680.0,4295.0,821.0,...,11526.0,4241.0,12009.0,189.0,1080.0,1006.0,32.0,1078.0,589.0,283.04379
14,Afghanistan,AFG,2004,203.233435,222.0,2467.0,1129.0,714.0,1735.0,854.0,...,11800.0,4095.0,11925.0,195.0,1107.0,992.0,33.0,1117.0,209.0,280.265432


In [491]:
cleaned_cause.isnull().sum()*100/cleaned_cause.shape[0]

Country Name                                                  0.0
Country Code                                                  0.0
Year                                                          0.0
Invasive Non-typhoidal Salmonella (iNTS)                      0.0
Interpersonal violence                                        0.0
Nutritional deficiencies                                      0.0
Acute hepatitis                                               0.0
Neoplasms                                                     0.0
Measles                                                       0.0
Digestive diseases                                            0.0
Cirrhosis and other chronic liver diseases                    0.0
Chronic kidney disease                                        0.0
Cardiovascular diseases                                       0.0
Congenital birth defects                                      0.0
Lower respiratory infections                                  0.0
Neonatal p

In [492]:
cleaned_cause['Country Name'].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'American Samoa', 'Andorra',
       'Angola', 'Antigua and Barbuda', 'Argentina', 'Armenia',
       'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin',
       'Bermuda', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina',
       'Botswana', 'Brazil', 'Brunei', 'Bulgaria', 'Burkina Faso',
       'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde',
       'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia',
       'Comoros', 'Congo', 'Cook Islands', 'Costa Rica', "Cote d'Ivoire",
       'Croatia', 'Cuba', 'Cyprus', 'Czechia',
       'Democratic Republic of Congo', 'Denmark', 'Djibouti', 'Dominica',
       'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador',
       'Equatorial Guinea', 'Eritrea', 'Estonia', 'Eswatini', 'Ethiopia',
       'Fiji', 'Finland', 'France', 'Gabon', 'Gambia', 'Georgia',
       'Germany', 'Ghana', 'Greece', 'Greenland', 'G

In [493]:
cleaned_cause.to_csv('Databases/cleaned_cause.csv')

<hr>

### Malnutrition across the globe (Data of countries from 1983-2019)

In [494]:
malnutrition = pd.read_csv('Databases/malnutrition_country_avg.csv')
malnutrition.head()

Unnamed: 0,Country,Income Classification,Severe Wasting,Wasting,Overweight,Stunting,Underweight,U5 Population ('000s)
0,AFGHANISTAN,0,3.033333,10.35,5.125,47.775,30.375,4918.5615
1,ALBANIA,2,4.075,7.76,20.8,24.16,7.7,232.8598
2,ALGERIA,2,2.733333,5.942857,12.833333,19.571429,7.342857,3565.213143
3,ANGOLA,1,2.4,6.933333,2.55,42.633333,23.6,3980.054
4,ARGENTINA,2,0.2,2.15,11.125,10.025,2.6,3613.65175


In [495]:
malnutrition.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 152 entries, 0 to 151
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country                152 non-null    object 
 1   Income Classification  152 non-null    int64  
 2   Severe Wasting         140 non-null    float64
 3   Wasting                150 non-null    float64
 4   Overweight             149 non-null    float64
 5   Stunting               151 non-null    float64
 6   Underweight            150 non-null    float64
 7   U5 Population ('000s)  152 non-null    float64
dtypes: float64(6), int64(1), object(1)
memory usage: 9.6+ KB


In [496]:
malnutrition.describe()

Unnamed: 0,Income Classification,Severe Wasting,Wasting,Overweight,Stunting,Underweight,U5 Population ('000s)
count,152.0,140.0,150.0,149.0,151.0,150.0,152.0
mean,1.427632,2.16865,6.599257,7.201638,25.814728,13.503047,4042.927052
std,0.967019,1.708939,4.481723,4.649144,14.686807,10.895839,13164.191928
min,0.0,0.0,0.0,0.9625,1.0,0.1,1.0
25%,1.0,0.9,3.2625,3.85,13.485,4.305,241.765813
50%,1.0,1.8725,5.710714,6.3,24.16,10.38,981.233486
75%,2.0,2.822727,8.740476,9.08,36.564935,19.496875,3002.43308
max,3.0,11.4,23.65,26.5,57.6,46.266667,123014.491


In [497]:
malnutrition.describe(include=object)

Unnamed: 0,Country
count,152
unique,152
top,AFGHANISTAN
freq,1


In [498]:
malnutrition.isnull().sum()*100/malnutrition.shape[0]

Country                  0.000000
Income Classification    0.000000
Severe Wasting           7.894737
Wasting                  1.315789
Overweight               1.973684
Stunting                 0.657895
Underweight              1.315789
U5 Population ('000s)    0.000000
dtype: float64

In [499]:
malnutrition.duplicated().sum()

0

In [500]:
malnutrition.dropna(axis=0, inplace=True)
cleaned_malnutrition = malnutrition
cleaned_malnutrition = cleaned_malnutrition.rename(columns={"Country": "Country Name"})
cleaned_malnutrition['Country Name'] = cleaned_malnutrition['Country Name'].str.title()
cleaned_malnutrition.head()

Unnamed: 0,Country Name,Income Classification,Severe Wasting,Wasting,Overweight,Stunting,Underweight,U5 Population ('000s)
0,Afghanistan,0,3.033333,10.35,5.125,47.775,30.375,4918.5615
1,Albania,2,4.075,7.76,20.8,24.16,7.7,232.8598
2,Algeria,2,2.733333,5.942857,12.833333,19.571429,7.342857,3565.213143
3,Angola,1,2.4,6.933333,2.55,42.633333,23.6,3980.054
4,Argentina,2,0.2,2.15,11.125,10.025,2.6,3613.65175


In [501]:
cleaned_malnutrition.isnull().sum()*100/cleaned_malnutrition.shape[0]

Country Name             0.0
Income Classification    0.0
Severe Wasting           0.0
Wasting                  0.0
Overweight               0.0
Stunting                 0.0
Underweight              0.0
U5 Population ('000s)    0.0
dtype: float64

In [502]:
cleaned_malnutrition['Country Name'].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
       'Armenia', 'Australia', 'Azerbaijan', 'Bangladesh', 'Barbados',
       'Belarus', 'Belize', 'Benin', 'Bhutan', 'Bolivia',
       'Bosnia And Herzegovina', 'Botswana', 'Brazil',
       'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi',
       'Cambodia', 'Cameroon', 'Central African Republic', 'Chad',
       'China', 'Colombia', 'Comoros', 'Congo', "Cote D'Ivoire",
       'Czechia', "Democratic People'S Rep. Of Korea",
       'Democratic Rep. Of The Congo', 'Djibouti', 'Dominican Republic',
       'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea',
       'Eswatini', 'Ethiopia', 'Fiji', 'Gabon', 'Gambia', 'Georgia',
       'Germany', 'Ghana', 'Guatemala', 'Guinea', 'Guinea-Bissau',
       'Guyana', 'Haiti', 'Honduras', 'India', 'Indonesia', 'Iran',
       'Iraq', 'Jamaica', 'Japan', 'Jordan', 'Kazakhstan', 'Kenya',
       'Kuwait', 'Kyrgyzstan', "Lao People'S Democratic Rep.", 'Lebanon',
  

In [503]:
countries = pd.read_csv('Databases/countrynamesncodes.csv')
countries.head()

Unnamed: 0,Country Name,Country Code
0,Afghanistan,AFG
1,Albania,ALB
2,Algeria,DZA
3,American Samoa,ASM
4,Andorra,AND


#### Since the table does not contain a country code column, the best solution was to merge with a country code dataset from IBAN using Alpha-3 coding. `See datasets section for the source reference.`

In [504]:
cleaned_malnutrition = pd.merge(cleaned_malnutrition, countries, how='left', on='Country Name')
column_to_reorder = cleaned_malnutrition.pop('Country Code')
cleaned_malnutrition.insert(1, 'Country Code', column_to_reorder)
cleaned_malnutrition.head()

Unnamed: 0,Country Name,Country Code,Income Classification,Severe Wasting,Wasting,Overweight,Stunting,Underweight,U5 Population ('000s)
0,Afghanistan,AFG,0,3.033333,10.35,5.125,47.775,30.375,4918.5615
1,Albania,ALB,2,4.075,7.76,20.8,24.16,7.7,232.8598
2,Algeria,DZA,2,2.733333,5.942857,12.833333,19.571429,7.342857,3565.213143
3,Angola,AGO,1,2.4,6.933333,2.55,42.633333,23.6,3980.054
4,Argentina,ARG,2,0.2,2.15,11.125,10.025,2.6,3613.65175


In [505]:
cleaned_malnutrition.isnull().sum()*100/cleaned_malnutrition.shape[0]

Country Name              0.000000
Country Code             17.142857
Income Classification     0.000000
Severe Wasting            0.000000
Wasting                   0.000000
Overweight                0.000000
Stunting                  0.000000
Underweight               0.000000
U5 Population ('000s)     0.000000
dtype: float64

#### Some countries will not be included in the possible merges because they are not given country codes, but these will not be dropped as the malnutrition table will mostly be utilized for determining the income classification of each country. 

In [506]:
cleaned_malnutrition.to_csv('Databases/cleaned_malnutrition.csv')

<hr>
<hr>

## Data Visualization