# Life Expectancy Analysis & Modeling Using WHO, UN data

## Feature Engineering

To:&nbsp;&nbsp;&nbsp;&nbsp; [Magnimind](https://magnimindacademy.com/)

From: Matt Curcio, matt.curcio.us@gmail.com

Date: 2023-02-28

Re:&nbsp;&nbsp;&nbsp; Categorize countries into regions

---

## Excutive Summary

This notebook simply bins 183 countries into 8 (subjective) regions.

This notebook, aligns with the Linux philosphy that programs should do one thing, lol

---

## Introduction


- This list of countries and their regions was found on the site:

    https://www.thoughtco.com/official-listing-of-countries-world-region-1435153
    
- Q. How many coountries in EACH region?


- This file utilizes:
  
  Input: `Clean_LE_Data_Post_EDA_3.csv` 

  Output: `Clean_LE_Regions.csv`


In [17]:
# Common Python Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [18]:
path = '../data/processed/'
filename = 'Clean_LE_Data_w_Means_2.csv'

df = pd.read_csv(path+filename, header=0)

## Display 183 Countries To Check For Spelling Differences

In [19]:
lst_countries = df.Country.unique()

print('\nNumber of countries evaluated in dataset:', len(lst_countries))

lst_countries


Number of countries evaluated in dataset: 183


array(['Afghanistan', 'Albania', 'Algeria', 'Angola',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia',
       'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh',
       'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bhutan',
       'Bolivia (Plurinational State of)', 'Bosnia and Herzegovina',
       'Botswana', 'Brazil', 'Brunei Darussalam', 'Bulgaria',
       'Burkina Faso', 'Burundi', "Côte d'Ivoire", 'Cabo Verde',
       'Cambodia', 'Cameroon', 'Canada', 'Central African Republic',
       'Chad', 'Chile', 'China', 'Colombia', 'Comoros', 'Congo',
       'Costa Rica', 'Croatia', 'Cuba', 'Cyprus', 'Czechia',
       "Democratic People's Republic of Korea",
       'Democratic Republic of the Congo', 'Denmark', 'Djibouti',
       'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador',
       'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Fiji',
       'Finland', 'France', 'Gabon', 'Gambia', 'Georgia', 'Germany',
       'Ghana', 'Greece', 'Grenad

### NOTE 1:

- This list of countries and their **EIGHT (8) regions** was found on the site:

- https://www.thoughtco.com/official-listing-of-countries-world-region-1435153

In [20]:
Asia = ['Bangladesh','Bhutan','Brunei','Cambodia','China','India','Indonesia','Japan','Kazakhstan',"Democratic People's Republic of Korea",'South Korea','Kyrgyzstan',"Lao People's Democratic Republic",'Malaysia','Maldives','Mongolia','Myanmar','Nepal','Philippines','Singapore','Sri Lanka','Taiwan','Tajikistan','Thailand','Turkmenistan','Uzbekistan','Viet Nam']

In [21]:
M_East_N_Africa = ['Afghanistan','Algeria','Azerbaijan','Bahrain','Egypt',"Iran (Islamic Republic of)",'Iraq','Israel','Jordan','Kuwait','Lebanon','Libya','Morocco','Oman','Pakistan','Qatar','Saudi Arabia','Somalia','Syrian Arab Republic','Tunisia','Turkey','United Arab Emirates','Yemen']

In [22]:
Europe = ['Albania','Andorra','Armenia','Austria','Belarus','Belgium','Bosnia and Herzegovina','Bulgaria','Croatia','Cyprus','Czech Republic','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Kosovo','Latvia','Liechtenstein','Lithuania','Luxembourg','Yugoslav republic of Macedonia','Malta','Republic of Moldova','Monaco','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','San Marino','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Ukraine','United Kingdom of Great Britain and Northern Ireland','Vatican City']

In [23]:
N_America = ['Canada','Greenland','Mexico','United States of America']

In [24]:
C_America_Caribbean = ['Antigua and Barbuda','Bahamas','Barbados','Belize','Costa Rica','Cuba','Dominica','Dominican Republic','El Salvador','Grenada','Guatemala','Haiti','Honduras','Jamaica','Nicaragua','Panama','Saint Kitts and Nevis','Saint Lucia','Saint Vincent and the Grenadines','Trinidad and Tobago']

In [25]:
S_America = ['Argentina','Bolivia','Brazil','Chile','Colombia','Ecuador','Guyana','Paraguay','Peru','Suriname','Uruguay',"Venezuela (Bolivarian Republic of)"]

In [26]:
Sub_Saharan_Africa = ['Angola','Benin','Botswana','Burkina Faso','Burundi','Cameroon','Cape Verde','The Central African Republic','Chad','Comoros','Republic of the Congo','Democratic Republic of the Congo','Côte d\'Ivoire','Djibouti','Equatorial Guinea','Eritrea','Ethiopia','Gabon','Gambia','Ghana','Guinea','Guinea-Bissau','Kenya','Lesotho','Liberia','Madagascar','Malawi','Mali','Mauritania','Mauritius','Mozambique','Namibia','Niger','Nigeria','Rwanda','Sao Tome and Principe','Senegal','Seychelles','Sierra Leone','South Africa','South Sudan','Sudan','Swaziland','Tanzania','Togo','Uganda','Zambia','Zimbabwe']

In [27]:
Oceania = ['Australia','Timor-Leste','Fiji','Kiribati','Marshall Islands','Micronesia (Federated States of)','Nauru','Niue','New Zealand','Palau','Papua New Guinea','Samoa','Solomon Islands','Tonga','Tuvalu','Vanuatu']

In [28]:
def country_2_region(country):
    """Assign a country name a region. There are EIGHT regions:
    {'Asia':1,
    'M_East_N_Africa':2,
    'S_America':3,
    'N_America':4,
    'Europe':5,
    'Oceania':6,
    'Sub_Saharan_Africa':7,
    'CAmerica_Caribbean':8}
    """
    
    region = ''
    if country in Asia:
        region='Asia'
    elif country in M_East_N_Africa:
        region='N_Africa'
    elif country in S_America:
        region='S_America'
    elif country in N_America:
        region='N_America'
    elif country in Europe:
        region='Europe'
    elif country in Oceania:
        region='Oceania'
    elif country in Sub_Saharan_Africa:
        region='Sub_Sahara'
    else: 
        region='CAmerica_Caribbean'
    
    return region



# Test with assertions

assert 'Asia'==country_2_region('Kazakhstan')

assert 'N_Africa'==country_2_region('Algeria')

assert 'S_America'==country_2_region('Bolivia')

assert 'N_America'==country_2_region('Canada')

assert 'Europe'==country_2_region('Andorra')

assert 'Oceania'==country_2_region('Samoa')

assert 'Sub_Sahara'==country_2_region('Botswana')

assert 'CAmerica_Caribbean'==country_2_region('Bahamas')

print('Good Job!')

Good Job!


In [29]:
df['Region'] = df['Country'].apply(country_2_region)

lst_regions = df.Region.unique()
lst_regions

array([2, 5, 7, 8, 3, 6, 1, 4])

## Save engineered data for analysis

In [30]:
path = '../data/processed/'
filename = 'Clean_LE_Regions.csv'

df.to_csv(path+filename, index=False)

df.tail(5)

Unnamed: 0,Country,Year,Status,LifeExpectancy,AdultMort,InfD,EtOH,PercExpen,Measles,BMI,lt5yD,Polio,TotalExpen,DTP,HIV,Thin1_19y,Thin5_9y,Income,Education,Region
2923,Zimbabwe,2004,0,44.3,723.0,27,4.36,0.0,31,27.1,42,67.0,7.13,65.0,33.6,9.4,9.4,0.407,9.2,7
2924,Zimbabwe,2003,0,44.5,715.0,26,4.06,0.0,998,26.7,41,7.0,6.52,68.0,36.7,9.8,9.9,0.418,9.5,7
2925,Zimbabwe,2002,0,44.8,73.0,25,4.43,0.0,304,26.3,40,73.0,6.53,71.0,39.8,1.2,1.3,0.427,10.0,7
2926,Zimbabwe,2001,0,45.3,686.0,25,1.72,0.0,529,25.9,39,76.0,6.16,75.0,42.1,1.6,1.7,0.427,9.8,7
2927,Zimbabwe,2000,0,46.0,665.0,24,1.68,0.0,1483,25.5,39,78.0,7.1,78.0,43.5,11.0,11.2,0.434,9.8,7
