## Executive Summary

**Impetus**

Information on [Life Expectancy-Kaggle](https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who) from the World Health Organization and United Nations was gathered for the years 2000-2015 for 193 countries in three categories: 

1. *Communicable diseases*: Hepatitis B, Measles, Polio, Diphtheria, HIV/AIDS.


2. *Country-specific monetary data*: Gross domestic product, Governmental expenditure rates, Average personal income, Developmental status, Population, Education.


3. *Health-related information*: Infant and adult mortality rates, Number of deaths less than 5 yr, Alcohol consumption rates, Child-related malnutrition, Average body mass index per capita.


**Important Features**

21 Features were culled to six (6) features that were deemed most important then used in the final modeling of life expactancy:

1. Income
2. Education
3. HIV
4. DTP
5. Polio
6. AdultMort

**Results**

Four algorthms were tested on the data and the accuracy alone was used as a benchmark:

| Model | Average % Accuracy (cv=5) |
|:------|--------------------------:|
| Gradient Boosting Regressor | 94.7 |
| Decision Tree Regressor     | 88.5 |
| Linear Regression         | 81.0 |
| Support Vector Regressor  | 19.0 |

---

## Introduction

In 2006, The World Health Organization published a report entitled *Preventing disease through healthy environments*. 
>[It] “confirms that approximately one-quarter of the global disease burden, and more than one-third of the burden among children, is due to modifiable environmental factors.”
>
>[WHO-Preventing Disease Through Healthy Environments](https://www.who.int/publications/i/item/9241593822)

The dataset was obtained from Kaggle.com:
https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who

### Project Planning

- [Initial Data Analysis](2_Life_Expectancy_Initial_Data_Analysis.ipynb)
- [Exploratory Data Analysis](3_Life_Expectancy_EDA.ipynb)
- [Exploratory Data Analysis #2](4_Life_Expectancy_Exploratory_Data_Analysis_PART2.ipynb)
- [Pandas Profiling Library](../reports/Test_Pandas_Profiling_Report.ipynb) 
- [Feature Engineering](5_Life_Expectancy_Feature_Engineering.ipynb)
- [Recursive Feature Elimination](6_Life_Expectancy_Recursive_Feature_Elimination.ipynb)
- [Modeling](7_Life_Expectancy_Modeling.ipynb) 
- [Discussion of Linear Model](8-Life_Expectancy_Linear_Coefficients.ipynb)
- [Results](8-Life_Expectancy_Groupings.ipynb)

---


| **Response Variable** |      **Description** |
|:----------------------|:---------------------|
|    **LifeExpectancy** | Life Expectancy (Yr) |

**<p style="color:green;">Green indicates feature used in final model</p>**
**<p style="color:blue;">Blue indicates feature of questionable importance</p>**
**<p style="color:orange;">Orange indicates dropped due to high collinearity</p>**
**<p style="color:red;">Red indicates dropped due to missing values greater than 15%</p>**

| **List Of Features** | **Description** |
|:---------------------|:----------------|
| **<p style="color:green;">Income</p>** | Income composition of resources, Human Development Index |
| **<p style="color:green;">Education</p>** | Years of Education |
| **<p style="color:green;">HIV</p>** | HIV/AIDS: Deaths per 1,000 |
| **<p style="color:green;">DTP</p>** | Diphtheria, tetanus toxoid & pertussis % immunization coverage among 1-year-olds |
| **<p style="color:green;">Polio</p>** | Pol3: % immunization coverage among 1-year-olds |
| **<p style="color:green;">AdultMort</p>** | Adult Mortality Rates of both sexes (probability of dying between 15 and 60 years per 1000 population) |
| **<p style="color:blue;">Year</p>** |  Year   |
| **<p style="color:blue;">Country</p>** |   Country |
| **<p style="color:blue;">Status</p>** |  Developed(1) or Developing(0) |
| **<p style="color:blue;">BMI</p>** |   Average Body Mass Index of entire population |
| **<p style="color:blue;">lt5yD</p>** |  Number of under-five deaths per 1,000 population |
| **<p style="color:blue;">Thin1_19y</p>** |  % Prevalence of thinness among children and adolescents 10 < Age < 19 |
| **<p style="color:blue;">TotalExpen</p>** |    Total Expenditure |
| **<p style="color:blue;">PercExpen</p>** |   Percent Expenditure |
| **<p style="color:blue;">EtOH</p>** |  Alcohol consumption, litres of pure alcohol per capita  |
| **<p style="color:blue;">Measles</p>** |   Number of reported cases per 1,000 population |
| **<p style="color:orange;">Thin5_9y</p>** |  % Prevalence of thinness among children and adolescents 5 < Age < 9 |
| **<p style="color:orange;">InfD</p>** |   Number of Infant Deaths per 1,000 population |
| **<p style="color:red;">HepB</p>** | Hepatitis B: % immunization coverage among 1-year-olds |
| **<p style="color:red;">Population</p>** | Population of country |
| **<p style="color:red;">GDP</p>** | Gross Domestic Product per capita (in USD) |



In [2]:
%run 2_Life_Expectancy_Initial_Data_Analysis.ipynb

1_MAIN_Life_Expectancy_WHO_UN_Analysis_Modeling.ipynb
2_Life_Expectancy_Initial_Data_Analysis.ipynb
3_Life_Expectancy_EDA.ipynb
4_Life_Expectancy_Exploratory_Data_Analysis_PART2.ipynb
5_Life_Expectancy_Feature_Engineering.ipynb
6_Life_Expectancy_Recursive_Feature_Elimination.ipynb
7_Life_Expectancy_Modeling.ipynb
8-Life_Expectancy_Linear_Coefficients.ipynb
9-Life_Expectancy_Groupings.ipynb
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2938 entries, 0 to 2937
Data columns (total 22 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country         2938 non-null   object 
 1   Year            2938 non-null   int64  
 2   Status          2938 non-null   object 
 3   LifeExpectancy  2928 non-null   float64
 4   AdultMort       2928 non-null   float64
 5   InfD            2938 non-null   int64  
 6   EtOH            2744 non-null   float64
 7   PercExpen       2938 non-null   float64
 8   HepB            2385 non-null   float64
 9   