## Implementing Neural Network

Kaggle WHO dataset: https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who/data

The data-set aims to answer the following key questions:

1. Does various predicting factors which has been chosen initially really affect the Life expectancy? 
2. What are the predicting variables actually affecting the life expectancy?
3. Should a country having a lower life expectancy value(<65) increase its healthcare expenditure in order to improve its average lifespan?
4. How does Infant and Adult mortality rates affect life expectancy?
5. Does Life Expectancy has positive or negative correlation with eating habits, lifestyle, exercise, smoking, drinking alcohol etc.
6. What is the impact of schooling on the lifespan of humans?
7. Does Life Expectancy have positive or negative relationship with drinking alcohol?
8. Do densely populated countries tend to have lower life expectancy?
9. What is the impact of Immunization coverage on life Expectancy?

Dataset details:
1. Country = Country
2. Year = Year of data
3. Status = developed/ developing country status
4. Life expectancy = life expectancy in age
5. Adult Mortality = Adult Mortality Rates of both sexes (probability of dying between 15 and 60 years per 1000 population)
6. infant deaths = Number of Infant Deaths per 1000 population
7. Alcohol = Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol)
8. percentage expenditure = Expenditure on health as a percentage of Gross Domestic Product per capita(%)
9. Hepatitis B = Hepatitis B (HepB) immunization coverage among 1-year-olds (%)
10. Measles = Measles - number of reported cases per 1000 population
11. BMI = Average Body Mass Index of entire population
12. under-five deaths = Number of under-five deaths per 1000 population
13. Polio = Polio (Pol3) immunization coverage among 1-year-olds (%)
14. Total expenditure = General government expenditure on health as a percentage of total government expenditure (%)
15. Diphtheria = Diphtheria tetanus toxoid and pertussis (DTP3) immunization coverage among 1-year-olds (%)
16. HIV/ AIDS = Deaths per 1 000 live births HIV/AIDS (0-4 years)
17. GDP = Gross Domestic Product per capita (in USD)
18. Population = Population of the country
19. thinnes 1-19 years = Prevalence of thinness among children and adolescents for Age 10 to 19 (% )
20. thinnes 5-9 years = Prevalence of thinness among children for Age 5 to 9(%)
21. income composition of resources = Human Development Index in terms of income composition of resources (index ranging from 0 to 1)
22. Schooling = Number of years of Schooling(years)

#### 1. Import data-set and relevant libraries

In [1]:
import pandas as pd
import numpy as np

In [4]:
# Load the data
data = pd.read_csv('Life Expectancy Data.csv')
print(data.head())

       Country  Year      Status  Life expectancy   Adult Mortality  \
0  Afghanistan  2015  Developing              65.0            263.0   
1  Afghanistan  2014  Developing              59.9            271.0   
2  Afghanistan  2013  Developing              59.9            268.0   
3  Afghanistan  2012  Developing              59.5            272.0   
4  Afghanistan  2011  Developing              59.2            275.0   

   infant deaths  Alcohol  percentage expenditure  Hepatitis B  Measles   ...  \
0             62     0.01               71.279624         65.0      1154  ...   
1             64     0.01               73.523582         62.0       492  ...   
2             66     0.01               73.219243         64.0       430  ...   
3             69     0.01               78.184215         67.0      2787  ...   
4             71     0.01                7.097109         68.0      3013  ...   

   Polio  Total expenditure  Diphtheria    HIV/AIDS         GDP  Population  \
0    6.

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2938 entries, 0 to 2937
Data columns (total 22 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   Country                          2938 non-null   object 
 1   Year                             2938 non-null   int64  
 2   Status                           2938 non-null   object 
 3   Life expectancy                  2928 non-null   float64
 4   Adult Mortality                  2928 non-null   float64
 5   infant deaths                    2938 non-null   int64  
 6   Alcohol                          2744 non-null   float64
 7   percentage expenditure           2938 non-null   float64
 8   Hepatitis B                      2385 non-null   float64
 9   Measles                          2938 non-null   int64  
 10   BMI                             2904 non-null   float64
 11  under-five deaths                2938 non-null   int64  
 12  Polio               

#### 2. Preprocessing the dataset

In [6]:
# Check for missing values
print(data.isnull().sum())

Country                              0
Year                                 0
Status                               0
Life expectancy                     10
Adult Mortality                     10
infant deaths                        0
Alcohol                            194
percentage expenditure               0
Hepatitis B                        553
Measles                              0
 BMI                                34
under-five deaths                    0
Polio                               19
Total expenditure                  226
Diphtheria                          19
 HIV/AIDS                            0
GDP                                448
Population                         652
 thinness  1-19 years               34
 thinness 5-9 years                 34
Income composition of resources    167
Schooling                          163
dtype: int64


In [7]:
# Rename columns
data.rename(columns = {'Country':'country', 'Year':'year', 'Status':'status', 'Life expectancy ':'life_expectancy', 'Adult Mortality':'adult_mortality', 'infant deaths':'infant_deaths', 'Alcohol':'alcohol', 'percentage expenditure':'percentage_expenditure', 'Hepatitis B':'hepatitis_b', 'Measles ':'measles', ' BMI ':'bmi', 'under-five deaths ':'under_five_deaths', 'Polio':'polio', 'Total expenditure':'total_expenditure', 'Diphtheria ':'diphtheria', ' HIV/AIDS':'hiv_aids', 'GDP':'gdp', 'Population':'population', ' thinness  1-19 years':'thinness_1_19_years', ' thinness 5-9 years':'thinness_5_9_years', 'Income composition of resources':'income_composition_of_resources', 'Schooling':'schooling'}, inplace = True)
print(data.columns)

Index(['country', 'year', 'status', 'life_expectancy', 'adult_mortality',
       'infant_deaths', 'alcohol', 'percentage_expenditure', 'hepatitis_b',
       'measles', 'bmi', 'under_five_deaths', 'polio', 'total_expenditure',
       'diphtheria', 'hiv_aids', 'gdp', 'population', 'thinness_1_19_years',
       'thinness_5_9_years', 'income_composition_of_resources', 'schooling'],
      dtype='object')


In [10]:
# Check for life expectancy missing values
print(data[data['life_expectancy'].isnull()])


                    country  year      status  life_expectancy  \
624            Cook Islands  2013  Developing              NaN   
769                Dominica  2013  Developing              NaN   
1650       Marshall Islands  2013  Developing              NaN   
1715                 Monaco  2013  Developing              NaN   
1812                  Nauru  2013  Developing              NaN   
1909                   Niue  2013  Developing              NaN   
1958                  Palau  2013  Developing              NaN   
2167  Saint Kitts and Nevis  2013  Developing              NaN   
2216             San Marino  2013  Developing              NaN   
2713                 Tuvalu  2013  Developing              NaN   

      adult_mortality  infant_deaths  alcohol  percentage_expenditure  \
624               NaN              0     0.01                0.000000   
769               NaN              0     0.01               11.419555   
1650              NaN              0     0.01         

In [18]:
# Check missing adult mortality values
print(data[data['adult_mortality'].isnull()])

                    country  year      status  life_expectancy  \
624            Cook Islands  2013  Developing              NaN   
769                Dominica  2013  Developing              NaN   
1650       Marshall Islands  2013  Developing              NaN   
1715                 Monaco  2013  Developing              NaN   
1812                  Nauru  2013  Developing              NaN   
1909                   Niue  2013  Developing              NaN   
1958                  Palau  2013  Developing              NaN   
2167  Saint Kitts and Nevis  2013  Developing              NaN   
2216             San Marino  2013  Developing              NaN   
2713                 Tuvalu  2013  Developing              NaN   

      adult_mortality  infant_deaths  alcohol  percentage_expenditure  \
624               NaN              0     0.01                0.000000   
769               NaN              0     0.01               11.419555   
1650              NaN              0     0.01         

Based on the missing value data, we can see that the missing values for life expectancy and adult mortality are for 2013 data. For this missing 2013, we can use the median for 2012 and 2014 data.