# Primera entrega del proyecto final

## Contexto
Aunque en el pasado se han realizado muchos estudios sobre los factores que afectan la esperanza de vida considerando variables demográficas, composición de ingresos y tasas de mortalidad. Se descubrió que en el pasado no se tenía en cuenta el efecto de la inmunización y el índice de desarrollo humano. Además, algunas de las investigaciones anteriores se realizaron considerando una regresión lineal múltiple basada en un conjunto de datos de un año para todos los países. Por lo tanto, esto motiva a resolver ambos factores establecidos anteriormente mediante la formulación de un modelo de regresión basado en un modelo de efectos mixtos y una regresión lineal múltiple considerando datos de un período de 2000 a 2015 para todos los países. También se considerarán vacunas importantes como la hepatitis B, la polio y la difteria. En pocas palabras, este estudio se centrará en los factores de inmunización, factores de mortalidad, factores económicos, factores sociales y también otros factores relacionados con la salud. Dado que las observaciones de este conjunto de datos se basan en diferentes países, será más fácil para un país determinar el factor de predicción que contribuye al menor valor de la esperanza de vida. Esto ayudará a sugerir a un país a qué área se le debe dar importancia para mejorar eficientemente la esperanza de vida de su población.

## Con estos datos a tener en cuenta podemos generar las siguientes preguntas
- ¿Existe una correlación entre la tasa de inmunización (Hepatitis B, Polio, Difteria) y la esperanza de vida en diferentes países durante el período de 2000 a 2015?
- ¿Cómo se relaciona el índice de desarrollo humano (IDH) con la esperanza de vida y la mortalidad adulta  a lo largo de los años estudiados?
- ¿Hay una conexión entre el gasto en salud (representado por el porcentaje del PIB dedicado a la salud y el gasto total) y la incidencia de enfermedades mortales como el VIH/SIDA en países de diferentes niveles económicos?
- ¿Cuál es la influencia de la situación económica (PIB per cápita) en la desnutrición (thinness) de diferentes grupos de edad (1-19 años y 5-9 años) en distintos países?
- ¿Cómo se correlacionan los niveles de educación (escolaridad) y la composición de ingresos con la esperanza de vida, considerando la distribución por países y el transcurso de los años?

In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [14]:
df = pd.read_csv('./LifeExpectancyData2015.csv')

In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2938 entries, 0 to 2937
Data columns (total 22 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   Country                          2938 non-null   object 
 1   Year                             2938 non-null   int64  
 2   Status                           2938 non-null   object 
 3   Life expectancy                  2928 non-null   float64
 4   Adult Mortality                  2928 non-null   float64
 5   infant deaths                    2938 non-null   int64  
 6   Alcohol                          2744 non-null   float64
 7   percentage expenditure           2938 non-null   float64
 8   Hepatitis B                      2385 non-null   float64
 9   Measles                          2938 non-null   int64  
 10   BMI                             2904 non-null   float64
 11  under-five deaths                2938 non-null   int64  
 12  Polio               

In [16]:
df.shape

(2938, 22)

## columnas con valores nulos

In [17]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                     10
Adult Mortality                     10
infant deaths                        0
Alcohol                            194
percentage expenditure               0
Hepatitis B                        553
Measles                              0
 BMI                                34
under-five deaths                    0
Polio                               19
Total expenditure                  226
Diphtheria                          19
 HIV/AIDS                            0
GDP                                448
Population                         652
 thinness  1-19 years               34
 thinness 5-9 years                 34
Income composition of resources    167
Schooling                          163
dtype: int64

In [18]:
# Replacing the Null Values with mean values of the data
from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values=np.nan,strategy='mean',fill_value=None)
df['Life expectancy ']=imputer.fit_transform(df[['Life expectancy ']])
df['Adult Mortality']=imputer.fit_transform(df[['Adult Mortality']])
df['Alcohol']=imputer.fit_transform(df[['Alcohol']])
df['Hepatitis B']=imputer.fit_transform(df[['Hepatitis B']])
df[' BMI ']=imputer.fit_transform(df[[' BMI ']])
df['Polio']=imputer.fit_transform(df[['Polio']])
df['Total expenditure']=imputer.fit_transform(df[['Total expenditure']])
df['Diphtheria ']=imputer.fit_transform(df[['Diphtheria ']])
df['GDP']=imputer.fit_transform(df[['GDP']])
df['Population']=imputer.fit_transform(df[['Population']])
df[' thinness  1-19 years']=imputer.fit_transform(df[[' thinness  1-19 years']])
df[' thinness 5-9 years']=imputer.fit_transform(df[[' thinness 5-9 years']])
df['Income composition of resources']=imputer.fit_transform(df[['Income composition of resources']])
df['Schooling']=imputer.fit_transform(df[['Schooling']])

In [19]:
df.isnull().sum()

Country                            0
Year                               0
Status                             0
Life expectancy                    0
Adult Mortality                    0
infant deaths                      0
Alcohol                            0
percentage expenditure             0
Hepatitis B                        0
Measles                            0
 BMI                               0
under-five deaths                  0
Polio                              0
Total expenditure                  0
Diphtheria                         0
 HIV/AIDS                          0
GDP                                0
Population                         0
 thinness  1-19 years              0
 thinness 5-9 years                0
Income composition of resources    0
Schooling                          0
dtype: int64

In [20]:
df.shape

(2938, 22)

In [21]:
df.describe()

Unnamed: 0,Year,Life expectancy,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,BMI,under-five deaths,Polio,Total expenditure,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling
count,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0,2938.0
mean,2007.51872,69.224932,164.796448,30.303948,4.602861,738.251295,80.940461,2419.59224,38.321247,42.035739,82.550188,5.93819,82.324084,1.742103,7483.158469,12753380.0,4.839704,4.870317,0.627551,11.992793
std,4.613841,9.50764,124.080302,117.926501,3.916288,1987.914858,22.586855,11467.272489,19.927677,160.445548,23.352143,2.400274,23.640073,5.077785,13136.800417,53815460.0,4.394535,4.482708,0.20482,3.264381
min,2000.0,36.3,1.0,0.0,0.01,0.0,1.0,0.0,1.0,0.0,3.0,0.37,2.0,0.1,1.68135,34.0,0.1,0.1,0.0,0.0
25%,2004.0,63.2,74.0,0.0,1.0925,4.685343,80.940461,0.0,19.4,0.0,78.0,4.37,78.0,0.1,580.486996,418917.2,1.6,1.6,0.50425,10.3
50%,2008.0,72.0,144.0,3.0,4.16,64.912906,87.0,17.0,43.0,4.0,93.0,5.93819,93.0,0.1,3116.561755,3675929.0,3.4,3.4,0.662,12.1
75%,2012.0,75.6,227.0,22.0,7.39,441.534144,96.0,360.25,56.1,28.0,97.0,7.33,97.0,0.8,7483.158469,12753380.0,7.1,7.2,0.772,14.1
max,2015.0,89.0,723.0,1800.0,17.87,19479.91161,99.0,212183.0,87.3,2500.0,99.0,17.6,99.0,50.6,119172.7418,1293859000.0,27.7,28.6,0.948,20.7


## Buscar algun tipo de correlacion

In [29]:
columns_to_drop = ['Country', 'Status']
df_filtered = df.drop(columns=columns_to_drop)
df_filtered.corr()

Unnamed: 0,Year,Life expectancy,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,BMI,under-five deaths,Polio,Total expenditure,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling
Year,1.0,0.169623,-0.078861,-0.037415,-0.048168,0.0314,0.089398,-0.082493,0.108327,-0.042937,0.09382,0.08186,0.133853,-0.139741,0.093351,0.014951,-0.047592,-0.050627,0.236333,0.203471
Life expectancy,0.169623,1.0,-0.696359,-0.196535,0.391598,0.381791,0.203771,-0.157574,0.559255,-0.222503,0.461574,0.207981,0.475418,-0.556457,0.430493,-0.019638,-0.472162,-0.466629,0.692483,0.715066
Adult Mortality,-0.078861,-0.696359,1.0,0.078747,-0.190408,-0.242814,-0.138591,0.031174,-0.381449,0.094135,-0.272694,-0.110875,-0.273014,0.523727,-0.277053,-0.012501,0.299863,0.305366,-0.440062,-0.435108
infant deaths,-0.037415,-0.196535,0.078747,1.0,-0.113812,-0.085612,-0.178783,0.501128,-0.22722,0.996629,-0.170674,-0.126564,-0.175156,0.025231,-0.107109,0.548522,0.46559,0.471228,-0.143663,-0.191757
Alcohol,-0.048168,0.391598,-0.190408,-0.113812,1.0,0.339634,0.075447,-0.051055,0.31807,-0.110777,0.213744,0.294898,0.215242,-0.04865,0.318591,-0.030765,-0.416946,-0.405881,0.416099,0.497546
percentage expenditure,0.0314,0.381791,-0.242814,-0.085612,0.339634,1.0,0.011679,-0.056596,0.228537,-0.087852,0.147203,0.173414,0.14357,-0.097857,0.88814,-0.024648,-0.25119,-0.252725,0.380374,0.388105
Hepatitis B,0.089398,0.203771,-0.138591,-0.178783,0.075447,0.011679,1.0,-0.090317,0.134929,-0.184413,0.408519,0.050084,0.499958,-0.102405,0.062318,-0.109811,-0.105144,-0.108334,0.150992,0.171755
Measles,-0.082493,-0.157574,0.031174,0.501128,-0.051055,-0.056596,-0.090317,1.0,-0.175925,0.507809,-0.136146,-0.104569,-0.141861,0.030899,-0.06806,0.23625,0.224742,0.221007,-0.115764,-0.122609
BMI,0.108327,0.559255,-0.381449,-0.22722,0.31807,0.228537,0.134929,-0.175925,1.0,-0.237586,0.282156,0.231814,0.281059,-0.243548,0.276645,-0.063238,-0.532025,-0.538911,0.479837,0.508105
under-five deaths,-0.042937,-0.222503,0.094135,0.996629,-0.110777,-0.087852,-0.184413,0.507809,-0.237586,1.0,-0.188703,-0.128269,-0.195651,0.038062,-0.11064,0.535864,0.467626,0.472099,-0.161533,-0.207111


A partir del anterior cuadro, se puede notar primariamente las siguientes cosas:
- Mientras mayor sea la variable "Schooling", mayor serea la variable "Life expectancy"
- La variable "Diphtheria", se relaciona con las variables "Life expectancy", "Hepatitis B", "Polio"