# Análisis de desórdenes mentales en el mundo.

## Introducción

## Justificación
El presente proyecto busca generar un análisis de la evolución de diversos desórdenes mentales en el mundo a lo largos de las últimas décadas. Para este proyecto se hace especial énfasis en la depresión, ya que se cuenta con una base de información extremadamente bien documentada sobre este fenómeno en particular y, por otro lado, este desorden en particular se encuentra como la mayor causa de discapacidad en el mundo, siendo entonces su estudio y comprensión de suma importancia para un mayor entendimiento de sus causas, consecuencias, así como de sus posibles soluciones.

La OMS ha identificado fuertes vínculos entre la depresión y otros trastornos y enfermedades no transmisibles. La depresión aumenta el riesgo de trastornos por uso de sustancias y enfermedades como la diabetes y las enfermedades del corazón; lo contrario también es cierto, lo que significa que las personas con estas otras condiciones tienen un mayor riesgo de depresión.

La depresión también es un factor de riesgo importante para el suicidio, que reclama cientos de miles de vidas cada año.

Es por esta razón que un análisis que nos permita estudiarla resulta sumamente relevante.

## Preguntas clave
- ¿A qué edad empiezan a manifestarse ciertos desórdenes?
- ¿En qué zonas del mundo hay mayor prevalencia de desórdenes mentales?
- ¿Qué tan prevalentes son estos desórdenes en países desarrollados? ¿Y en países en desarrollo?
- ¿Qué tendencia tienen en su prevalencia a lo largo de los años?
- ¿Está relacionada la depresión con el suicidio?

### Instalación de las librerías

In [1]:
#!pip install openpyxl
#!pip install pandas
#!pip install numpy
#!pip install kaggle
#!pip install zipfile

### Importación de Librerías
- Pandas
- zipfile

In [2]:
import pandas as pd
from zipfile import ZipFile 
import os

### Lectura de datos

In [3]:
global_disorders = pd.read_excel(open('./data/depression.xlsx', 'rb'), sheet_name='prevalence-by-mental-and-substa')
education_level = pd.read_excel(open('./data/depression.xlsx', 'rb'), sheet_name='depression-by-level-of-educatio')
age_groups = pd.read_excel(open('./data/depression.xlsx', 'rb'), sheet_name='prevalence-of-depression-by-age')
gender_groups = pd.read_excel(open('./data/depression.xlsx', 'rb'), sheet_name='prevalence-of-depression-males-')
depression_and_suicide_rate = pd.read_excel(open('./data/depression.xlsx', 'rb'), sheet_name='suicide-rates-vs-prevalence-of-')
affected = pd.read_excel(open('./data/depression.xlsx', 'rb'), sheet_name='number-with-depression-by-count')

### Visualización las dimensiones de las tablas

La uniformidad de los datos es importante, ya que nos permite concatenar fácilmente datos que están relacionados entre sí.

Esto se puede realizar mediante la función <code>shape</code>.

In [4]:
print('global disorders: ', global_disorders.shape)
print('education levels', education_level.shape)
print('age groups: ', age_groups.shape)
print('gender groups: ', gender_groups.shape)
print('depression and suicide rate: ', depression_and_suicide_rate.shape)
print('affected: ', affected.shape)

global disorders:  (6468, 10)
education levels (26, 15)
age groups:  (6468, 13)
gender groups:  (47807, 6)
depression and suicide rate:  (47807, 6)
affected:  (6468, 4)


### Nombre y tipos de datos

La consistencia entre los metadatos de un conjunto de datos permite la correcta integración entre sí.

Esto se puede realizar mediante la función <code>dtypes</code>.

Tabla de índices de desórdenes globales

In [5]:
global_disorders.dtypes

Entity                        object
Code                          object
Year                           int64
Schizophrenia (%)            float64
Bipolar disorder (%)         float64
Eating disorders (%)         float64
Anxiety disorders (%)        float64
Drug use disorders (%)       float64
Depression (%)               float64
Alcohol use disorders (%)    float64
dtype: object

Tabla de índices de depresión por nivel de educación

In [6]:
education_level.dtypes

Entity                                                           object
Code                                                             object
Year                                                              int64
All levels (active) (%)                                         float64
All levels (employed) (%)                                       float64
All levels (total) (%)                                          float64
Below upper secondary (active) (%)                              float64
Below upper secondary (employed) (%)                            float64
Below upper secondary (total) (%)                               float64
Tertiary (active) (%)                                           float64
Tertiary (employed) (%)                                         float64
Tertiary (total) (%)                                            float64
Upper secondary & post-secondary non-tertiary (active) (%)      float64
Upper secondary & post-secondary non-tertiary (employed) (%)    

Tabla de índices de depresión por edad

In [7]:
age_groups.dtypes

Entity                   object
Code                     object
Year                      int64
20-24 years old (%)     float64
10-14 years old (%)     float64
All ages (%)            float64
70+ years old (%)       float64
30-34 years old (%)     float64
15-19 years old (%)     float64
25-29 years old (%)     float64
50-69 years old (%)     float64
Age-standardized (%)    float64
15-49 years old (%)     float64
dtype: object

Tabla de índices de depresión por género

In [8]:
gender_groups.dtypes

Entity                        object
Code                          object
Year                          object
Prevalence in males (%)      float64
Prevalence in females (%)    float64
Population                   float64
dtype: object

Tabla de índices de depresión y suicidio por cada 100,000 individuos

In [9]:
depression_and_suicide_rate.dtypes

Entity                                                       object
Code                                                         object
Year                                                         object
Suicide rate (deaths per 100,000 individuals)               float64
Depressive disorder rates (number suffering per 100,000)    float64
Population                                                  float64
dtype: object

Tabla de población afectada por la depresión

In [10]:
affected.dtypes

Entity                                                                                                        object
Code                                                                                                          object
Year                                                                                                           int64
Prevalence - Depressive disorders - Sex: Both - Age: All Ages (Number) (people suffering from depression)    float64
dtype: object

### Visualización de un conjunto de datos pequeño

Explorar el conjunto de datos nos permite visualizar los posibles valores que adquiere cada columna y de esta forma, realizar un correcto procesamiento de datos.

Esto se puede realizar mediante la función <code>head</code>.

Tabla de índices de desórdenes globales

In [11]:
global_disorders.head()

Unnamed: 0,Entity,Code,Year,Schizophrenia (%),Bipolar disorder (%),Eating disorders (%),Anxiety disorders (%),Drug use disorders (%),Depression (%),Alcohol use disorders (%)
0,Afghanistan,AFG,1990,0.16056,0.697779,0.101855,4.82883,1.677082,4.071831,0.672404
1,Afghanistan,AFG,1991,0.160312,0.697961,0.099313,4.82974,1.684746,4.079531,0.671768
2,Afghanistan,AFG,1992,0.160135,0.698107,0.096692,4.831108,1.694334,4.088358,0.670644
3,Afghanistan,AFG,1993,0.160037,0.698257,0.094336,4.830864,1.70532,4.09619,0.669738
4,Afghanistan,AFG,1994,0.160022,0.698469,0.092439,4.829423,1.716069,4.099582,0.66926


Tabla de índices de depresión por nivel de educación

In [12]:
education_level.head()

Unnamed: 0,Entity,Code,Year,All levels (active) (%),All levels (employed) (%),All levels (total) (%),Below upper secondary (active) (%),Below upper secondary (employed) (%),Below upper secondary (total) (%),Tertiary (active) (%),Tertiary (employed) (%),Tertiary (total) (%),Upper secondary & post-secondary non-tertiary (active) (%),Upper secondary & post-secondary non-tertiary (employed) (%),Upper secondary & post-secondary non-tertiary (total) (%)
0,Austria,AUT,2014,6.5,4.7,7.7,15.5,9.0,15.2,4.3,3.5,5.5,5.5,4.2,6.7
1,Belgium,BEL,2014,5.0,4.1,7.1,7.1,4.8,11.6,3.7,3.3,4.2,5.7,5.0,7.5
2,Czech Republic,CZE,2014,3.0,2.6,4.0,2.1,2.5,6.0,1.7,1.7,2.0,3.5,3.0,4.4
3,Denmark,DNK,2014,6.7,5.7,8.3,10.4,6.5,15.5,5.7,4.7,6.7,7.4,6.9,8.8
4,Estonia,EST,2014,3.8,3.8,5.1,4.7,4.7,6.4,3.6,3.6,4.3,3.7,3.8,5.2


Tabla de índices de depresión por edad

In [13]:
age_groups.head()

Unnamed: 0,Entity,Code,Year,20-24 years old (%),10-14 years old (%),All ages (%),70+ years old (%),30-34 years old (%),15-19 years old (%),25-29 years old (%),50-69 years old (%),Age-standardized (%),15-49 years old (%)
0,Afghanistan,AFG,1990,4.417802,1.594676,3.218871,5.202803,5.799034,3.455708,5.175856,5.917752,4.071831,4.939766
1,Afghanistan,AFG,1991,4.433524,1.588356,3.203468,5.192849,5.814828,3.45188,5.176729,5.927093,4.079531,4.902682
2,Afghanistan,AFG,1992,4.453689,1.57798,3.156559,5.176872,5.829745,3.434982,5.160249,5.945656,4.088358,4.837097
3,Afghanistan,AFG,1993,4.464517,1.577201,3.120655,5.167355,5.85306,3.42021,5.148767,5.966915,4.09619,4.813657
4,Afghanistan,AFG,1994,4.46296,1.570846,3.082179,5.157549,5.852851,3.425222,5.148227,5.975907,4.099582,4.83934


Tabla de índices de depresión por género

In [14]:
gender_groups.head()

Unnamed: 0,Entity,Code,Year,Prevalence in males (%),Prevalence in females (%),Population
0,Afghanistan,AFG,1800,,,3280000.0
1,Afghanistan,AFG,1801,,,3280000.0
2,Afghanistan,AFG,1802,,,3280000.0
3,Afghanistan,AFG,1803,,,3280000.0
4,Afghanistan,AFG,1804,,,3280000.0


Tabla de índices de depresión y suicidio por cada 100,000 individuos

In [15]:
depression_and_suicide_rate.head()

Unnamed: 0,Entity,Code,Year,"Suicide rate (deaths per 100,000 individuals)","Depressive disorder rates (number suffering per 100,000)",Population
0,Afghanistan,AFG,1800,,,3280000.0
1,Afghanistan,AFG,1801,,,3280000.0
2,Afghanistan,AFG,1802,,,3280000.0
3,Afghanistan,AFG,1803,,,3280000.0
4,Afghanistan,AFG,1804,,,3280000.0


Tabla de población afectada por la depresión

In [16]:
affected.head()

Unnamed: 0,Entity,Code,Year,Prevalence - Depressive disorders - Sex: Both - Age: All Ages (Number) (people suffering from depression)
0,Afghanistan,AFG,1990,318435.81367
1,Afghanistan,AFG,1991,329044.773956
2,Afghanistan,AFG,1992,382544.572895
3,Afghanistan,AFG,1993,440381.507393
4,Afghanistan,AFG,1994,456916.645489


La tabla <code>education_level</code> no nos brinda la información necesaria para poder relacionarla con las demás tablas, por lo que decidimos descartarla debido a multiples factores como:

- Únicamente comprende el año 2014.
- Contiene muy pocos registros, por lo que altera la uniformidad de los datos.
- Comprende paises de una sola region.

### Renombramiento
Como pudimos observar, los nombres de las columnas no siguen la convención <code>Snake Case</code>, por lo que hay que renombrar cada columna para cada una de las tablas mediante la función <code>rename</code>.

In [17]:
percentages_of_global_disorders = global_disorders.copy().rename(
    columns={
        'Entity': 'entity',
        'Code': 'code',
        'Year': 'year',
        'Schizophrenia (%)': 'schizophrenia',
        'Bipolar disorder (%)': 'bipolar_disorder',
        'Eating disorders (%)': 'eating_disorder',
        'Anxiety disorders (%)': 'anxiety',
        'Drug use disorders (%)': 'drug_addiction',
        'Depression (%)': 'depression',
        'Alcohol use disorders (%)': 'alcoholism'
    }
)

depression_rates_by_age = age_groups.copy().rename(
    columns={
        'Entity': 'entity',
        'Code': 'code',
        'Year': 'year',
        'All ages (%)': 'all',
        '10-14 years old (%)': 'from_10_to_14',
        '15-19 years old (%)': 'from_15_to_19',
        '20-24 years old (%)': 'from_20_to_24',
        '25-29 years old (%)': 'from_25_to_29',
        '30-34 years old (%)': 'from_30_to_34',
        '15-49 years old (%)': 'from_15_to_49',
        '50-69 years old (%)': 'from_50_to_69',
        '70+ years old (%)': 'above_69',
        'Age-standardized (%)': 'standardized'
    }
)

depression_rates_by_gender = gender_groups.copy().rename(
    columns = {
        'Entity': 'entity',
        'Code': 'code',
        'Year': 'year',
        'Prevalence in males (%)': 'prevalence_in_males',
        'Prevalence in females (%)': 'prevalence_in_females',
        'Population': 'population'
    }
)

depression_and_suicide_rate_per_100000_individuals = depression_and_suicide_rate.copy().rename(
    columns={
        'Entity': 'entity',
        'Code': 'code',
        'Year': 'year',
        'Suicide rate (deaths per 100,000 individuals)': 'suicide_rate',
        'Depressive disorder rates (number suffering per 100,000)': 'depression_rate',
        'Population': 'population'
    }
)

depression_affected = affected.copy().rename(
    columns={
        'Entity': 'entity',
        'Code': 'code',
        'Year': 'year',
        'Prevalence - Depressive disorders - Sex: Both - Age: All Ages (Number) (people suffering from depression)': 'depression_prevalence'
    }
)

### Casting
La columna <code>year</code> de las tablas <code>gender_groups</code> y <code>depression_and_suicide_rate</code> es de tipo <code>object</code> mientras que en las demás tablas es de tipo <code>int64</code>.

1. Transformar los valores a números mediante la función <code>to_numeric</code>.
2. Filtrar los registros que no tienen valores <code>NaN</code> mediante la función <code>isna</code>.
3. Transformar los valores de <code>float64</code> a <code>int64</code> mediante la función <code>astype</code>.

In [18]:
depression_rates_by_gender['year'] = pd.to_numeric(depression_rates_by_gender['year'], errors='coerce')
depression_rates_by_gender = depression_rates_by_gender[depression_rates_by_gender['year'].isna() == False]
depression_rates_by_gender['year'] = depression_rates_by_gender['year'].astype('int64')

depression_and_suicide_rate_per_100000_individuals['year'] = pd.to_numeric(depression_and_suicide_rate_per_100000_individuals['year'], errors='coerce')
depression_and_suicide_rate_per_100000_individuals = depression_and_suicide_rate_per_100000_individuals[depression_and_suicide_rate_per_100000_individuals['year'].isna() == False]
depression_and_suicide_rate_per_100000_individuals['year'] = depression_and_suicide_rate_per_100000_individuals['year'].astype('int64')

### Limpieza de datos

Para comenzar, necesitamos identificar las columnas que poseen valores <code>NaN</code>, ya que estos no permiten un correcto procesamiento de los datos.

Esto se puede realizar mediante las funciones <code>isna</code> y <code>mean</code>.

Tabla de índices de desórdenes globales

In [19]:
percentages_of_global_disorders.isna().mean()

entity              0.000000
code                0.151515
year                0.000000
schizophrenia       0.000000
bipolar_disorder    0.000000
eating_disorder     0.000000
anxiety             0.000000
drug_addiction      0.000000
depression          0.000000
alcoholism          0.000000
dtype: float64

Tabla de índices de depresión por edad

In [20]:
depression_rates_by_age.isna().mean()

entity           0.000000
code             0.151515
year             0.000000
from_20_to_24    0.000000
from_10_to_14    0.000000
all              0.000000
above_69         0.000000
from_30_to_34    0.000000
from_15_to_19    0.000000
from_25_to_29    0.000000
from_50_to_69    0.000000
standardized     0.000000
from_15_to_49    0.000000
dtype: float64

Tabla de índices de depresión por género

In [21]:
depression_rates_by_gender.isna().mean()

entity                   0.000000
code                     0.034900
year                     0.000000
prevalence_in_males      0.864508
prevalence_in_females    0.864508
population               0.019356
dtype: float64

Tabla de índices de depresión y suicidio por cada 100,000 individuos

In [22]:
depression_and_suicide_rate_per_100000_individuals.isna().mean()

entity             0.000000
code               0.034900
year               0.000000
suicide_rate       0.864508
depression_rate    0.864508
population         0.019356
dtype: float64

Tabla de población afectada por la depresión

In [23]:
depression_affected.isna().mean()


entity                   0.000000
code                     0.151515
year                     0.000000
depression_prevalence    0.000000
dtype: float64

Las tablas de nuestro conjunto de datos contienen regiones del mundo, las cuales contienen información parcial y no son de gran utilidad para este análisis.

Todos estos registros pueden filtrarse mediante la función <code>isna</code> aplicada a la columna <code>code</code>, ya que esta columna no tiene valor alguno para estos registros.

In [24]:
percentages_of_global_disorders = percentages_of_global_disorders.drop(percentages_of_global_disorders[percentages_of_global_disorders['code'].isna()].index)

depression_rates_by_age = depression_rates_by_age.drop(depression_rates_by_age[depression_rates_by_age['code'].isna()].index)

depression_rates_by_gender = depression_rates_by_gender.drop(depression_rates_by_gender[depression_rates_by_gender['code'].isna()].index)

depression_and_suicide_rate_per_100000_individuals = depression_and_suicide_rate_per_100000_individuals.drop(depression_and_suicide_rate_per_100000_individuals[depression_and_suicide_rate_per_100000_individuals['code'].isna()].index)

depression_affected = depression_affected.drop(depression_affected[depression_affected['code'].isna()].index)

Existen registros donde la columna <code>code</code> no tiene valor alguno, por lo que se puede eliminar sin dañar la integridad de los datos.

Esto se hace mediante la función <code>drop</code>.

In [25]:
percentages_of_global_disorders = percentages_of_global_disorders.drop(columns=['code'])
depression_rates_by_age = depression_rates_by_age.drop(columns=['code'])
depression_rates_by_gender = depression_rates_by_gender.drop(columns=['code'])
depression_and_suicide_rate_per_100000_individuals = depression_and_suicide_rate_per_100000_individuals.drop(columns=['code'])
depression_affected = depression_affected.drop(columns=['code'])

Una vez que la columna <code>year</code> de todas las tablas es de tipo <code>int64</code>, hay que filtrar los registros comprendidos entre los años <code>1990</code> y <code>2017</code> de las tablas <code>depression_rates_by_gender</code> y <code>depression_and_suicide_rate_per_100000_individuals</code>.

In [26]:
depression_rates_by_gender = depression_rates_by_gender[depression_rates_by_gender['year'] >= 1990]
depression_rates_by_gender = depression_rates_by_gender[depression_rates_by_gender['year'] <= 2017]

depression_and_suicide_rate_per_100000_individuals = depression_and_suicide_rate_per_100000_individuals[depression_and_suicide_rate_per_100000_individuals['year'] >= 1990]
depression_and_suicide_rate_per_100000_individuals = depression_and_suicide_rate_per_100000_individuals[depression_and_suicide_rate_per_100000_individuals['year'] <= 2017]

In [27]:
print('global disorders: ', percentages_of_global_disorders.shape)
print('age groups: ', depression_rates_by_age.shape)
print('gender groups: ', depression_rates_by_gender.shape)
print('depression and suicide rate: ', depression_and_suicide_rate_per_100000_individuals.shape)
print('affected: ', depression_affected.shape)

global disorders:  (5488, 9)
age groups:  (5488, 12)
gender groups:  (6580, 5)
depression and suicide rate:  (6580, 5)
affected:  (5488, 3)


Como podemos observar, las tablas <code>depression_rates_by_gender</code> y <code>depression_and_suicide_rate_per_100000_individuals</code> aún cuentan con una mayor cantidad de registros que el resto de las tablas.

Realizamos una comparación entre los valores únicos de las columnas <code>entity</code> de ambas tablas y una tabla que contiene los valores unicos compartidos de las columnas <code>entity</code> de las demas tablas, con el fin de conocer las entidades adicionales tienen ambas tablas.

In [28]:
global_disorders_unique_list = pd.unique(percentages_of_global_disorders['entity'])
gender_groups_unique_list = pd.Series(pd.unique(depression_rates_by_gender['entity']))

countries_not_in_all_dfs = gender_groups_unique_list[gender_groups_unique_list.apply(lambda x:x not in global_disorders_unique_list)]
countries_not_in_all_dfs.head()

6                            Anguilla
10                              Aruba
25    Bonaire Sint Eustatius and Saba
29             British Virgin Islands
38                     Cayman Islands
dtype: object

Podemos observar que hay registros de países pequeños, territorios y demás dependencias.

La importancia de estos registros es menor al considerar el valor de uniformidad entre cada una de las tablas obtenidas, por lo que podemos eliminarlos accediendo a la columna <code>entity</code> de las tablas.

In [29]:
depression_rates_by_gender = depression_rates_by_gender[depression_rates_by_gender['entity'].apply(lambda x:x not in countries_not_in_all_dfs.tolist())]
depression_and_suicide_rate_per_100000_individuals = depression_and_suicide_rate_per_100000_individuals[depression_and_suicide_rate_per_100000_individuals['entity'].apply(lambda x:x not in countries_not_in_all_dfs.tolist())]

print('gender groups: ', depression_rates_by_gender.shape)
print('depression and suicide rate: ', depression_and_suicide_rate_per_100000_individuals.shape)

gender groups:  (5488, 5)
depression and suicide rate:  (5488, 5)


Una vez sea el número de filas uniforme, procedemos a comprobar que ya no existan valores <code>NaN</code> en ninguna de las tablas.

Tabla de índices de desórdenes globales

In [30]:
percentages_of_global_disorders.isna().mean()

entity              0.0
year                0.0
schizophrenia       0.0
bipolar_disorder    0.0
eating_disorder     0.0
anxiety             0.0
drug_addiction      0.0
depression          0.0
alcoholism          0.0
dtype: float64

Tabla de índices de desórdenes globales

In [31]:
depression_rates_by_age.isna().mean()

entity           0.0
year             0.0
from_20_to_24    0.0
from_10_to_14    0.0
all              0.0
above_69         0.0
from_30_to_34    0.0
from_15_to_19    0.0
from_25_to_29    0.0
from_50_to_69    0.0
standardized     0.0
from_15_to_49    0.0
dtype: float64

Tabla de índices de depresión por género

In [32]:
depression_rates_by_gender.isna().mean()

entity                   0.0
year                     0.0
prevalence_in_males      0.0
prevalence_in_females    0.0
population               0.0
dtype: float64

Tabla de índices de depresión y suicidio por cada 100,000 individuos

In [33]:
depression_and_suicide_rate_per_100000_individuals.isna().mean()

entity             0.0
year               0.0
suicide_rate       0.0
depression_rate    0.0
population         0.0
dtype: float64

Tabla de población afectada por la depresión

In [34]:
depression_affected.isna().mean()

entity                   0.0
year                     0.0
depression_prevalence    0.0
dtype: float64

### Merge

Con el dataset limpio y uniforme, procedemos a juntar cada una de las hojas del excel en un sólo archivo csv, el cuál contenga toda la información relevante obtenida del archivo original de forma ordenada.

In [35]:
first_merge = percentages_of_global_disorders.merge(depression_rates_by_age, left_on=['entity','year'],right_on=['entity','year'])
second_merge = first_merge.merge(depression_rates_by_gender, left_on=['entity','year'], right_on=['entity','year'])
third_merge = second_merge.merge(depression_and_suicide_rate_per_100000_individuals, left_on=['entity','year'],right_on=['entity','year'])
df_merge = third_merge.merge(depression_affected, left_on=['entity','year'],right_on=['entity','year'])
df_merge.columns
df_merge = df_merge.drop(columns = ['population_y']).rename(columns={
    'population_x': 'population'
})
df_merge.head()

Unnamed: 0,entity,year,schizophrenia,bipolar_disorder,eating_disorder,anxiety,drug_addiction,depression,alcoholism,from_20_to_24,...,from_25_to_29,from_50_to_69,standardized,from_15_to_49,prevalence_in_males,prevalence_in_females,population,suicide_rate,depression_rate,depression_prevalence
0,Afghanistan,1990,0.16056,0.697779,0.101855,4.82883,1.677082,4.071831,0.672404,4.417802,...,5.175856,5.917752,4.071831,4.939766,3.499982,4.647815,12412000.0,10.318504,4039.755763,318435.81367
1,Afghanistan,1991,0.160312,0.697961,0.099313,4.82974,1.684746,4.079531,0.671768,4.433524,...,5.176729,5.927093,4.079531,4.902682,3.503947,4.655772,13299000.0,10.32701,4046.256034,329044.773956
2,Afghanistan,1992,0.160135,0.698107,0.096692,4.831108,1.694334,4.088358,0.670644,4.453689,...,5.160249,5.945656,4.088358,4.837097,3.508912,4.662066,14486000.0,10.271411,4053.709902,382544.572895
3,Afghanistan,1993,0.160037,0.698257,0.094336,4.830864,1.70532,4.09619,0.669738,4.464517,...,5.148767,5.966915,4.09619,4.813657,3.513429,4.669012,15817000.0,10.376123,4060.203474,440381.507393
4,Afghanistan,1994,0.160022,0.698469,0.092439,4.829423,1.716069,4.099582,0.66926,4.46296,...,5.148227,5.975907,4.099582,4.83934,3.515578,4.67305,17076000.0,10.575915,4062.290365,456916.645489


### API de Kaggle

Para hacer funciones de agrupación en nuestro dataframe, utilizamos un dataset alojado en la plataforma Kaggle el cuál contiene los continentes y regiones de los distintos países del mundo. Esto nos ayudará a agrupar nuestros datos por regiones geográficas, con lo cuál esperamos realizar observaciones que nos sean de valor.

In [42]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = "./data"
!kaggle datasets download -d andradaolteanu/country-mapping-iso-continent-region

file_name = "country-mapping-iso-continent-region.zip"
  
# opening the zip file in READ mode 
with ZipFile(file_name, 'r') as zip: 
    # printing all the contents of the zip file 
    zip.printdir() 
    # extracting all the files 
    zip.extractall() 

!cp "country-mapping-iso-continent-region.zip" "./data/country-mapping-iso-continent-region.zip"
!cp "continents2.csv" "./data/continents2.csv"
!rm "continents2.csv"
!rm "country-mapping-iso-continent-region.zip"

country-mapping-iso-continent-region.zip: Skipping, found more recently modified local copy (use --force to force download)
File Name                                             Modified             Size
continents2.csv                                2019-12-15 15:07:36        19700


In [37]:
continents = pd.read_csv('./data/continents2.csv')
pd.merge(df_merge, continents, left_on = 'entity',right_on = 'name')

Unnamed: 0,entity,year,schizophrenia,bipolar_disorder,eating_disorder,anxiety,drug_addiction,depression,alcoholism,from_20_to_24,...,alpha-2,alpha-3,country-code,iso_3166-2,region,sub-region,intermediate-region,region-code,sub-region-code,intermediate-region-code
0,Afghanistan,1990,0.160560,0.697779,0.101855,4.828830,1.677082,4.071831,0.672404,4.417802,...,AF,AFG,4,ISO 3166-2:AF,Asia,Southern Asia,,142.0,34.0,
1,Afghanistan,1991,0.160312,0.697961,0.099313,4.829740,1.684746,4.079531,0.671768,4.433524,...,AF,AFG,4,ISO 3166-2:AF,Asia,Southern Asia,,142.0,34.0,
2,Afghanistan,1992,0.160135,0.698107,0.096692,4.831108,1.694334,4.088358,0.670644,4.453689,...,AF,AFG,4,ISO 3166-2:AF,Asia,Southern Asia,,142.0,34.0,
3,Afghanistan,1993,0.160037,0.698257,0.094336,4.830864,1.705320,4.096190,0.669738,4.464517,...,AF,AFG,4,ISO 3166-2:AF,Asia,Southern Asia,,142.0,34.0,
4,Afghanistan,1994,0.160022,0.698469,0.092439,4.829423,1.716069,4.099582,0.669260,4.462960,...,AF,AFG,4,ISO 3166-2:AF,Asia,Southern Asia,,142.0,34.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5119,Zimbabwe,2013,0.155670,0.607993,0.117248,3.090168,0.766280,3.128192,1.515641,2.849725,...,ZW,ZWE,716,ISO 3166-2:ZW,Africa,Sub-Saharan Africa,Eastern Africa,2.0,202.0,14.0
5120,Zimbabwe,2014,0.155993,0.608610,0.118073,3.093964,0.768914,3.140290,1.515470,2.856874,...,ZW,ZWE,716,ISO 3166-2:ZW,Africa,Sub-Saharan Africa,Eastern Africa,2.0,202.0,14.0
5121,Zimbabwe,2015,0.156465,0.609363,0.119470,3.098687,0.771802,3.155710,1.514751,2.868684,...,ZW,ZWE,716,ISO 3166-2:ZW,Africa,Sub-Saharan Africa,Eastern Africa,2.0,202.0,14.0
5122,Zimbabwe,2016,0.157111,0.610234,0.121456,3.104294,0.772275,3.174134,1.513269,2.893170,...,ZW,ZWE,716,ISO 3166-2:ZW,Africa,Sub-Saharan Africa,Eastern Africa,2.0,202.0,14.0
