# Deadly Visualizations!!!

![Image](../images/viz_types_portada.png)

## Setup

First we need to create a basic setup which includes:

- Importing the libraries.

- Reading the dataset file (source [Instituto Nacional de Estadística](https://www.ine.es/ss/Satellite?L=es_ES&c=Page&cid=1259942408928&p=1259942408928&pagename=ProductosYServicios%2FPYSLayout)).

- Create a couple of columns and tables for the analysis.

__NOTE:__ some functions were already created in order to help you go through the challenge. However, feel free to perform any code you might need.

In [1]:
# imports

import sys
import re
sys.path.insert(0, "../modules")

import numpy as np
import pandas as pd

import plotly.express as px
import cufflinks as cf
cf.go_offline()

import module as mod     # functions are include in module.py

In [2]:
# read dataset

deaths = pd.read_csv('../data/7947.csv', sep=';', thousands='.')

deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301158 entries, 0 to 301157
Data columns (total 5 columns):
 #   Column           Non-Null Count   Dtype 
---  ------           --------------   ----- 
 0   Causa de muerte  301158 non-null  object
 1   Sexo             301158 non-null  object
 2   Edad             301158 non-null  object
 3   Periodo          301158 non-null  int64 
 4   Total            301158 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.5+ MB


In [3]:
deaths.head()


Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total
0,001-102 I-XXII.Todas las causas,Total,Todas las edades,2018,427721
1,001-102 I-XXII.Todas las causas,Total,Todas las edades,2017,424523
2,001-102 I-XXII.Todas las causas,Total,Todas las edades,2016,410611
3,001-102 I-XXII.Todas las causas,Total,Todas las edades,2015,422568
4,001-102 I-XXII.Todas las causas,Total,Todas las edades,2014,395830


In [4]:
population = pd.read_csv('../data/31304bsc.csv', sep=';', thousands='.')

population.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 103 entries, 0 to 102
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Sexo        103 non-null    object
 1   Edad        103 non-null    object
 2   Provincias  103 non-null    object
 3   Periodo     103 non-null    object
 4   Total       103 non-null    int64 
dtypes: int64(1), object(4)
memory usage: 4.1+ KB


In [5]:
population.head()


Unnamed: 0,Sexo,Edad,Provincias,Periodo,Total
0,Ambos sexos,Total,Total Nacional,1 de enero de 2022,47432805
1,Ambos sexos,Total,Total Nacional,1 de julio de 2021,47331545
2,Ambos sexos,Total,Total Nacional,1 de enero de 2021,47398695
3,Ambos sexos,Total,Total Nacional,1 de julio de 2020,47355685
4,Ambos sexos,Total,Total Nacional,1 de enero de 2020,47332614


In [6]:
print(len(deaths['Periodo'].unique()))
deaths['Periodo'].unique()

39


array([2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008,
       2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997,
       1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986,
       1985, 1984, 1983, 1982, 1981, 1980])

In [7]:
# add some columns...you'll need them later

deaths['cause_code'] = deaths['Causa de muerte'].apply(mod.cause_code)
deaths['cause_group'] = deaths['Causa de muerte'].apply(mod.cause_types)
deaths['cause_name'] = deaths['Causa de muerte'].apply(mod.cause_name)

deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301158 entries, 0 to 301157
Data columns (total 8 columns):
 #   Column           Non-Null Count   Dtype 
---  ------           --------------   ----- 
 0   Causa de muerte  301158 non-null  object
 1   Sexo             301158 non-null  object
 2   Edad             301158 non-null  object
 3   Periodo          301158 non-null  int64 
 4   Total            301158 non-null  int64 
 5   cause_code       301158 non-null  object
 6   cause_group      301158 non-null  object
 7   cause_name       301158 non-null  object
dtypes: int64(2), object(6)
memory usage: 18.4+ MB


In [8]:
deaths

Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,cause_code,cause_group,cause_name
0,001-102 I-XXII.Todas las causas,Total,Todas las edades,2018,427721,001-102,Multiple causes,I-XXII.Todas las causas
1,001-102 I-XXII.Todas las causas,Total,Todas las edades,2017,424523,001-102,Multiple causes,I-XXII.Todas las causas
2,001-102 I-XXII.Todas las causas,Total,Todas las edades,2016,410611,001-102,Multiple causes,I-XXII.Todas las causas
3,001-102 I-XXII.Todas las causas,Total,Todas las edades,2015,422568,001-102,Multiple causes,I-XXII.Todas las causas
4,001-102 I-XXII.Todas las causas,Total,Todas las edades,2014,395830,001-102,Multiple causes,I-XXII.Todas las causas
...,...,...,...,...,...,...,...,...
301153,102 Otras causas externas y sus efectos tardíos,Mujeres,95 y más años,1984,0,102,Single cause,Otras causas externas y sus efectos tardíos
301154,102 Otras causas externas y sus efectos tardíos,Mujeres,95 y más años,1983,0,102,Single cause,Otras causas externas y sus efectos tardíos
301155,102 Otras causas externas y sus efectos tardíos,Mujeres,95 y más años,1982,0,102,Single cause,Otras causas externas y sus efectos tardíos
301156,102 Otras causas externas y sus efectos tardíos,Mujeres,95 y más años,1981,0,102,Single cause,Otras causas externas y sus efectos tardíos


In [9]:
# lets check the categorical variables

var_list = ['Sexo', 'Edad', 'Periodo', 'cause_code', 'cause_name', 'cause_group']

categories = mod.cat_var(deaths, var_list)
categories

Unnamed: 0,categorical_variable,number_of_possible_values,values
0,cause_code,117,"[001-102, 001-008, 001, 002, 003, 004, 005, 00..."
1,cause_name,117,"[I-XXII.Todas las causas, I.Enfermedades infec..."
2,Periodo,39,"[2018, 2017, 2016, 2015, 2014, 2013, 2012, 201..."
3,Edad,22,"[Todas las edades, Menos de 1 año, De 1 a 4 añ..."
4,Sexo,3,"[Total, Hombres, Mujeres]"
5,cause_group,2,"[Multiple causes, Single cause]"


In [10]:
# we need also to create a causes table for the analysis

causes_table = deaths[['cause_code', 'cause_name']].drop_duplicates().sort_values(by='cause_code').reset_index(drop=True)

causes_table

Unnamed: 0,cause_code,cause_name
0,001,Enfermedades infecciosas intestinales
1,001-008,I.Enfermedades infecciosas y parasitarias
2,001-102,I-XXII.Todas las causas
3,002,Tuberculosis y sus efectos tardíos
4,003,Enfermedad meningocócica
...,...,...
112,098,Suicidio y lesiones autoinfligidas
113,099,Agresiones (homicidio)
114,100,Eventos de intención no determinada
115,101,Complicaciones de la atención médica y quirúrgica


## Lets make some transformations

Eventhough the dataset is pretty clean, the information is completely denormalized as you could see. For that matter a collection of methods (functions) are available in order to generate the tables you might need:

- `row_filter(df, cat_var, cat_values)` => Filter rows by any value or group of values in a categorical variable.

- `nrow_filter(df, cat_var, cat_values)` => The same but backwards. 

- `groupby_sum(df, group_vars, agg_var='Total', sort_var='Total')` => Add deaths by a certain variable.

- `pivot_table(df, col, x_axis, value='Total')`=> Make some pivot tables, you might need them...

__NOTE:__ be aware that the filtering methods can perform a filter at a time. Feel free to perform the filter you need in any way you want or feel confortable with.

In [11]:
# Example 1

dataset = mod.row_filter(deaths, 'Sexo', ['Total'])
dataset = mod.row_filter(dataset, 'Edad', ['Todas las edades'])
dataset.head()


Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,cause_code,cause_group,cause_name
0,001-102 I-XXII.Todas las causas,Total,Todas las edades,2018,427721,001-102,Multiple causes,I-XXII.Todas las causas
1,001-102 I-XXII.Todas las causas,Total,Todas las edades,2017,424523,001-102,Multiple causes,I-XXII.Todas las causas
2,001-102 I-XXII.Todas las causas,Total,Todas las edades,2015,422568,001-102,Multiple causes,I-XXII.Todas las causas
3,001-102 I-XXII.Todas las causas,Total,Todas las edades,2016,410611,001-102,Multiple causes,I-XXII.Todas las causas
4,001-102 I-XXII.Todas las causas,Total,Todas las edades,2012,402950,001-102,Multiple causes,I-XXII.Todas las causas


In [12]:
# Example 2

group = ['cause_code','Periodo']
dataset = mod.groupby_sum(deaths, group)
dataset.head()


Unnamed: 0,cause_code,Periodo,Total
0,001-102,2018,1710884
1,001-102,2017,1698092
2,001-102,2015,1690272
3,001-102,2016,1642444
4,001-102,2012,1611800


In [13]:
# Example 3

dataset = mod.pivot_table(dataset, 'cause_code', 'Periodo')
dataset.head()


cause_code,Periodo,001,001-008,001-102,002,003,004,005,006,007,...,093,094,095,096,097,098,099,100,101,102
0,1980,1620,15768,1157376,5904,2008,3448,436,0,0,...,4956,1432,184,692,16748,6608,1496,28,968,96
1,1981,1404,15124,1173544,6332,1656,3344,348,0,0,...,4700,1200,156,1396,17472,6872,1284,336,908,208
2,1982,1308,13488,1146620,5352,1240,3104,316,0,0,...,4864,956,200,1000,18616,7404,1228,440,1132,52
3,1983,1212,13100,1210276,5152,1072,3152,336,0,0,...,4788,1464,148,884,18392,8724,1560,1276,1500,56
4,1984,1228,12928,1197636,4564,964,3704,424,0,0,...,4716,1244,164,1020,14696,9972,1812,1144,1636,76


## ...and finally, show me some insights with Plotly!!!

In [None]:
# And some space for free-style Pandas!!! (e.g.: df['column_name'].unique())




In [14]:
# clean dataset population (year, population)
population_year = population.loc[population['Periodo'].str.startswith('1 de enero de')]
population_year['Periodo'] = population_year['Periodo'].str.replace('1 de enero de ', '')
population_year['Periodo'] = pd.to_numeric(population_year['Periodo'])
population_year = population_year.drop(['Sexo', 'Edad', 'Provincias'], axis=1)
population_year = population_year.rename(columns={"Total": "Demo"})
print(len(population))
print(len(population_year))
population_year.head()


103
52


Unnamed: 0,Periodo,Demo
0,2022,47432805
2,2021,47398695
4,2020,47332614
6,2019,46937060
8,2018,46658447


In [15]:
# clean dataset deaths (all sex, all ages, every cause)
deaths_tot = deaths[(deaths['Sexo'] == 'Total') & 
                    (deaths['Edad'] == 'Todas las edades') &
                    (deaths['cause_group'] == 'Single cause')]
print(len(deaths))
print(len(deaths_tot))
deaths_tot.head()


301158
3978


Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,cause_code,cause_group,cause_name
5148,001 Enfermedades infecciosas intestinales,Total,Todas las edades,2018,1037,1,Single cause,Enfermedades infecciosas intestinales
5149,001 Enfermedades infecciosas intestinales,Total,Todas las edades,2017,955,1,Single cause,Enfermedades infecciosas intestinales
5150,001 Enfermedades infecciosas intestinales,Total,Todas las edades,2016,946,1,Single cause,Enfermedades infecciosas intestinales
5151,001 Enfermedades infecciosas intestinales,Total,Todas las edades,2015,895,1,Single cause,Enfermedades infecciosas intestinales
5152,001 Enfermedades infecciosas intestinales,Total,Todas las edades,2014,753,1,Single cause,Enfermedades infecciosas intestinales


In [16]:
population_year.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 52 entries, 0 to 102
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Periodo  52 non-null     int64
 1   Demo     52 non-null     int64
dtypes: int64(2)
memory usage: 1.2 KB


In [17]:
deaths_tot.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3978 entries, 5148 to 298622
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Causa de muerte  3978 non-null   object
 1   Sexo             3978 non-null   object
 2   Edad             3978 non-null   object
 3   Periodo          3978 non-null   int64 
 4   Total            3978 non-null   int64 
 5   cause_code       3978 non-null   object
 6   cause_group      3978 non-null   object
 7   cause_name       3978 non-null   object
dtypes: int64(2), object(6)
memory usage: 279.7+ KB


In [18]:
# join datasets deaths and population and calculate percentage
deaths_tot = deaths_tot.merge(population_year)
deaths_tot['Percent'] = deaths_tot.apply(lambda x: x['Total'] * 100 / x['Demo'], axis=1)
deaths_tot['PercentK'] = deaths_tot.apply(lambda x: x['Total'] * 100 / x['Demo'] * 100000, axis=1)
deaths_tot.head()


Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,cause_code,cause_group,cause_name,Demo,Percent,PercentK
0,001 Enfermedades infecciosas intestinales,Total,Todas las edades,2018,1037,1,Single cause,Enfermedades infecciosas intestinales,46658447,0.002223,222.253432
1,002 Tuberculosis y sus efectos tardíos,Total,Todas las edades,2018,249,2,Single cause,Tuberculosis y sus efectos tardíos,46658447,0.000534,53.366543
2,003 Enfermedad meningocócica,Total,Todas las edades,2018,15,3,Single cause,Enfermedad meningocócica,46658447,3.2e-05,3.214852
3,004 Septicemia,Total,Todas las edades,2018,3040,4,Single cause,Septicemia,46658447,0.006515,651.543331
4,005 Hepatitis vírica,Total,Todas las edades,2018,567,5,Single cause,Hepatitis vírica,46658447,0.001215,121.521404


In [19]:
# Total deaths every year
deaths_tot_year = deaths_tot.groupby(['Periodo'])[['Total', 
                                                   'Percent', 
                                                   'PercentK']].sum().reset_index()
print(len(deaths_tot_year))
deaths_tot_year.head()


39


Unnamed: 0,Periodo,Total,Percent,PercentK
0,1980,289344,0.774746,77474.620411
1,1981,293386,0.779548,77954.820661
2,1982,286655,0.756708,75670.756829
3,1983,302569,0.79435,79434.969948
4,1984,299409,0.782709,78270.930525


In [20]:
# Graph total deaths
deaths_tot_year.iplot(kind='line',
                      x='Periodo',
                      y='Total',
                      xTitle='Year',
                      yTitle='Total',
                      title='Evolution deaths. Total',
                      dimensions =(900,300))


In [21]:

# Graph total deaths by population
deaths_tot_year.iplot(kind='line',
                      x='Periodo',
                      y='Percent',
                      xTitle='Year',
                      yTitle='Total / population',
                      title='Evolution deaths. Total by population',
                      dimensions =(900,300))


In [None]:
# Significant by population


In [22]:
deaths_tot.head()

Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,cause_code,cause_group,cause_name,Demo,Percent,PercentK
0,001 Enfermedades infecciosas intestinales,Total,Todas las edades,2018,1037,1,Single cause,Enfermedades infecciosas intestinales,46658447,0.002223,222.253432
1,002 Tuberculosis y sus efectos tardíos,Total,Todas las edades,2018,249,2,Single cause,Tuberculosis y sus efectos tardíos,46658447,0.000534,53.366543
2,003 Enfermedad meningocócica,Total,Todas las edades,2018,15,3,Single cause,Enfermedad meningocócica,46658447,3.2e-05,3.214852
3,004 Septicemia,Total,Todas las edades,2018,3040,4,Single cause,Septicemia,46658447,0.006515,651.543331
4,005 Hepatitis vírica,Total,Todas las edades,2018,567,5,Single cause,Hepatitis vírica,46658447,0.001215,121.521404


In [23]:
# Total deaths every cause
deaths_tot_cause = deaths_tot.groupby(['cause_code', 
                                       'cause_name'])[['Total', 
                                                       'Percent', 
                                                       'PercentK']].sum().reset_index()
print(len(deaths_tot_cause))
deaths_tot_cause = deaths_tot_cause.sort_values(by=['Total'], ascending=False).head(10)
print(len(deaths_tot_cause))
deaths_tot_cause


102
10


Unnamed: 0,cause_code,cause_name,Total,Percent,PercentK
58,59,Enfermedades cerebrovasculares,1467841,3.589863,358986.288705
54,55,Infarto agudo de miocardio,855162,2.081283,208128.3283
56,57,Insuficiencia cardíaca,758343,1.833875,183387.483773
17,18,"Tumor maligno de la tráquea, de los bronquios ...",656092,1.554487,155448.668038
57,58,Otras enfermedades del corazón,644917,1.529769,152976.886419
63,64,Enfermedades crónicas de las vías respiratoria...,527573,1.26073,126072.950375
55,56,Otras enfermedades isquémicas del corazón,509696,1.206033,120603.286466
66,67,Otras enfermedades del sistema respiratorio,498811,1.171766,117176.609923
45,46,"Trastornos mentales orgánicos, senil y presenil",378321,0.870775,87077.545485
43,44,Diabetes mellitus,359715,0.862838,86283.840436


In [24]:
# Graph total deaths every cause
deaths_tot_cause.iplot(kind='bar',
                       x='cause_name',
                       y='Total',
                       xTitle='Cause',
                       yTitle='Total',
                       title='Total by cause (10 most frequent causes)')


In [25]:
# 10 most causes by sex
most_causes = deaths_tot_cause[['cause_code']]
deaths_sex = most_causes.merge(deaths)
deaths_sex = deaths_sex[(deaths_sex['Sexo'] != 'Total') & 
                        (deaths_sex['Edad'] == 'Todas las edades')]
deaths_sex

Unnamed: 0,cause_code,Causa de muerte,Sexo,Edad,Periodo,Total,cause_group,cause_name
858,059,059 Enfermedades cerebrovasculares,Hombres,Todas las edades,2018,11435,Single cause,Enfermedades cerebrovasculares
859,059,059 Enfermedades cerebrovasculares,Hombres,Todas las edades,2017,11555,Single cause,Enfermedades cerebrovasculares
860,059,059 Enfermedades cerebrovasculares,Hombres,Todas las edades,2016,11556,Single cause,Enfermedades cerebrovasculares
861,059,059 Enfermedades cerebrovasculares,Hombres,Todas las edades,2015,12077,Single cause,Enfermedades cerebrovasculares
862,059,059 Enfermedades cerebrovasculares,Hombres,Todas las edades,2014,11573,Single cause,Enfermedades cerebrovasculares
...,...,...,...,...,...,...,...,...
24916,044,044 Diabetes mellitus,Mujeres,Todas las edades,1984,5378,Single cause,Diabetes mellitus
24917,044,044 Diabetes mellitus,Mujeres,Todas las edades,1983,5326,Single cause,Diabetes mellitus
24918,044,044 Diabetes mellitus,Mujeres,Todas las edades,1982,5006,Single cause,Diabetes mellitus
24919,044,044 Diabetes mellitus,Mujeres,Todas las edades,1981,4832,Single cause,Diabetes mellitus


In [26]:
deaths_tot_sex = deaths_sex.groupby(['Sexo',
                                     'cause_code', 
                                     'cause_name'])[['Total']].sum().reset_index()
print(len(deaths_tot_sex))
deaths_tot_sex

20


Unnamed: 0,Sexo,cause_code,cause_name,Total
0,Hombres,18,"Tumor maligno de la tráquea, de los bronquios ...",565825
1,Hombres,44,Diabetes mellitus,138261
2,Hombres,46,"Trastornos mentales orgánicos, senil y presenil",123447
3,Hombres,55,Infarto agudo de miocardio,523199
4,Hombres,56,Otras enfermedades isquémicas del corazón,268537
5,Hombres,57,Insuficiencia cardíaca,273816
6,Hombres,58,Otras enfermedades del corazón,290034
7,Hombres,59,Enfermedades cerebrovasculares,612649
8,Hombres,64,Enfermedades crónicas de las vías respiratoria...,397349
9,Hombres,67,Otras enfermedades del sistema respiratorio,246037


In [27]:
# Graph total deaths most causes by sex
graph = px.bar(deaths_tot_sex, 
               x = 'cause_name',
               y = 'Total', 
               color = 'Sexo', 
               title = 'Mostly causes by sex',
               labels={'cause_name':'Cause', 
                       'Total':'Total', 
                       'Sexo': 'Sex'})

graph.update_layout(barmode='group', xaxis={'categoryorder': 'total descending'})

graph.show()

In [28]:
print(len(causes_table))
causes_single = causes_table.loc[causes_table['cause_code'].str.contains('-') == False]
print(len(causes_single))

117
102


In [29]:
display(causes_single.head(51))
display(causes_single.tail(51))

Unnamed: 0,cause_code,cause_name
0,1,Enfermedades infecciosas intestinales
3,2,Tuberculosis y sus efectos tardíos
4,3,Enfermedad meningocócica
5,4,Septicemia
6,5,Hepatitis vírica
7,6,SIDA
8,7,"VIH+ (portador, evidencias de laboratorio del ..."
9,8,Resto de enfermedades infecciosas y parasitari...
10,9,"Tumor maligno del labio, de la cavidad bucal y..."
12,10,Tumor maligno del esófago


Unnamed: 0,cause_code,cause_name
58,52,Otras enfermedades del sistema nervioso y de l...
59,53,Enfermedades cardíacas reumáticas crónicas
61,54,Enfermedades hipertensivas
62,55,Infarto agudo de miocardio
63,56,Otras enfermedades isquémicas del corazón
64,57,Insuficiencia cardíaca
65,58,Otras enfermedades del corazón
66,59,Enfermedades cerebrovasculares
67,60,Aterosclerosis
68,61,Otras enfermedades de los vasos sanguíneos


In [30]:
# Evolution most frequent causes. Selected most frequent cause, two increasing causes and other curious causes
selected_causes = pd.DataFrame(['059', '018', '046', '090', '098'], columns=['cause_code'])
deaths_causes_evo = selected_causes.merge(deaths_tot)
print(len(deaths_causes_evo))
deaths_causes_evo.head()

195


Unnamed: 0,cause_code,Causa de muerte,Sexo,Edad,Periodo,Total,cause_group,cause_name,Demo,Percent,PercentK
0,59,059 Enfermedades cerebrovasculares,Total,Todas las edades,2018,26420,Single cause,Enfermedades cerebrovasculares,46658447,0.056624,5662.425927
1,59,059 Enfermedades cerebrovasculares,Total,Todas las edades,2017,26937,Single cause,Enfermedades cerebrovasculares,46527039,0.057895,5789.536704
2,59,059 Enfermedades cerebrovasculares,Total,Todas las edades,2016,27122,Single cause,Enfermedades cerebrovasculares,46440099,0.058402,5840.211495
3,59,059 Enfermedades cerebrovasculares,Total,Todas las edades,2015,28434,Single cause,Enfermedades cerebrovasculares,46449565,0.061215,6121.47821
4,59,059 Enfermedades cerebrovasculares,Total,Todas las edades,2014,27579,Single cause,Enfermedades cerebrovasculares,46512199,0.059294,5929.412196


In [31]:
# Graph evolution selected causes
graph = px.line(deaths_causes_evo, 
                x = 'Periodo',
                y = 'Total', 
                color = 'cause_name',
                title = 'Evolution most frequent causes',
                labels={'Periodo':'Year', 
                        'Total':'Total', 
                        'cause_name': ''})

graph.update_layout(legend=dict(orientation='v'))

graph.show()

In [32]:
# Cause: 098 - suicide. Total
deaths_cause_tot_098 = deaths_tot[deaths_tot['cause_code'] == '098']
print(len(deaths_cause_tot_098))
deaths_cause_tot_098.head()

39


Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,cause_code,cause_group,cause_name,Demo,Percent,PercentK
97,098 Suicidio y lesiones autoinfligidas,Total,Todas las edades,2018,3539,98,Single cause,Suicidio y lesiones autoinfligidas,46658447,0.007585,758.49074
199,098 Suicidio y lesiones autoinfligidas,Total,Todas las edades,2017,3679,98,Single cause,Suicidio y lesiones autoinfligidas,46527039,0.007907,790.723003
301,098 Suicidio y lesiones autoinfligidas,Total,Todas las edades,2016,3569,98,Single cause,Suicidio y lesiones autoinfligidas,46440099,0.007685,768.51688
403,098 Suicidio y lesiones autoinfligidas,Total,Todas las edades,2015,3602,98,Single cause,Suicidio y lesiones autoinfligidas,46449565,0.007755,775.464743
505,098 Suicidio y lesiones autoinfligidas,Total,Todas las edades,2014,3910,98,Single cause,Suicidio y lesiones autoinfligidas,46512199,0.008406,840.639678


In [33]:
# Cause: 098 - suicide. By sex, ages
deaths_cause_098 = deaths[deaths['cause_code'] == '098']
deaths_cause_098 = deaths_cause_098[(deaths_cause_098['Sexo'] != 'Total') &
                                    (deaths_cause_098['Edad'] != 'Todas las edades')] 
deaths_cause_098.head()

Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,cause_code,cause_group,cause_name
289185,098 Suicidio y lesiones autoinfligidas,Hombres,Menos de 1 año,2018,0,98,Single cause,Suicidio y lesiones autoinfligidas
289186,098 Suicidio y lesiones autoinfligidas,Hombres,Menos de 1 año,2017,0,98,Single cause,Suicidio y lesiones autoinfligidas
289187,098 Suicidio y lesiones autoinfligidas,Hombres,Menos de 1 año,2016,0,98,Single cause,Suicidio y lesiones autoinfligidas
289188,098 Suicidio y lesiones autoinfligidas,Hombres,Menos de 1 año,2015,0,98,Single cause,Suicidio y lesiones autoinfligidas
289189,098 Suicidio y lesiones autoinfligidas,Hombres,Menos de 1 año,2014,0,98,Single cause,Suicidio y lesiones autoinfligidas


In [34]:
# Graph total suicide by year
deaths_cause_tot_098.iplot(kind='line',
                           x='Periodo',
                           y='Total',
                           xTitle='Year',
                           yTitle='Total',
                           title='Total suicide')


In [36]:
# Graph total suicide by sex
graph = px.pie(deaths_cause_098, 
               values = 'Total',
               names = 'Sexo',
               title = 'Total suicide',
               color = 'Sexo',
               color_discrete_map={'Hombres': 'grey',
                                   'Mujeres': 'gold'})

graph.update_traces(textposition = 'none')

graph.show()


In [37]:
# Graph total suicide by year, sex
deaths_cause_098_sex = deaths_cause_098.groupby(['Sexo', 'Periodo'])[['Total']].sum().reset_index()

graph = px.bar(deaths_cause_098_sex, 
               x = 'Periodo',
               y = 'Total',
               color = 'Sexo',
               title = 'Total suicide',
               labels={'Periodo':'Year', 
                       'Total':'Total', 
                       'Sexo': 'Sex'})

graph.show()

In [38]:
# Graph total suicide by age
deaths_cause_098_age = deaths_cause_098.groupby(['Edad'])[['Total']].sum().reset_index()
print(len(deaths_cause_098_age))

graph = px.bar(deaths_cause_098_age, 
               x = 'Edad',
               y = 'Total',
               title = 'Total suicide',
               labels={'Edad':'Age', 
                       'Total':'Total'})

graph.update_layout(xaxis={'categoryorder': 'total descending'})

graph.show()


21


In [40]:
# Age suicide by year
deaths_cause_098_age_year = deaths_cause_098.groupby(['Edad', 'Periodo'])[['Total']].sum().reset_index()

deaths_cause_098_age_year_max = deaths_cause_098_age_year.loc[deaths_cause_098_age_year.groupby('Periodo')['Total'].idxmax()].reset_index(drop=True)

graph = px.bar(deaths_cause_098_age_year_max, 
               x = 'Periodo',
               y = 'Total',
               color = 'Edad',
               title = 'Age suicide',
               labels={'Periodo':'Year', 
                       'Total':'Total'})

graph.show()



In [42]:
# 1982: Most frequent causes in 1982 (total years and sex)
year = 1982
age = 'Todas las edades'
sex = 'Total'
cause = 'Single cause'
mask = (deaths['Periodo'] == year) & (deaths['Edad'] == age) & (deaths['Sexo'] == sex) & (deaths['cause_group'] == cause)
               
deaths_year_age_sex = deaths[mask]
deaths_sorted = deaths_year_age_sex.sort_values('Total', ascending=False).head(10)
deaths_sorted

Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,cause_code,cause_group,cause_name
169920,059 Enfermedades cerebrovasculares,Total,Todas las edades,1982,46138,59,Single cause,Enfermedades cerebrovasculares
159624,055 Infarto agudo de miocardio,Total,Todas las edades,1982,21658,55,Single cause,Infarto agudo de miocardio
164772,057 Insuficiencia cardíaca,Total,Todas las edades,1982,16769,57,Single cause,Insuficiencia cardíaca
167346,058 Otras enfermedades del corazón,Total,Todas las edades,1982,13052,58,Single cause,Otras enfermedades del corazón
172494,060 Aterosclerosis,Total,Todas las edades,1982,12326,60,Single cause,Aterosclerosis
51516,"018 Tumor maligno de la tráquea, de los bronq...",Total,Todas las edades,1982,9828,18,Single cause,"Tumor maligno de la tráquea, de los bronquios ..."
193086,067 Otras enfermedades del sistema respiratorio,Total,Todas las edades,1982,8432,67,Single cause,Otras enfermedades del sistema respiratorio
205956,071 Cirrosis y otras enfermedades crónicas de...,Total,Todas las edades,1982,8081,71,Single cause,Cirrosis y otras enfermedades crónicas del hígado
162198,056 Otras enfermedades isquémicas del corazón,Total,Todas las edades,1982,7871,56,Single cause,Otras enfermedades isquémicas del corazón
182790,063 Neumonía,Total,Todas las edades,1982,7644,63,Single cause,Neumonía


In [43]:
deaths_sorted.iplot(kind='barh',
                    x='cause_name',
                    y='Total',
                    title='Mostly causes',
                    yTitle='Cause',
                    xTitle='Number')

In [46]:
# Next three graphs: most frequente causes by age in 1982, 2000 and 2018

In [48]:
# Year 1982 most frequent causes by age
year = 1982
sex = 'Total'
age = 'Todas las edades'
cause = 'Single cause'
mask = (deaths['Periodo'] == year) & (deaths['Sexo'] == sex) & (deaths['cause_group'] == cause)  & (deaths['Edad'] != age)
deaths_year_sex = deaths[mask]
deaths_year_sex
deaths_max = deaths_year_sex.loc[deaths_year_sex.groupby(['Edad'])['Total'].idxmax().reset_index(drop=True)]
deaths_max.sort_values('Edad')

graph = px.bar(deaths_max, 
               x = "cause_name", 
               y = "Total", 
               color = "Edad", 
               title = "Mostly causes 1982",
               width=900, height=1000)
graph.show()


In [49]:
# Year 2000 most frequent causes by age
year = 2000
sex = 'Total'
age = 'Todas las edades'
cause = 'Single cause'
mask = (deaths['Periodo'] == year) & (deaths['Sexo'] == sex) & (deaths['cause_group'] == cause)  & (deaths['Edad'] != age)
deaths_year_sex = deaths[mask]
deaths_year_sex
deaths_max = deaths_year_sex.loc[deaths_year_sex.groupby(['Edad'])['Total'].idxmax().reset_index(drop=True)]
#len(deaths_max)
deaths_max.sort_values('Edad')
#deaths_max['cause_name'].unique()

graph = px.bar(deaths_max, 
               x = "cause_name", 
               y = "Total", 
               color = "Edad", 
               title = "Mostly causes 2000",
               width=900, height=1000)
graph.show()

In [51]:
# Year 2018 most frequent causes by age
year = 2018
sex = 'Total'
age = 'Todas las edades'
cause = 'Single cause'
mask = (deaths['Periodo'] == year) & (deaths['Sexo'] == sex) & (deaths['cause_group'] == cause)  & (deaths['Edad'] != age)
deaths_year_sex = deaths[mask]
deaths_year_sex
deaths_max = deaths_year_sex.loc[deaths_year_sex.groupby(['Edad'])['Total'].idxmax().reset_index(drop=True)]
#len(deaths_max)
deaths_max.sort_values('Edad')
#deaths_max['cause_name'].unique()

graph = px.bar(deaths_max, 
               x = "cause_name", 
               y = "Total", 
               color = "Edad", 
               title = "Mostly causes 2018",
               width=900, height=1000)
graph.show()


In [None]:
# Cufflinks histogram




In [None]:
# Cufflinks bar plot
'''
dataset_bar.iplot(kind='bar',
                  x='VARIABLE',
                  xTitle='AXIS TITLE',
                  yTitle='AXIS TITLE',
                  title='VIZ TITLE')
'''

In [None]:
# Cufflinks line plot
'''
dataset_line.iplot(kind='line',
                   x='VARIABLE',
                   xTitle='AXIS TITLE',
                   yTitle='AXIS TITLE',
                   title='VIZ TITLE')
'''

In [None]:
# Cufflinks scatter plot
'''
dataset_scatter.iplot(x='VARIABLE', 
                      y='VARIABLE', 
                      categories='VARIABLE',
                      xTitle='AXIS TITLE', 
                      yTitle='AXIS TITLE',
                      title='VIZ TITLE')
'''