# Deadly Visualizations!!!

![Image](../images/viz_types_portada.png)

## Setup

First we need to create a basic setup which includes:

- Importing the libraries.

- Reading the dataset file (source [Instituto Nacional de Estadística](https://www.ine.es/ss/Satellite?L=es_ES&c=Page&cid=1259942408928&p=1259942408928&pagename=ProductosYServicios%2FPYSLayout)).

- Create a couple of columns and tables for the analysis.

__NOTE:__ some functions were already created in order to help you go through the challenge. However, feel free to perform any code you might need.

In [1]:
# imports

import sys
import re
sys.path.insert(0, "../modules")

import numpy as np
import pandas as pd

import plotly.express as px
import cufflinks as cf
cf.go_offline()

import module as mod     # functions are include in module.py

In [179]:
# read dataset
import re
deaths = pd.read_csv('../data/7947.csv', sep=';', thousands='.')
population=pd.read_csv('../data/31304bsc.csv',sep=';',thousands='.')
population['Periodo'] = population['Periodo'].str[-4:]
population =population.groupby('Periodo')['Total'].mean().reset_index(name='Avg_Population')
population['Periodo']=population['Periodo'].astype(int)
population['Avg_Population']=population['Avg_Population'].astype(int)
deaths=pd.merge(deaths, population, on='Periodo')
deaths

Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,Avg_Population
0,001-102 I-XXII.Todas las causas,Total,Todas las edades,2018,427721,46693630
1,001-102 I-XXII.Todas las causas,Total,Menos de 1 año,2018,1027,46693630
2,001-102 I-XXII.Todas las causas,Total,De 1 a 4 años,2018,240,46693630
3,001-102 I-XXII.Todas las causas,Total,De 5 a 9 años,2018,179,46693630
4,001-102 I-XXII.Todas las causas,Total,De 10 a 14 años,2018,198,46693630
...,...,...,...,...,...,...
301153,102 Otras causas externas y sus efectos tardíos,Mujeres,De 75 a 79 años,1980,1,37420006
301154,102 Otras causas externas y sus efectos tardíos,Mujeres,De 80 a 84 años,1980,1,37420006
301155,102 Otras causas externas y sus efectos tardíos,Mujeres,De 85 a 89 años,1980,1,37420006
301156,102 Otras causas externas y sus efectos tardíos,Mujeres,De 90 a 94 años,1980,0,37420006


In [180]:
# add some columns...you'll need them later

deaths['cause_code'] = deaths['Causa de muerte'].apply(mod.cause_code)
deaths['cause_group'] = deaths['Causa de muerte'].apply(mod.cause_types)
deaths['cause_name'] = deaths['Causa de muerte'].apply(mod.cause_name)



In [181]:
# lets check the categorical variables
var_list = ['Sexo', 'Edad', 'Periodo', 'cause_code', 'cause_name', 'cause_group']
categories = mod.cat_var(deaths, var_list)
categories

Unnamed: 0,categorical_variable,number_of_possible_values,values
0,cause_code,117,"[001-102, 001-008, 001, 002, 003, 004, 005, 00..."
1,cause_name,117,"[I-XXII.Todas las causas, I.Enfermedades infec..."
2,Periodo,39,"[2018, 2017, 2016, 2015, 2014, 2013, 2012, 201..."
3,Edad,22,"[Todas las edades, Menos de 1 año, De 1 a 4 añ..."
4,Sexo,3,"[Total, Hombres, Mujeres]"
5,cause_group,2,"[Multiple causes, Single cause]"


In [182]:
# we need also to create a causes table for the analysis

causes_table = deaths[['cause_code', 'cause_name']].drop_duplicates().sort_values(by='cause_code').reset_index(drop=True)

causes_table

Unnamed: 0,cause_code,cause_name
0,001,Enfermedades infecciosas intestinales
1,001-008,I.Enfermedades infecciosas y parasitarias
2,001-102,I-XXII.Todas las causas
3,002,Tuberculosis y sus efectos tardíos
4,003,Enfermedad meningocócica
...,...,...
112,098,Suicidio y lesiones autoinfligidas
113,099,Agresiones (homicidio)
114,100,Eventos de intención no determinada
115,101,Complicaciones de la atención médica y quirúrgica


In [183]:
# And some space for free-style Pandas!!! (e.g.: df['column_name'].unique())

A=deaths['cause_group'].unique()
A

array(['Multiple causes', 'Single cause'], dtype=object)

## Lets make some transformations

Eventhough the dataset is pretty clean, the information is completely denormalized as you could see. For that matter a collection of methods (functions) are available in order to generate the tables you might need:

- `row_filter(df, cat_var, cat_values)` => Filter rows by any value or group of values in a categorical variable.

- `nrow_filter(df, cat_var, cat_values)` => The same but backwards. 

- `groupby_sum(df, group_vars, agg_var='Total', sort_var='Total')` => Add deaths by a certain variable.

- `pivot_table(df, col, x_axis, value='Total')`=> Make some pivot tables, you might need them...

__NOTE:__ be aware that the filtering methods can perform a filter at a time. Feel free to perform the filter you need in any way you want or feel confortable with.

In [202]:
# Example 1

#dataset = mod.row_filter(deaths, 'Sexo', ['Total'])
#dataset = mod.row_filter(dataset, 'Edad', ['Todas las edades'])
#dataset.head()
dataset1 = mod.row_filter(deaths, 'cause_code', ['059'])
#dataset2=mod.row_filter(dataset1,'Edad',['Todas las edades'])
#dataset3=mod.row_filter(dataset2,'cause_group',['Multiple causes'])
#dataset3
dataset1['ratio']=dataset1['Total']/dataset1['Avg_Population']
dataset1

Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,Avg_Population,cause_code,cause_group,cause_name,ratio
0,059 Enfermedades cerebrovasculares,Total,Todas las edades,1981,49000,37699923,059,Single cause,Enfermedades cerebrovasculares,0.001300
1,059 Enfermedades cerebrovasculares,Total,Todas las edades,1983,48331,38125207,059,Single cause,Enfermedades cerebrovasculares,0.001268
2,059 Enfermedades cerebrovasculares,Total,Todas las edades,1984,47699,38289071,059,Single cause,Enfermedades cerebrovasculares,0.001246
3,059 Enfermedades cerebrovasculares,Total,Todas las edades,1985,47684,38437427,059,Single cause,Enfermedades cerebrovasculares,0.001241
4,059 Enfermedades cerebrovasculares,Total,Todas las edades,1980,47475,37420006,059,Single cause,Enfermedades cerebrovasculares,0.001269
...,...,...,...,...,...,...,...,...,...,...
2569,059 Enfermedades cerebrovasculares,Total,Menos de 1 año,2008,0,45826053,059,Single cause,Enfermedades cerebrovasculares,0.000000
2570,059 Enfermedades cerebrovasculares,Mujeres,De 1 a 4 años,2004,0,42703313,059,Single cause,Enfermedades cerebrovasculares,0.000000
2571,059 Enfermedades cerebrovasculares,Total,De 1 a 4 años,2011,0,46701716,059,Single cause,Enfermedades cerebrovasculares,0.000000
2572,059 Enfermedades cerebrovasculares,Mujeres,Menos de 1 año,2015,0,46429857,059,Single cause,Enfermedades cerebrovasculares,0.000000


In [203]:
# Example 2
group = ['Sexo','Periodo','ratio']
dataset1 = mod.groupby_sum(dataset1, group)
dataset1


Unnamed: 0,Sexo,Periodo,ratio,Total
0,Total,1981,0.001300,49000
1,Total,1983,0.001268,48331
2,Total,1984,0.001246,47699
3,Total,1985,0.001241,47684
4,Total,1980,0.001269,47475
...,...,...,...,...
2454,Total,2017,0.000000,0
2455,Hombres,2014,0.000000,0
2456,Mujeres,2017,0.000000,0
2457,Hombres,1997,0.000000,0


In [204]:
# Example 3
dataset1 = mod.pivot_table(dataset1,'Sexo','Periodo','ratio')
dataset1

Sexo,Periodo,Hombres,Mujeres,Total
0,1980,0.001063,0.001452,0.002537
1,1981,0.001105,0.001494,0.002599
2,1982,0.001029,0.001403,0.002433
3,1983,0.001065,0.001469,0.002535
4,1984,0.001051,0.00144,0.00249
5,1985,0.001049,0.001431,0.002481
6,1986,0.000981,0.001395,0.002376
7,1987,0.000951,0.001349,0.0023
8,1988,0.000958,0.001346,0.002305
9,1989,0.000939,0.001321,0.002261


## ...and finally, show me some insights with Plotly!!!

In [206]:
# Cufflinks histogram
dataset1.iplot(kind='line',
                     title='Cardio evolution',
                     x='Periodo',
                     y=['Total','Hombres','Mujeres'],
                     yTitle='Total',
                     xTitle='Periodo')

In [75]:
# Cufflinks bar plot
dataset_bar.iplot(kind='bar',
                  x='VARIABLE',
                  xTitle='AXIS TITLE',
                  yTitle='AXIS TITLE',
                  title='VIZ TITLE')


NameError: name 'dataset_bar' is not defined

In [None]:
# Cufflinks line plot
'''
dataset_line.iplot(kind='line',
                   x='VARIABLE',
                   xTitle='AXIS TITLE',
                   yTitle='AXIS TITLE',
                   title='VIZ TITLE')
'''

In [None]:
# Cufflinks scatter plot
'''
dataset_scatter.iplot(x='VARIABLE', 
                      y='VARIABLE', 
                      categories='VARIABLE',
                      xTitle='AXIS TITLE', 
                      yTitle='AXIS TITLE',
                      title='VIZ TITLE')
'''