## Causas de Morte ao redor do mundo

### Análise Exploratória

*Importanto Base de dados e imprimindo as primeiras 5 linhas*

In [92]:
import pandas as pd

# Importando base de dados 
df = pd.read_csv("./cause_of_deaths.csv")

# Primeiras 5 linhas do Dataframe
df.head()

Unnamed: 0,Country/Territory,Code,Year,Meningitis,Alzheimer's Disease and Other Dementias,Parkinson's Disease,Nutritional Deficiencies,Malaria,Drowning,Interpersonal Violence,...,Diabetes Mellitus,Chronic Kidney Disease,Poisonings,Protein-Energy Malnutrition,Road Injuries,Chronic Respiratory Diseases,Cirrhosis and Other Chronic Liver Diseases,Digestive Diseases,"Fire, Heat, and Hot Substances",Acute Hepatitis
0,Afghanistan,AFG,1990,2159,1116,371,2087,93,1370,1538,...,2108,3709,338,2054,4154,5945,2673,5005,323,2985
1,Afghanistan,AFG,1991,2218,1136,374,2153,189,1391,2001,...,2120,3724,351,2119,4472,6050,2728,5120,332,3092
2,Afghanistan,AFG,1992,2475,1162,378,2441,239,1514,2299,...,2153,3776,386,2404,5106,6223,2830,5335,360,3325
3,Afghanistan,AFG,1993,2812,1187,384,2837,108,1687,2589,...,2195,3862,425,2797,5681,6445,2943,5568,396,3601
4,Afghanistan,AFG,1994,3027,1211,391,3081,211,1809,2849,...,2231,3932,451,3038,6001,6664,3027,5739,420,3816


*Quantas entradas temos? de quantos anos? de quantos países? e de quantas doenças?*

In [93]:
df.shape

(6120, 34)

In [94]:
df['Year'].nunique()

30

In [95]:

df['Country/Territory'].nunique()

204

In [96]:
dfCauses = df.drop(columns=["Country/Territory", "Year"]).reset_index(drop=True)
dfCauses = dfCauses.melt(id_vars=["Code"], var_name="Cause", value_name="Deaths")
dfCauses["Cause"].nunique()

31

Nota-se que há 6120 entradas, recorrentes a informações de 204 países ao longo de 30 anos a respeito de 31 causas de mortes distintas.

Abaixo estão listadas as causas de mortes que estamos trabalhando:

In [97]:
dfCauses["Cause"].unique()

array(['Meningitis', "Alzheimer's Disease and Other Dementias",
       "Parkinson's Disease", 'Nutritional Deficiencies', 'Malaria',
       'Drowning', 'Interpersonal Violence', 'Maternal Disorders',
       'HIV/AIDS', 'Drug Use Disorders', 'Tuberculosis',
       'Cardiovascular Diseases', 'Lower Respiratory Infections',
       'Neonatal Disorders', 'Alcohol Use Disorders', 'Self-harm',
       'Exposure to Forces of Nature', 'Diarrheal Diseases',
       'Environmental Heat and Cold Exposure', 'Neoplasms',
       'Conflict and Terrorism', 'Diabetes Mellitus',
       'Chronic Kidney Disease', 'Poisonings',
       'Protein-Energy Malnutrition', 'Road Injuries',
       'Chronic Respiratory Diseases',
       'Cirrhosis and Other Chronic Liver Diseases', 'Digestive Diseases',
       'Fire, Heat, and Hot Substances', 'Acute Hepatitis'], dtype=object)

Abaixo está disposta a listagem dos anos. Nota-se que trabalharemos no período de tempo 1990 - 2019.

In [98]:
df['Year'].unique()

array([1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
       2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011,
       2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019], dtype=int64)

Há dados de todos os países para cada um dos anos?

In [99]:
# Número de entradas (uma por ano) para cada país

df['Country/Territory'].value_counts()


Afghanistan         30
Papua New Guinea    30
Niue                30
North Korea         30
North Macedonia     30
                    ..
Greenland           30
Grenada             30
Guam                30
Guatemala           30
Zimbabwe            30
Name: Country/Territory, Length: 204, dtype: int64

In [100]:
# Verificar se há dados faltantes
df.isnull().sum()

Country/Territory                             0
Code                                          0
Year                                          0
Meningitis                                    0
Alzheimer's Disease and Other Dementias       0
Parkinson's Disease                           0
Nutritional Deficiencies                      0
Malaria                                       0
Drowning                                      0
Interpersonal Violence                        0
Maternal Disorders                            0
HIV/AIDS                                      0
Drug Use Disorders                            0
Tuberculosis                                  0
Cardiovascular Diseases                       0
Lower Respiratory Infections                  0
Neonatal Disorders                            0
Alcohol Use Disorders                         0
Self-harm                                     0
Exposure to Forces of Nature                  0
Diarrheal Diseases                      

Observa-se que nossa base de dados está bem completa, sem dados faltantes. 

Agora, fica a dúvida, qual a principal causa de morte do planeta? 

In [101]:
# Novo dataset com a frequência de cada causa de morte no mundo
dfFrequence = df.groupby("Year").sum(numeric_only=True)
dfFrequence

Unnamed: 0_level_0,Meningitis,Alzheimer's Disease and Other Dementias,Parkinson's Disease,Nutritional Deficiencies,Malaria,Drowning,Interpersonal Violence,Maternal Disorders,HIV/AIDS,Drug Use Disorders,...,Diabetes Mellitus,Chronic Kidney Disease,Poisonings,Protein-Energy Malnutrition,Road Injuries,Chronic Respiratory Diseases,Cirrhosis and Other Chronic Liver Diseases,Digestive Diseases,"Fire, Heat, and Hot Substances",Acute Hepatitis
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1990,432253,560616,147156,756808,840297,460460,372497,302419,336059,56133,...,661085,600925,87951,655975,1112770,3092759,1012423,1854392,123123,166343
1991,428621,583166,150875,729145,858984,454375,383689,298271,430725,61890,...,679630,613589,87813,631013,1117024,3148288,1026870,1877515,123941,165276
1992,426440,605894,154886,700664,856415,447056,407176,299300,540070,66826,...,702253,630160,88435,606015,1125566,3207816,1042953,1903759,124995,163687
1993,420836,629571,160249,674219,862216,445434,432858,293564,664463,71603,...,728077,647255,90036,583919,1137444,3266612,1067730,1939556,127493,161899
1994,413799,652176,164381,649801,855671,443350,441971,293148,800169,76717,...,751254,665365,90897,564046,1153642,3297292,1089331,1967669,129611,159423
1995,409826,674815,168882,723095,862626,437303,444246,290551,938440,79985,...,773490,683701,90353,641084,1162799,3313295,1104380,1984263,128523,157173
1996,417259,696665,173822,671977,872476,423296,432673,287746,1061580,81704,...,799023,704624,88861,593826,1162809,3342591,1115241,1995513,126804,153406
1997,400893,717342,179347,647682,892946,413405,427316,289467,1174154,82572,...,827734,731048,87371,572372,1169798,3381872,1128962,2014659,126274,151902
1998,393364,738768,185097,620498,901338,407205,431984,289304,1303651,85087,...,854415,758681,86679,549543,1177827,3401426,1141509,2030600,124746,149563
1999,390136,761620,191538,593417,893788,397169,440642,288836,1439777,87336,...,879521,785380,87333,527431,1195303,3419612,1158552,2054201,125921,147017


In [102]:
# Causa de morte mais frequente no planeta em cada ano
dfMax = dfFrequence.idxmax(axis=1)
dfMax

Year
1990    Cardiovascular Diseases
1991    Cardiovascular Diseases
1992    Cardiovascular Diseases
1993    Cardiovascular Diseases
1994    Cardiovascular Diseases
1995    Cardiovascular Diseases
1996    Cardiovascular Diseases
1997    Cardiovascular Diseases
1998    Cardiovascular Diseases
1999    Cardiovascular Diseases
2000    Cardiovascular Diseases
2001    Cardiovascular Diseases
2002    Cardiovascular Diseases
2003    Cardiovascular Diseases
2004    Cardiovascular Diseases
2005    Cardiovascular Diseases
2006    Cardiovascular Diseases
2007    Cardiovascular Diseases
2008    Cardiovascular Diseases
2009    Cardiovascular Diseases
2010    Cardiovascular Diseases
2011    Cardiovascular Diseases
2012    Cardiovascular Diseases
2013    Cardiovascular Diseases
2014    Cardiovascular Diseases
2015    Cardiovascular Diseases
2016    Cardiovascular Diseases
2017    Cardiovascular Diseases
2018    Cardiovascular Diseases
2019    Cardiovascular Diseases
dtype: object

In [103]:
mode = dfMax.mode()
cases = dfFrequence["Cardiovascular Diseases"].sum(axis=0)
print("A maior causa de morte no planeta de 1990 a 2019:", mode[0], " - tirou a vida de", cases, "individuos.")

A maior causa de morte no planeta de 1990 a 2019: Cardiovascular Diseases  - tirou a vida de 447741982 individuos.


In [104]:
# Outra análise interessante: Qual a causa de morte mais frequente em cada país?
# Para isso, vamos criar um novo dataset com a frequência de cada causa de morte em cada país, em todos os anos

dfCountry = df.drop(columns=["Year"]).reset_index(drop=True)
dfCountry = dfCountry.groupby("Country/Territory").sum(numeric_only=True)
dfCountry


Unnamed: 0_level_0,Meningitis,Alzheimer's Disease and Other Dementias,Parkinson's Disease,Nutritional Deficiencies,Malaria,Drowning,Interpersonal Violence,Maternal Disorders,HIV/AIDS,Drug Use Disorders,...,Diabetes Mellitus,Chronic Kidney Disease,Poisonings,Protein-Energy Malnutrition,Road Injuries,Chronic Respiratory Diseases,Cirrhosis and Other Chronic Liver Diseases,Digestive Diseases,"Fire, Heat, and Hot Substances",Acute Hepatitis
Country/Territory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,78666,41998,13397,71453,13924,56536,108228,129621,4282,7094,...,93207,134676,14530,70163,208331,209857,98419,186959,13559,98108
Albania,1323,16549,4491,569,0,2397,5242,246,57,634,...,4055,7636,500,526,8522,22632,8717,14907,636,44
Algeria,15685,86914,22943,7138,70,24273,16702,29475,6101,10612,...,89035,154666,12337,6407,369395,168453,91927,146527,27628,10492
American Samoa,30,143,69,60,0,120,101,30,15,0,...,970,512,0,60,164,612,181,341,0,0
Andorra,0,614,137,0,0,0,15,0,85,0,...,198,292,0,0,259,838,283,560,0,30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Venezuela,11615,108735,18573,22554,3726,20273,266071,12739,46090,1664,...,175790,161667,2607,21347,175036,122198,91720,168365,4949,1109
Vietnam,38559,369363,83322,48613,17311,214356,47981,13167,148838,19959,...,544222,396874,34681,7366,594980,911787,527192,735817,17380,30650
Yemen,21095,31045,7188,68939,143463,27994,17918,53611,6276,3718,...,30812,52119,12561,66731,278327,126525,64136,111536,23871,26532
Zambia,98886,13473,4054,95913,205529,12809,30065,28395,1175563,933,...,54098,41751,9056,92915,56976,59173,100581,147640,9476,8846


In [105]:
# Qual será o país que mais teve mortes por cada causa de morte?
dfLeaders = dfCountry.idxmax(axis=0)
dfLeaders


Meningitis                                            India
Alzheimer's Disease and Other Dementias               China
Parkinson's Disease                                   China
Nutritional Deficiencies                              India
Malaria                                             Nigeria
Drowning                                              China
Interpersonal Violence                               Brazil
Maternal Disorders                                    India
HIV/AIDS                                       South Africa
Drug Use Disorders                            United States
Tuberculosis                                          India
Cardiovascular Diseases                               China
Lower Respiratory Infections                          India
Neonatal Disorders                                    India
Alcohol Use Disorders                                Russia
Self-harm                                             India
Exposure to Forces of Nature            

In [106]:
dfCountry.idxmax(axis=0).mode()[0]


'India'

Dado a população numerosa da índia, faz sentido que ela lidere em número de óbitos para a maior parte das causas. Será que o Brasil lidera em alguma causa? Vejamos:

In [107]:
# Em quais causas de morte o Brasil aparece no dataset dfLeaders?
dfLeaders[dfLeaders == "Brazil"]

Interpersonal Violence    Brazil
dtype: object

O Brasil é o país onde houve mais mortes por Violência Interpessoal nos anos de 1990 a 2019.

Vamos agora construir algumas visualizações.

### Visualização

Vamos começar analisando a causa de morte que o Brasil Lidera: a Interpesonal Violence.

In [108]:
import plotly.express as px

# Gráfico de linhas para mostrar a evolução do número de mortes por INTERPERSONAL VIOLENCE no Brasil

fig = px.line(df, x="Year", y="Interpersonal Violence", color="Country/Territory", title="Interpersonal Violence")
fig.show()

Faremos, agora, um gráfico de barras que mostra o número de mortes por cada causa no Brasil em 2019 - a data mais recente.

In [109]:
# Gráfico de barras com o número de mortes em cada país por cada causa em 2019

df2019 = df[df["Year"] == 2019]
df2019 = df2019.loc[df["Country/Territory"] == "Brazil"].drop(columns=["Country/Territory","Code", "Year"]).reset_index(drop=True)

# Agora usamos melt para transformar as colunas em linhas
df2019= df2019.melt(var_name="Cause", value_name="Deaths")
bar = px.bar(df2019,
            x="Deaths",
            y="Cause",
            color="Cause",
        color_discrete_sequence=["yellow", "blue", "green"],
)
# Inserir título
bar.update_layout(title="Deaths in Brazil in 2019")

# Não mostrar a legenda
bar.update_layout(showlegend=False)

bar



E no mundo? Como se distribuiram as causas em 2019?

In [110]:
# Gráfico de barras com o número total de mortes no planeta por cada causa de morte em 2019
bar2 = px.bar(dfFrequence.loc[2019],
            x=dfFrequence.loc[2019].values,
            y=dfFrequence.loc[2019].index,
            color=dfFrequence.loc[2019].index,
            color_discrete_sequence=px.colors.sequential.RdBu,
)
# Inserir título
bar2.update_layout(title="Deaths in the world in 2019")

# Não mostrar a legenda
bar2.update_layout(showlegend=False)

# Não mostrar títulos dos eixos
bar2.update_xaxes(title_text="")
bar2.update_yaxes(title_text="")

bar2





Essas foram algumas das centenas de visualizações interessantes que essa base de dados nos permite fazer.