## Küresel Ölüm Sebepleri Verisetinin Görselleştirilmesi


## 1. Install and import libraries

In [1]:
!pip install geopandas




In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly
import geopandas
import plotly.express as px
import plotly.graph_objects as go



### 1.1 Set options



In [3]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

## 2. Reading data

Küresel Ölüm Sebepleri, ölüm ve hastalık nedenleri üzerine The Lancet tıp dergisinde yayınlanan önemli bir çalışmadır. Küresel ölüm sebepleri verisinde 1990-2016 yılları arasında herbir ülkedeki ölümlere ait hastalık, rahatsızlık, terör, trafik kazaları gibi sebeplere ait oranlar tutulmuştur. Veride ülkelerin dışında dünya geneli, bazı topluluklar, Orta Asya, Doğu Avrupa gibi yerlere ait ölüm sebep oranları da mevcuttur. Ölüm sebepleri verisi yalnızca oran olarak verildiğinden toplama, ortalama alma gibi işlemler yapılamamaktadır, bu da veri analiz edilirken bilgi çıkarılmasında kısıtlara sebep olmaktadır.

In [4]:
data = pd.read_excel('https://github.com/rfordatascience/tidytuesday/blob/master/data/2018/2018-04-16/global_mortality.xlsx?raw=true', na_values=0)

In [5]:
data.head()

Unnamed: 0,country,country_code,year,Cardiovascular diseases (%),Cancers (%),Respiratory diseases (%),Diabetes (%),Dementia (%),Lower respiratory infections (%),Neonatal deaths (%),Diarrheal diseases (%),Road accidents (%),Liver disease (%),Tuberculosis (%),Kidney disease (%),Digestive diseases (%),HIV/AIDS (%),Suicide (%),Malaria (%),Homicide (%),Nutritional deficiencies (%),Meningitis (%),Protein-energy malnutrition (%),Drowning (%),Maternal deaths (%),Parkinson disease (%),Alcohol disorders (%),Intestinal infectious diseases (%),Drug disorders (%),Hepatitis (%),Fire (%),Heat-related (hot and cold exposure) (%),Natural disasters (%),Conflict (%),Terrorism (%)
0,Afghanistan,AFG,1990,17.610397,4.025975,2.106626,3.832555,0.531429,10.886362,9.184653,2.497141,3.715944,0.836909,5.877075,1.680611,1.058771,0.013019,0.43661,0.448886,1.28702,0.350504,3.037603,0.32976,0.983862,1.769213,0.025159,0.028998,0.18333,0.041205,0.138738,0.174157,0.137823,,0.932,0.007
1,Afghanistan,AFG,1991,17.801807,4.054145,2.134176,3.822228,0.532497,10.356968,8.938897,2.572228,3.729142,0.845516,5.891704,1.671115,1.049322,0.014515,0.44228,0.455019,1.290991,0.343212,2.903202,0.322171,0.954586,1.749264,0.025451,0.029172,0.178107,0.042033,0.135008,0.170671,0.134827,0.797603,2.044,0.04
2,Afghanistan,AFG,1992,18.386833,4.173959,2.208298,3.900125,0.540066,10.095546,8.84138,2.707743,3.816357,0.874525,6.034669,1.700988,1.062887,0.016258,0.456645,0.460838,1.326168,0.345483,2.840649,0.323465,0.951376,1.764245,0.026122,0.029973,0.176855,0.043843,0.134582,0.171171,0.139053,0.34021,2.408,0.027
3,Afghanistan,AFG,1993,18.959646,4.269233,2.283923,3.974113,0.553813,9.873841,8.676409,3.360793,3.884374,0.900513,6.160951,1.728674,1.074955,0.017721,0.464683,0.466301,1.350335,0.348639,2.779886,0.325916,0.949149,1.77212,0.027044,0.030456,0.173467,0.04535,0.133888,0.171032,0.150136,0.116263,,
4,Afghanistan,AFG,1994,19.089513,4.256034,2.307721,3.968954,0.550087,9.530242,8.384454,3.083801,3.856048,0.905218,6.142278,1.725375,1.062284,0.018506,0.463374,0.472166,1.339526,0.345193,2.678094,0.321689,0.928878,1.745684,0.02718,0.030103,0.165135,0.045744,0.130486,0.167263,0.14897,0.075506,4.296,0.01


Veride ülke adı, ülke kodu ve yıl kolonlarından sonra ölüm sebeplerinin herbiri bir kolon olarak tutulmuştur. Bu durum verinin analiz edilmesini zorlaştıracağından ilerde tüm ölüm sebepleri tek bir sütun altına çekilmiştir.

## 2.1 Dropping the rows whose country codes are not exist
Veri görselleştirilirken ve daha sonra kıta isimleri ile birleştirilirken country_code sütunu kullanıldığından buradaki null satırlar verisetinden çıkarılmıştır.

In [6]:
data.isnull().sum()

country                                        0
country_code                                 864
year                                           0
Cardiovascular diseases (%)                    0
Cancers (%)                                    0
Respiratory diseases (%)                       0
Diabetes (%)                                   0
Dementia (%)                                   0
Lower respiratory infections (%)               0
Neonatal deaths (%)                            0
Diarrheal diseases (%)                         0
Road accidents (%)                             0
Liver disease (%)                              0
Tuberculosis (%)                               0
Kidney disease (%)                             0
Digestive diseases (%)                         0
HIV/AIDS (%)                                   4
Suicide (%)                                    0
Malaria (%)                                 2540
Homicide (%)                                   0
Nutritional deficien

In [7]:
data[data['country_code'].isnull()]['country'].unique()

array(['Andean Latin America', 'Australasia', 'Caribbean', 'Central Asia',
       'Central Europe', 'Central Latin America',
       'Central Sub-Saharan Africa', 'East Asia', 'Eastern Europe',
       'Eastern Sub-Saharan Africa', 'England', 'High SDI',
       'High-income Asia Pacific', 'High-middle SDI',
       'Latin America and Caribbean', 'Low SDI', 'Low-middle SDI',
       'Middle SDI', 'North Africa and Middle East', 'North America',
       'Northern Ireland', 'Oceania', 'Scotland', 'South Asia',
       'Southeast Asia', 'Southern Latin America',
       'Southern Sub-Saharan Africa', 'Sub-Saharan Africa',
       'Tropical Latin America', 'Wales', 'Western Europe',
       'Western Sub-Saharan Africa'], dtype=object)

In [8]:
data1  = data.dropna(subset = ['country_code'])

## 2.2 Deleting the " (%)" sign from the column names


In [9]:
data1.describe()


Unnamed: 0,year,Cardiovascular diseases (%),Cancers (%),Respiratory diseases (%),Diabetes (%),Dementia (%),Lower respiratory infections (%),Neonatal deaths (%),Diarrheal diseases (%),Road accidents (%),Liver disease (%),Tuberculosis (%),Kidney disease (%),Digestive diseases (%),HIV/AIDS (%),Suicide (%),Malaria (%),Homicide (%),Nutritional deficiencies (%),Meningitis (%),Protein-energy malnutrition (%),Drowning (%),Maternal deaths (%),Parkinson disease (%),Alcohol disorders (%),Intestinal infectious diseases (%),Drug disorders (%),Hepatitis (%),Fire (%),Heat-related (hot and cold exposure) (%),Natural disasters (%),Conflict (%),Terrorism (%)
count,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5288.0,5292.0,2995.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,5292.0,1481.0,1300.0,1514.0
mean,2003.0,29.966068,14.133649,3.913752,6.488727,3.120373,5.834167,4.675771,3.208658,2.567795,2.124687,2.11812,2.135181,1.951861,3.379312,1.390872,3.183114,0.9661,1.113351,0.804677,1.018504,0.72731,0.596536,0.283308,0.313936,0.178676,0.172696,0.15989,0.341091,0.101854,0.318406,1.047965,0.115638
std,7.789617,14.187516,8.059649,2.210812,4.642916,2.67837,3.538398,3.908785,4.439285,2.30398,1.294671,2.660851,1.532203,0.700086,7.958905,1.171087,5.273192,1.458855,1.956906,0.993244,1.896526,0.528672,0.721724,0.257529,0.420485,0.299029,0.178848,0.165079,0.187186,0.136137,2.351237,4.499114,0.395299
min,1990.0,1.428594,0.582273,0.298003,0.327133,0.044752,0.684583,0.040714,0.008251,0.278404,0.192933,0.010881,0.056244,0.313646,7e-06,0.10161,0.000593,0.045202,0.003746,0.027986,0.001092,0.055781,0.00188,0.002316,0.01275,7.1e-05,0.001797,0.004848,0.056913,0.007108,0.000164,0.001,0.001
25%,1996.0,18.392261,6.889108,2.17372,3.216611,0.999013,3.112286,0.804252,0.177489,1.35173,1.317,0.243517,0.890475,1.466308,0.078444,0.665231,0.00699,0.238664,0.08827,0.120475,0.055799,0.36957,0.036922,0.072907,0.074507,0.000798,0.057159,0.0368,0.19175,0.040033,0.019407,0.029,0.003
50%,2003.0,30.5868,12.932293,3.458337,5.074182,2.444242,4.997018,4.035269,0.737304,1.907809,1.793883,0.906269,1.76027,1.922355,0.414587,1.124024,0.17917,0.504646,0.383222,0.360732,0.305886,0.611934,0.235853,0.205875,0.149517,0.017057,0.115017,0.109705,0.326486,0.067295,0.039751,0.112,0.012
75%,2010.0,38.49907,20.770595,5.157334,8.311211,4.195622,8.228956,7.841307,5.318225,2.867473,2.510589,3.241578,3.055168,2.272682,2.332775,1.806541,4.291608,0.979861,1.355008,1.097441,1.228803,0.960172,1.018433,0.41189,0.378919,0.270626,0.224844,0.240806,0.44737,0.115436,0.084008,0.5125,0.063
max,2016.0,67.387681,33.617499,16.289395,35.816187,16.672481,20.035185,17.806831,25.184488,20.900877,11.647082,16.465861,9.947692,5.15873,62.19363,15.412018,24.425962,14.229261,35.545008,6.981346,35.518698,4.510948,3.414353,1.592211,3.080525,2.276513,1.313466,1.583289,1.343686,1.165286,65.294199,82.317,5.877


Veride 5292 satır yer almaktadır. Hastalıklar yatayda ayrı kolonlarda yüzdelikler olarak tutulmaktadır. Herhangi bir ülkede herhangi bir sebeple ölüme yol açan en büyük oran % 82.317 ile çatışmalara (Conflict (%))' a aittir.
Veri tipleri uygun olup dönüştürmemiz gereken bir durum yoktur.

In [10]:
data1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5292 entries, 0 to 6155
Data columns (total 35 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   country                                   5292 non-null   object 
 1   country_code                              5292 non-null   object 
 2   year                                      5292 non-null   int64  
 3   Cardiovascular diseases (%)               5292 non-null   float64
 4   Cancers (%)                               5292 non-null   float64
 5   Respiratory diseases (%)                  5292 non-null   float64
 6   Diabetes (%)                              5292 non-null   float64
 7   Dementia (%)                              5292 non-null   float64
 8   Lower respiratory infections (%)          5292 non-null   float64
 9   Neonatal deaths (%)                       5292 non-null   float64
 10  Diarrheal diseases (%)              

Kolon isimlerindeki " (%)" işaretleri çıkarılmıştır.

In [11]:
data1.columns=data1.columns.str.replace('[(%)]','').str.strip()
data1.columns



Index(['country', 'country_code', 'year', 'Cardiovascular diseases', 'Cancers',
       'Respiratory diseases', 'Diabetes', 'Dementia',
       'Lower respiratory infections', 'Neonatal deaths', 'Diarrheal diseases',
       'Road accidents', 'Liver disease', 'Tuberculosis', 'Kidney disease',
       'Digestive diseases', 'HIV/AIDS', 'Suicide', 'Malaria', 'Homicide',
       'Nutritional deficiencies', 'Meningitis', 'Protein-energy malnutrition',
       'Drowning', 'Maternal deaths', 'Parkinson disease', 'Alcohol disorders',
       'Intestinal infectious diseases', 'Drug disorders', 'Hepatitis', 'Fire',
       'Heat-related hot and cold exposure', 'Natural disasters', 'Conflict',
       'Terrorism'],
      dtype='object')

## 3. Unpivoting the columns related to death causes
Verinin pandas fonksiyonlarıyla daha kolay analiz edilmesi için verideki ölüm sebepleri bir kolona, ölüm oranları bir kolona gelecek şekilde yeniden şekillendirilmiştir.

In [12]:
mortality_reasons = data1.columns[3:]
data1 = data1.groupby(['country', 'country_code', 'year'])[mortality_reasons].mean().reset_index()
data_melted = data1.melt(id_vars=['country', 'country_code', 'year'], var_name='mortality_reasons', value_name='percent')
data_melted.head()


Unnamed: 0,country,country_code,year,mortality_reasons,percent
0,Afghanistan,AFG,1990,Cardiovascular diseases,17.610397
1,Afghanistan,AFG,1991,Cardiovascular diseases,17.801807
2,Afghanistan,AFG,1992,Cardiovascular diseases,18.386833
3,Afghanistan,AFG,1993,Cardiovascular diseases,18.959646
4,Afghanistan,AFG,1994,Cardiovascular diseases,19.089513


## 4. Visualization of the max death causes by country and year


In the first visualization, it was desired to find the cause with the highest mortality rate in each country by year. The aim is to see the change in the most common causes of death depending on the years, and to find answers to questions such as is there a grouping in the world in the causes of death with the highest percentage.

In [20]:
data_year_and_country_based_max = data_melted.loc[data_melted.groupby(['country', 'country_code', 'year'])['percent'].idxmax()]

## Choropleth map
import plotly.express as px
fig = px.choropleth(data_year_and_country_based_max, locations="country_code", color="mortality_reasons", 
                    hover_name="country", hover_data = ['percent'], animation_frame="year", range_color=[0, 0.5], labels={"mortality_reasons": "mortality reasons"})
fig.update_layout({'title': {'text': "Ülke bazında en yüksek orana sahip ölüm sebeplerinin yıllara göre değişimi", 'x': 0.45, 'y': 0.95}})
fig.show()

Table 1: Causes of death with the highest rate by country and year

According to Table 1:
*  The predominant cause of death worldwide is cardiovascular disease.
*  In South Africa, especially in the early years, in many countries, besides cardiovascular disease, diarrhea, tuberculosis and HIV/AIDS-related diseases caused the most deaths.
*  Again, malaria in Africa and Bangladesh, lower respiratory infections diseases in African and Latin American countries are striking.
*  Since the beginning of the 2000s, cancer has been observed as the most common cause of death in Greenland and France, followed by the Americas.

## 5. Visualization of the top 5 diseases that cause the most deaths by continent and year
Continent names were needed to show the first 5 diseases on a continent basis by bar graphs.
Since we do not have continent names in our data, let's take it from another data source and combine it and draw bar graphs of the top 5 disturbances by continent and year.

In [14]:
#Verinin okunması ve kullanılacak kolonların seçilmesi
country_contitent = pd.read_csv('https://gist.githubusercontent.com/stevewithington/20a69c0b6d2ff846ea5d35e5fc47f26c/raw/13716ceb2f22b5643ce5e7039643c86a0e0c6da6/country-and-continent-codes-list-csv.csv')
country_contitent = country_contitent[['Continent_Name', 'Country_Name', 'Three_Letter_Country_Code' ]]
country_contitent.head()

Unnamed: 0,Continent_Name,Country_Name,Three_Letter_Country_Code
0,Asia,"Afghanistan, Islamic Republic of",AFG
1,Europe,"Albania, Republic of",ALB
2,Antarctica,Antarctica (the territory South of 60 deg S),ATA
3,Africa,"Algeria, People's Democratic Republic of",DZA
4,Oceania,American Samoa,ASM


In [15]:
#Verilerin ülke kodları üzerinden birleştirilmesi
data_with_contitent = pd.merge(data_melted, country_contitent, how = 'inner', left_on= 'country_code', right_on= 'Three_Letter_Country_Code')
data_with_contitent.head()

Unnamed: 0,country,country_code,year,mortality_reasons,percent,Continent_Name,Country_Name,Three_Letter_Country_Code
0,Afghanistan,AFG,1990,Cardiovascular diseases,17.610397,Asia,"Afghanistan, Islamic Republic of",AFG
1,Afghanistan,AFG,1991,Cardiovascular diseases,17.801807,Asia,"Afghanistan, Islamic Republic of",AFG
2,Afghanistan,AFG,1992,Cardiovascular diseases,18.386833,Asia,"Afghanistan, Islamic Republic of",AFG
3,Afghanistan,AFG,1993,Cardiovascular diseases,18.959646,Asia,"Afghanistan, Islamic Republic of",AFG
4,Afghanistan,AFG,1994,Cardiovascular diseases,19.089513,Asia,"Afghanistan, Islamic Republic of",AFG


In [16]:
# Kıta ve yıllar bazında  ilk 5 rahatsızlığın elde edilmesi
top_reasons = data_with_contitent.groupby(['Continent_Name', 'year' ,'mortality_reasons'])['percent'].mean().reset_index()

results = []
for name, group in top_reasons.groupby(['Continent_Name', 'year']):
    top_5 = group.sort_values(by='percent', ascending=False).head(5)
    results.append(top_5)

# Concatenate the results into a single DataFrame
top_reasons_df = pd.concat(results)
top_reasons_df = pd.DataFrame(top_reasons_df)
top_reasons_df.head(10)


Unnamed: 0,Continent_Name,year,mortality_reasons,percent
2,Africa,1990,Cardiovascular diseases,14.054308
18,Africa,1990,Lower respiratory infections,10.434937
6,Africa,1990,Diarrheal diseases,10.052056
23,Africa,1990,Neonatal deaths,8.979096
19,Africa,1990,Malaria,6.380664
34,Africa,1991,Cardiovascular diseases,14.073641
50,Africa,1991,Lower respiratory infections,10.282906
38,Africa,1991,Diarrheal diseases,9.973767
55,Africa,1991,Neonatal deaths,8.883286
51,Africa,1991,Malaria,6.486841


In [17]:
#Verinin görselleştirilmesi
fig_bar = px.bar(top_reasons_df, x="mortality_reasons", y="percent",  
                 animation_frame="year",  facet_col = 'Continent_Name', color ='mortality_reasons', hover_data = [ 'percent'],
                 width = 1500, height = 800, labels = {
        'mortality_reasons': '',
        'percent': 'Yüzde',
        'Continent_Name': 'Kıta',
        'year':'Sene'
    } ,  color_discrete_sequence=px.colors.qualitative.T10)

#fig_bar.update_yaxes(showgrid=False)
fig_bar.update_layout({'title': {'text': "Kıtalar bazında ölüme sebep olan ilk 5 hastalığın yıllara göre değişimi ", 'x': 0.45, 'y': 0.85}})
fig_bar.update_xaxes(categoryorder='total descending')

#fig_bar.update_layout({'xaxis':{'title':{'text':''}}, 'yaxis':{'title':{'text':'%'}}})
fig_bar.update_traces(hovertemplate=None)

fig_bar.update_layout(
                        hovermode="x unified",
                        
                        xaxis_title=' ', yaxis_title=" ",
                        
                        title_font=dict(size=25, color='#a5a7ab', family="Lato, sans-serif"),
                        font=dict(color='#8a8d93'),
                        legend=dict(orientation="v", yanchor="bottom", y=1.07, xanchor="right", x=1)
                          )

fig_bar['layout']['updatemenus'][0]['pad']=dict(r= 10, t= 180)
fig_bar['layout']['sliders'][0]['pad']=dict(r= 10, t= 180,)


names={'Africa':'Plot1','Asia':'Plot2', 'Europe':'Plot3','North America':'Plot4', 'Oceania':'Plot5','South America':'Plot6' }
for i, label in enumerate(names):
    fig_bar.layout.annotations[i]['text'] = label


fig_bar.show()

Table 2: Top 5 diseases that cause the most deaths by continent and years

*  Cardiovascular diseases cause the most deaths every year on all continents.
*  Deaths due to cardiovascular diseases are mostly in the European continent.

*  Diseases that are specific to the African continent but cannot be included in the top 5 in other continents in any year are HIV/AIDS, diarrhea and malaria.
*  Dementia, respiratory diseases, natural disasters, and diabetes have not been in the top 5 at any time in the African continent.
*  After cardiovascular diseases for many years in Africa, HIV/AIDS caused the most death toll.
*  Until 2005, deaths from respiratory diseases in any continent other than Europe and Oceania did not enter the top 5 . After 2005, it has been in the top 5 in Asia and South America.
*  Conflict-related death rates were only in the top 5 in the Asian continent in 2014 and 2016.



## 6. Comparison of mortality rates due to cardiovascular diseases in Turkey with the World, Central Europe and Central Asia by years
Cardiovascular diseases with line and scatter graphs
*  Turkey-World
*  Turkey-Central Europe
*  Turkey-Central Asia
The rates of deaths between years were compared.

In [21]:
data_turkey_world = data[data['country'].isin(['World', 'Turkey', 'Central Europe', 'Central Asia' ]) ]
data_turkey_world.columns=data_turkey_world.columns.str.replace('[(%)]','').str.strip()
data_turkey_world_melted = data_turkey_world.melt(id_vars=['country', 'country_code', 'year'], var_name='mortality_reasons', value_name='percent')
data_turkey_world_melted.sort_values(by = 'country')
data_turkey_world_melted_cardio = data_turkey_world_melted[data_turkey_world_melted['mortality_reasons']== 'Cardiovascular diseases']
data_turkey_world_melted_cardio.drop(['country_code'], axis = 1, inplace = True)


fig = go.Figure()
for place in ['Turkey', 'World', 'Central Europe', 'Central Asia']:
    df = data_turkey_world_melted_cardio[data_turkey_world_melted_cardio.country == place]
    fig.add_trace(go.Scatter(
                x = df['year'],
                y = df['percent'], 
                mode = 'lines', name = place))#not line plot
fig.update_layout({'title': {'text': "Kardiyovasküler rahatsızlığa bağlı ölüm oranlarının yıllara göre değişimi", 'x': 0.5, 'y': 0.9}})
fig.update_layout({'xaxis':{'title':{'text':''}}, 'yaxis':{'title':{'text':'%'}}})

my_sliders = [
    {'steps':[
        
    {'method':'update', 'label':'All Places', 
    'args':[{'visible': [True, True, True, True]}]},
        
    {'method':'update', 'label':'Turkey vs World', 
    'args':[{'visible': [True, True, False, False]}]},
        
    {'method':'update', 'label':'Turkey vs Central Europe', 
    'args':[{'visible': [True, False, True, False]}]},
         
    {'method':'update', 'label':'Turkey vs Central Asia', 
    'args':[{'visible': [True, False, False, True]}]},
    
    ]}  ]
fig.update_layout({'sliders':my_sliders})

my_button = [
   { "label":"Line plot","method": "update", "args":[{"type":"scatter", "mode":"lines"}]},
   {"label": "Scatter plot", "method":"update", "args":[{"type":"scatter", "mode":"markers"}]}
      ]

fig.update_layout({"updatemenus":[{"type":"buttons",
                                   "direction":"down",
                                   "x":1.1, "y":0.7,
                                   "showactive":True,
                                   "active":0,
                                   "buttons":my_button}]})



fig.show()


The default value of regex will change from True to False in a future version.



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Table 3: Comparison of mortality rates due to cardiovascular diseases in Turkey, the World, Central Europe and Central Asia by years

*  Mortality rates due to cardiovascular diseases increase over time in Turkey.
*  Mortality rates due to cardiovascular diseases in Turkey are above the world average, but are quite low compared to underdeveloped Central Asia and highly developed Central Europe.
*  Central Europe was only able to reduce the death rate from cardiovascular diseases to 51% over time, from the rate of 55%.
*  Contrary to Central Europe, death rates due to cardiovascular diseases have increased over time in the World and Central Asia, like Turkey.
