# Project: ANNUAL NUMBER OF DEATH BY CAUSE (1990 - 2019)
- by Ifeoma Augusta Adigwe

![image_2023-11-21_131926832.png](attachment:image_2023-11-21_131926832.png)

#### OBJECTIVE: 
"Annual Number of Death by Cause" - Identify the major causes of death, the risk factors associated with them, and the potential interventions to prevent or reduce them. Use infographics to make the visualization more appealing.

#### Introduction:

This dataset was gotten from www.fp20analytics.com monthly Analytics challenges(September 2023 version)
- The "Annual Causes of Death Analysis (1990-2019)" dataset provides a valuable resource for exploring and understanding global mortality trends over a 30-year period. 
- The analysis enhances the dataset's accessibility and analytical capabilities, making it a useful tool for researchers and stakeholders in the field of public health and beyond.
- The aim is to deliver an insightful report that explores the various factors that contribute to the deaths and provides a holistic view of the situation. 


#### Dataset (6060 rows × 34 columns):
- Country: Names of all countries available in the Dataset.
- Country_Code: Country codes representing reach country
- Year: The year under review.From 1990 to 2019
- Different illness and causes of death

In [1]:
# Import all necessary libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected=True)
import plotly.express as px

In [2]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
# Load the dataset

data = pd.read_excel(r'C:\Users\IfeomaAugustaAdigwe\Desktop\DATASETS Pool\Annual Death dataset.xlsx')
data.head(10).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
Country,Afghanistan,Afghanistan,Afghanistan,Afghanistan,Afghanistan,Afghanistan,Afghanistan,Afghanistan,Afghanistan,Afghanistan
Country Code,AFG,AFG,AFG,AFG,AFG,AFG,AFG,AFG,AFG,AFG
Year,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999
Meningitis,2159,2218,2475,2812,3027,3102,3193,3304,3281,3200
Alzheimer's disease and other dementias,1116,1136,1162,1187,1211,1225,1239,1253,1267,1281
Parkinson's disease,371,374,378,384,391,394,398,402,405,409
Nutritional deficiencies,2087,2153,2441,2837,3081,3131,3175,3250,3193,3115
Malaria,93,189,239,108,211,175,175,240,563,468
Drowning,1370,1391,1514,1687,1809,1881,1969,2078,2098,2084
Interpersonal violence,1538,2001,2299,2589,2849,2969,3331,3028,3098,2917


In [4]:
data.columns

Index(['Country', 'Country Code', 'Year', ' Meningitis',
       ' Alzheimer's disease and other dementias', ' Parkinson's disease',
       ' Nutritional deficiencies', ' Malaria', ' Drowning',
       ' Interpersonal violence', ' Maternal disorders', ' HIV/AIDS',
       ' Drug use disorders', ' Tuberculosis', ' Cardiovascular diseases',
       ' Lower respiratory infections', ' Neonatal disorders',
       ' Alcohol use disorders', ' Self-harm', ' Exposure to forces of nature',
       ' Diarrheal diseases', ' Environmental heat and cold exposure',
       ' Neoplasms', ' Conflict and terrorism', ' Diabetes mellitus',
       ' Chronic kidney disease', ' Poisonings',
       ' Protein-energy malnutrition', ' Road injuries',
       ' Chronic respiratory diseases',
       ' Cirrhosis and other chronic liver diseases', ' Digestive diseases',
       ' Fire, heat, and hot substances', ' Acute hepatitis'],
      dtype='object')

In [5]:
data.describe().T.style.bar(subset=["mean"],
                        color='#205ff2').background_gradient(subset=['std'],
                                                             cmap='Reds').background_gradient(subset=['50%'],
                                                                                              cmap='coolwarm')

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Year,6060.0,2004.5,8.656156,1990.0,1997.0,2004.5,2012.0,2019.0
Meningitis,6060.0,1731.54505,6703.684523,0.0,16.0,109.0,821.5,98358.0
Alzheimer's disease and other dementias,6060.0,4911.766667,18304.345146,0.0,94.75,681.5,2493.5,320715.0
Parkinson's disease,6060.0,1184.580858,4637.523327,0.0,28.0,167.0,616.0,76990.0
Nutritional deficiencies,6060.0,2269.477393,10533.981446,0.0,9.0,119.0,1158.25,268223.0
Malaria,6060.0,4159.731023,18516.240607,0.0,0.0,0.0,371.25,280604.0
Drowning,6060.0,1698.485809,8919.536829,0.0,35.0,177.0,704.25,153773.0
Interpersonal violence,6060.0,2102.503465,6948.541596,0.0,41.0,273.0,881.0,69640.0
Maternal disorders,6060.0,1270.860891,6087.168562,0.0,5.0,54.0,726.25,107929.0
HIV/AIDS,6060.0,5954.638449,21109.210296,0.0,11.0,136.0,1833.25,305491.0


In [6]:
print(data.shape)

(6060, 34)


In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6060 entries, 0 to 6059
Data columns (total 34 columns):
 #   Column                                       Non-Null Count  Dtype 
---  ------                                       --------------  ----- 
 0   Country                                      6060 non-null   object
 1   Country Code                                 6060 non-null   object
 2   Year                                         6060 non-null   int64 
 3    Meningitis                                  6060 non-null   int64 
 4    Alzheimer's disease and other dementias     6060 non-null   int64 
 5    Parkinson's disease                         6060 non-null   int64 
 6    Nutritional deficiencies                    6060 non-null   int64 
 7    Malaria                                     6060 non-null   int64 
 8    Drowning                                    6060 non-null   int64 
 9    Interpersonal violence                      6060 non-null   int64 
 10   Maternal di

In [8]:
data.isnull().sum()

Country                                        0
Country Code                                   0
Year                                           0
 Meningitis                                    0
 Alzheimer's disease and other dementias       0
 Parkinson's disease                           0
 Nutritional deficiencies                      0
 Malaria                                       0
 Drowning                                      0
 Interpersonal violence                        0
 Maternal disorders                            0
 HIV/AIDS                                      0
 Drug use disorders                            0
 Tuberculosis                                  0
 Cardiovascular diseases                       0
 Lower respiratory infections                  0
 Neonatal disorders                            0
 Alcohol use disorders                         0
 Self-harm                                     0
 Exposure to forces of nature                  0
 Diarrheal diseases 

In [9]:
# To check the unique values in a specific column
print(data['Year'].unique())

[1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
 2018 2019]


In [10]:
# To check the unique values in a specific column
print(data['Country'].unique())

['Afghanistan' 'Albania' 'Algeria' 'American Samoa' 'Andorra' 'Angola'
 'Antigua and Barbuda' 'Argentina' 'Armenia' 'Australia' 'Austria'
 'Azerbaijan' 'Bahamas' 'Bahrain' 'Bangladesh' 'Barbados' 'Belarus'
 'Belgium' 'Belize' 'Benin' 'Bermuda' 'Bhutan' 'Bolivia'
 'Bosnia and Herzegovina' 'Botswana' 'Brazil' 'Brunei' 'Bulgaria'
 'Burkina Faso' 'Burundi' 'Cambodia' 'Cameroon' 'Canada' 'Cape Verde'
 'Chad' 'Chile' 'China' 'Colombia' 'Comoros' 'Congo' 'Cook Islands'
 'Costa Rica' "Cote d'Ivoire" 'Croatia' 'Cuba' 'Cyprus' 'Czechia'
 'Democratic Republic of Congo' 'Denmark' 'Djibouti' 'Dominica'
 'Dominican Republic' 'East Timor' 'Ecuador' 'Egypt' 'El Salvador'
 'Equatorial Guinea' 'Eritrea' 'Estonia' 'Eswatini' 'Ethiopia' 'Fiji'
 'Finland' 'France' 'Gabon' 'Gambia' 'Georgia' 'Germany' 'Ghana' 'Greece'
 'Greenland' 'Grenada' 'Guam' 'Guatemala' 'Guinea' 'Guinea-Bissau'
 'Guyana' 'Haiti' 'Honduras' 'Hungary' 'Iceland' 'India' 'Indonesia'
 'Iran' 'Iraq' 'Ireland' 'Israel' 'Italy' 'Jamaica' 'Jap

# Data Cleaning!!

In [None]:
## Check for Outliers in the whole dataset

def detect_outliers(data):
    outliers = {}
    
    numeric_columns = data.select_dtypes(include=['number']).columns
    
    for column in numeric_columns:
        q1 = data[column].quantile(0.25)
        q3 = data[column].quantile(0.75)
        iqr = q3 - q1
        lower_bound = q1 - 1.5 * iqr
        upper_bound = q3 + 1.5 * iqr
        outliers[column] = data[(data[column] < lower_bound) | (data[column] > upper_bound)]
    
    #return outliers

all_outliers = detect_outliers(data)

for column, outlier_data in all_outliers.items(): # Print the outliers for each numeric column
    #print(f"Outliers in {column}:")
    #print(outlier_data)
    #print("\n")

In [11]:
## Some column names have white-space before and after the death cause

def remove_whitespace_from_column_names(data):
    data.columns = data.columns.str.strip()
    return data

cleaned_data = remove_whitespace_from_column_names(data)

In [12]:
# Convert the 'Year' column to datetime format
#data['Years'] = pd.to_datetime(data['Year'], format='%Y')

In [13]:
# Remove irrelevant columns

#data.drop(['Year'], axis = 1, inplace = True)
data.drop(['Country Code'],axis=1, inplace = True)

In [14]:
#data

Unnamed: 0,Country,Year,Meningitis,Alzheimer's disease and other dementias,Parkinson's disease,Nutritional deficiencies,Malaria,Drowning,Interpersonal violence,Maternal disorders,...,Diabetes mellitus,Chronic kidney disease,Poisonings,Protein-energy malnutrition,Road injuries,Chronic respiratory diseases,Cirrhosis and other chronic liver diseases,Digestive diseases,"Fire, heat, and hot substances",Acute hepatitis
0,Afghanistan,1990,2159,1116,371,2087,93,1370,1538,2655,...,2108,3709,338,2054,4154,5945,2673,5005,323,2985
1,Afghanistan,1991,2218,1136,374,2153,189,1391,2001,2885,...,2120,3724,351,2119,4472,6050,2728,5120,332,3092
2,Afghanistan,1992,2475,1162,378,2441,239,1514,2299,3315,...,2153,3776,386,2404,5106,6223,2830,5335,360,3325
3,Afghanistan,1993,2812,1187,384,2837,108,1687,2589,3671,...,2195,3862,425,2797,5681,6445,2943,5568,396,3601
4,Afghanistan,1994,3027,1211,391,3081,211,1809,2849,3863,...,2231,3932,451,3038,6001,6664,3027,5739,420,3816
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6055,Zimbabwe,2015,1439,754,215,3019,2518,770,1302,1355,...,3176,2108,381,2990,2373,2751,1956,4202,632,146
6056,Zimbabwe,2016,1457,767,219,3056,2050,801,1342,1338,...,3259,2160,393,3027,2436,2788,1962,4264,648,146
6057,Zimbabwe,2017,1460,781,223,2990,2116,818,1363,1312,...,3313,2196,398,2962,2473,2818,2007,4342,654,144
6058,Zimbabwe,2018,1450,795,227,2918,2088,825,1396,1294,...,3381,2240,400,2890,2509,2849,2030,4377,657,139


# Exploratory Data Analysis!!!

In [15]:
data['Total'] = data[['Meningitis', "Alzheimer's disease and other dementias", "Parkinson's disease",
                     'Nutritional deficiencies', 'Malaria', 'Drowning', 'Interpersonal violence',
                     'Maternal disorders', 'HIV/AIDS', 'Drug use disorders', 'Tuberculosis', 'Cardiovascular diseases', 
                      'Lower respiratory infections', 'Neonatal disorders', 'Alcohol use disorders', 
                      'Self-harm', 'Exposure to forces of nature', 'Diarrheal diseases', 'Environmental heat and cold exposure',
                      'Neoplasms', 'Conflict and terrorism', 'Diabetes mellitus', 'Chronic kidney disease', 'Poisonings',
                     'Protein-energy malnutrition', 'Road injuries', 'Chronic respiratory diseases',
                     'Cirrhosis and other chronic liver diseases', 'Digestive diseases',
                     'Fire, heat, and hot substances', 'Acute hepatitis']].sum(axis=1)
data['Total'] 

0       147971
1       156844
2       169156
3       182230
4       194795
         ...  
6055    130080
6056    128274
6057    126515
6058    123506
6059    123540
Name: Total, Length: 6060, dtype: int64

In [16]:
#data

Unnamed: 0,Country,Year,Meningitis,Alzheimer's disease and other dementias,Parkinson's disease,Nutritional deficiencies,Malaria,Drowning,Interpersonal violence,Maternal disorders,...,Chronic kidney disease,Poisonings,Protein-energy malnutrition,Road injuries,Chronic respiratory diseases,Cirrhosis and other chronic liver diseases,Digestive diseases,"Fire, heat, and hot substances",Acute hepatitis,Total
0,Afghanistan,1990,2159,1116,371,2087,93,1370,1538,2655,...,3709,338,2054,4154,5945,2673,5005,323,2985,147971
1,Afghanistan,1991,2218,1136,374,2153,189,1391,2001,2885,...,3724,351,2119,4472,6050,2728,5120,332,3092,156844
2,Afghanistan,1992,2475,1162,378,2441,239,1514,2299,3315,...,3776,386,2404,5106,6223,2830,5335,360,3325,169156
3,Afghanistan,1993,2812,1187,384,2837,108,1687,2589,3671,...,3862,425,2797,5681,6445,2943,5568,396,3601,182230
4,Afghanistan,1994,3027,1211,391,3081,211,1809,2849,3863,...,3932,451,3038,6001,6664,3027,5739,420,3816,194795
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6055,Zimbabwe,2015,1439,754,215,3019,2518,770,1302,1355,...,2108,381,2990,2373,2751,1956,4202,632,146,130080
6056,Zimbabwe,2016,1457,767,219,3056,2050,801,1342,1338,...,2160,393,3027,2436,2788,1962,4264,648,146,128274
6057,Zimbabwe,2017,1460,781,223,2990,2116,818,1363,1312,...,2196,398,2962,2473,2818,2007,4342,654,144,126515
6058,Zimbabwe,2018,1450,795,227,2918,2088,825,1396,1294,...,2240,400,2890,2509,2849,2030,4377,657,139,123506


In [17]:
## Total sum of death present in the dataset 

def sum_total_deaths(data):
    total_death = data['Total'].sum()
    return total_death

total_death = sum_total_deaths(data)
print("Total Deaths:", total_death)

Total Deaths: 1466259138


In [18]:
## Total count of all countries in the Dataset

def unique_country_count(data):
    total_country = data['Country'].nunique()
    return total_country

total_country = unique_country_count(data)
print("Total Country:", total_country)

Total Country: 202


###  Year and country with Most Death in the world

In [19]:
def get_top_20_countries_by_deaths(data):
    data["Total"] = data.drop(['Year', 'Country'], axis=1).sum(axis=1) # Calculate total deaths per row
    grouped_data = data.groupby(['Year', 'Country'])['Total'].sum().reset_index() # Group year & country
    sorted_data = grouped_data.sort_values(by='Total', ascending=False)
    top_20_data = sorted_data.head(20)
    return top_20_data

top_20_countries_data = get_top_20_countries_by_deaths(data)# Call the function(def)
#top_20_countries_data

In [20]:

fig = px.line(top_20_countries_data, x="Year", y="Total", color="Country", text="Total",
              title="Most Death by Year and Country")
fig.update_traces(textposition="top center")
fig.show()

### Total number of death and their Causes

In [21]:
def total_number_of_deaths(data):
    data_new =data.sum(axis=0).reset_index()
    data_new=data_new.rename(columns={'index': 'cause', 0:'total deaths'})
    data_new.drop(data_new.index[[0,1 , -1]],inplace=True)
    data_new.convert_dtypes()
    data_new = data_new.sort_values(by=["total deaths"],ascending=False)
    return data_new

total_deaths_by_cause = total_number_of_deaths(data)# Call the function
total_deaths_by_cause

Unnamed: 0,cause,total deaths
13,Cardiovascular diseases,447575114
21,Neoplasms,229694753
28,Chronic respiratory diseases,104565756
14,Lower respiratory infections,83591057
15,Neonatal disorders,76705852
19,Diarrheal diseases,66002190
30,Digestive diseases,65584951
12,Tuberculosis,45624438
29,Cirrhosis and other chronic liver diseases,37451977
27,Road injuries,36223760


In [22]:
fig = px.bar(total_deaths_by_cause, x='cause', y='total deaths',
             hover_data=['cause', 'total deaths'], color='cause',
             labels={'deaths':'causes of deaths'},width=1200, height=1200, title="Total number of death and their Causes")
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.show()

### These Countries have the highest number of death by cause in the dataset

In [23]:
def get_top_countries_by_cause(data, cause_column, num_countries=5):
    top_countries = []

    for cause in cause_column:
        sorted_data = data.sort_values(by=cause, ascending=False)
        top_countries_by_cause = sorted_data.head(num_countries)
        top_countries.extend(top_countries_by_cause['Country'].tolist())

    return list(set(top_countries))

cause_columns = total_deaths_by_cause['cause']
top_countries = get_top_countries_by_cause(data, cause_columns)

top_countries

['China',
 'United States',
 'Haiti',
 'Nigeria',
 'Myanmar',
 'India',
 'Syria',
 'Bangladesh',
 'Indonesia',
 'Russia',
 'South Africa',
 'Brazil',
 'Rwanda']

### which countries has the highest number of death and which year?

In [24]:
data2 = pd.DataFrame(columns=data.columns)

for itr in total_deaths_by_cause.head().cause:
    top_countries = data.sort_values(by=itr, ascending=False).head(5)
    data2 = data2.append(top_countries)    
data2.reset_index(drop=True, inplace=True)
data2

Unnamed: 0,Country,Year,Meningitis,Alzheimer's disease and other dementias,Parkinson's disease,Nutritional deficiencies,Malaria,Drowning,Interpersonal violence,Maternal disorders,...,Chronic kidney disease,Poisonings,Protein-energy malnutrition,Road injuries,Chronic respiratory diseases,Cirrhosis and other chronic liver diseases,Digestive diseases,"Fire, heat, and hot substances",Acute hepatitis,Total
0,China,2019,6465,320715,76990,16863,0,56524,11970,1537,...,196726,27084,13099,250025,1085273,152262,277142,11096,3726,20885122
1,China,2018,6798,306747,73789,16630,0,57898,12197,1725,...,191351,27623,12907,253799,1054610,149792,273223,10914,3843,20327886
2,China,2017,7228,291962,71490,16572,0,59354,12523,2083,...,187685,28345,12850,257068,1040384,149472,273724,10829,4080,19957306
3,China,2016,7550,275481,69364,15827,0,61049,13076,2510,...,184528,29051,12262,262951,1039047,148104,271648,10789,4234,19628426
4,China,2015,7553,259217,66761,14487,0,61489,13512,2341,...,179638,29323,11176,268987,1029418,148056,270036,10738,4171,19182444
5,China,2019,6465,320715,76990,16863,0,56524,11970,1537,...,196726,27084,13099,250025,1085273,152262,277142,11096,3726,20885122
6,China,2018,6798,306747,73789,16630,0,57898,12197,1725,...,191351,27623,12907,253799,1054610,149792,273223,10914,3843,20327886
7,China,2017,7228,291962,71490,16572,0,59354,12523,2083,...,187685,28345,12850,257068,1040384,149472,273724,10829,4080,19957306
8,China,2016,7550,275481,69364,15827,0,61049,13076,2510,...,184528,29051,12262,262951,1039047,148104,271648,10789,4234,19628426
9,China,2015,7553,259217,66761,14487,0,61489,13512,2341,...,179638,29323,11176,268987,1029418,148056,270036,10738,4171,19182444


In [25]:
i = 0
empty_data = []
while i < 5:
    cause = total_deaths_by_cause.iloc[i]["cause"]
    ls = [data2.iloc[i]["Country"], data2.iloc[i]["Year"], cause, data2.iloc[i][cause]]
    empty_data.append(ls)
    i += 1

data_rank = pd.DataFrame(empty_data, columns=['Country', 'Year', 'Cause of Death', "Deaths"])
data_rank
fig = px.scatter(data_rank, x="Year", y="Deaths",
                 size="Deaths", color="Cause of Death",
                 hover_name="Country", log_x=True, size_max=60)
fig.show()

### Top Countries with Highest Total Deaths

In [26]:
countries_with_highest_death = data.groupby('Country').sum().sort_values(by='Total', ascending=False).head(10)
countries_with_highest_death = countries_with_highest_death.sort_values(by='Total', ascending=False)
countries_with_highest_death

Unnamed: 0_level_0,Year,Meningitis,Alzheimer's disease and other dementias,Parkinson's disease,Nutritional deficiencies,Malaria,Drowning,Interpersonal violence,Maternal disorders,HIV/AIDS,...,Chronic kidney disease,Poisonings,Protein-energy malnutrition,Road injuries,Chronic respiratory diseases,Cirrhosis and other chronic liver diseases,Digestive diseases,"Fire, heat, and hot substances",Acute hepatitis,Total
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
China,60135,480899,5381846,1533092,584236,13418,2873619,776275,243257,433709,...,4195276,770140,507664,8350399,36676826,4918899,8924906,383402,318564,530816212
India,60135,2008944,1707561,756832,3290569,2439244,2110438,1237163,2292449,2454374,...,4556172,170119,2356222,5346154,25232974,6294910,11804380,730580,1672179,476316330
United States,60135,40032,3302609,661288,133044,0,114752,596818,25206,528417,...,2018497,40259,121030,1359744,4949052,1514325,3026943,126712,5851,142395604
Russia,60135,60519,972305,236367,15906,0,423044,1215179,15028,350679,...,325433,298954,8167,1067225,1518195,1233608,2398456,322573,3598,119182310
Indonesia,60135,337724,487566,145752,604467,74664,237902,81342,376966,74981,...,964478,27837,560546,1325509,2559457,2249388,3204787,51790,231909,88093882
Nigeria,60135,1520376,241713,66545,286858,6422063,103723,306846,525566,2216718,...,464656,107604,270470,487695,641714,995203,1716202,110784,119860,87340028
Pakistan,60135,752870,244109,92715,452930,213590,125812,344555,503820,36968,...,874990,67217,296019,421970,2276479,1039240,1523030,75442,293025,76303756
Brazil,60135,136709,924942,171324,333626,39970,250078,1762967,72806,447733,...,820933,7754,312031,1352192,1743233,927027,1776565,59981,8977,65348224
Japan,60135,14330,2638908,308327,38101,0,189041,28447,2698,5140,...,966881,12372,27908,327979,1231229,755502,1463405,52365,6876,63845614
Germany,60135,10086,1114977,264727,11627,0,17698,24943,1488,28389,...,616900,3303,8354,211341,1117582,654268,1335653,20982,2446,51119334


In [27]:
top_countries = data.groupby('Country').sum().sort_values(by='Total', ascending=False).head(10).reset_index()

fig = px.bar(top_countries, x='Country', y='Total', title='Top Countries with Highest Total Deaths', labels={'Country': 'Country', 'Total': 'Total Deaths'})
fig.show()


### Comparison of Related Causes of Death in Top 5 Countries

In [28]:
top_countries = data.groupby('Country').sum().sort_values(by='Total', ascending=False).head(10) # Select 5top countries
#top_countries

In [29]:
top_countries = data.groupby('Country').sum().sort_values(by='Total', ascending=False).head(10) # Select 5top countries

related_causes = ['Drug use disorders', 'Alcohol use disorders', 'Cirrhosis and other chronic liver diseases',
                  'Acute hepatitis', 'Exposure to forces of nature'] # Specify the related causes of death

filtered_data = top_countries[related_causes].reset_index()# Filter the data

melted_data = filtered_data.melt(id_vars=['Country'], value_vars=related_causes,
                                 var_name='Cause of Death', value_name='Number of Deaths')

fig = px.bar(melted_data, x='Country', y='Number of Deaths', color='Cause of Death',
             title='Comparison of Related Causes of Death in Top 10 Countries',
             labels={'Country': 'Country', 'Number of Deaths': 'Number of Deaths'},
             barmode='group')
fig.show()

### Most Common Death causes from 5 different Countries

In [30]:
top_countries = countries_with_highest_death.head(10)# Select the top 5 countries

selected_causes = ['Malaria', 'HIV/AIDS', 'Tuberculosis', 'Diarrheal diseases', 'Cardiovascular diseases']

filtered_data = top_countries[selected_causes].reset_index()# Filter the data

melted_data = filtered_data.melt(id_vars=['Country'], value_vars=selected_causes,
                                 var_name='Cause of Death', value_name='Number of Deaths') # Melt the DataFrame to have a tidy format for plotting

fig = px.histogram(melted_data, x='Cause of Death', y='Number of Deaths', color='Country',
                   title='Deaths from Different Causes in Top 10 Countries',
                   labels={'Cause of Death': 'Cause of Death', 'Number of Deaths': 'Number of Deaths'},
                   histfunc='sum', barmode='group')

fig.show()

### Death Caused by Malaria in Different Countries and Years

In [31]:
malaria_deaths = data[data['Malaria'] > 0]# Filter the data for rows where the 'Malaria' column has non-zero values
print(malaria_deaths[['Country', 'Year', 'Malaria']])

          Country  Year  Malaria
0     Afghanistan  1990       93
1     Afghanistan  1991      189
2     Afghanistan  1992      239
3     Afghanistan  1993      108
4     Afghanistan  1994      211
...           ...   ...      ...
6055     Zimbabwe  2015     2518
6056     Zimbabwe  2016     2050
6057     Zimbabwe  2017     2116
6058     Zimbabwe  2018     2088
6059     Zimbabwe  2019     2068

[2659 rows x 3 columns]


In [32]:
malaria_deaths = data[data['Malaria'] > 0]

fig = px.bar(malaria_deaths, x='Country', y='Malaria', color='Year',
             title='Deaths Caused by Malaria in Different Countries and Years',
             labels={'Country': 'Country', 'Malaria': 'Number of Deaths'})

fig.show()

### Special interest - Total Deaths by Cause in Germany

In [33]:
deu = data[data["Country"] == "Germany"]
data_cols = data.columns.drop(['Total']) 
deu = deu[data_cols]
deu

Unnamed: 0,Country,Year,Meningitis,Alzheimer's disease and other dementias,Parkinson's disease,Nutritional deficiencies,Malaria,Drowning,Interpersonal violence,Maternal disorders,...,Diabetes mellitus,Chronic kidney disease,Poisonings,Protein-energy malnutrition,Road injuries,Chronic respiratory diseases,Cirrhosis and other chronic liver diseases,Digestive diseases,"Fire, heat, and hot substances",Acute hepatitis
2010,Germany,1990,595,29984,5811,210,0,934,1040,108,...,21676,9348,147,108,12388,39888,22263,42169,908,240
2011,Germany,1991,566,30640,5837,180,0,901,1110,96,...,22156,8997,145,93,12243,39376,23277,43220,900,235
2012,Germany,1992,524,31442,5842,164,0,858,1157,82,...,22421,8764,140,87,11731,38572,23676,43420,865,222
2013,Germany,1993,500,32298,5947,153,0,818,1167,70,...,23254,8820,137,82,11215,38451,24280,44078,841,206
2014,Germany,1994,476,33246,6102,151,0,802,1179,66,...,23467,9130,133,82,10761,38081,24691,44587,811,184
2015,Germany,1995,451,33953,6345,160,0,766,1158,62,...,23504,9868,131,88,10328,37814,24919,44911,793,137
2016,Germany,1996,436,34322,6556,158,0,715,1101,70,...,23133,10421,127,87,9837,37048,24844,44831,887,125
2017,Germany,1997,410,34603,6704,159,0,701,1032,66,...,22080,11004,122,88,9318,35630,24382,44163,739,111
2018,Germany,1998,383,34797,6874,155,0,659,1040,55,...,20652,12092,116,82,8717,34386,23791,43501,706,98
2019,Germany,1999,360,35251,7078,155,0,652,1023,50,...,20019,13183,110,83,8371,33596,23523,43341,682,78


In [34]:
germany_data = data[data['Country'] == 'Germany']
germany_data = germany_data.drop(['Country', 'Year', 'Total'], axis=1)

germany_cause_totals = germany_data.sum()# Sum the data by cause

germany_cause_totals = germany_cause_totals.reset_index()# Reset index and rename columns
germany_cause_totals.columns = ['Cause', 'TotalDeaths']

germany_cause_totals = germany_cause_totals.sort_values(by='TotalDeaths', ascending=False)# Sort the data

fig = px.bar(germany_cause_totals, x='Cause', y='TotalDeaths', 
             title='Total Deaths by Cause in Germany', 
             labels={'Cause': 'Cause of Death', 'TotalDeaths': 'Total Deaths'})

fig.show()

### Special Interest - Total Deaths by Cause in Nigeria

In [35]:
nrg = data[data["Country"] == "Nigeria"]
data_cols = data.columns.drop(['Total']) 
nrg = nrg[data_cols]
nrg

Unnamed: 0,Country,Year,Meningitis,Alzheimer's disease and other dementias,Parkinson's disease,Nutritional deficiencies,Malaria,Drowning,Interpersonal violence,Maternal disorders,...,Diabetes mellitus,Chronic kidney disease,Poisonings,Protein-energy malnutrition,Road injuries,Chronic respiratory diseases,Cirrhosis and other chronic liver diseases,Digestive diseases,"Fire, heat, and hot substances",Acute hepatitis
3870,Nigeria,1990,40226,4984,1431,10236,148931,2887,6579,11879,...,12194,11660,2937,9769,11832,17749,25713,43375,2876,4292
3871,Nigeria,1991,41349,5107,1460,10517,157502,2977,7140,12398,...,12561,11928,2908,10039,12185,17994,26125,44145,2948,4475
3872,Nigeria,1992,42711,5255,1496,10624,165722,3071,7951,12918,...,13022,12257,3007,10141,12663,18313,26664,45145,3037,4628
3873,Nigeria,1993,44166,5395,1528,10693,173695,3165,8438,13396,...,13453,12536,3107,10210,13082,18594,27081,46006,3133,4666
3874,Nigeria,1994,45460,5540,1557,11053,180588,3256,7467,13962,...,13851,12756,3189,10552,13421,18837,27378,46688,3205,4825
3875,Nigeria,1995,46492,5654,1578,11401,186263,3313,7634,14061,...,14172,12887,3252,10874,13667,19031,27473,47083,3265,5022
3876,Nigeria,1996,59006,5754,1592,11568,191437,3367,7900,14618,...,14434,13000,3296,11029,13879,19112,27460,47296,3307,5067
3877,Nigeria,1997,48278,5894,1614,11913,197467,3417,8586,15300,...,14804,13176,3362,11351,14199,19311,27746,47939,3365,5178
3878,Nigeria,1998,50009,6041,1637,12155,202843,3478,8500,16126,...,15193,13336,3438,11574,14574,19542,28018,48666,3435,5246
3879,Nigeria,1999,50499,6186,1661,12206,207571,3528,9369,16891,...,15568,13492,3503,11614,14916,19824,28428,49645,3489,5219


In [36]:
def total_deaths_by_cause(country_name, data):
    country_data = data[data['Country'] == country_name]
    country_data = country_data.drop(['Country', 'Year', 'Total'], axis=1)
    
    cause_totals = country_data.sum()
    cause_totals = cause_totals.reset_index()
    cause_totals.columns = ['Cause', 'TotalDeaths']
    cause_totals = cause_totals.sort_values(by='TotalDeaths', ascending=False)
    
    return cause_totals

def plot_total_deaths_by_cause(cause_totals, country_name):
    fig = px.bar(cause_totals, x='Cause', y='TotalDeaths', 
                 title=f'Total Deaths by Cause in {country_name}', 
                 labels={'Cause': 'Cause of Death', 'TotalDeaths': 'Total Deaths'})
    fig.show()


country_name = 'Nigeria'  # Specify country name or change country name to view other country's info

nigeria_cause_totals = total_deaths_by_cause(country_name, data)# Call the functions
plot_total_deaths_by_cause(nigeria_cause_totals, country_name)

### Related Causes of Death in Top 10 Countries

In [37]:
top_countries = data.groupby('Country').sum().sort_values(by='Total', ascending=False) # Select 5top countries
top_countries

Unnamed: 0_level_0,Year,Meningitis,Alzheimer's disease and other dementias,Parkinson's disease,Nutritional deficiencies,Malaria,Drowning,Interpersonal violence,Maternal disorders,HIV/AIDS,...,Chronic kidney disease,Poisonings,Protein-energy malnutrition,Road injuries,Chronic respiratory diseases,Cirrhosis and other chronic liver diseases,Digestive diseases,"Fire, heat, and hot substances",Acute hepatitis,Total
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
China,60135,480899,5381846,1533092,584236,13418,2873619,776275,243257,433709,...,4195276,770140,507664,8350399,36676826,4918899,8924906,383402,318564,530816212
India,60135,2008944,1707561,756832,3290569,2439244,2110438,1237163,2292449,2454374,...,4556172,170119,2356222,5346154,25232974,6294910,11804380,730580,1672179,476316330
United States,60135,40032,3302609,661288,133044,0,114752,596818,25206,528417,...,2018497,40259,121030,1359744,4949052,1514325,3026943,126712,5851,142395604
Russia,60135,60519,972305,236367,15906,0,423044,1215179,15028,350679,...,325433,298954,8167,1067225,1518195,1233608,2398456,322573,3598,119182310
Indonesia,60135,337724,487566,145752,604467,74664,237902,81342,376966,74981,...,964478,27837,560546,1325509,2559457,2249388,3204787,51790,231909,88093882
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Palau,60135,0,45,30,0,0,60,66,1,18,...,274,20,0,102,439,93,151,30,0,9628
Cook Islands,60135,0,85,34,0,0,30,0,0,16,...,136,0,0,142,213,60,121,0,0,7998
Tuvalu,60135,5,35,30,14,0,30,30,5,7,...,108,0,14,80,248,60,120,18,0,5924
Nauru,60135,19,0,0,22,0,56,30,23,6,...,81,0,22,109,128,60,84,0,0,4498


In [41]:
top_countries = data.groupby('Country').sum().sort_values(by='Total', ascending=False).head(10) # Select 5top countries

related_causes = ['Drug use disorders', 'Alcohol use disorders', 'Cirrhosis and other chronic liver diseases',
                  'Acute hepatitis', 'Exposure to forces of nature'] # Specify the related causes of death

filtered_data = top_countries[related_causes].reset_index()# Filter the data

melted_data = filtered_data.melt(id_vars=['Country'], value_vars=related_causes,
                                 var_name='Cause of Death', value_name='Number of Deaths')

fig = px.bar(melted_data, x='Country', y='Number of Deaths', color='Cause of Death',
             title='Comparison of Related Causes of Death in Top 10 Countries',
             labels={'Country': 'Country', 'Number of Deaths': 'Number of Deaths'},
             barmode='group')
fig.show()

In [39]:
data

Unnamed: 0,Country,Year,Meningitis,Alzheimer's disease and other dementias,Parkinson's disease,Nutritional deficiencies,Malaria,Drowning,Interpersonal violence,Maternal disorders,...,Chronic kidney disease,Poisonings,Protein-energy malnutrition,Road injuries,Chronic respiratory diseases,Cirrhosis and other chronic liver diseases,Digestive diseases,"Fire, heat, and hot substances",Acute hepatitis,Total
0,Afghanistan,1990,2159,1116,371,2087,93,1370,1538,2655,...,3709,338,2054,4154,5945,2673,5005,323,2985,295942
1,Afghanistan,1991,2218,1136,374,2153,189,1391,2001,2885,...,3724,351,2119,4472,6050,2728,5120,332,3092,313688
2,Afghanistan,1992,2475,1162,378,2441,239,1514,2299,3315,...,3776,386,2404,5106,6223,2830,5335,360,3325,338312
3,Afghanistan,1993,2812,1187,384,2837,108,1687,2589,3671,...,3862,425,2797,5681,6445,2943,5568,396,3601,364460
4,Afghanistan,1994,3027,1211,391,3081,211,1809,2849,3863,...,3932,451,3038,6001,6664,3027,5739,420,3816,389590
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6055,Zimbabwe,2015,1439,754,215,3019,2518,770,1302,1355,...,2108,381,2990,2373,2751,1956,4202,632,146,260160
6056,Zimbabwe,2016,1457,767,219,3056,2050,801,1342,1338,...,2160,393,3027,2436,2788,1962,4264,648,146,256548
6057,Zimbabwe,2017,1460,781,223,2990,2116,818,1363,1312,...,2196,398,2962,2473,2818,2007,4342,654,144,253030
6058,Zimbabwe,2018,1450,795,227,2918,2088,825,1396,1294,...,2240,400,2890,2509,2849,2030,4377,657,139,247012


#### INSIGHTS:

- Worldwide there is 1466million number of Deaths.
- Average death per year worldwide is 48.88 million.
- China has the highest count of deaths, followed by India as second contributor, United states as third contributor, Russia as fourth, and Indonesia as fifth contributor
- Cardiovascular diseases is the major contributor towards the death count worlwide
- 2019 had the maximum count of deaths