About Dataset
Context
A straightforward way to assess the health status of a population is to focus on mortality – or concepts like child mortality or life expectancy, which are based on mortality estimates. A focus on mortality, however, does not take into account that the burden of diseases is not only that they kill people, but that they cause suffering to people who live with them. Assessing health outcomes by both mortality and morbidity (the prevalent diseases) provides a more encompassing view on health outcomes. This is the topic of this entry. The sum of mortality and morbidity is referred to as the ‘burden of disease’ and can be measured by a metric called ‘Disability Adjusted Life Years‘ (DALYs). DALYs are measuring lost health and are a standardized metric that allow for direct comparisons of disease burdens of different diseases across countries, between different populations, and over time. Conceptually, one DALY is the equivalent of losing one year in good health because of either premature death or disease or disability. One DALY represents one lost year of healthy life. The first ‘Global Burden of Disease’ (GBD) was GBD 1990 and the DALY metric was prominently featured in the World Bank’s 1993 World Development Report. Today it is published by both the researchers at the Institute of Health Metrics and Evaluation (IHME) and the ‘Disease Burden Unit’ at the World Health Organization (WHO), which was created in 1998. The IHME continues the work that was started in the early 1990s and publishes the Global Burden of Disease study.
Content
In this Dataset, we have Historical Data of different cause of deaths for all ages around the World. The key features of this Dataset are: Meningitis, Alzheimer's Disease and Other Dementias, Parkinson's Disease, Nutritional Deficiencies, Malaria, Drowning, Interpersonal Violence, Maternal Disorders, HIV/AIDS, Drug Use Disorders, Tuberculosis, Cardiovascular Diseases, Lower Respiratory Infections, Neonatal Disorders, Alcohol Use Disorders, Self-harm, Exposure to Forces of Nature, Diarrheal Diseases, Environmental Heat and Cold Exposure, Neoplasms, Conflict and Terrorism, Diabetes Mellitus, Chronic Kidney Disease, Poisonings, Protein-Energy Malnutrition, Road Injuries, Chronic Respiratory Diseases, Cirrhosis and Other Chronic Liver Diseases, Digestive Diseases, Fire, Heat, and Hot Substances, Acute Hepatitis.
Dataset Glossary (Column-wise)
•	01. Country/Territory - Name of the Country/Territory
•	02. Code - Country/Territory Code
•	03. Year - Year of the Incident
•	04. Meningitis - No. of People died from Meningitis
•	05. Alzheimer's Disease and Other Dementias - No. of People died from Alzheimer's Disease and Other Dementias
•	06. Parkinson's Disease - No. of People died from Parkinson's Disease
•	07. Nutritional Deficiencies - No. of People died from Nutritional Deficiencies
•	08. Malaria - No. of People died from Malaria
•	09. Drowning - No. of People died from Drowning
•	10. Interpersonal Violence - No. of People died from Interpersonal Violence
•	11. Maternal Disorders - No. of People died from Maternal Disorders
•	12. Drug Use Disorders - No. of People died from Drug Use Disorders
•	13. Tuberculosis - No. of People died from Tuberculosis
•	14. Cardiovascular Diseases - No. of People died from Cardiovascular Diseases
•	15. Lower Respiratory Infections - No. of People died from Lower Respiratory Infections
•	16. Neonatal Disorders - No. of People died from Neonatal Disorders
•	17. Alcohol Use Disorders - No. of People died from Alcohol Use Disorders
•	18. Self-harm - No. of People died from Self-harm
•	19. Exposure to Forces of Nature - No. of People died from Exposure to Forces of Nature
•	20. Diarrheal Diseases - No. of People died from Diarrheal Diseases
•	21. Environmental Heat and Cold Exposure - No. of People died from Environmental Heat and Cold Exposure
•	22. Neoplasms - No. of People died from Neoplasms
•	23. Conflict and Terrorism - No. of People died from Conflict and Terrorism
•	24. Diabetes Mellitus - No. of People died from Diabetes Mellitus
•	25. Chronic Kidney Disease - No. of People died from Chronic Kidney Disease
•	26. Poisonings - No. of People died from Poisoning
•	27. Protein-Energy Malnutrition - No. of People died from Protein-Energy Malnutrition
•	28. Chronic Respiratory Diseases - No. of People died from Chronic Respiratory Diseases
•	29. Cirrhosis and Other Chronic Liver Diseases - No. of People died from Cirrhosis and Other Chronic Liver Diseases
•	30. Digestive Diseases - No. of People died from Digestive Diseases
•	31. Fire, Heat, and Hot Substances - No. of People died from Fire or Heat or any Hot Substances
•	32. Acute Hepatitis - No. of People died from Acute Hepatitis

Steps to Follow
https://www.kaggle.com/code/spscientist/a-simple-tutorial-on-exploratory-data-analysis 
https://en.wikipedia.org/wiki/Exploratory_data_analysis#:~:text=In%20statistics%2C%20exploratory%20data%20analysis,and%20other%20data%20visualization%20methods. 

Note : Data Scientists have to apply their analytical skills to give findings and conclusions in detailed data analysis written in jupyter notebook . Only data analysis is required.   
Need not to create machine learning models /but still if anybody comes with it that is welcome.



In [5]:
# importing libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
import sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier


In [25]:
#df=pd.read_excel('c:/Users/91939/Documents/customer_retention_dataset.xlsx')
#df.head()
df=pd.read_excel("D:\cause_of_deaths dataset.xlsx")
#"D:\cause_of_deaths dataset.xlsx"

In [27]:
df.head()

Unnamed: 0,Country/Territory,Code,Year,Meningitis,Alzheimer's Disease and Other Dementias,Parkinson's Disease,Nutritional Deficiencies,Malaria,Drowning,Interpersonal Violence,...,Diabetes Mellitus,Chronic Kidney Disease,Poisonings,Protein-Energy Malnutrition,Road Injuries,Chronic Respiratory Diseases,Cirrhosis and Other Chronic Liver Diseases,Digestive Diseases,"Fire, Heat, and Hot Substances",Acute Hepatitis
0,Afghanistan,AFG,1990,2159,1116,371,2087,93,1370,1538,...,2108,3709,338,2054,4154,5945,2673,5005,323,2985
1,Afghanistan,AFG,1991,2218,1136,374,2153,189,1391,2001,...,2120,3724,351,2119,4472,6050,2728,5120,332,3092
2,Afghanistan,AFG,1992,2475,1162,378,2441,239,1514,2299,...,2153,3776,386,2404,5106,6223,2830,5335,360,3325
3,Afghanistan,AFG,1993,2812,1187,384,2837,108,1687,2589,...,2195,3862,425,2797,5681,6445,2943,5568,396,3601
4,Afghanistan,AFG,1994,3027,1211,391,3081,211,1809,2849,...,2231,3932,451,3038,6001,6664,3027,5739,420,3816


In [28]:
df

Unnamed: 0,Country/Territory,Code,Year,Meningitis,Alzheimer's Disease and Other Dementias,Parkinson's Disease,Nutritional Deficiencies,Malaria,Drowning,Interpersonal Violence,...,Diabetes Mellitus,Chronic Kidney Disease,Poisonings,Protein-Energy Malnutrition,Road Injuries,Chronic Respiratory Diseases,Cirrhosis and Other Chronic Liver Diseases,Digestive Diseases,"Fire, Heat, and Hot Substances",Acute Hepatitis
0,Afghanistan,AFG,1990,2159,1116,371,2087,93,1370,1538,...,2108,3709,338,2054,4154,5945,2673,5005,323,2985
1,Afghanistan,AFG,1991,2218,1136,374,2153,189,1391,2001,...,2120,3724,351,2119,4472,6050,2728,5120,332,3092
2,Afghanistan,AFG,1992,2475,1162,378,2441,239,1514,2299,...,2153,3776,386,2404,5106,6223,2830,5335,360,3325
3,Afghanistan,AFG,1993,2812,1187,384,2837,108,1687,2589,...,2195,3862,425,2797,5681,6445,2943,5568,396,3601
4,Afghanistan,AFG,1994,3027,1211,391,3081,211,1809,2849,...,2231,3932,451,3038,6001,6664,3027,5739,420,3816
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6115,Zimbabwe,ZWE,2015,1439,754,215,3019,2518,770,1302,...,3176,2108,381,2990,2373,2751,1956,4202,632,146
6116,Zimbabwe,ZWE,2016,1457,767,219,3056,2050,801,1342,...,3259,2160,393,3027,2436,2788,1962,4264,648,146
6117,Zimbabwe,ZWE,2017,1460,781,223,2990,2116,818,1363,...,3313,2196,398,2962,2473,2818,2007,4342,654,144
6118,Zimbabwe,ZWE,2018,1450,795,227,2918,2088,825,1396,...,3381,2240,400,2890,2509,2849,2030,4377,657,139


In [29]:
df.shape

(6120, 34)

In [30]:
df.dtypes

Country/Territory                             object
Code                                          object
Year                                           int64
Meningitis                                     int64
Alzheimer's Disease and Other Dementias        int64
Parkinson's Disease                            int64
Nutritional Deficiencies                       int64
Malaria                                        int64
Drowning                                       int64
Interpersonal Violence                         int64
Maternal Disorders                             int64
HIV/AIDS                                       int64
Drug Use Disorders                             int64
Tuberculosis                                   int64
Cardiovascular Diseases                        int64
Lower Respiratory Infections                   int64
Neonatal Disorders                             int64
Alcohol Use Disorders                          int64
Self-harm                                     

In [31]:
# check for the column heading
df.columns


Index(['Country/Territory', 'Code', 'Year', 'Meningitis',
       'Alzheimer's Disease and Other Dementias', 'Parkinson's Disease',
       'Nutritional Deficiencies', 'Malaria', 'Drowning',
       'Interpersonal Violence', 'Maternal Disorders', 'HIV/AIDS',
       'Drug Use Disorders', 'Tuberculosis', 'Cardiovascular Diseases',
       'Lower Respiratory Infections', 'Neonatal Disorders',
       'Alcohol Use Disorders', 'Self-harm', 'Exposure to Forces of Nature',
       'Diarrheal Diseases', 'Environmental Heat and Cold Exposure',
       'Neoplasms', 'Conflict and Terrorism', 'Diabetes Mellitus',
       'Chronic Kidney Disease', 'Poisonings', 'Protein-Energy Malnutrition',
       'Road Injuries', 'Chronic Respiratory Diseases',
       'Cirrhosis and Other Chronic Liver Diseases', 'Digestive Diseases',
       'Fire, Heat, and Hot Substances', 'Acute Hepatitis'],
      dtype='object')

In [32]:
#check for any null values
df.isnull().sum().any()

False

There are no null values is the dataset

In [33]:
df.nunique()

Country/Territory                              204
Code                                           204
Year                                            30
Meningitis                                    2020
Alzheimer's Disease and Other Dementias       3037
Parkinson's Disease                           1817
Nutritional Deficiencies                      2147
Malaria                                       1723
Drowning                                      1875
Interpersonal Violence                        2142
Maternal Disorders                            1818
HIV/AIDS                                      2412
Drug Use Disorders                             876
Tuberculosis                                  2843
Cardiovascular Diseases                       5225
Lower Respiratory Infections                  4106
Neonatal Disorders                            3553
Alcohol Use Disorders                         1287
Self-harm                                     2758
Exposure to Forces of Nature   