# Intro

Data taken from https://platform.who.int/mortality/themes/theme-details/MDB/all-causes. Contains all countries, many years, number of deaths. 

## Imports

In [3]:
import pandas as pd

## Load the Excel file

In [6]:
file_path = 'data_deaths.csv'
df = pd.read_csv(file_path)

# Exploratory analysis

## Basic info

In [7]:
print("Basic Information:")
print(df.info())


Basic Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 314055 entries, 0 to 314054
Data columns (total 12 columns):
 #   Column                                                       Non-Null Count   Dtype  
---  ------                                                       --------------   -----  
 0   Region Code                                                  314055 non-null  object 
 1   Region Name                                                  314055 non-null  object 
 2   Country Code                                                 314055 non-null  object 
 3   Country Name                                                 314055 non-null  object 
 4   Year                                                         314055 non-null  int64  
 5   Sex                                                          314055 non-null  object 
 6   Age group code                                               314055 non-null  object 
 7   Age Group                                     

## Statistics Summary

In [8]:
print("\nSummary Statistics:")
print(df.describe(include='all'))


Summary Statistics:
       Region Code Region Name Country Code Country Name           Year  \
count       314055      314055       314055       314055  314055.000000   
unique           6           6          119          119            NaN   
top             EU      Europe          MEX       Mexico            NaN   
freq        126105      126105         4725         4725            NaN   
mean           NaN         NaN          NaN          NaN    1993.423537   
std            NaN         NaN          NaN          NaN      18.275490   
min            NaN         NaN          NaN          NaN    1950.000000   
25%            NaN         NaN          NaN          NaN    1980.000000   
50%            NaN         NaN          NaN          NaN    1996.000000   
75%            NaN         NaN          NaN          NaN    2009.000000   
max            NaN         NaN          NaN          NaN    2022.000000   

           Sex Age group code Age Group        Number  \
count   314055       

## NaN values

In [10]:
print("\nNaN Values per Column:")
print(df.isna().sum())


NaN Values per Column:
Region Code                                                         0
Region Name                                                         0
Country Code                                                        0
Country Name                                                        0
Year                                                                0
Sex                                                                 0
Age group code                                                      0
Age Group                                                           0
Number                                                           1677
Percentage of cause-specific deaths out of total deaths             0
Age-standardized death rate per 100 000 standard population    299727
Death rate per 100 000 population                               30075
dtype: int64


## Empty values 

Assuming empty values are represented as empty strings

In [11]:
print("\nEmpty Values per Column:")
print((df == '').sum())



Empty Values per Column:
Region Code                                                    0
Region Name                                                    0
Country Code                                                   0
Country Name                                                   0
Year                                                           0
Sex                                                            0
Age group code                                                 0
Age Group                                                      0
Number                                                         0
Percentage of cause-specific deaths out of total deaths        0
Age-standardized death rate per 100 000 standard population    0
Death rate per 100 000 population                              0
dtype: int64


## Unique values

In [12]:

print("\nUnique Values per Column:")
for column in df.columns:
    unique_values = df[column].unique()
    print(f"{column}: {len(unique_values)} unique values")



Unique Values per Column:
Region Code: 6 unique values
Region Name: 6 unique values
Country Code: 119 unique values
Country Name: 119 unique values
Year: 73 unique values
Sex: 4 unique values
Age group code: 21 unique values
Age Group: 21 unique values
Number: 41935 unique values
Percentage of cause-specific deaths out of total deaths: 1 unique values
Age-standardized death rate per 100 000 standard population: 14329 unique values
Death rate per 100 000 population: 267220 unique values


# Totals per country

This section analyzes the data as grouped by country.  

In [26]:
df_country = df[["Country Name", "Death rate per 100 000 population", "Year"]].groupby(["Country Name", "Year"]).count()
df_country

Unnamed: 0_level_0,Unnamed: 1_level_0,Death rate per 100 000 population
Country Name,Year,Unnamed: 2_level_1
Albania,1987,60
Albania,1988,60
Albania,1989,60
Albania,1992,60
Albania,1993,60
...,...,...
Venezuela (Bolivarian Republic of),2012,60
Venezuela (Bolivarian Republic of),2013,60
Venezuela (Bolivarian Republic of),2014,60
Venezuela (Bolivarian Republic of),2015,60


In [28]:
df[df["Country Name"] == "Sweden"]

Unnamed: 0,Region Code,Region Name,Country Code,Country Name,Year,Sex,Age group code,Age Group,Number,Percentage of cause-specific deaths out of total deaths,Age-standardized death rate per 100 000 standard population,Death rate per 100 000 population
233457,EU,Europe,SWE,Sweden,1951,All,Age_all,[All],69799.0,100,878.907709,987.214121
233458,EU,Europe,SWE,Sweden,1951,All,Age00,[0],2378.0,100,,2152.036199
233459,EU,Europe,SWE,Sweden,1951,All,Age01_04,[1-4],600.0,100,,122.649223
233460,EU,Europe,SWE,Sweden,1951,All,Age05_09,[5-9],363.0,100,,59.861478
233461,EU,Europe,SWE,Sweden,1951,All,Age10_14,[10-14],210.0,100,,45.971979
...,...,...,...,...,...,...,...,...,...,...,...,...
313357,EU,Europe,SWE,Sweden,2019,Unknown,Age70_74,[70-74],12.0,100,,
313358,EU,Europe,SWE,Sweden,2019,Unknown,Age75_79,[75-79],7.0,100,,
313359,EU,Europe,SWE,Sweden,2019,Unknown,Age80_84,[80-84],3.0,100,,
313360,EU,Europe,SWE,Sweden,2019,Unknown,Age85_over,[85+],10.0,100,,
