# World Health Statistics 2020 -Visual Summary

In [None]:
import os
from IPython.display import Image
Image(filename="../input/meta-data-files/WHO-vis.jpg", width= 1000, height=400)

WHO's annual World Health Statistics reports present the most recent health statistics for the WHO Member States. The yearly statistics is for monitoring health for the sustainable development goals (SDGs) and which member states are fully commited to.

Towards recreating the Visual summary of World Heatlh statistics 2020 reported by WHO. This notebook is created with the goal of visualizing the summary of Health data on few important aspects of the world

In [None]:
!pip install pycountry_convert

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import pycountry_convert as pc
import json 
from plotly.subplots import make_subplots

import warnings
warnings.filterwarnings('ignore')

## Healthy Life Expectancy (HALE)

Life expectancy gives an indication of how long a population is expected to live on average. But Healthy Life Expectancy (HALE) reveals the true health of a population. It’s about both length of life and quality of life. Not just the number of years the average person lives, but the number of years they can expect to live in good health.

In [None]:
life_exp_birth = pd.read_csv('../input/who-worldhealth-statistics-2020-complete/ofHaleInLifeExpectancy.csv')
print('HALE Data')
life_exp_birth.head()

In [None]:
HALE = life_exp_birth[life_exp_birth['Dim1']=='Both sexes'][['Period','Hale Expectency']].groupby('Period').mean()
fig = go.Figure(
    data= go.Scatter(x=HALE.index, y=HALE['Hale Expectency'].values,mode='markers+lines'))
fig.update_layout(title='HALE Expectency over the years between 2000 and 2019')
fig.show()

HALE increased globally by 8% from 59 years to 63 from 2000 to 2019

**Across the world, where are people living healthier lives?**

Based on the data reported in Life and HALE expectencies, the region of healthier people can be visualized

In [None]:
life_exp_birth_all = life_exp_birth[life_exp_birth['Dim1']=='Both sexes']
life_exp_birth_all.sort_values(by='Period',inplace=True)

x=[]
year=[]
y=[]
color=[]
for r, c in life_exp_birth_all.iterrows():
    loc, per,_,_,he,le,_,perc_hale = life_exp_birth_all.loc[r].values
    x.append(loc)
    year.append(per)
    y.append(he)
    color.append('Hale Expectancy')
    x.append(loc)
    year.append(per)
    y.append(le)
    color.append('Life Expectancy')
map_cont2count ={'Africa': 'Chad',
                 'South-East Asia': 'Malaysia',
                 'Eastern Mediterranean':'Saudi Arabia',
                 'Europe':'Austria',
                 'Americas':'Costa Rica',
                 'Western Pacific':'Cambodia'}   
df=pd.DataFrame()
df['x']=x
df['country'] =df['x'].map(map_cont2count)
df['y']=y
df['year']=year
df['color']=color

df.sort_values(by='year',inplace=True)
for color in df['color'].unique():
    fig = px.scatter_geo(df[df['color']==color], locations="country", color="x", size="y",
                         projection="natural earth", locationmode='country names',animation_frame='year')
    fig.update_layout(title=color)
    fig.show()

This visualization reveals the interplay between life expectancy and healthy life expectancy, and allows us to see how this has changed between 2000 and 2019.HALE has increased globally during this time period. Yet from this chart, we can see there are many disparities between regions.

Countries in the region of Africa might have a large proportion of life is healthy, the average healthy life might not be long represented by the HALE expectancy. We want to achieve equality across all regions. But currently there is a clear difference in equity between Africa and Europe.

## Death, Disease and Intervention

These visualizations highlight the progress made in recent decades, where we stand today and reveal the challenges we still face.


## How mortality rates have decreased over the past two decades?

* Under 5 mortality rate: The risk of a child dying before their fifth birthday.

In [None]:
map_cont={}
map_cont['AF'] ='Africa'
map_cont['AS'] ='Asia'
map_cont['EU'] ='Europe'
map_cont['NA'] = 'North America'
map_cont['OC'] ='Oceania'
map_cont['SA'] = 'South America'

In [None]:
mortality_rate =pd.read_csv('../input/who-worldhealth-statistics-2020-complete/under5MortalityRate.csv')
print('Under 5 mortality rate')
mortality_rate.head()

In [None]:
with open('../input/meta-data-files/country2code.json') as json_file: 
    country2code = json.load(json_file) 


In [None]:
mortality_rate['Rate'] = mortality_rate['First Tooltip'].str.split('[',n=2,expand=True)[0].astype('float')
mortality_rate=mortality_rate.query("Period>=2000")
mortality_rate['iso_alpha'] = mortality_rate['Location'].map(country2code)
mortality_rate.dropna(inplace=True)
cont=[]
for r,c in mortality_rate.iterrows():
    country_code = mortality_rate.loc[r,'iso_alpha']
    continent_name = pc.country_alpha2_to_continent_code(country_code)
    
    cont.append(continent_name)
mortality_rate['continent'] =cont
mortality_rate['continent'] =mortality_rate['continent'].map(map_cont)
mortality_rate.sort_values(by='Period',inplace=True)
fig = px.scatter_geo(mortality_rate, locations="Location", color="continent", size="Rate",
                     projection="natural earth", locationmode='country names',animation_frame="Period")
fig.show()

The lesser bubbles across the years of the charts reveals major progress has been made in under-5 mortality. This is due, in part, to gains made in vaccination coverage for specific diseases. We can see that under-5 mortality remains a significant problem in Africa, where the rate is more than eight times higher than the European region.

* Maternal mortality ratio: The death of women as a result of complications during or following pregnancy and childbirth. 


In [None]:
maternal_mortality =pd.read_csv('../input/who-worldhealth-statistics-2020-complete/maternalMortalityRatio.csv')
print('Maternal mortality data')
maternal_mortality.head()

In [None]:
maternal_mortality['Rate'] = maternal_mortality['First Tooltip'].str.split('[',n=2,expand=True)[0].astype('float')
maternal_mortality=maternal_mortality.query("Period>=2000")
maternal_mortality['iso_alpha'] = maternal_mortality['Location'].map(country2code)
maternal_mortality.dropna(inplace=True)
cont=[]
for r,c in maternal_mortality.iterrows():
    country_code = maternal_mortality.loc[r,'iso_alpha']
    continent_name = pc.country_alpha2_to_continent_code(country_code)
    
    cont.append(continent_name)
maternal_mortality['continent'] =cont
maternal_mortality['continent'] = maternal_mortality['continent'].map(map_cont)
maternal_mortality.sort_values(by='Period',inplace=True)
fig = px.scatter_geo(maternal_mortality, locations="Location", color="continent", size="Rate",
                     projection="natural earth", locationmode='country names',animation_frame="Period")
fig.show()

This reflects the global inequalities in access to quality health care. Most maternal deaths are preventable. We can see from the chart above not only how much progress has been made in reducing maternal mortality but also where more efforts are needed. There is way too significant difference between Europe and Africa

### Malaria, tuberculosis and HIV incidence rates globally

In [None]:
malaria =pd.read_csv('../input/who-worldhealth-statistics-2020-complete/incedenceOfMalaria.csv')
malaria.columns =['Location','Indicator','Period','malaria_rate']
malaria = malaria[['Location','Period','malaria_rate']]
print('Incidence of Malaria data')
malaria.head()

In [None]:
hiv =pd.read_csv('../input/who-worldhealth-statistics-2020-complete/newHivInfections.csv')
hiv.columns =['Location','Period','Indicator','Dim1','hiv_rate']
hiv= hiv[hiv['Dim1']=='Both sexes']

hiv['hiv_rate'] = hiv['hiv_rate'].str.split('[',n=2,expand=True)[0]

hiv['hiv_rate'] = hiv['hiv_rate'].replace({'<0.01 ':'0.009', 
                                           'No data': np.nan}).astype('float')
hiv.dropna(inplace=True)
hiv = hiv[['Location','Period','hiv_rate']]
print('New HIV infections data')
hiv.head()

In [None]:
tb = pd.read_csv('../input/who-worldhealth-statistics-2020-complete/incedenceOfTuberculosis.csv')
tb.columns =['Location','Indicator','Period','tb_rate']
tb['tb_rate'] = tb['tb_rate'].str.split('[',n=2,expand=True)[0].astype('float')
tb=tb[['Location','Period','tb_rate']]
print('Incidence of Tuberculosis data')
tb.head()

In [None]:
disease = pd.merge(malaria, tb,  how='left', left_on=['Location','Period'], right_on = ['Location','Period'])
disease = pd.merge(disease, hiv,  how='left', left_on=['Location','Period'], right_on = ['Location','Period'])
disease=disease.query("Period>=2000")
disease['iso_alpha'] = disease['Location'].map(country2code)
disease.dropna(inplace=True)
cont=[]
for r,c in disease.iterrows():
    country_code = disease.loc[r,'iso_alpha']
    continent_name = pc.country_alpha2_to_continent_code(country_code)
    
    cont.append(continent_name)
disease['continent'] =cont
disease['continent'] =disease['continent'].map(map_cont)
disease=disease.groupby(['continent','Period']).mean()
disease = disease.reset_index()

anim_map={}
for i in range(len(disease['Period'].unique())):
    anim_map[i] = disease['Period'].unique()[i]
color_map={}
color_map['Africa'] ='firebrick'
color_map['Asia'] = 'royalblue'
color_map['North America'] = 'indigo'
color_map['Oceania'] ='yellow'
color_map['South America'] = 'green'


fig = make_subplots(
    rows=3, cols=1,
    subplot_titles=("Malaria Incidence", "TB Incidence", "New HIV Infections"))
for cont in disease['continent'].unique():
# Add traces
    fig.add_trace(go.Scatter(x=disease[disease['continent']==cont]['Period'].values, 
                         y=disease[disease['continent']==cont]['malaria_rate'].values,
                    mode='lines+markers',
                    name=cont,line=dict(color=color_map[cont], width=2,dash='dot')), row=1,col=1)
for cont in disease['continent'].unique():
# Add traces
    fig.add_trace(go.Scatter(x=disease[disease['continent']==cont]['Period'].values, 
                         y=disease[disease['continent']==cont]['tb_rate'].values,
                    mode='lines+markers',
                    name=cont,line=dict(color=color_map[cont], width=2,dash='dot'),showlegend=False), row=2,col=1)
for cont in disease['continent'].unique():
# Add traces
    fig.add_trace(go.Scatter(x=disease[disease['continent']==cont]['Period'].values, 
                         y=disease[disease['continent']==cont]['hiv_rate'].values,
                    mode='lines+markers',
                    name=cont,line=dict(color=color_map[cont], width=2,dash='dot'),showlegend=False), row=3,col=1)
# fig.update_layout(coloraxis=dict(colorbar=['maroon','green','violet','blue','indigo']), showlegend=False)
fig.update_layout(height=800, width=1000, title_text="Malaria, tuberculosis and HIV incidence rates globally")
fig.show()

Infectious diseases such as malaria, tuberculosis (TB) and HIV have long been some of the world’s biggest killers. 

The African region still lags far behind the global average in all three, yet the past two decades have seen dramatic progress. Incidences of HIV, TB and malaria have decreased globally since 2000, yet they still pose a major threat

## Health workforce

A well-prepared health workforce under adequate working conditions is essential to strong health systems. Health professionals such as medical doctors and nurses are the people who respond to both emergencies and everyday needs

### Number of people for every single Nurse/midwife

In [None]:
nursing =pd.read_csv('../input/who-worldhealth-statistics-2020-complete/nursingAndMidwife.csv')
print('Nurse/Midewife global data')
nursing.head()

In [None]:
nursing['iso_alpha'] = nursing['Location'].map(country2code)
nursing.dropna(inplace=True)
cont=[]
for r,c in nursing.iterrows():
    country_code = nursing.loc[r,'iso_alpha']
    continent_name = pc.country_alpha2_to_continent_code(country_code)
    
    cont.append(continent_name)
nursing['continent'] =cont
nursing['continent'] =nursing['continent'].map(map_cont)
nursing=nursing.groupby(['continent','Period']).sum()
nursing = nursing.reset_index()

df =nursing[nursing['Period']==2018]
plt.xkcd()
fig =plt.figure(figsize=(15,15))
for i in range(len(df)):
    Nmax = int(df.iloc[i,2])
    R = int(df.iloc[i,2]/200)
    r2 = R * np.sqrt(np.random.rand(Nmax, 1))
    theta2 = 2 * np.pi * np.random.rand(Nmax, 1)
    x2 = r2 * np.cos(theta2)
    y2 = r2 * np.sin(theta2)
    plt.subplot(3,2,i+1)
    plt.plot(x2,y2,'b.')
    plt.plot(0, 0, 'mo')
    plt.xlim((-1.1*R,1.1*R))
    plt.ylim((-1.1*R,1.1*R))
    plt.axis('square')
    plt.axis('off')
    plt.title(df.iloc[i,0])
fig.suptitle('Number of people for every single Nurse/midwife', fontsize=16)

Each dot represents one person and each dot at the center represent single Nurse/midwife

### Number of people for every single Medical Doctor

In [None]:
doctors =pd.read_csv('../input/who-worldhealth-statistics-2020-complete/medicalDoctors.csv')
print('Medical Doctors global data')
doctors.head()

In [None]:
doctors['iso_alpha'] = doctors['Location'].map(country2code)
doctors.dropna(inplace=True)
cont=[]
for r,c in doctors.iterrows():
    country_code = doctors.loc[r,'iso_alpha']
    continent_name = pc.country_alpha2_to_continent_code(country_code)
    
    cont.append(continent_name)
doctors['continent'] =cont
doctors['continent'] =doctors['continent'].map(map_cont)
doctors=doctors.groupby(['continent','Period']).sum()
doctors = doctors.reset_index()

df =doctors[doctors['Period']==2018]
plt.xkcd()
fig =plt.figure(figsize=(15,15))
for i in range(len(df)):
    Nmax = int(df.iloc[i,2])
    R = 5
    r2 = R * np.sqrt(np.random.rand(Nmax, 1))
    theta2 = 2 * np.pi * np.random.rand(Nmax, 1)
    x2 = r2 * np.cos(theta2)
    y2 = r2 * np.sin(theta2)
    plt.subplot(3,2,i+1)
    plt.plot(x2,y2,'b.')
    plt.plot(0, 0, 'mo')
    plt.xlim((-1.1*R,1.1*R))
    plt.ylim((-1.1*R,1.1*R))
    plt.axis('square')
    plt.axis('off')
    plt.title(df.iloc[i,0])

fig.suptitle('Number of people for every single Medical Doctors', fontsize=16)

Though we can see the difference as more availabilty of doctors than a nurse/midwife. The dramatic disparities in the number of people versus the number of health workers across different world regions is also noted. It reveals just how varied the distribution is throughout the world and highlights the unacceptable scarcity of health workers in some regions like Asia and Europe.

## Death rates due to Deadly diseases Vs Suicide 

Probability (%) of dying between age 30 and exact age 70 from any of cardiovascular disease, cancer, diabetes, or chronic respiratory disease are categorized under the death rate due to deadly diseases.

Here,the visual analysis on the death rate between crude sucides and deadly diseases are presented.

In [None]:
diseases= pd.read_csv('../input/who-worldhealth-statistics-2020-complete/30-70cancerChdEtc.csv')
print('Death rate of Cardiovascular disease, cancer, diabetes, or chronic respiratory disease')
diseases['death_rate'] = diseases['First Tooltip']
diseases = diseases[diseases['Dim1']=='Both sexes']
diseases.drop(['Indicator','Dim1','First Tooltip'],axis=1,inplace=True)
diseases =diseases[diseases['Period']!=2016]
diseases['type'] = ['Deadly diseases'] * len(diseases)
diseases.head()

In [None]:
sucide_rate=pd.read_csv('../input/who-worldhealth-statistics-2020-complete/crudeSuicideRates.csv')
sucide_rate['death_rate'] = sucide_rate['First Tooltip']
sucide_rate = sucide_rate[sucide_rate['Dim1']=='Both sexes']
sucide_rate.drop(['Indicator','Dim1','First Tooltip'],axis=1,inplace=True)
sucide_rate= sucide_rate[sucide_rate['Period']!=2016]
sucide_rate['type'] =['suicide']*len(sucide_rate)
print('Death due to crude suicide')
sucide_rate.head()

In [None]:

death_rate=pd.concat([diseases, sucide_rate])
death_rate.sort_values(by='Period',inplace=True)
fig = px.scatter_geo(death_rate, locations="Location", color="type", size="death_rate",
                     projection="natural earth", locationmode='country names',animation_frame="Period")
fig.show()

Though there are significant decline in the death rate over the years. We can clearly see the sufferings of mental illness more than physical illnes in few of the regions like North America and Europe.

## Miscellaneous 

For now, the other miscellaneous death rate possibilities were considered to be air pollution and road traffic. Reports are seen only in the year 2016.

In [None]:
air_pollution =pd.read_csv('../input/who-worldhealth-statistics-2020-complete/airPollutionDeathRate.csv')
print('Air pollution death rate')
air_pollution['death_rate']=air_pollution['First Tooltip'].str.split('[',n=2,expand=True)[0].str.strip().astype('float')
air_pollution = air_pollution[air_pollution['Dim1']=='Both sexes']
air_pollution.drop(['Indicator','Dim1','Dim2','First Tooltip'],axis=1,inplace=True)
air_pollution['iso_alpha'] = air_pollution['Location'].map(country2code)
air_pollution.dropna(inplace=True)
cont=[]
for r,c in air_pollution.iterrows():
    country_code = air_pollution.loc[r,'iso_alpha']
    continent_name = pc.country_alpha2_to_continent_code(country_code)
    
    cont.append(continent_name)
air_pollution['continent'] =cont
air_pollution['continent'] =air_pollution['continent'].map(map_cont)
air_pollution=air_pollution.groupby(['continent','Period']).mean()
air_pollution['type']= ['Air pollution'] *len(air_pollution)
air_pollution = air_pollution.reset_index()
air_pollution.head()

In [None]:
road_traffic =pd.read_csv('../input/who-worldhealth-statistics-2020-complete/roadTrafficDeaths.csv')
print('Road traffic death rate')
road_traffic['death_rate'] = road_traffic['First Tooltip']
road_traffic.drop(['Indicator','First Tooltip'],axis=1,inplace=True)
road_traffic['iso_alpha'] = road_traffic['Location'].map(country2code)
road_traffic.dropna(inplace=True)
cont=[]
for r,c in road_traffic.iterrows():
    country_code = road_traffic.loc[r,'iso_alpha']
    continent_name = pc.country_alpha2_to_continent_code(country_code)
    
    cont.append(continent_name)
road_traffic['continent'] =cont
road_traffic['continent'] =road_traffic['continent'].map(map_cont)
road_traffic=road_traffic.groupby(['continent','Period']).mean()
road_traffic['type']  = ['Road Traffic']* len(road_traffic)
road_traffic = road_traffic.reset_index()
road_traffic.head()

In [None]:
misc =pd.concat([air_pollution, road_traffic])
fig = px.bar(misc, x="continent", y="death_rate",
             color='type', barmode='group',
             height=400)
fig.show()

Pollution taking the lead in the death rate put our global heath at risk. Countries in Africa has higher loss in both road traffic and Air pollution. Though Europe has the lesser death rate in Road traffic, the pollution takes the lives in that region unline North America.

### Conclusion

World Health Statistics report makes clear that the global efforts in recent decades have been paying off. Looking at the most up-to-date data we have on some of these vital SDG indicators, it reveals health trends across Member States, regions and the entire world. Indeed, we are living through extraordinary times. The global outbreak of COVID-19 will have an unprecedented – and as yet unknown – effect on our work towards a healthier world.

---