# World Health Data & Its Determinants

## About Dataset

Dataset used for analysis in the following notebook was taken from https://www.kaggle.com/kumarajarshi/life-expectancy-who. This dataset contains country and year wise health statistics such as Life expectancy, alcohol consumption, infant deaths, GDP. Although most of data is correct there is presence of incorrect observations like population of china is observed significantly lower than actual population.

For our analysis we will consider this data as source of truth as majority of it is correct. This shouldn't hamper overall trends in the data much.



## Overall world health

We might wonder if health of people in world improving over time. It is possible that there is significant breakthroughs in healthcare recently, however same is not accessible to general public. In that case inclusive health of citizens will decline, whereas quality medical facilities will be available only to selected few. Lucky for us we have access to WHO data. 

Run the notebook below to see healthcare data over the entire world over tens of parameter.

In [10]:
from feature_compare import select_aggregate_pair_1, select_aggregate_pair_2, drawable_cols, draw_pair_aggregate
from ipywidgets import interact, interactive, fixed, interact_manual, SelectMultiple

print "WORLD AGGREGATED DATA"

interact(select_aggregate_pair_2, column_2=drawable_cols);
interact_manual(draw_pair_aggregate)

WORLD AGGREGATED DATA


aW50ZXJhY3RpdmUoY2hpbGRyZW49KERyb3Bkb3duKGRlc2NyaXB0aW9uPXUnY29sdW1uXzInLCBpbmRleD0xLCBvcHRpb25zPSgnWWVhcicsICdMaWZlIGV4cGVjdGFuY3kgJywgJ0FkdWx0IE3igKY=


aW50ZXJhY3RpdmUoY2hpbGRyZW49KEJ1dHRvbihkZXNjcmlwdGlvbj11J1J1biBJbnRlcmFjdCcsIHN0eWxlPUJ1dHRvblN0eWxlKCkpLCBPdXRwdXQoKSksIF9kb21fY2xhc3Nlcz0odSd3aWTigKY=


<function feature_compare.draw_pair_aggregate>

## Country wise health

Turns out all the diseases are heading downwards and all the positive health parameters such life expectancy are on rise. We see that total expenditure is on healthcare is increasing. One might wonder what is the reason of overall increase in health? Is it because of increase in GDP or total spend on healthcare or something else? We will soon find out. 

Lets now dive deep to know how each of the country performing against above parameters.

Run the cell below to view country-wise health data.

Note: To select multiple countries & compare hold shift/ctrl when selecting

In [13]:
from country_pair_compare_util import countryWidget, select_pair_column_1, select_pair_column_2, drawable_cols, draw_pairs
from ipywidgets import interact, interactive, fixed, interact_manual, SelectMultiple

print "COUNTRY-WISE DATA"

display(countryWidget)
interact(select_pair_column_2, column_2=drawable_cols);
interact_manual(draw_pairs)

COUNTRY-WISE DATA


U2VsZWN0TXVsdGlwbGUoZGVzY3JpcHRpb249dSdDb3VudHJ5JywgaW5kZXg9KDExMSwgMTEyKSwgb3B0aW9ucz0oJ0FmZ2hhbmlzdGFuJywgJ0FsYmFuaWEnLCAnQWxnZXJpYScsICdBbmdvbGHigKY=


aW50ZXJhY3RpdmUoY2hpbGRyZW49KERyb3Bkb3duKGRlc2NyaXB0aW9uPXUnY29sdW1uXzInLCBpbmRleD0xLCBvcHRpb25zPSgnWWVhcicsICdMaWZlIGV4cGVjdGFuY3kgJywgJ0FkdWx0IE3igKY=


aW50ZXJhY3RpdmUoY2hpbGRyZW49KEJ1dHRvbihkZXNjcmlwdGlvbj11J1J1biBJbnRlcmFjdCcsIHN0eWxlPUJ1dHRvblN0eWxlKCkpLCBPdXRwdXQoKSksIF9kb21fY2xhc3Nlcz0odSd3aWTigKY=


<function country_pair_compare_util.draw_pairs>

## Leader Board

Looks like all countries are following same trends which were noticed in overall world health trends. Let's see which country is leading or trailing against each of above parameters. Let's find out if our country is leading in any of the parameters listed.

Run the cell below and select the parameters you'd want to see leaderboard of for each year.

In [19]:
from ipywidgets import interact, interactive, fixed, interact_manual, SelectMultiple

from yearly_stats import metricWidget, show_year_wise_metrics

display(metricWidget)
interact_manual(show_year_wise_metrics)


U2VsZWN0TXVsdGlwbGUoZGVzY3JpcHRpb249dSdNZXRyaWNzOiAnLCBpbmRleD0oMCwpLCBvcHRpb25zPSgnTGlmZSBleHBlY3RhbmN5ICcsICdBZHVsdCBNb3J0YWxpdHknLCAnaW5mYW50IGTigKY=


aW50ZXJhY3RpdmUoY2hpbGRyZW49KEJ1dHRvbihkZXNjcmlwdGlvbj11J1J1biBJbnRlcmFjdCcsIHN0eWxlPUJ1dHRvblN0eWxlKCkpLCBPdXRwdXQoKSksIF9kb21fY2xhc3Nlcz0odSd3aWTigKY=


<function yearly_stats.show_year_wise_metrics>

In terms of life expectancy we see that Japan, and some Europian countries are on leaderboard, while Sierra Leone is at the end most times. Maximum life expectancy for Slovenia in 2015 was observed to be around whopping 88 YEARS.

Play with parameters to find out interesting stats for yourself.

## Health Parameters and their Inter-relations

It is apparent that if expenditure on healthcare increases there will be improvement in health of citizens. However we might wonder which parameter causes maximum change in health(Life Expectancy). We might wonder how much is a parameter affected by rise or fall of some other parameter.

For example we might think that consumption of alcohol will result in decrease in Life Expectancy but how much? How much does GDP affect Life expectancy?

To answer that we calculate Coefficient of correlation between each pair of parameter. A coefficient of correlation has value between 1 to -1. A positive value between two parameters such GDP and BMI will indicate that on increase of GDP, BMI of average citizen will increase (i.e. they will more likely be obese). While a negative value indicates that on increase of GDP, BMI will decrease.

Closer value to 1 or -1 indicates that two parameters are affected by each other a lot. while a value closer to 0 indicates that two values are not affected by each other.

Run cell below to find out correlation among health parameters as a heat graph.

In [20]:
from correlation_heatmap import featureWidget, draw_heatmap
from ipywidgets import interact, interactive, fixed, interact_manual

display(featureWidget)
interact_manual(draw_heatmap)

U2VsZWN0TXVsdGlwbGUoZGVzY3JpcHRpb249dSdGZWF0dXJlcycsIGluZGV4PSgxLCAyLCAzLCAxOSksIG9wdGlvbnM9KCdZZWFyJywgJ0xpZmUgZXhwZWN0YW5jeSAnLCAnQWR1bHQgTW9ydGHigKY=


aW50ZXJhY3RpdmUoY2hpbGRyZW49KEJ1dHRvbihkZXNjcmlwdGlvbj11J1J1biBJbnRlcmFjdCcsIHN0eWxlPUJ1dHRvblN0eWxlKCkpLCBPdXRwdXQoKSksIF9kb21fY2xhc3Nlcz0odSd3aWTigKY=


<function correlation_heatmap.draw_heatmap>

## High Correlated Health Parameters

Now that we have impact of change of health parameters on each other, lets find out which parameters is responsible for maximum change in other parameter.

Run code cell below to generate correlation stats.

In [22]:
from high_correlation_feature import df
from IPython.display import display, HTML

display(HTML(df.to_html(index=False)))

Feature,High correlation Feature,correlation value
infant deaths,under-five deaths,0.9966288820398193
under-five deaths,infant deaths,0.9966288820398193
thinness 5-9 years,thinness 1-19 years,0.9391019921914692
thinness 1-19 years,thinness 5-9 years,0.9391019921914692
percentage expenditure,GDP,0.8993726409895392
GDP,percentage expenditure,0.8993726409895392
Schooling,Income composition of resources,0.8000924203919645
Income composition of resources,Schooling,0.8000924203919645
Life expectancy,Schooling,0.7519754627366968
Diphtheria,Polio,0.6735533206902242


Above stats indicate that Life Expectancy is affected the most by SCHOOLING. While Schooling is affected by Income composition.

This might enable us to see transitive effects of changing one parameter in the economy. For example if we increase income composition, that will cause change in schooling which in turn cause change in Life expectancy.