### Health, Nutrition and Population Statistics

In [44]:
health_nutr_pop = pd.read_csv('../input/health-nutrition-and-population-statistics/data.csv')

#### Population growth and birth control

I looked at metrics for population growth and population growth control for years 2012 to 2015 (where data is available for the year) from the Health, Nutrition and Population statistics dataset. I created a custom score (a kind of rank average) for each country to identify countries with the greatest population growth/the poorest population growth management and vice versa. The names of the metrics used to produce the score:
*  'Adolescent fertility rate (births per 1,000 women ages 15-19)',
*  'Condom use, population ages 15-24, female (% of females ages 15-24)',
*  'Condom use, population ages 15-24, male (% of males ages 15-24)',
*  'Contraceptive prevalence, modern methods (% of women ages 15-49)',
*  'Fertility rate, total (births per woman)',
*  'Population growth (annual %)',
*  'Rural population growth (annual %)',
*  'Urban population growth (annual %)',
*  'Birth rate, crude (per 1,000 people)',
*  'Condom use with non regular partner, % adults(15-49), female',
*  'Condom use with non regular partner, % adults(15-49), male',
*  'Demand for family planning satisfied by modern methods (% of married women with demand for family planning)'

In [45]:
def extract_data_for_indicator(df, ind_name, years):
    return df[df['Indicator Name'] == ind_name][[*years, 'Country Name']]

def generate_new_column_names(ind_name, years):
    return ['{} - {}'.format(ind_name, year) for year in years]

def create_indicator_df(df, ind_name, years):
    new_df = extract_data_for_indicator(df, ind_name, years)
    return rename_df_columns(new_df, [*years, 'Country Name'], [*generate_new_column_names(ind_name, years), 'Country'])

def create_indicator_dfs(df, ind_names, years_arr):
    return [create_indicator_df(df, ind_name, years_arr[i]) for i, ind_name in enumerate(ind_names)]
    
def calc_rank(df, col_name, ascending):
    df[col_name] = df[col_name].rank(ascending=ascending)/df[col_name].count()

def calc_ranks(df, ind_name, years, ascending):
    col_names = generate_new_column_names(ind_name, years)
    for col_name in col_names:
        calc_rank(df, col_name, ascending)

def calc_all_ranks(df, ind_names, years_arr, sort_order_arr):
    for i, ind_name in enumerate(ind_names):
        calc_ranks(df, ind_name, years_arr[i], sort_order_arr[i])
        
def calc_final_rank(df, rank_name, total_ind_col='Total Indicators'):
    cols = list(df)
    cols.remove(total_ind_col)
    df[rank_name] = df[cols].sum(axis=1)/df[total_ind_col]

def create_country_profile(df, ind_names, years_arr):
    # Combine all of the metrics under single profile
    profile = pd.DataFrame({'Country': df['Country Name'].unique()})
    profile = merge_dfs_on_column([profile, *create_indicator_dfs(df, ind_names, years_arr)])
    profile['Total Indicators'] = profile.count(axis=1)-1
    # Filter out countries/regions for which we have no information in any of the categories or we have information in only one of the categories
    return profile[profile['Total Indicators'] > 1]

In [46]:
ONE_YEAR = ['2012']
TWO_YEARS = ['2012','2013']
THREE_YEARS = ['2012','2013', '2014']
FOUR_YEARS = ['2012','2013', '2014', '2015']
RANK_ORDER_ASCENDING = [True, False, False, False, True, True, True, True, True, False, False, False]
INDICATOR_NAMES = [
    'Adolescent fertility rate (births per 1,000 women ages 15-19)',
    'Condom use, population ages 15-24, female (% of females ages 15-24)',
    'Condom use, population ages 15-24, male (% of males ages 15-24)',
    'Contraceptive prevalence, modern methods (% of women ages 15-49)',
    'Fertility rate, total (births per woman)',
    'Population growth (annual %)',
    'Rural population growth (annual %)',
    'Urban population growth (annual %)',
    'Birth rate, crude (per 1,000 people)',
    'Condom use with non regular partner, % adults(15-49), female',
    'Condom use with non regular partner, % adults(15-49), male',
    'Demand for family planning satisfied by modern methods (% of married women with demand for family planning)'
]
RECENT_YEARS_WITH_DATA = [
    THREE_YEARS,
    TWO_YEARS,
    TWO_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS
]

# Create profile
country_population_rise_profile = create_country_profile(health_nutr_pop, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA)

# Turn numeric values into ranks 
calc_all_ranks(country_population_rise_profile, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA, RANK_ORDER_ASCENDING)

# Calculate the 'Poor access to healthcare rank'
calc_final_rank(country_population_rise_profile, 'Poor population growth control score')

The number of metrics used to assess the score is present in 'Total Indicators'' column - the higher this number, the more confident we can be about the calculated score. We can see that countries with the highest MPI are also countries with poor population growth control. It is also interesting that the Mediterranean  and the Baltics seem to be places with the greatest population shrinkage.

In [47]:
country_population_rise_profile.sort_values(by=['Poor population growth control score'], ascending=False)[['Country', 'Poor population growth control score', 'Total Indicators']]

Unnamed: 0,Country,Poor population growth control score,Total Indicators
46,Angola,0.955650,21
79,Chad,0.955322,21
243,Uganda,0.953820,22
85,"Congo, Dem. Rep.",0.952005,25
218,South Sudan,0.928698,21
71,Burkina Faso,0.926389,22
184,Niger,0.926008,27
175,Mozambique,0.921677,21
32,Pre-demographic dividend,0.912392,22
163,Mali,0.909715,27


In [48]:
country_population_rise_profile_updated = update_country_profile_with_coords(country_population_rise_profile, country_mappings_a, lat_long_info)
create_coords_column(country_population_rise_profile_updated)
lowest_rank = get_coords_of_n_smallest(country_population_rise_profile_updated, metric='Poor population growth control score')
highest_rank = get_coords_of_n_largest(country_population_rise_profile_updated, metric='Poor population growth control score')

I added child markers for countries with 10 highest 'Poor population growth control score' (red marker) and 10 lowest 'Poor population growth control score' (blue marker). The markers are positioned on top of the Urban MPI map where the markers for the top and bottom Urban MPI countries are present as well. Hence, move around the map and zoom in to see which countries have both markers.

In [49]:
add_markers_to_map(urban_map, lowest_rank,  metric='Poor population growth control score', color='blue', icon='child', prefix='fa')
add_markers_to_map(urban_map, highest_rank,  metric='Poor population growth control score', color='red', icon='child', prefix='fa')
urban_map

#### Healthcare access

Next I looked at metrics for access to healthcare for years 2012 to 2015 (where data is available for the year) from the same Health, Nutrition and Population statistics dataset and created the same kind of custom score for each country to identify countries with the most and least provisioned healthcare systems.  The names of the metrics used to produce the 'Impeded access to healthcare score ':
*     'Hospital beds (per 1,000 people)',
*     'Physicians (per 1,000 people)',
*     'Specialist surgical workforce (per 100,000 population)',
*     'Number of surgical procedures (per 100,000 population)',
*     'Births attended by skilled health staff (% of total)',
*     'External resources for health (% of total expenditure on health)',
*     'Health expenditure per capita (current US$)',
*     'Health expenditure, total (% of GDP)'
*     'Lifetime risk of maternal death (%)'
*     'Nurses and midwives (per 1,000 people)'
*     'Risk of impoverishing expenditure for surgical care (% of people at risk)',
*     'Risk of catastrophic expenditure for surgical care (% of people at risk)',
*     'Out-of-pocket health expenditure (% of total expenditure on health)',
*     'Health expenditure, private (% of total health expenditure)'

In [50]:
RANK_ORDER_ASCENDING = [False, False, False, False, False, True, False, False, True, False, True, True, True, True]
INDICATOR_NAMES = [
    'Hospital beds (per 1,000 people)',
    'Physicians (per 1,000 people)',
    'Specialist surgical workforce (per 100,000 population)',
    'Number of surgical procedures (per 100,000 population)',
    'Births attended by skilled health staff (% of total)',
    'External resources for health (% of total expenditure on health)',
    'Health expenditure per capita (current US$)',
    'Health expenditure, total (% of GDP)',
    'Lifetime risk of maternal death (%)',
    'Nurses and midwives (per 1,000 people)',
    'Risk of impoverishing expenditure for surgical care (% of people at risk)',
    'Risk of catastrophic expenditure for surgical care (% of people at risk)',
    'Out-of-pocket health expenditure (% of total expenditure on health)',
    'Health expenditure, private (% of total health expenditure)'
]
RECENT_YEARS_WITH_DATA = [
    ONE_YEAR,
    TWO_YEARS,
    THREE_YEARS,
    ONE_YEAR,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    FOUR_YEARS,
    TWO_YEARS,
    ['2014'],
    ['2014'],
    THREE_YEARS,
    THREE_YEARS
]

# Create profile
country_health_access_profile = create_country_profile(health_nutr_pop, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA)

# Turn numeric values into ranks 
calc_all_ranks(country_health_access_profile, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA, RANK_ORDER_ASCENDING)

# Calculate the 'Poor access to healthcare rank'
calc_final_rank(country_health_access_profile, 'Impeded access to healthcare score')

According to the estimate, South Sudan, Eritrea and Central African Republic are amongst the countries with the worst access to healthcare system and among those with the best one are Netherlands, Sweden and Denmark. 

In [51]:
country_health_access_profile.sort_values(by=['Impeded access to healthcare score'], ascending=False)[['Country', 'Impeded access to healthcare score', 'Total Indicators']]

Unnamed: 0,Country,Impeded access to healthcare score,Total Indicators
216,Somalia,0.986367,6
218,South Sudan,0.933846,20
102,Eritrea,0.899816,22
78,Central African Republic,0.893477,23
85,"Congo, Dem. Rep.",0.891613,24
56,Bangladesh,0.863365,25
75,Cameroon,0.855088,24
185,Nigeria,0.854857,24
79,Chad,0.854135,23
121,Guinea,0.845576,23


Once again, I add the new markers for 10 countries with the lowest and the highest score (green and red ambulance markers, respectively) on top of the existing ones on the Urban MPI map. Zoom in to see which countries have several markers. 

#### Sanitation

Next I looked at the sanitary conditions and practices and water access in countries. The names of the metrics used to produce the 'Poor sanitary conditions score ':
*     'Improved sanitation facilities (% of population with access)',
*     'Improved sanitation facilities, rural (% of rural population with access)',
*     'Improved sanitation facilities, urban (% of urban population with access)',
*     'Improved water source (% of population with access)',
*     'Improved water source, rural (% of rural population with access)',
*     'Improved water source, urban (% of urban population with access)',
*     'People practicing open defecation (% of population)',
*     'People practicing open defecation, rural (% of rural population)',
*     'People practicing open defecation, urban (% of urban population)'

In [54]:
RANK_ORDER_ASCENDING = [False, False, False, False, False, False, True, True, True]
INDICATOR_NAMES = [
    'Improved sanitation facilities (% of population with access)',
    'Improved sanitation facilities, rural (% of rural population with access)',
    'Improved sanitation facilities, urban (% of urban population with access)',
    'Improved water source (% of population with access)',
    'Improved water source, rural (% of rural population with access)',
    'Improved water source, urban (% of urban population with access)',
    'People practicing open defecation (% of population)',
    'People practicing open defecation, rural (% of rural population)',
    'People practicing open defecation, urban (% of urban population)'
]
RECENT_YEARS_WITH_DATA = [
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS
]

# Create profile
country_sanitation_profile = create_country_profile(health_nutr_pop, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA)

# Turn numeric values into ranks 
calc_all_ranks(country_sanitation_profile, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA, RANK_ORDER_ASCENDING)

# Calculate the 'Poor sanitary conditions rank'
calc_final_rank(country_sanitation_profile, 'Poor sanitary conditions score')

In [55]:
country_sanitation_profile.sort_values(by=['Poor sanitary conditions score'], ascending=False)[['Country', 'Poor sanitary conditions score', 'Total Indicators']]

Unnamed: 0,Country,Poor sanitary conditions score,Total Indicators
218,South Sudan,0.973885,36
79,Chad,0.971066,36
159,Madagascar,0.960825,36
235,Togo,0.955781,36
102,Eritrea,0.953648,36
175,Mozambique,0.937924,36
225,Sudan,0.935301,27
152,Liberia,0.926894,36
210,Sierra Leone,0.919316,36
61,Benin,0.916067,36


#### Malnourishment prevalence

Next, I looked at prevalence of malnourishment. The names of the metrics used to produce the 'Malnourishment score ':
*     'Malnutrition prevalence, height for age (% of children under 5)',
*     'Malnutrition prevalence, weight for age (% of children under 5)',
*     'Number of people who are undernourished',
*     'Prevalence of severe wasting, weight for height (% of children under 5)',
*     'Prevalence of wasting (% of children under 5)',
*     'Prevalence of undernourishment (% of population)'

In [58]:
RANK_ORDER_ASCENDING = [True, True, True, True, True, True]
INDICATOR_NAMES = [
    'Malnutrition prevalence, height for age (% of children under 5)',
    'Malnutrition prevalence, weight for age (% of children under 5)',
    'Number of people who are undernourished',
    'Prevalence of severe wasting, weight for height (% of children under 5)',
    'Prevalence of wasting (% of children under 5)',
    'Prevalence of undernourishment (% of population)'
]
RECENT_YEARS_WITH_DATA = [
    THREE_YEARS,
    THREE_YEARS,
    FOUR_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    FOUR_YEARS
]

# Create profile
country_malnourishment_profile = create_country_profile(health_nutr_pop, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA)

# Turn numeric values into ranks 
calc_all_ranks(country_malnourishment_profile, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA, RANK_ORDER_ASCENDING)

# Calculate the 'Malnourishment rank'
calc_final_rank(country_malnourishment_profile, 'Malnourishment score')

Countries in South Asia (India and Bangladesh) are amongst those with the highest score which was not observed for other indicators.

In [59]:
country_malnourishment_profile.sort_values(by=['Malnourishment score'], ascending=False)[['Country', 'Malnourishment score', 'Total Indicators']]

Unnamed: 0,Country,Malnourishment score,Total Indicators
225,Sudan,0.971008,4
84,Comoros,0.882308,4
104,Ethiopia,0.864959,12
34,South Asia,0.863291,12
19,Least developed countries: UN classification,0.856110,8
13,Heavily indebted poor countries (HIPC),0.849175,8
189,Pakistan,0.844690,12
12,Fragile and conflict affected situations,0.841576,8
232,Tanzania,0.829464,8
56,Bangladesh,0.829000,16


#### Literacy and access to education

Next, I looked at the metrics for literacy, school enrollment and public education spending to find calculate the access to education score.  The names of the metrics used to produce the s':
*     'Literacy rate, adult total (% of people ages 15 and above)',
*     'Literacy rate, youth total (% of people ages 15-24)',
*     'Primary completion rate, total (% of relevant age group)',
*     'Public spending on education, total (% of GDP)',
*     'School enrollment, primary (% net)',
*     'School enrollment, secondary (% net)',
*     'School enrollment, tertiary (% gross)'

In [62]:
RANK_ORDER_ASCENDING = [False, False, False, False, False, False, False]
INDICATOR_NAMES = [
    'Literacy rate, adult total (% of people ages 15 and above)',
    'Literacy rate, youth total (% of people ages 15-24)',
    'Primary completion rate, total (% of relevant age group)',
    'Public spending on education, total (% of GDP)',
    'School enrollment, primary (% net)',
    'School enrollment, secondary (% net)',
    'School enrollment, tertiary (% gross)'
]
RECENT_YEARS_WITH_DATA = [
    FOUR_YEARS,
    FOUR_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS
]

# Create profile
country_education_profile = create_country_profile(health_nutr_pop, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA)

# Turn numeric values into ranks 
calc_all_ranks(country_education_profile, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA, RANK_ORDER_ASCENDING)

# Calculate the 'Low ed rank'
calc_final_rank(country_education_profile, 'Impdeded Access to Education Score')

Since there was not as much data available for creating the scores (see 'Total Indicators' column), both the top and the bottom scoring countries are somewhat surprising.

In [63]:
country_education_profile.sort_values(by=['Impdeded Access to Education Score'], ascending=False)[['Country', 'Impdeded Access to Education Score', 'Total Indicators']]

Unnamed: 0,Country,Impdeded Access to Education Score,Total Indicators
171,Monaco,0.994253,3
218,South Sudan,0.987179,2
78,Central African Republic,0.982705,6
95,Djibouti,0.975713,4
207,Senegal,0.943955,8
79,Chad,0.940214,9
152,Liberia,0.919750,6
159,Madagascar,0.915240,11
166,Mauritania,0.912931,15
88,Cote d'Ivoire,0.907282,12
