# Mapping poverty around the world

> Kiva.org is an online crowdfunding platform to extend financial services to poor and financially excluded people around the world. [...] In Kaggle Datasets' inaugural Data Science for Good challenge, Kiva is inviting the Kaggle community to help them build more localized models to estimate the poverty levels of residents in the regions where Kiva has active loans.

This notebook is exploring the Kiva's own regional mappings of Multidimensional Poverty Index (MPI) as well as national Rural and Urban MPIs as per external MPI dataset. In addition some of the metrics contributing to the state of poverty are explored individually - population growth, access to healthcare, malnutrition prevalence, sanitary conditions, access to education, Global Peace Indicator (aka degree of military unrest in each country) and World Happiness Index (aka how citizens of each country perceive their state of wellbeing). So far I have been really enjoying working with these datasets and learning quite a bit about the world - thanks to Kiva and Kaggle. The modelling part still to follow. 

In [None]:
import numpy as np 
import pandas as pd 
import folium
from folium.plugins import HeatMap
import seaborn as sns
from functools import reduce
import geopandas as gpd

## Kiva's Dataset

In [None]:
kiva_mpi_region_locations = pd.read_csv('../input/data-science-for-good-kiva-crowdfunding/kiva_mpi_region_locations.csv')

In [None]:
def create_heatmap_coords(row, metric, coord1, coord2):
    return [row[coord1], row[coord2], row[metric]]

def create_coords(row, coord1, coord2):
    return [row[coord1], row[coord2]]

def create_heatmap_coords_column(df, col_title='heatmap_coords', metric='MPI', coord1='longitude', coord2='latitude'):
    df[col_title] = df.apply(lambda x: create_heatmap_coords(x, metric, coord1, coord2), axis=1)
    
def create_coords_column(df, coord1='longitude', coord2='latitude'):
    df['coords'] = df.apply(lambda x: create_coords(x, coord1, coord2), axis=1)
    
def get_coords_of_n_smallest(df, name='Country', metric='MPI', n=10):
    return np.array(df.sort_values(by=[metric])[['coords', name, metric]][:n])

def get_coords_of_n_largest(df, name='Country', metric='MPI', n=10):
    return np.array(df.sort_values(by=[metric])[['coords', name, metric]][-n:])

kiva_mpi_region_locations = kiva_mpi_region_locations.dropna()
kiva_mpi_region_locations = kiva_mpi_region_locations[(kiva_mpi_region_locations['LocationName'] != 'Lac, Chad') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Logone Occidental, Chad') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Logone Oriental, Chad') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Kanem, Chad') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Hama, Syrian Arab Republic') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Tortous, Syrian Arab Republic') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Gharbia, Egypt') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Matroh, Egypt') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Port Said, Egypt') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Bogota, Colombia') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Orinoquia Y Amazonia, Colombia') & 
                                                      (kiva_mpi_region_locations['LocationName'] != 'Central-Eastern, Uzbekistan') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Southern, Uzbekistan') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'Eastern, Uzbekistan') &
                                                      (kiva_mpi_region_locations['LocationName'] != 'St. Ann, Jamaica') 
                                                     ]
create_heatmap_coords_column(kiva_mpi_region_locations, coord1='lat', coord2='lon')
create_coords_column(kiva_mpi_region_locations, coord1='lat', coord2='lon')
lowest_MPI = get_coords_of_n_smallest(kiva_mpi_region_locations, name='LocationName')
highest_MPI = get_coords_of_n_largest(kiva_mpi_region_locations, name='LocationName')

### Heatmap of Kiva's MPI regional data

The heatmap shows areas with the lowest MPI in blue-green and the ones with the highest - in red. The red markers are positioned at the 10 highest MPI regions and green markers - at the 10 lowest. Clicking on the marker displays the region name and its MPI. Working with this dataset I noticed that some of the regions have incorrect coordinates. For now, I just removed the rows for some of such occurences to make the visualisation.


In [None]:
def add_markers_to_map(head_map, coords, metric='MPI', num_markers=10, color='green', icon='info-sign', prefix='glyphicon'):
    for i in range(num_markers):
        folium.Marker(
            location=coords[i][0],
            icon=folium.Icon(color=color, icon=icon, prefix=prefix),
            popup='{}, {} {}'.format(coords[i][1], metric, coords[i][2])
        ).add_to(head_map)

common_map = folium.Map(location=[10, 0], zoom_start=3)
hm = HeatMap(kiva_mpi_region_locations['heatmap_coords'], radius=15, blur=5)
hm.add_to(common_map)
add_markers_to_map(common_map, lowest_MPI)
add_markers_to_map(common_map, highest_MPI, color='red')
common_map

According to this dataset, [Chad](https://en.wikipedia.org/wiki/Chad) and [Burkina Faso](https://en.wikipedia.org/wiki/Burkina_Faso) contain several of the most empoverished regions of the world. 

## External datasets

### Multidimensional poverty measures (national)

In [None]:
mpi_national = pd.read_csv('../input/mpi/MPI_national.csv')
lat_long_info = pd.read_csv('../input/world-countries-and-continents-details/Countries Longitude and Latitude.csv')
countries = gpd.read_file('../input/countries-shape-files/ne_10m_admin_0_countries.shp')

In [None]:
country_mappings_a = {
    'United Kingdom': 'UK',
    'United States': 'US',
    'Venezuela, RB': 'Venezuela',
    'Yemen, Rep.': 'Yemen',
    'West Bank and Gaza': 'Palestine',
    'Korea, Rep.': 'South Korea',
    'Korea, Dem. People’s Rep.': 'North Korea',
    'Kyrgyz Republic': 'Kyrgyzstan',
    'Lao PDR': 'Laos',
    'St. Martin (French part)': 'St. Martin',
    'Syrian Arab Republic': 'Syria',
    'Micronesia, Fed. Sts.': 'Micronesia',
    'Russian Federation': 'Russia',
    'Macedonia, FYR': 'Macedonia',
    'Macao SAR, China': 'Macau',
    'Iran, Islamic Rep.': 'Iran',
    'Hong Kong SAR, China': 'Hong Kong',
    'Egypt, Arab Rep.': 'Egypt',
    'Virgin Islands (U.S.)': 'U.S. Virgin Islands',
    'Congo, Dem. Rep.': 'Congo - Kinshasa',
    'Congo, Rep.': 'Congo - Brazzaville',
    'Brunei Darussalam': 'Brunei',
    'Bahamas, The': 'Bahamas',
    'Gambia, The': 'Gambia'
}

country_mappings_b = {
    'Macedonia, The former Yugoslav Republic of': 'Macedonia',
    'Moldova, Republic of': 'Moldova',
    'Syrian Arab Republic': 'Syria',
    'Viet Nam': 'Vietnam',
    "Lao People's Democratic Republic": 'Laos',
    'Central African Republic': 'Central African Rep.',
    'Congo, Democratic Republic of the': 'Dem. Rep. Congo',
    'Congo, Republic of': 'Congo',
    "Cote d'Ivoire": "CÃ´te d'Ivoire",
    'Tanzania, United Republic of': 'Tanzania'
}

In [None]:
def rename_df_columns(df, current_names, new_names):
    assert len(current_names) == len(new_names)
    columns = {key: new_names[i] for i, key in enumerate(current_names)}
    return df.rename(index=str, columns=columns)

def merge_dfs_on_column(dfs, column_name='Country'):
    return reduce(lambda left,right: pd.merge(left,right,on=column_name), dfs)

# helper function to shift the lat and long coords of markers to prevent positioning on top of each other
def shift_coords(df, amount_coord1, amount_coord2):
    df['latitude'] = df['latitude'] + amount_coord1
    df['longitude'] = df['longitude'] + amount_coord2
    return df

def update_country_names(df, mappings):
    # Update the country names before merge to avoid losing data
    for key in mappings.keys():
        df.loc[df.Country == key, 'Country'] = mappings[key]
    return df
    
def update_country_profile_with_coords(profile_df, mappings, coords_df, shift=True, amount_coord1=0, amount_coord2=0.5):
    profile_df = update_country_names(profile_df, mappings)  
    coords_df = shift_coords(coords_df, amount_coord1, amount_coord2) if shift else coords_df
    profile_updated = merge_dfs_on_column([coords_df, profile_df])
    return profile_updated

mpi_national['MPI Diff'] = mpi_national['MPI Rural'] - mpi_national['MPI Urban']
lat_long_info = rename_df_columns(lat_long_info, ['name'], ['Country'])
mpi_national_updated_a = update_country_profile_with_coords(mpi_national, country_mappings_a, lat_long_info, shift=False)
create_coords_column(mpi_national_updated_a)

countries = rename_df_columns(countries, ['NAME'], ['Country'])
countries.geometry = countries.geometry.simplify(0.3)
mpi_national_updated_b = update_country_profile_with_coords(mpi_national, country_mappings_b, countries, shift=False)

lowest_MPI_urban = get_coords_of_n_smallest(mpi_national_updated_a, metric='MPI Urban')
highest_MPI_urban = get_coords_of_n_largest(mpi_national_updated_a, metric='MPI Urban')
lowest_MPI_rural = get_coords_of_n_smallest(mpi_national_updated_a, metric='MPI Rural')
highest_MPI_rural = get_coords_of_n_largest(mpi_national_updated_a, metric='MPI Rural')
lowest_MPI_diff = get_coords_of_n_smallest(mpi_national_updated_a, metric='MPI Diff')
highest_MPI_diff = get_coords_of_n_largest(mpi_national_updated_a, metric='MPI Diff')

#### Mapping Urban MPI 

The choropleth map displays the global distribution of Urban MPI. Additionally, the red markers mark the 10 countries with the highest Urban MPI. The green markers are present for 10 countries with the lowest Urban MPI. When a marker is clicked  the country name and Urban MPI score is displayed.  Just like the Kiva's dataset, the national MPI dataset locates the most impoverished regions in the world in the Sub-Saharan Africa.

In [None]:
base_layer = countries.geometry.to_json()
mpi_layer = mpi_national_updated_b[['Country', 'geometry']].to_json()
style_function = lambda x: {'fillColor': '#ffffff', 'color': '#000000', 'weight' : 1}
urban_map = folium.Map(location=[10, 0], zoom_start=2)
folium.GeoJson(
    base_layer,
    name='base',
    style_function=style_function
).add_to(urban_map)

urban_map.choropleth(
    geo_data=mpi_layer,
    name='mpi urban choropleth',
    key_on='properties.Country',
    fill_color='YlOrRd',
    data=mpi_national_updated_b,
    columns=['Country', 'MPI Urban'],
    legend_name='MPI Urban'
)

folium.LayerControl().add_to(urban_map)
add_markers_to_map(urban_map, lowest_MPI_urban, metric='MPI Urban')
add_markers_to_map(urban_map, highest_MPI_urban, metric='MPI Urban', color='red')
urban_map

#### Mapping Rural MPI

The choropleth map displays the global distribution of Rural MPI. Additionally, the red markers mark the countries with the top 10 highest Rural MPI. The green markers are present for 10 countries with the lowest Rural MPI. When a marker is clicked the country name and Rural MPI score is displayed. The map is very similar to that of the national Urban MPI one.

In [None]:
rural_map = folium.Map(location=[10, 0], zoom_start=2)
folium.GeoJson(
    base_layer,
    name='base',
    style_function=style_function
).add_to(rural_map)
rural_map.choropleth(
    geo_data=mpi_layer,
    name='mpi rural choropleth',
    key_on='properties.Country',
    fill_color='YlOrRd',
    data=mpi_national_updated_b,
    columns=['Country', 'MPI Rural'],
    legend_name='MPI Rural'
)

folium.LayerControl().add_to(rural_map)
add_markers_to_map(rural_map, lowest_MPI_rural, metric='MPI Rural')
add_markers_to_map(rural_map, highest_MPI_rural, metric='MPI Rural', color='red')
rural_map

#### Mapping differences between rural and urban MPIs

The choropleth map displays the global distribution of differences in urban and rural MPI. Just like before, the red markers mark the countries with the top 10 biggest differences and the green markers are present for 10 countries with the top ten smallest. It reveals that Sub-saharan Africa is also the region with the greatest contrasts between rural and urban living conditions. Presence of such contrasts is yet another problematic factor - it might encourage mass migration from rural to urban areas, affecting the food production and creating ever increasing slums around the cities.  

In [None]:
diff_map = folium.Map(location=[10, 0], zoom_start=2)
folium.GeoJson(
    base_layer,
    name='base',
    style_function=style_function
).add_to(diff_map)

diff_map.choropleth(
    geo_data=mpi_layer,
    name='mpi diff choropleth',
    key_on='properties.Country',
    fill_color='YlOrRd',
    data=mpi_national_updated_b,
    columns=['Country', 'MPI Diff'],
    legend_name='MPI Diff'
)

folium.LayerControl().add_to(diff_map)
add_markers_to_map(diff_map, lowest_MPI_diff, metric='MPI Diff')
add_markers_to_map(diff_map, highest_MPI_diff, metric='MPI Diff', color='red')
diff_map

#### Pearson correlations of poverty indicators

Looking at the Pearson correlations between various indicators in the Multidimensional Poverty Measures dataset we can see that they are all very highly correlated. There's a correlation of nearly 1 between MPI Urban and Headcount Ration Urban as well as MPI Rural and Headcount Ratio Rural, i.e. the higher the MPIs the greater the number of people in urban and rural areas live under poverty line.

In [None]:
pov_metrics = mpi_national_updated_a[['MPI Urban', 'Headcount Ratio Urban', 'Intensity of Deprivation Urban', 'MPI Rural', 'Headcount Ratio Rural', 'Intensity of Deprivation Rural']]
corr = pov_metrics.corr()
sns.heatmap(corr)

### Health, Nutrition and Population Statistics

In [None]:
health_nutr_pop = pd.read_csv('../input/health-nutrition-and-population-statistics/data.csv')

#### Population growth and birth control

I looked at metrics for population growth and population growth control for years 2012 to 2015 (where data is available for the year) from the Health, Nutrition and Population statistics dataset. I created a custom score (a kind of rank average) for each country to identify countries with the greatest population growth/the poorest population growth management and vice versa. The names of the metrics used to produce the score:
*  'Adolescent fertility rate (births per 1,000 women ages 15-19)',
*  'Condom use, population ages 15-24, female (% of females ages 15-24)',
*  'Condom use, population ages 15-24, male (% of males ages 15-24)',
*  'Contraceptive prevalence, modern methods (% of women ages 15-49)',
*  'Fertility rate, total (births per woman)',
*  'Population growth (annual %)',
*  'Rural population growth (annual %)',
*  'Urban population growth (annual %)',
*  'Birth rate, crude (per 1,000 people)',
*  'Condom use with non regular partner, % adults(15-49), female',
*  'Condom use with non regular partner, % adults(15-49), male',
*  'Demand for family planning satisfied by modern methods (% of married women with demand for family planning)'

In [None]:
def extract_data_for_indicator(df, ind_name, years):
    return df[df['Indicator Name'] == ind_name][[*years, 'Country Name']]

def generate_new_column_names(ind_name, years):
    return ['{} - {}'.format(ind_name, year) for year in years]

def create_indicator_df(df, ind_name, years):
    new_df = extract_data_for_indicator(df, ind_name, years)
    return rename_df_columns(new_df, [*years, 'Country Name'], [*generate_new_column_names(ind_name, years), 'Country'])

def create_indicator_dfs(df, ind_names, years_arr):
    return [create_indicator_df(df, ind_name, years_arr[i]) for i, ind_name in enumerate(ind_names)]
    
def calc_rank(df, col_name, ascending):
    df[col_name] = df[col_name].rank(ascending=ascending)/df[col_name].count()

def calc_ranks(df, ind_name, years, ascending):
    col_names = generate_new_column_names(ind_name, years)
    for col_name in col_names:
        calc_rank(df, col_name, ascending)

def calc_all_ranks(df, ind_names, years_arr, sort_order_arr):
    for i, ind_name in enumerate(ind_names):
        calc_ranks(df, ind_name, years_arr[i], sort_order_arr[i])
        
def calc_final_rank(df, rank_name, total_ind_col='Total Indicators'):
    cols = list(df)
    cols.remove(total_ind_col)
    df[rank_name] = df[cols].sum(axis=1)/df[total_ind_col]

def create_country_profile(df, ind_names, years_arr):
    # Combine all of the metrics under single profile
    profile = pd.DataFrame({'Country': df['Country Name'].unique()})
    profile = merge_dfs_on_column([profile, *create_indicator_dfs(df, ind_names, years_arr)])
    profile['Total Indicators'] = profile.count(axis=1)-1
    # Filter out countries/regions for which we have no information in any of the categories or we have information in only one of the categories
    return profile[profile['Total Indicators'] > 1]

In [None]:
ONE_YEAR = ['2012']
TWO_YEARS = ['2012','2013']
THREE_YEARS = ['2012','2013', '2014']
FOUR_YEARS = ['2012','2013', '2014', '2015']
RANK_ORDER_ASCENDING = [True, False, False, False, True, True, True, True, True, False, False, False]
INDICATOR_NAMES = [
    'Adolescent fertility rate (births per 1,000 women ages 15-19)',
    'Condom use, population ages 15-24, female (% of females ages 15-24)',
    'Condom use, population ages 15-24, male (% of males ages 15-24)',
    'Contraceptive prevalence, modern methods (% of women ages 15-49)',
    'Fertility rate, total (births per woman)',
    'Population growth (annual %)',
    'Rural population growth (annual %)',
    'Urban population growth (annual %)',
    'Birth rate, crude (per 1,000 people)',
    'Condom use with non regular partner, % adults(15-49), female',
    'Condom use with non regular partner, % adults(15-49), male',
    'Demand for family planning satisfied by modern methods (% of married women with demand for family planning)'
]
RECENT_YEARS_WITH_DATA = [
    THREE_YEARS,
    TWO_YEARS,
    TWO_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS
]

# Create profile
country_population_rise_profile = create_country_profile(health_nutr_pop, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA)

# Turn numeric values into ranks 
calc_all_ranks(country_population_rise_profile, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA, RANK_ORDER_ASCENDING)

# Calculate the 'Poor access to healthcare rank'
calc_final_rank(country_population_rise_profile, 'Poor population growth control score')

The number of metrics used to assess the score is present in 'Total Indicators'' column - the higher this number, the more confident we can be about the calculated score. We can see that countries with the highest MPI are also countries with poor population growth control. It is also interesting that the Mediterranean  and the Baltics seem to be places with the greatest population shrinkage.

In [None]:
country_population_rise_profile.sort_values(by=['Poor population growth control score'], ascending=False)[['Country', 'Poor population growth control score', 'Total Indicators']]

In [None]:
country_population_rise_profile_updated = update_country_profile_with_coords(country_population_rise_profile, country_mappings_a, lat_long_info)
create_coords_column(country_population_rise_profile_updated)
lowest_rank = get_coords_of_n_smallest(country_population_rise_profile_updated, metric='Poor population growth control score')
highest_rank = get_coords_of_n_largest(country_population_rise_profile_updated, metric='Poor population growth control score')

I added child markers for countries with 10 highest 'Poor population growth control score' (red marker) and 10 lowest 'Poor population growth control score' (blue marker). The markers are positioned on top of the Urban MPI map where the markers for the top and bottom Urban MPI countries are present as well. Hence, move around the map and zoom in to see which countries have both markers.

In [None]:
add_markers_to_map(urban_map, lowest_rank,  metric='Poor population growth control score', color='blue', icon='child', prefix='fa')
add_markers_to_map(urban_map, highest_rank,  metric='Poor population growth control score', color='red', icon='child', prefix='fa')
urban_map

#### Healthcare access

Next I looked at metrics for access to healthcare for years 2012 to 2015 (where data is available for the year) from the same Health, Nutrition and Population statistics dataset and created the same kind of custom score for each country to identify countries with the most and least provisioned healthcare systems.  The names of the metrics used to produce the 'Impeded access to healthcare score ':
*     'Hospital beds (per 1,000 people)',
*     'Physicians (per 1,000 people)',
*     'Specialist surgical workforce (per 100,000 population)',
*     'Number of surgical procedures (per 100,000 population)',
*     'Births attended by skilled health staff (% of total)',
*     'External resources for health (% of total expenditure on health)',
*     'Health expenditure per capita (current US$)',
*     'Health expenditure, total (% of GDP)'
*     'Lifetime risk of maternal death (%)'
*     'Nurses and midwives (per 1,000 people)'
*     'Risk of impoverishing expenditure for surgical care (% of people at risk)',
*     'Risk of catastrophic expenditure for surgical care (% of people at risk)',
*     'Out-of-pocket health expenditure (% of total expenditure on health)',
*     'Health expenditure, private (% of total health expenditure)'

In [None]:
RANK_ORDER_ASCENDING = [False, False, False, False, False, True, False, False, True, False, True, True, True, True]
INDICATOR_NAMES = [
    'Hospital beds (per 1,000 people)',
    'Physicians (per 1,000 people)',
    'Specialist surgical workforce (per 100,000 population)',
    'Number of surgical procedures (per 100,000 population)',
    'Births attended by skilled health staff (% of total)',
    'External resources for health (% of total expenditure on health)',
    'Health expenditure per capita (current US$)',
    'Health expenditure, total (% of GDP)',
    'Lifetime risk of maternal death (%)',
    'Nurses and midwives (per 1,000 people)',
    'Risk of impoverishing expenditure for surgical care (% of people at risk)',
    'Risk of catastrophic expenditure for surgical care (% of people at risk)',
    'Out-of-pocket health expenditure (% of total expenditure on health)',
    'Health expenditure, private (% of total health expenditure)'
]
RECENT_YEARS_WITH_DATA = [
    ONE_YEAR,
    TWO_YEARS,
    THREE_YEARS,
    ONE_YEAR,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    FOUR_YEARS,
    TWO_YEARS,
    ['2014'],
    ['2014'],
    THREE_YEARS,
    THREE_YEARS
]

# Create profile
country_health_access_profile = create_country_profile(health_nutr_pop, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA)

# Turn numeric values into ranks 
calc_all_ranks(country_health_access_profile, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA, RANK_ORDER_ASCENDING)

# Calculate the 'Poor access to healthcare rank'
calc_final_rank(country_health_access_profile, 'Impeded access to healthcare score')

According to the estimate, South Sudan, Eritrea and Central African Republic are amongst the countries with the worst access to healthcare system and among those with the best one are Netherlands, Sweden and Denmark. 

In [None]:
country_health_access_profile.sort_values(by=['Impeded access to healthcare score'], ascending=False)[['Country', 'Impeded access to healthcare score', 'Total Indicators']]

In [None]:
country_health_access_profile_updated = update_country_profile_with_coords(country_health_access_profile, country_mappings_a, lat_long_info)
create_coords_column(country_health_access_profile_updated)
lowest_rank = get_coords_of_n_smallest(country_health_access_profile_updated, metric='Impeded access to healthcare score')
highest_rank = get_coords_of_n_largest(country_health_access_profile_updated, metric='Impeded access to healthcare score')

Once again, I add the new markers for 10 countries with the lowest and the highest score (green and red ambulance markers, respectively) on top of the existing ones on the Urban MPI map. Zoom in to see which countries have several markers. 

In [None]:
add_markers_to_map(urban_map, lowest_rank, metric='Impeded access to healthcare score', color='green', icon='ambulance', prefix='fa')
add_markers_to_map(urban_map, highest_rank, metric='Impeded access to healthcare score', color='red', icon='ambulance', prefix='fa')
urban_map 

#### Sanitation

Next I looked at the sanitary conditions and practices and water access in countries. The names of the metrics used to produce the 'Poor sanitary conditions score ':
*     'Improved sanitation facilities (% of population with access)',
*     'Improved sanitation facilities, rural (% of rural population with access)',
*     'Improved sanitation facilities, urban (% of urban population with access)',
*     'Improved water source (% of population with access)',
*     'Improved water source, rural (% of rural population with access)',
*     'Improved water source, urban (% of urban population with access)',
*     'People practicing open defecation (% of population)',
*     'People practicing open defecation, rural (% of rural population)',
*     'People practicing open defecation, urban (% of urban population)'

In [None]:
RANK_ORDER_ASCENDING = [False, False, False, False, False, False, True, True, True]
INDICATOR_NAMES = [
    'Improved sanitation facilities (% of population with access)',
    'Improved sanitation facilities, rural (% of rural population with access)',
    'Improved sanitation facilities, urban (% of urban population with access)',
    'Improved water source (% of population with access)',
    'Improved water source, rural (% of rural population with access)',
    'Improved water source, urban (% of urban population with access)',
    'People practicing open defecation (% of population)',
    'People practicing open defecation, rural (% of rural population)',
    'People practicing open defecation, urban (% of urban population)'
]
RECENT_YEARS_WITH_DATA = [
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS,
    FOUR_YEARS
]

# Create profile
country_sanitation_profile = create_country_profile(health_nutr_pop, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA)

# Turn numeric values into ranks 
calc_all_ranks(country_sanitation_profile, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA, RANK_ORDER_ASCENDING)

# Calculate the 'Poor sanitary conditions rank'
calc_final_rank(country_sanitation_profile, 'Poor sanitary conditions score')

In [None]:
country_sanitation_profile.sort_values(by=['Poor sanitary conditions score'], ascending=False)[['Country', 'Poor sanitary conditions score', 'Total Indicators']]

In [None]:
country_sanitation_profile_updated = update_country_profile_with_coords(country_sanitation_profile, country_mappings_a, lat_long_info, amount_coord1=-1, amount_coord2=-0.5)
create_coords_column(country_sanitation_profile_updated)
lowest_rank = get_coords_of_n_smallest(country_sanitation_profile_updated, metric='Poor sanitary conditions score')
highest_rank = get_coords_of_n_largest(country_sanitation_profile_updated, metric='Poor sanitary conditions score')

Red drop markers for 10 countries with the worst sanitary conditions and green drop markers for 10 countries with the best sanitary conditions.

In [None]:
add_markers_to_map(urban_map, lowest_rank, metric='Poor sanitary conditions score', color='green', icon='tint', prefix='fa')
add_markers_to_map(urban_map, highest_rank, metric='Poor sanitary conditions score', color='red', icon='tint', prefix='fa')
urban_map

#### Malnourishment prevalence

Next, I looked at prevalence of malnourishment. The names of the metrics used to produce the 'Malnourishment score ':
*     'Malnutrition prevalence, height for age (% of children under 5)',
*     'Malnutrition prevalence, weight for age (% of children under 5)',
*     'Number of people who are undernourished',
*     'Prevalence of severe wasting, weight for height (% of children under 5)',
*     'Prevalence of wasting (% of children under 5)',
*     'Prevalence of undernourishment (% of population)'

In [None]:
RANK_ORDER_ASCENDING = [True, True, True, True, True, True]
INDICATOR_NAMES = [
    'Malnutrition prevalence, height for age (% of children under 5)',
    'Malnutrition prevalence, weight for age (% of children under 5)',
    'Number of people who are undernourished',
    'Prevalence of severe wasting, weight for height (% of children under 5)',
    'Prevalence of wasting (% of children under 5)',
    'Prevalence of undernourishment (% of population)'
]
RECENT_YEARS_WITH_DATA = [
    THREE_YEARS,
    THREE_YEARS,
    FOUR_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    FOUR_YEARS
]

# Create profile
country_malnourishment_profile = create_country_profile(health_nutr_pop, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA)

# Turn numeric values into ranks 
calc_all_ranks(country_malnourishment_profile, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA, RANK_ORDER_ASCENDING)

# Calculate the 'Malnourishment rank'
calc_final_rank(country_malnourishment_profile, 'Malnourishment score')

Countries in South Asia (India and Bangladesh) are amongst those with the highest score which was not observed for other indicators.

In [None]:
country_malnourishment_profile.sort_values(by=['Malnourishment score'], ascending=False)[['Country', 'Malnourishment score', 'Total Indicators']]

In [None]:
country_malnourishment_profile_updated = update_country_profile_with_coords(country_malnourishment_profile, country_mappings_a, lat_long_info, amount_coord1=0.5, amount_coord2=-0.5)
create_coords_column(country_malnourishment_profile_updated)
lowest_rank = get_coords_of_n_smallest(country_malnourishment_profile_updated, metric='Malnourishment score')
highest_rank = get_coords_of_n_largest(country_malnourishment_profile_updated, metric='Malnourishment score')

Green apple markers for countries the lowest prevalence of malnourishment and red apple markers for ones with the highest.

In [None]:
add_markers_to_map(urban_map, lowest_rank, metric='Malnourishment score', color='green', icon='apple', prefix='fa')
add_markers_to_map(urban_map, highest_rank, metric='Malnourishment score', color='red', icon='apple', prefix='fa')
urban_map

#### Literacy and access to education

Next, I looked at the metrics for literacy, school enrollment and public education spending to find calculate the access to education score.  The names of the metrics used to produce the s':
*     'Literacy rate, adult total (% of people ages 15 and above)',
*     'Literacy rate, youth total (% of people ages 15-24)',
*     'Primary completion rate, total (% of relevant age group)',
*     'Public spending on education, total (% of GDP)',
*     'School enrollment, primary (% net)',
*     'School enrollment, secondary (% net)',
*     'School enrollment, tertiary (% gross)'

In [None]:
RANK_ORDER_ASCENDING = [False, False, False, False, False, False, False]
INDICATOR_NAMES = [
    'Literacy rate, adult total (% of people ages 15 and above)',
    'Literacy rate, youth total (% of people ages 15-24)',
    'Primary completion rate, total (% of relevant age group)',
    'Public spending on education, total (% of GDP)',
    'School enrollment, primary (% net)',
    'School enrollment, secondary (% net)',
    'School enrollment, tertiary (% gross)'
]
RECENT_YEARS_WITH_DATA = [
    FOUR_YEARS,
    FOUR_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS,
    THREE_YEARS
]

# Create profile
country_education_profile = create_country_profile(health_nutr_pop, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA)

# Turn numeric values into ranks 
calc_all_ranks(country_education_profile, INDICATOR_NAMES, RECENT_YEARS_WITH_DATA, RANK_ORDER_ASCENDING)

# Calculate the 'Low ed rank'
calc_final_rank(country_education_profile, 'Impdeded Access to Education Score')

Since there was not as much data available for creating the scores (see 'Total Indicators' column), both the top and the bottom scoring countries are somewhat surprising.

In [None]:
country_education_profile.sort_values(by=['Impdeded Access to Education Score'], ascending=False)[['Country', 'Impdeded Access to Education Score', 'Total Indicators']]

In [None]:
country_education_profile_updated = update_country_profile_with_coords(country_education_profile, country_mappings_a, lat_long_info, amount_coord1=-0.5, amount_coord2=-0.5)
create_coords_column(country_education_profile_updated)
lowest_rank = get_coords_of_n_smallest(country_education_profile_updated, metric='Impdeded Access to Education Score')
highest_rank = get_coords_of_n_largest(country_education_profile_updated, metric='Impdeded Access to Education Score')

Green laptop markers are present for countries with the smallest 'Impdeded Access to Education Score' and red laptop markers - for ones with the biggest.

In [None]:
add_markers_to_map(urban_map, lowest_rank, metric='Impdeded Access to Education Score', color='green', icon='laptop', prefix='fa')
add_markers_to_map(urban_map, highest_rank, metric='Impdeded Access to Education Score', color='red', icon='laptop', prefix='fa')
urban_map

### Global Peace Index

In [None]:
gpi = pd.read_csv('../input/gpi2008-2016/gpi_2008-2016.csv')

[Global Peace Index](https://en.wikipedia.org/wiki/Global_Peace_Index) combines 23 different metrics to assess the peacefulness of a country. They include Number and duration of internal conflicts,  Intensity of organised internal conflict, Level of perceived criminality in society, Political instability, Impact of terrorism etc. Poverty is often a result of long-lasting military and political instability in a region and indeed the most empoverished regions in the world also have a long history violence.
For the visualisation, I used the averages of GPI for years 2012-2015 since that's the time period for which all of the above metrics were calculated.

In [None]:
country_mappings = {
    'United Kingdom': 'UK',
    'United States': 'US',
    'Ivory Coast': 'Côte d’Ivoire',
    'Democratic Republic of the Congo': 'Congo - Kinshasa',
    'Republic of the Congo': 'Congo - Brazzaville'
}
gpi = rename_df_columns(gpi, ['country'], ['Country'])
gpi = update_country_profile_with_coords(gpi, country_mappings, lat_long_info)
create_coords_column(gpi)
gpi['av_2012_2015'] = gpi[['score_2012', 'score_2013', 'score_2014', 'score_2015']].mean(axis=1)
lowest_score = get_coords_of_n_smallest(gpi, metric='av_2012_2015')
highest_score = get_coords_of_n_largest(gpi, metric='av_2012_2015')

The light green bomb markers show the 10 most peaceful countries in the world and purple bomb markers - 10 least peaceful. Middle East and Sub-Saharan Africa are the least peaceful regions. [This great talk](https://www.ted.com/talks/gary_slutkin_let_s_treat_violence_like_a_contagious_disease) by Gary Slutkin draws parallels between the spread of violence and the spread of contagious diseases. The good news is it is not impossible to stop it.

In [None]:
add_markers_to_map(urban_map, lowest_score, metric='global peace index', color='lightgreen', icon='bomb', prefix='fa')
add_markers_to_map(urban_map, highest_score, metric='global peace index', color='purple', icon='bomb', prefix='fa')
urban_map 

### World Happiness Report

[World Happiness Report](http://worldhappiness.report/) quantifies the subjective perception of peoples happiness in countries across the world. When it was launched it brought some surprises showing that great economic conditions does not always mean great happiness. However, unsurprisingly, poverty and lack of peace takes away peoples' happiness. Countries that are the most impoverished and/or most volatile in military sense are also the least happy ones.

In [None]:
whr_2015 = pd.read_csv('../input/world-happiness/2015.csv')

In [None]:
country_mappings = {
    'United Kingdom': 'UK',
    'United States': 'US',
    'Ivory Coast': 'Côte d’Ivoire',
    'Congo (Kinshasa)': 'Congo - Kinshasa',
    'Congo (Brazzaville)': 'Congo - Brazzaville',
    'Palestinian Territories': 'Palestine'
}
whi = update_country_profile_with_coords(whr_2015, country_mappings, lat_long_info, amount_coord1=-0.5, amount_coord2=0)
create_coords_column(whi)
lowest_rank = get_coords_of_n_smallest(whi, metric='Happiness Rank')
highest_rank = get_coords_of_n_largest(whi, metric='Happiness Rank')

Dark green thumbs up markers show 10 happiest countries in the world and dark purple thumbs down markers show ten least happy countries in the world. 

In [None]:
add_markers_to_map(urban_map, lowest_rank, metric='Happiness Rank', color='darkgreen', icon='thumbs-up')
add_markers_to_map(urban_map, highest_rank, metric='Happiness Rank', color='darkpurple', icon='thumbs-down')
urban_map

### Conclusions

* [Chad](https://en.wikipedia.org/wiki/Chad) and [South Sudan](https://en.wikipedia.org/wiki/South_Sudan) are countries that unfortunately have nearly every red marker on the map. Regions in Chad were also identified by Kiva as some of the most impoverished in the world. 
* There are large contrasts between rural and urban MPI in Sub-Saharan Africa at national scale and multiple indicators between Europe and Sub-Saharan Africa at international scale.
* Long-term military unrest precipitates long-lasting poverty in a region.
* The rate of population growth is a strong indicator of poverty levels. 
* Poverty is a multidimensional problem that affects peoples' lives on multiple scales.

### Next steps

* Have a look at changes over time - the store might be less or more grim when time dimension is added
* Drill down from national level to regional data where possible
* Do some modelling
* Make a plan to change the world