# Mapping global indicators of spatial accessibility with regard to policy relevant thresholds for population health and wellbeing

Carl Higgs, Global Indicators project, 2020

This notebook draws on spatial estimates from the global indicators project to estimate percent of population living below, within and exceeding policy relevant threshold bounds for increased physical activity levels derived by Ester Cerin using the IPEN study populations.

Specifically, the analysis and mapping is concerned with threshold values for urban design and transport planning features associated with
  - (A) ≥80% probability of engaging in walking for transport and 
  - (B) reaching the WHO’s target of a ≥15% relative reduction in insufficient physical activity through walking
  
Thresholds are presented as 95% credible interval bounds, which population within 250m hexagonal grid segments across urban portions of the city are identified as being below, within or exceeding based on the estimates derived in the main project outputs.



## Import libraries and city specific parameters

In [1]:
import os
import numpy as np
import fiona
import pandas as pd
import geopandas as gpd
import argparse
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
from mpl_toolkits.axes_grid1.anchored_artists import AnchoredSizeBar
import matplotlib.font_manager as fm
from textwrap import wrap
from matplotlib.backends.backend_pdf import PdfPages
import json
with open('../process/configuration/cities.json') as f:
  city_data = json.load(f)
exec(open('../process/data/GTFS/gtfs_config.py').read())

def valid_path(arg):
    arg = os.path.abspath(arg)
    if not os.path.exists(arg):
        msg = f"The path {arg} does not exist!"
        raise argparse.ArgumentTypeError(msg)
    else:
        return arg


import warnings
# filter out RuntimeWarnings, due to geopandas/fiona read file spam
# https://stackoverflow.com/questions/64995369/geopandas-warning-on-read-file
warnings.filterwarnings("ignore",category=RuntimeWarning)


## Import data

In [2]:
# Parse input arguments
# parser = argparse.ArgumentParser(description='Analyse processed results with regard to thresholds')
# parser.add_argument('-gpkg_cities',
#                     help='path to all cities summary results geopackage',
#                     default='./data/output/November 2020/global_indicators_city_2020-11-24.gpkg',
#                     type=valid_path)
# parser.add_argument('-gpkg_hexes',
#                     help='path to all cities hexagon grid results geopackage',
#                     default='./data/output/November 2020/global_indicators_hex_250m_2020-11-24.gpkg',
#                     type=valid_path)
# args = parser.parse_args()

# dummy parsing for interactive debugging
class Object(object):
    pass

args = Object()
args.gpkg_cities = os.path.abspath('../process/data/output/global_indicators_city_2021-06-21.gpkg')
args.gpkg_hexes = os.path.abspath('../process/data/output/global_indicators_hex_250m_2021-06-21.gpkg')

cities = gpd.read_file(args.gpkg_cities, layer='all_cities_combined')
cities.set_index('City',inplace=True)
# cities

In [3]:
hexes={}
for city in cities.index:
    hexes[city] = gpd.read_file(args.gpkg_hexes, layer=city.lower().replace(' ','_'))

In [4]:
hexes.keys()

dict_keys(['Maiduguri', 'Mexico City', 'Baltimore', 'Phoenix', 'Seattle', 'Sao Paulo', 'Hong Kong', 'Chennai', 'Bangkok', 'Hanoi', 'Adelaide', 'Melbourne', 'Sydney', 'Auckland', 'Graz', 'Ghent', 'Olomouc', 'Odense', 'Cologne', 'Lisbon', 'Barcelona', 'Valencia', 'Vic', 'Bern', 'Belfast'])

In [5]:
# Calculate public transport density for hexagons, required for one scenario
gtfs_analysis_date = '2021-06-16'
gtfs_gpkg = f'../process/data/GTFS/gtfs_frequent_transit_headway_{gtfs_analysis_date}_python.gpkg'

In [6]:
gtfs_gpkg

'../process/data/GTFS/gtfs_frequent_transit_headway_2021-06-16_python.gpkg'

In [7]:
points_in_polys = {}
point_hexes={}
for city in hexes.keys():
    _city_ = city.lower().replace(' ','_')
    if GTFS[_city_]==[]:
        transport_data = f"../process/data/output/{city_data['gpkgNames'][_city_]}"
        osm_layer = 'destinations'
        points_in_polys[city] = gpd.read_file(f"../process/data/output/{city_data['gpkgNames'][_city_]}",layer=osm_layer)
        points_in_polys[city] = points_in_polys[city].query('dest_name_full =="Public transport stop (any)"')
    else:
        if _city_ in dissolve_cities:
            gtfs_layer = f"{_city_}_stops_average_feeds_headway_{GTFS[_city_][-1]['start_date_mmdd']}_{GTFS[_city_][-1]['end_date_mmdd']}"
        else:
            gtfs_layer = f"{_city_}_stops_headway_{GTFS[_city_][-1]['start_date_mmdd']}_{GTFS[_city_][-1]['end_date_mmdd']}"
        points_in_polys[city] = gpd.read_file(gtfs_gpkg,layer=gtfs_layer)
        
    points_in_polys[city] = gpd.sjoin(points_in_polys[city],hexes[city],how='left', op='within')
    points_in_polys[city] = points_in_polys[city]['index_right'].dropna().astype(int)
    points_in_polys[city] = points_in_polys[city].reset_index().groupby('index_right').count().reset_index()
    points_in_polys[city].columns = ['index','pt_stops']
    point_hexes[city] = hexes[city].join(points_in_polys[city].set_index('index'),how='left').copy()
    point_hexes[city]['pt_stops_per_sqkm'] = point_hexes[city]['pt_stops']/hexes[city]['area_sqkm']
                                          

## Scientific colour mapping code

The below cell code uses the Breslow colour map from Fabio Crameri's (https://www.fabiocrameri.ch/colourmaps/)[Scientific colour maps, Version 7.0.0 (02.02.2021, scm-vn7.0)].  Specifically, the values from the file 'ScientificColourMaps7\batlow\batlow.txt' are used as text input.

Crameri, F. (2018). Scientific colour maps. Zenodo. http://doi.org/10.5281/zenodo.1243862

Crameri, F. (2018), Geodynamic diagnostics, scientific visualisation and StagLab 3.0, Geosci. Model Dev., 11, 2541-2562, doi:10.5194/gmd-11-2541-2018

Crameri, F., G.E. Shephard, and P.J. Heron (2020), The misuse of colour in science communication, Nature Communications, 11, 5444. doi:10.1038/s41467-020-19160-7

The Scientific color map values for the Breslow scale were used under MIT License terms, as below

Copyright (c) 2021 Fabio Crameri

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

In [8]:
from io import StringIO

batlow = StringIO("""0.005193 0.098238 0.349842\n0.009065 0.104487 0.350933\n0.012963 0.110779 0.351992\n0.016530 0.116913 0.353070\n0.019936 0.122985 0.354120\n0.023189 0.129035 0.355182\n0.026291 0.135044 0.356210\n0.029245 0.140964 0.357239\n0.032053 0.146774 0.358239\n0.034853 0.152558 0.359233\n0.037449 0.158313 0.360216\n0.039845 0.163978 0.361187\n0.042104 0.169557 0.362151\n0.044069 0.175053 0.363084\n0.045905 0.180460 0.364007\n0.047665 0.185844 0.364915\n0.049378 0.191076 0.365810\n0.050795 0.196274 0.366684\n0.052164 0.201323 0.367524\n0.053471 0.206357 0.368370\n0.054721 0.211234 0.369184\n0.055928 0.216046 0.369974\n0.057033 0.220754 0.370750\n0.058032 0.225340 0.371509\n0.059164 0.229842 0.372252\n0.060167 0.234299 0.372978\n0.061052 0.238625 0.373691\n0.062060 0.242888 0.374386\n0.063071 0.247085 0.375050\n0.063982 0.251213 0.375709\n0.064936 0.255264 0.376362\n0.065903 0.259257 0.376987\n0.066899 0.263188 0.377594\n0.067921 0.267056 0.378191\n0.069002 0.270922 0.378774\n0.070001 0.274713 0.379342\n0.071115 0.278497 0.379895\n0.072192 0.282249 0.380434\n0.073440 0.285942 0.380957\n0.074595 0.289653 0.381452\n0.075833 0.293321 0.381922\n0.077136 0.296996 0.382376\n0.078517 0.300622 0.382814\n0.079984 0.304252 0.383224\n0.081553 0.307858 0.383598\n0.083082 0.311461 0.383936\n0.084778 0.315043 0.384240\n0.086503 0.318615 0.384506\n0.088353 0.322167 0.384731\n0.090281 0.325685 0.384910\n0.092304 0.329220 0.385040\n0.094462 0.332712 0.385116\n0.096618 0.336161 0.385134\n0.099015 0.339621 0.385090\n0.101481 0.343036 0.384981\n0.104078 0.346410 0.384801\n0.106842 0.349774 0.384548\n0.109695 0.353098 0.384217\n0.112655 0.356391 0.383807\n0.115748 0.359638 0.383310\n0.118992 0.362849 0.382713\n0.122320 0.366030 0.382026\n0.125889 0.369160 0.381259\n0.129519 0.372238 0.380378\n0.133298 0.375282 0.379395\n0.137212 0.378282 0.378315\n0.141260 0.381240 0.377135\n0.145432 0.384130 0.375840\n0.149706 0.386975 0.374449\n0.154073 0.389777 0.372934\n0.158620 0.392531 0.371320\n0.163246 0.395237 0.369609\n0.167952 0.397889 0.367784\n0.172788 0.400496 0.365867\n0.177752 0.403041 0.363833\n0.182732 0.405551 0.361714\n0.187886 0.408003 0.359484\n0.193050 0.410427 0.357177\n0.198310 0.412798 0.354767\n0.203676 0.415116 0.352253\n0.209075 0.417412 0.349677\n0.214555 0.419661 0.347019\n0.220112 0.421864 0.344261\n0.225707 0.424049 0.341459\n0.231362 0.426197 0.338572\n0.237075 0.428325 0.335634\n0.242795 0.430418 0.332635\n0.248617 0.432493 0.329571\n0.254452 0.434529 0.326434\n0.260320 0.436556 0.323285\n0.266241 0.438555 0.320085\n0.272168 0.440541 0.316831\n0.278171 0.442524 0.313552\n0.284175 0.444484 0.310243\n0.290214 0.446420 0.306889\n0.296294 0.448357 0.303509\n0.302379 0.450282 0.300122\n0.308517 0.452205 0.296721\n0.314648 0.454107 0.293279\n0.320834 0.456006 0.289841\n0.327007 0.457900 0.286377\n0.333235 0.459794 0.282937\n0.339469 0.461685 0.279468\n0.345703 0.463563 0.275998\n0.351976 0.465440 0.272492\n0.358277 0.467331 0.269037\n0.364589 0.469213 0.265543\n0.370922 0.471085 0.262064\n0.377291 0.472952 0.258588\n0.383675 0.474842 0.255131\n0.390070 0.476711 0.251665\n0.396505 0.478587 0.248212\n0.402968 0.480466 0.244731\n0.409455 0.482351 0.241314\n0.415967 0.484225 0.237895\n0.422507 0.486113 0.234493\n0.429094 0.488011 0.231096\n0.435714 0.489890 0.227728\n0.442365 0.491795 0.224354\n0.449052 0.493684 0.221074\n0.455774 0.495585 0.217774\n0.462539 0.497497 0.214518\n0.469368 0.499393 0.211318\n0.476221 0.501314 0.208148\n0.483123 0.503216 0.205037\n0.490081 0.505137 0.201976\n0.497089 0.507058 0.198994\n0.504153 0.508984 0.196118\n0.511253 0.510898 0.193296\n0.518425 0.512822 0.190566\n0.525637 0.514746 0.187990\n0.532907 0.516662 0.185497\n0.540225 0.518584 0.183099\n0.547599 0.520486 0.180884\n0.555024 0.522391 0.178854\n0.562506 0.524293 0.176964\n0.570016 0.526186 0.175273\n0.577582 0.528058 0.173775\n0.585199 0.529927 0.172493\n0.592846 0.531777 0.171449\n0.600520 0.533605 0.170648\n0.608240 0.535423 0.170104\n0.615972 0.537231 0.169826\n0.623739 0.539002 0.169814\n0.631513 0.540752 0.170075\n0.639301 0.542484 0.170622\n0.647098 0.544183 0.171465\n0.654889 0.545863 0.172603\n0.662691 0.547503 0.174044\n0.670477 0.549127 0.175747\n0.678244 0.550712 0.177803\n0.685995 0.552274 0.180056\n0.693720 0.553797 0.182610\n0.701421 0.555294 0.185478\n0.709098 0.556772 0.188546\n0.716731 0.558205 0.191851\n0.724322 0.559628 0.195408\n0.731878 0.561011 0.199174\n0.739393 0.562386 0.203179\n0.746850 0.563725 0.207375\n0.754268 0.565033 0.211761\n0.761629 0.566344 0.216322\n0.768942 0.567630 0.221045\n0.776208 0.568899 0.225930\n0.783416 0.570162 0.230962\n0.790568 0.571421 0.236160\n0.797665 0.572682 0.241490\n0.804709 0.573928 0.246955\n0.811692 0.575187 0.252572\n0.818610 0.576462 0.258303\n0.825472 0.577725 0.264197\n0.832272 0.579026 0.270211\n0.838999 0.580339 0.276353\n0.845657 0.581672 0.282631\n0.852247 0.583037 0.289036\n0.858747 0.584440 0.295572\n0.865168 0.585882 0.302255\n0.871505 0.587352 0.309112\n0.877741 0.588873 0.316081\n0.883878 0.590450 0.323195\n0.889900 0.592087 0.330454\n0.895809 0.593765 0.337865\n0.901590 0.595507 0.345429\n0.907242 0.597319 0.353142\n0.912746 0.599191 0.360986\n0.918103 0.601126 0.368999\n0.923300 0.603137 0.377139\n0.928323 0.605212 0.385404\n0.933176 0.607369 0.393817\n0.937850 0.609582 0.402345\n0.942332 0.611867 0.411006\n0.946612 0.614218 0.419767\n0.950697 0.616649 0.428624\n0.954574 0.619137 0.437582\n0.958244 0.621671 0.446604\n0.961696 0.624282 0.455702\n0.964943 0.626934 0.464860\n0.967983 0.629639 0.474057\n0.970804 0.632394 0.483290\n0.973424 0.635183 0.492547\n0.975835 0.638012 0.501826\n0.978052 0.640868 0.511090\n0.980079 0.643752 0.520350\n0.981918 0.646664 0.529602\n0.983574 0.649590 0.538819\n0.985066 0.652522 0.547998\n0.986392 0.655470 0.557142\n0.987567 0.658422 0.566226\n0.988596 0.661378 0.575265\n0.989496 0.664329 0.584246\n0.990268 0.667280 0.593174\n0.990926 0.670230 0.602031\n0.991479 0.673165 0.610835\n0.991935 0.676091 0.619575\n0.992305 0.679007 0.628251\n0.992595 0.681914 0.636869\n0.992813 0.684815 0.645423\n0.992967 0.687705 0.653934\n0.993064 0.690579 0.662398\n0.993111 0.693451 0.670810\n0.993112 0.696314 0.679177\n0.993074 0.699161 0.687519\n0.993002 0.702006 0.695831\n0.992900 0.704852 0.704114\n0.992771 0.707689 0.712380\n0.992619 0.710530 0.720639\n0.992447 0.713366 0.728892\n0.992258 0.716210 0.737146\n0.992054 0.719049 0.745403\n0.991837 0.721893 0.753673\n0.991607 0.724754 0.761959\n0.991367 0.727614 0.770270\n0.991116 0.730489 0.778606\n0.990855 0.733373 0.786976\n0.990586 0.736265 0.795371\n0.990307 0.739184 0.803810\n0.990018 0.742102 0.812285\n0.989720 0.745039 0.820804\n0.989411 0.747997 0.829372\n0.989089 0.750968 0.837979\n0.988754 0.753949 0.846627\n0.988406 0.756949 0.855332\n0.988046 0.759964 0.864078\n0.987672 0.762996 0.872864\n0.987280 0.766047 0.881699\n0.986868 0.769105 0.890573\n0.986435 0.772184 0.899493\n0.985980 0.775272 0.908448\n0.985503 0.778378 0.917444\n0.985002 0.781495 0.926468\n0.984473 0.784624 0.935531\n0.983913 0.787757 0.944626\n0.983322 0.790905 0.953748\n0.982703 0.794068 0.962895\n0.982048 0.797228 0.972070\n0.981354 0.800406 0.981267\n""")
cm_data = np.loadtxt(batlow)
from matplotlib.colors import LinearSegmentedColormap
batlow_map = LinearSegmentedColormap.from_list("batlow", cm_data[::-1])
batlow_map_4 = LinearSegmentedColormap.from_list("batlow", cm_data,4)
batlow_map_R = batlow_map.reversed()

## Scenario set up for threshold analyses

In [9]:
# Analysis set up
scenarios={
  'A':'≥80% probability of engaging in walking for transport', 
  'B':'reaching the WHO’s target of a ≥15% relative reduction in insufficient physical activity through walking',
  'distances':'distances to destinations, measured up to a maximum distance target threshold of 500 metres'
}
scenario_style = {
    'A':{'colour':batlow_map(170),'line':'dashed','align':0.96},
    'B':{'colour':batlow_map(0),'line':'solid','align':0.93},
    'distances':{'colour':batlow_map(170),'line':'dashed','align':0.96},
    }
greq = '≥'
thresholds = {
'Mean 1000 m neighbourhood population per km²':{
  'data':'hexes', # the geopackage (hexes or points)
  'variable':'local_nh_population_density', # variable; a list is required if a function is specified
  'polarity':'positive', # which is better: more (positive)? or less (negative)?
  'scenarios':{
      'A':{
        'threshold':5665, # not used; we plot the interval
        'comparison':'>', # direction in which to evaluate success (e.g. is the aim to be greater than or less than the threshold?)
        'interval':(4790, 6750),
        'interval_type':'95% CrI'
        },
      'B':{
        'threshold':6491,
        'comparison':'>',
        'interval':(5677, 7823),
        'interval_type':'95% CrI' 
        }
  }
},
'Mean 1000 m neighbourhood street intersections per km²':{
  'data':'hexes',
  'variable':'local_nh_intersection_density',
  'polarity':'positive',
  'scenarios':{
      'A':{
        'threshold':98,
        'comparison':'>',
        'interval':(90, 110),
        'interval_type':'95% CrI'
        },
      'B':{
        'threshold':122,
        'comparison':'>',
        'interval':(106, 156),
        'interval_type':'95% CrI'
        }
  }
},
'Distance to nearest public transport stops (m; up to 500m)':{
  'data':'points',
  'layer':'samplePointsData',
  'point_function':np.nanmin, # take the minimum of the OSM and GTFS pt data sources, axis=1 w/ fillna w/ np.nan
  'variable':['sp_nearest_node_pt_osm_any','sp_nearest_node_pt_gtfs_any'],
  'truncate_cutoff':500, # distance measures are only formally measured up to 500m, however truncation at 500 is required 
                         # for neatness when plotting continuous distribution due to full distance measurement method
  'polarity':'negative', # shorter distance is assumed to be better, so polarity is negative
  'scenarios':{
      'distances':{
        'threshold':400,
        'comparison':'<',
        'interval':(300,500),
        'interval_type':'distance (m)',
        # 'statistic':'pop_pct_access_500m_pt_any_binary',
        }
  }
},
'Distance to nearest park (m; up to 500m)':{
  'data':'points',
  'layer':'samplePointsData',
  'variable':'sp_nearest_node_public_open_space_any',
  'truncate_cutoff':500,
  'polarity':'negative',
  'scenarios':{
      'distances':{
        'threshold':400,
        'comparison':'<',
        'interval':(300,500),
        'interval_type':'distance (m)',
        # 'statistic':'pop_pct_access_500m_public_open_space_any_binary',
        }
  }
}}



## Analysis loop

In [10]:
fontprops = fm.FontProperties(size=8)
# for city in ['Odense']:
for city in hexes.keys():
    print(city)
    study_region = cities.query(f'index=="{city}"').to_crs(hexes[city].crs).copy()
    bounds = study_region.bounds
    width = (bounds['maxx'].values[0]-bounds['minx'].values[0])
    height = (bounds['maxy'].values[0]-bounds['miny'].values[0])
    statistics = []
    # create a PdfPages object for file output
    if not os.path.exists('./reports'):
        os.mkdir('./reports')
    with PdfPages(f'reports/{city}_threshold_summary.pdf') as pdf:
        for indicator in thresholds.keys():
            data = thresholds[indicator]['data']
            variable = thresholds[indicator]['variable']
            indicator_scenarios = list(thresholds[indicator]['scenarios'].keys())
            this_scenario = [s for s in scenarios.keys() if s in indicator_scenarios]
            polarity = thresholds[indicator]['polarity']
            # adjust colour scales for indicator polarities (more blue is better, or meeting achievements)
            if polarity == 'negative':
                cmap = batlow_map
                cmap_r = batlow_map_R
            else:
                cmap = batlow_map_R
                cmap_r = batlow_map
            # Aggregate point data (e.g. distances) to hexes
            if data == 'points':
                layer = thresholds[indicator]['layer']
                points = gpd.read_file(f"../process/data/output/{city_data['gpkgNames'][city.lower().replace(' ','_')]}",layer=layer)
                if 'point_function' in thresholds[indicator].keys():
                    points[''.join(variable)]=points[variable].apply(lambda x: thresholds[indicator]['point_function'](x.fillna(value=np.nan)),axis=1)
                    variable = ''.join(variable)
                if 'distances' in this_scenario:
                    # ensure NA values (significant of undefined access > 500m are accounted for in a reasonable approximation)
                    points[variable] = points[variable].mask(points[variable].isna(), 650)
                    # ensure this hex variable doesn't exist, eg as a result of debugging code
                    point_hexes[city] = point_hexes[city][[c for c in point_hexes[city].columns if c!=variable]]
                    point_hexes[city][variable] = point_hexes[city].merge(points.groupby('hex_id')[variable].mean().reset_index(),
                                                              left_on='index', 
                                                              right_on='hex_id')[variable]
                    # fix distances > 500m to 650m, to facilitate plotting of '> 500m' category, if not sorted in point function loop
                    point_hexes[city][variable] = point_hexes[city][variable].mask(point_hexes[city][variable] > 500, 650)
                data = 'hexes'
            # Process maps for indicators using the hex data
            if data == 'hexes':
                if 'hex_function' in thresholds[indicator].keys():
                    point_hexes[city][variable]=point_hexes[city][variable].apply(lambda x: thresholds[indicator]['hex_function'](x),axis=1)

                var_min = round(min(point_hexes[city][variable].dropna()),1)
                var_max = round(max(point_hexes[city][variable].dropna()),1)

                # map main indicator
                fig, ax = plt.subplots(1, 1, figsize=(11.69,8.27))
                ax.set_aspect('equal')
                study_region.plot(ax=ax, color='none', edgecolor='black',zorder=2)
                divider = make_axes_locatable(ax)
                cax = divider.append_axes("right", size="5%", pad=0.1)
                ax.set_xticks([])
                ax.set_yticks([])
                scalebar = AnchoredSizeBar(ax.transData,
                                           1000, '1000 m', 'lower right', 
                                           pad= .01,
                                           color='black',
                                           frameon=False,
                                           fontproperties=fontprops)
                ax.add_artist(scalebar)
                fig.suptitle("\n".join(wrap(indicator, 120 )))
                if 'distances' in indicator_scenarios:
                    point_hexes[city].query(f'{variable} <= 500')\
                               .plot(column=variable, ax=ax, legend=True, cax=cax, cmap=cmap, zorder=1)
                else:
                    point_hexes[city].plot(column=variable, ax=ax, legend=True, cax=cax, cmap=cmap, zorder=1)
                ax.set_rasterized(True)
                pdf.savefig(fig,dpi=200)
                plt.clf()
                # map scenarios using custom splits
                interval_splits ={}
                splits = {}
                for scenario in [s for s in scenarios.keys() if s in indicator_scenarios]:
                    attributes = list(thresholds[indicator]['scenarios'][scenario].keys())
                    # categorical distribution plots for meeting scenarios
                    if ('interval' in attributes):
                        splits[scenario] = thresholds[indicator]['scenarios'][scenario]['interval']
                        interval_type = thresholds[indicator]['scenarios'][scenario]['interval_type']
                        if var_max in splits[scenario]:
                            splits[scenario] = [x if x!=var_max else var_max for x in splits[scenario]]
                        if var_min in splits[scenario]:
                            splits[scenario] = [x if x!=var_min else min(point_hexes[city][variable]) for x in splits[scenario]]
                        interval_splits[scenario] = list(splits[scenario]).copy()
                        split_labels = [f'within {interval_type} {splits[scenario]}']
                        if var_min < splits[scenario][0]:
                            splits[scenario] = [var_min]+list(splits[scenario])
                            split_labels = [f'below {interval_type} lower bound']+split_labels
                        if var_max > splits[scenario][-1]:
                            splits[scenario] = list(splits[scenario])+[var_max]
                            split_labels = split_labels+[f'exceeds {interval_type} upper bound']
                        #print(splits)
                        point_hexes[city][f'{variable}_{scenario}'] = pd.cut(point_hexes[city][variable], bins=splits[scenario], labels=[str(x) for x in range(0,len(split_labels))])
                        point_hexes[city][f'{variable}_{scenario}']
                        fig, ax = plt.subplots(figsize=(11.69,8.27))
                        ax.set_aspect('equal')
                        study_region.plot(ax=ax, color='none', edgecolor='black', zorder=2)
                        ax.set_xticks([])
                        ax.set_yticks([])
                        scalebar = AnchoredSizeBar(ax.transData,
                                                   1000, '1000 m', 'lower right', 
                                                   pad= .01,
                                                   color='black',
                                                   frameon=False,
                                                   fontproperties=fontprops)
                        ax.add_artist(scalebar)
                        fig.suptitle("\n".join(wrap(f'{scenario}: Estimated {indicator} requirement for {scenarios[scenario]}', 120 )))
                        if 'notes' in attributes:
                            ax.set_title(f"{thresholds[indicator]['scenarios'][scenario]['notes']}")
                        point_hexes[city].plot(column = f'{variable}_{scenario}',ax=ax,legend=True,cmap=cmap, zorder=1,legend_kwds={'borderaxespad':-4-height**.001, 'loc':'lower center'})
                        legend = ax.get_legend()
                        for text, label in zip(legend.get_texts(), split_labels):
                            text.set_text(label)
                        ax.set_rasterized(True)
                        pdf.savefig(fig, dpi=200)
                        plt.clf()
                    if ('statistic' in attributes):
                        statistics.append(thresholds[indicator]['scenarios'][scenario]['statistic'])
                    elif ('interval' in attributes):
                        # Estimated percentage of population meeting indicator threshold
                        percentages = (100*point_hexes[city]\
                                    .groupby([point_hexes[city][f'{variable}_{scenario}']])['pop_est']\
                                    .sum()\
                                    /point_hexes[city]['pop_est'].sum()).round(1)
                        percentages.index = split_labels
                        for c in split_labels:
                            try:
                                statistic = f'pop_pct_{scenario} - {indicator} - {c}'
                                cities.loc[city,statistic] = percentages.loc[c]
                            except:
                                cities.loc[city,statistic] = 0
                            finally:
                                statistics.append(statistic)
                if scenario == 'distances':
                    # histogram of distances (including NaN as > 500m, along with other > 500m)
                    point_hexes[city][f'{variable}'].mask(point_hexes[city][variable] > 500, 650)\
                                              .mask(point_hexes[city][variable].isna(), 650)\
                                              .plot.hist(grid=False, bins = range(0, 700,50),xticks=range(0, 600,100),align='mid',color=batlow_map(255))
                    plt.text(618,0,">",verticalalignment='center')
                else:
                    point_hexes[city][f'{variable}'].hist(grid=False,color=batlow_map(255))
                plt.suptitle("\n".join(wrap(f'Histogram of {indicator}.',120)))
                              
                for scenario in [s for s in scenarios.keys() if s in indicator_scenarios]:
                    attributes = list(thresholds[indicator]['scenarios'][scenario].keys())
                    if ('interval' in attributes):
                        plt.text(0.5,scenario_style[scenario]["align"],f'\n\n{scenario}: {interval_type} {thresholds[indicator]["scenarios"][scenario]["interval"]}; {scenario_style[scenario]["line"]} range', color=scenario_style[scenario]["colour"], transform=plt.gcf().transFigure,ha='center',va='center')
                        for line in [x for x in splits[scenario][1:] if x!=var_max]:
                            plt.axvline(line, color='k', linestyle=scenario_style[scenario]["line"], linewidth=1)
                        plt.axvspan(*interval_splits[scenario], color=scenario_style[scenario]["colour"],alpha=0.6, zorder=2)
                plt.ylabel("Frequency")
                pdf.savefig(fig)
                plt.clf()
            
    plt.close('all')

cities[statistics].fillna('-').transpose().to_csv(f'./reports/Global Indicators 2020 - thresholds summary estimates.csv')
cities[statistics].fillna('-').transpose()

Maiduguri
Mexico City
Baltimore
Phoenix
Seattle
Sao Paulo
Hong Kong
Chennai
Bangkok
Hanoi
Adelaide
Melbourne
Sydney
Auckland
Graz
Ghent
Olomouc
Odense
Cologne
Lisbon
Barcelona
Valencia
Vic
Bern
Belfast


City,Maiduguri,Mexico City,Baltimore,Phoenix,Seattle,Sao Paulo,Hong Kong,Chennai,Bangkok,Hanoi,...,Ghent,Olomouc,Odense,Cologne,Lisbon,Barcelona,Valencia,Vic,Bern,Belfast
pop_pct_A - Mean 1000 m neighbourhood population per km² - below 95% CrI lower bound,2.0,1.1,60.5,69.9,89.1,0.4,1.7,0.2,1.8,4.2,...,100.0,100.0,94.0,52.5,1.9,4.5,2.1,52.8,17.8,40.3
"pop_pct_A - Mean 1000 m neighbourhood population per km² - within 95% CrI (4790, 6750)",5.6,1.9,21.5,24.4,6.5,0.5,1.3,0.4,2.9,6.1,...,0.0,0.0,6.0,47.5,3.7,5.5,3.6,43.1,70.3,42.0
pop_pct_A - Mean 1000 m neighbourhood population per km² - exceeds 95% CrI upper bound,92.4,97.0,18.1,5.7,4.4,99.1,97.0,99.4,95.3,89.6,...,-,-,-,-,94.4,90.0,94.2,4.0,11.8,17.6
pop_pct_B - Mean 1000 m neighbourhood population per km² - below 95% CrI lower bound,4.1,1.9,72.0,84.3,93.6,0.6,2.3,0.4,3.0,7.0,...,100.0,100.0,100.0,78.4,3.1,7.6,4.1,75.7,41.7,59.8
"pop_pct_B - Mean 1000 m neighbourhood population per km² - within 95% CrI (5677, 7823)",7.8,2.5,16.4,14.0,3.0,0.7,1.3,0.5,3.8,7.5,...,0.0,0.0,0.0,21.6,5.3,5.0,2.9,24.3,58.3,37.6
pop_pct_B - Mean 1000 m neighbourhood population per km² - exceeds 95% CrI upper bound,88.1,95.6,11.6,1.7,3.4,98.7,96.4,99.2,93.2,85.5,...,-,-,-,-,91.5,87.3,92.9,-,-,2.5
pop_pct_A - Mean 1000 m neighbourhood street intersections per km² - below 95% CrI lower bound,54.4,10.4,35.1,25.6,38.7,12.1,4.3,9.6,38.5,32.4,...,32.5,31.0,5.1,16.1,0.2,17.4,21.3,34.7,0.8,9.0
"pop_pct_A - Mean 1000 m neighbourhood street intersections per km² - within 95% CrI (90, 110)",21.3,14.6,16.1,29.4,22.0,23.3,5.8,14.8,27.5,15.1,...,16.6,18.6,13.4,16.6,1.9,10.5,7.7,11.9,1.4,21.1
pop_pct_A - Mean 1000 m neighbourhood street intersections per km² - exceeds 95% CrI upper bound,24.3,75.0,48.7,45.0,39.3,64.7,89.9,75.6,34.0,52.5,...,50.9,50.4,81.4,67.3,97.8,72.1,70.9,53.4,97.9,70.0
pop_pct_B - Mean 1000 m neighbourhood street intersections per km² - below 95% CrI lower bound,71.5,21.4,48.2,49.1,56.8,29.6,8.5,20.7,60.4,43.7,...,45.1,45.8,14.6,28.4,1.3,25.1,27.6,43.6,1.8,26.0


In [11]:
stats = cities[statistics].transpose().reset_index()
MultiIndex = stats['index'].apply(lambda x: tuple(x.split(' - ')))\
    .apply(lambda x: \
           (x[1],
            x[0].replace('pop_pct','') ,
            x[2].replace(' 95% CrI ','').replace('lower','').replace('upper','')))
stats.index = pd.MultiIndex.from_tuples(MultiIndex)
stats = stats[[c for c in stats.columns if c!='index']]
with open('thresholds.tex', 'w') as f:
    print(stats.transpose().to_latex(),file=f)
stats

Unnamed: 0,Unnamed: 1,City,Maiduguri,Mexico City,Baltimore,Phoenix,Seattle,Sao Paulo,Hong Kong,Chennai,Bangkok,Hanoi,...,Ghent,Olomouc,Odense,Cologne,Lisbon,Barcelona,Valencia,Vic,Bern,Belfast
Mean 1000 m neighbourhood population per km²,_A,below bound,2.0,1.1,60.5,69.9,89.1,0.4,1.7,0.2,1.8,4.2,...,100.0,100.0,94.0,52.5,1.9,4.5,2.1,52.8,17.8,40.3
Mean 1000 m neighbourhood population per km²,_A,"within(4790, 6750)",5.6,1.9,21.5,24.4,6.5,0.5,1.3,0.4,2.9,6.1,...,0.0,0.0,6.0,47.5,3.7,5.5,3.6,43.1,70.3,42.0
Mean 1000 m neighbourhood population per km²,_A,exceeds bound,92.4,97.0,18.1,5.7,4.4,99.1,97.0,99.4,95.3,89.6,...,,,,,94.4,90.0,94.2,4.0,11.8,17.6
Mean 1000 m neighbourhood population per km²,_B,below bound,4.1,1.9,72.0,84.3,93.6,0.6,2.3,0.4,3.0,7.0,...,100.0,100.0,100.0,78.4,3.1,7.6,4.1,75.7,41.7,59.8
Mean 1000 m neighbourhood population per km²,_B,"within(5677, 7823)",7.8,2.5,16.4,14.0,3.0,0.7,1.3,0.5,3.8,7.5,...,0.0,0.0,0.0,21.6,5.3,5.0,2.9,24.3,58.3,37.6
Mean 1000 m neighbourhood population per km²,_B,exceeds bound,88.1,95.6,11.6,1.7,3.4,98.7,96.4,99.2,93.2,85.5,...,,,,,91.5,87.3,92.9,,,2.5
Mean 1000 m neighbourhood street intersections per km²,_A,below bound,54.4,10.4,35.1,25.6,38.7,12.1,4.3,9.6,38.5,32.4,...,32.5,31.0,5.1,16.1,0.2,17.4,21.3,34.7,0.8,9.0
Mean 1000 m neighbourhood street intersections per km²,_A,"within(90, 110)",21.3,14.6,16.1,29.4,22.0,23.3,5.8,14.8,27.5,15.1,...,16.6,18.6,13.4,16.6,1.9,10.5,7.7,11.9,1.4,21.1
Mean 1000 m neighbourhood street intersections per km²,_A,exceeds bound,24.3,75.0,48.7,45.0,39.3,64.7,89.9,75.6,34.0,52.5,...,50.9,50.4,81.4,67.3,97.8,72.1,70.9,53.4,97.9,70.0
Mean 1000 m neighbourhood street intersections per km²,_B,below bound,71.5,21.4,48.2,49.1,56.8,29.6,8.5,20.7,60.4,43.7,...,45.1,45.8,14.6,28.4,1.3,25.1,27.6,43.6,1.8,26.0


In [12]:
stats_reduced = stats.loc[(stats.index.get_level_values(1) != "_distances") & (stats.index.get_level_values(2).str.startswith(('within','exceeds')))\
         |(stats.index.get_level_values(1) == "_distances") & (stats.index.get_level_values(2).str.startswith(('below','within')))]\
            .groupby(level=(0,1)).sum()\
            .transpose()
stats_reduced = stats_reduced[thresholds.keys()]
stats_reduced.to_csv(f'./reports/Global Indicators 2020 - thresholds summary simplified.csv')
stats_reduced

Unnamed: 0_level_0,Mean 1000 m neighbourhood population per km²,Mean 1000 m neighbourhood population per km²,Mean 1000 m neighbourhood street intersections per km²,Mean 1000 m neighbourhood street intersections per km²,Distance to nearest public transport stops (m; up to 500m),Distance to nearest park (m; up to 500m)
Unnamed: 0_level_1,_A,_B,_A,_B,_distances,_distances
City,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Maiduguri,98.0,95.9,45.6,28.5,9.2,1.7
Mexico City,98.9,98.1,89.6,78.6,35.3,48.5
Baltimore,39.6,28.0,64.8,51.7,62.7,61.8
Phoenix,30.1,15.7,74.4,51.0,65.9,35.3
Seattle,10.9,6.4,61.3,43.2,60.2,58.7
Sao Paulo,99.6,99.4,88.0,70.4,96.9,71.3
Hong Kong,98.3,97.7,95.7,91.5,90.9,87.9
Chennai,99.8,99.7,90.4,79.3,38.0,40.3
Bangkok,98.2,97.0,61.5,39.7,63.5,13.7
Hanoi,95.7,93.0,67.6,56.3,65.3,25.8


### Explanation of remaining inconsistency of distance analyses between main analysis and post hoc reshold analysis here
There is a small but fundamental difference taken in the main analysis and the threshold analysis above with regard to evaluation of distances and thresholds; the estimates for population within threshold presented above reflect this through small difference in results for public transport and public open space access.

In the main analysis, after deriving 'full distances' for sample points we were primarily interested not in the recorded distance but rather whether this was within (<=) 500m or not.  This was evaluated, averaged to hexagon grids, then the percentage of population with access was evaluated as the population weighted average of hexagon access scores at the city level.

In the post hoc threshold analysis, the specific recorded distances at the hex level are of interest.  To estimate these, average distances need to be estimated grouped by hexagon identifier.  However, where null values exist for sample point distances (evaluated as 0 for access in the main analysis, as opposed to 1) these need to be accounted for some how: they signify some uncertain degree of distance beyond 500m which was not recorded. To do this in the post hoc threshold analysis we used 650m as a plug in approximation of distance, representing an optimistic view of what average access beyond this distance might be.  While some sample points' distance to nearest destination of interest would be less than this had it been recorded, many would be expected to be further and some much so.  After aggregation, to facilitate plotting of histograms all hexagons with average distance in excess of 500m are set equal to 650m and represented in the plot as ">", indicative that the true value in excess of 500m is uncertain.  

So, the difference is that the main analysis evaluates access within 500m for sample points and then aggregates to hexagons with weights by population, while the threshold analysis estimated average distance within each hexagon (with some assumptions) and subsequently relates these to thresholds before weighting by population.  If the assumptions had not been made (ie. 650m as a stand in for distances not measured due to being further than analysed), then the average distances calculated for hexagons would have been underestimated.  Hence the final results are simliar, but may differ by a few percentage points.

A proposed solution might be, when presenting 'distance within 500m' we use the main analysis results for consistency; we could otherwise note that due to the different methods employed the multiple threshold analysis of distance results may vary slightly from the binary result.