In [1]:
from configure import configuration
from analysis import analysis
from generate import generate
from compare import compare

# Global Healthy and Sustainable City Indicators (GHSCI) analysis for Las Palmas de Gran Canaria, Spain

This notebook contains an example of how the GHSCI tool can be used to run an analysis for a study region of interest, for example, a city or set of neighbourhoods.

In [2]:
codename = 'example_ES_Las_Palmas_2023'

## Configuration

A helper script has been provided for initialising new study region configuration files.  Two such files have been provided in the folder `process/configuration/regions`:

1. `example_ES_Las_Palmas_2023.yml`
  - This **example** defines the Spanish city of Las Palmas de Gran Canaria with a target time point of 2023
  - The codename `example_ES_Las_Palmas_2023` describes the above using a recommended shorthand structure, starting with a two-letter country code
    - For cities with shared names in different countries like Valencia this can differentiate between the two (i.e. ES_Valencia_2023 refers to the city in Spain, and VE_Valencia_2023 to the city Venezuela, and both using the same time point of 2023)
  - You can use suffixes or prefixes as required to ensure your codenames clearly describe the study region configurations they represent
2. `ES_Las_Palmas_2023_test_not_urbanx.yml`
  - This study region configuration provides a **sensitivity analysis** for the use of a city administrative boundary without restricting to the intersection with an urban region using the Global Human Settlements Layer Urban Centres Database (ie. `ghsl_intersection = false` instead of `true`)
  - We can use this later on in the workflow to compare the indicator results as a sensitivity analysis and evaluate the impact of this methodological decision

Codenames chosen should be no more than 40 characters in total to avoid errors with too long file names when additional files are created using this name as part of the workflow.

In [3]:
configuration(codename)


Configuration file for the specified study region codename
'example_ES_Las_Palmas_2023' already exists:
configuration/regions/example_ES_Las_Palmas_2023.yml.

Please open and edit this file in a text editor following the provided example
directions in order to complete configuration for your study region.  A
completed example study region configuration can be viewed in the file
'configuration/regions/example_ES_Las_Palmas_2023.yml'.

To view additional guidance on configuration, run this script again without a
codename.

Once configuration has been completed, to proceed to analysis for this city,
enter:
analysis example_ES_Las_Palmas_2023


## Analysis

The below function runs the following series of scripts located in the `subprocesses` folder for the specified study region codename:

|**Subprocess step** | **Description** | 
|--------------------|-----------------|
|_00_create_database.py | Create database | 
|_01_create_study_region.py | Create study region | 
|_02_create_osm_resources.py | Create OpenStreetMap resources | 
|_03_create_network_resources.py | Create pedestrian network | 
|_04_create_population_grid.py | Align population distribution | 
|_05_compile_destinations.py | Compile destinations | 
|_06_open_space_areas_setup.py | Identify public open space | 
|_07_locate_origins_destinations.py | Analyse local neighbourhoods | 
|_08_destination_summary.py | Summarise spatial distribution | 
|_09_urban_covariates.py | Collate urban covariates | 
|_10_gtfs_analysis.py | Analyse GTFS Feeds | 
|_11_neighbourhood_analysis.py | Analyse neighbourhoods | 
|_12_aggregation.py | Aggregate region summary analyses | y analyses',

In [4]:
analysis(codename)


Las Palmas de Gran Canaria (example_ES_Las_Palmas_2023)

Output directory:
  process/data/_study_region_outputs/example_ES_Las_Palmas_2023

A dated copy of project and region parameters has been saved as
process/data/_study_region_outputs/example_ES_Las_Palmas_2023/_parameters.yml.

Analysis time zone: Australia/Melbourne (to set time zone for where you are,
edit config.yml)

Analysis start:	2023-05-30_1618


                                      0%|                              | (0/13)

Analysis end:	2023-05-30_1620 (approximately 2.6 minutes)

To generate resources (data files, documentation, maps, figures, reports) using
the processed results for this study region, enter:

generate example_ES_Las_Palmas_2023

The Postgis SQL database for this city example_es_las_palmas_2023 can also be
accessed from QGIS or other applications by specifying the server as 'localhost'
and port as '5433', with username 'postgres' and password 'ghscic'.The SQL
database can also be explored on the command line by using the above password
after entering,'psql -U postgres -h gateway.docker.internal -p 5433 -d
"example_es_las_palmas_2023"'. When using psql, you can type '\dt' to list
database tables, '\d <table_name>' to list table columns, and 'SELECT * FROM
<table_name> LIMIT 10;' to view the first 10 rows of a table.  To exit psql,
enter '\q'.



## Generate resources

The `generate()` function is used to generate data, metadata, maps, figure and reports, optionally in multiple languages, for processed cities.  It lists the resources as they are generated.

In [6]:
# generate resources, but suppress display of images in this Jupyter Notebook to reduce filesize
%matplotlib agg 
generate(codename)


Las Palmas de Gran Canaria (example_ES_Las_Palmas_2023)

Output directory:
  process/data/_study_region_outputs/example_ES_Las_Palmas_2023

Analysis parameter summary text file
  _parameters.yml

Analysis log text file
  __Las Palmas de Gran Canaria__example_ES_Las_Palmas_2023_processing_log.txt

Data files
  example_ES_Las_Palmas_2023_1600m_buffer.gpkg
    - example_ES_Las_Palmas_2023_region
    - example_ES_Las_Palmas_2023_grid_100m
    - example_ES_Las_Palmas_2023_sample_points
    - aos_public_osm
    - dest_type
    - destinations
    - clean_intersections_12m
    - edges
    - nodes
    - pt_stops_headway
example_ES_Las_Palmas_2023_region.csv
example_ES_Las_Palmas_2023_grid_100m.csv

Data dictionaries
  output_data_dictionary.csv
  output_data_dictionary.xlsx

Metadata
  example_ES_Las_Palmas_2023_metadata.yml
  example_ES_Las_Palmas_2023_metadata.xml

Figures and maps (English)
  figures/access_profile_English.jpg
  figures/all_cities_walkability_English.jpg
  figures/pct_acces

## Sensitivity analyses

To evaluate the impact of your methodological decisions taken when configuring your study region, including selection of data sources, you may conduct sensitivity analyses.  An example has been provided to explore the impact of the decision to restrict to the urban area (`example_ES_Las_Palmas`) or not (`ES_Las_Palmas_2023_test_not_urbanx`).  *A priori*, we would assume that by restricting to an urban area would result in higher estimates for population density and street connectivity, and more proximal access to amenities.

Other comparisons are possible.  For example, 

- one could vary the study region boundary supplied, the parameter used for consolidating intersections, supply additional destination data or modify the definitions used to extract features of interest from the OpenStreetMap data.
  - When doing these modifications, the resulting generated datasets can be expected by local experts to evaluate how they conform with their knowledge of the area under study.  
- An official reference set of data could be used for example for population, to evaluate the use of a modelled population data layer compared with the official population data.  
- Population data for demographic sub-groups could be used (eg. using strata of age and/or sex, or other characteristics as available); the resulting aggregated study region indicators would provide population-specific estimates.
- Analyses could be conducted for different time points using historical data
- Finally, data could be modified to represent hypothetical interventions and evaluate their impact on the calculated indicators

In [7]:
comparison_codename = 'ES_Las_Palmas_2023_test_not_urbanx'

In [8]:
analysis(comparison_codename)


Las Palmas de Gran Canaria (ES_Las_Palmas_2023_test_not_urbanx)

Output directory:
  process/data/_study_region_outputs/ES_Las_Palmas_2023_test_not_urbanx

A dated copy of project and region parameters has been saved as process/data/_st
udy_region_outputs/ES_Las_Palmas_2023_test_not_urbanx/_parameters.yml.

Analysis time zone: Australia/Melbourne (to set time zone for where you are,
edit config.yml)

Analysis start:	2023-05-30_1636


                                      0%|                              | (0/13)

Analysis end:	2023-05-30_1639 (approximately 2.9 minutes)

To generate resources (data files, documentation, maps, figures, reports) using
the processed results for this study region, enter:

generate ES_Las_Palmas_2023_test_not_urbanx

The Postgis SQL database for this city es_las_palmas_2023_test_not_urbanx can
also be accessed from QGIS or other applications by specifying the server as
'localhost' and port as '5433', with username 'postgres' and password
'ghscic'.The SQL database can also be explored on the command line by using the
above password after entering,'psql -U postgres -h gateway.docker.internal -p
5433 -d "es_las_palmas_2023_test_not_urbanx"'. When using psql, you can type
'\dt' to list database tables, '\d <table_name>' to list table columns, and
'SELECT * FROM <table_name> LIMIT 10;' to view the first 10 rows of a table.  To
exit psql, enter '\q'.



In [None]:
# generate resources, but suppress display of images in this Jupyter Notebook to reduce filesize
%matplotlib agg
generate(comparison_codename)


Las Palmas de Gran Canaria (ES_Las_Palmas_2023_test_not_urbanx)

Output directory:
  process/data/_study_region_outputs/ES_Las_Palmas_2023_test_not_urbanx

Analysis parameter summary text file
  _parameters.yml

Analysis log text file
  __Las Palmas de Gran Canaria__ES_Las_Palmas_2023_test_not_urbanx_processing_log.txt

Data files
  ES_Las_Palmas_2023_test_not_urbanx_1600m_buffer.gpkg
    - ES_Las_Palmas_2023_test_not_urbanx_region
    - ES_Las_Palmas_2023_test_not_urbanx_grid_100m
    - ES_Las_Palmas_2023_test_not_urbanx_sample_points
    - aos_public_osm
    - dest_type
    - destinations
    - clean_intersections_12m
    - edges
    - nodes
    - pt_stops_headway
ES_Las_Palmas_2023_test_not_urbanx_region.csv
ES_Las_Palmas_2023_test_not_urbanx_grid_100m.csv

Data dictionaries
  output_data_dictionary.csv
  output_data_dictionary.xlsx

Metadata
  ES_Las_Palmas_2023_test_not_urbanx_metadata.yml
  ES_Las_Palmas_2023_test_not_urbanx_metadata.xml

Figures and maps (English)
  figures/acc

  fig = mpl.pyplot.figure()


  figures/destination_count_Restaurant.png
  figures/destination_count_Cafe.png
  figures/destination_count_Food_court.png
  figures/destination_count_Fast_food.png
  figures/destination_count_Pub.png
  figures/destination_count_Bar.png
  analysis_report_2023-05-30_1618.pdf


It is important to take the time to familiarise yourself with the various
outputs generated from the configuration and analysis of your region of interest
to ensure they provide a fair and accurate representation given local knowledge.
Any issues or limitations identified should be understood and can be iteratively
addressed and/or acknowledged in documentation prior to dissemination.



## Comparisons

As suggested above, a variety of interesting comparisons can be made using generated indicator data

- Sensitivity analyses exploring the impact of methodological choices
- Comparisons between different study regions for the same point in time
- Comparisons within a city for different points in time
- Evaluating the impact of hypothetical scenarios and/or interventions, using modified data

Below, we compare the impact of restricting the study region to the urban area

In [None]:
compare(codename,comparison_codename)

We can see from the above comparison that, as expected, density estimates and the percentage of population with access to most kinds of amenities evaluated were higher with restriction to the empirically defined urban region.  While the most likely explanation is that urban areas are associated with higher population, street connectivity and levels of amenity provision, the possibility of data bias should also be considered and if possible evaluated: data may have more completeness, detail and be more up to date for urban areas.  Hence, restriction to the empirical urban area is an important methodological choice: it could mitigate bias for areas in cases where data is found lacking, however, it could also exclude important sectors of the population living in urban fringe areas that may be of interest and whose inclusion may be important for a more complete understanding of the equitable distribution of healthy and sustainable urban environments.  Decisions such as these need to be made by analysts with local area knowledge or in consulation with local experts to ensure the representation and analysis of the study region in question is fair, meaningful and useful for informing local decision making.