Skip to content

perezjoan/PPCA-codes

Repository files navigation

PPCA-worldwide-protocol

Population Potential on Catchment Area - Worldwide Protocol

This repository is part of the emc2 research project. emc2 project
ESPACE laboratory

Global Objectives

This repository contains the implementation of the PPCA (Population Potential on Catchment Areas) protocol, which is a part of the EMC2 research project. The project aims to evaluate population potential within specified catchment areas using various data sources and machine learning techniques.

Sample Data

The protocol is designed to work globally. The user only needs to provide the coordinates of a bounding box for the area of interest. Coordinate examples are provided here.

Installation Steps

The code is written in Python, and each script requires a specific environment. The environments are detailed below with information containing the necessary commands to install the environnements.

Follow these steps to run the Python algorithms :

  • Install the Anaconda distribution of Python
  • Navigate to the relevant section and create a specific environment (detailed environment settings are provided here.
  • Activate an environment and run the related Python scripts

Project sections

PPCA 1.0 : GHS and OSM automated data acquisition Link to code

Description:

This script facilitates the acquisition of spatial data using a combination of Google Earth Engine, OpenStreetMap, and QGIS tools. It starts by authenticating and initializing Earth Engine and downloading Global Human Settlement (GHS) raster data for a specified year. The GHS data is then exported as a raster image for a defined geographical area. Once downloaded, the raster data is converted to vector data using QGIS and saved in a local GeoPackage. The script proceeds to extract building data within the same geographical area from OpenStreetMap, cleans the data by removing any list-type columns, and saves the cleaned data into the GeoPackage. Additionally, the script extracts street data from OpenStreetMap, converting it into a GeoDataFrame format and filtering it to separate pedestrian streets from primary roads. Both sets of street data are visualized and saved in the GeoPackage. The result is a set of spatial data layers, including GHS population data, building footprints, and street layers.

Requirements:

Guide to run the script:

  • Fill box 0.2 within the code
  • Put the output of step 1. in your working directory before running step 2.

Outputs :

  • A raster file with the GHS population data
  • A geopackage file with 4 layers :
    • 'ghs_{date}_vector'(Polygon), GHS population data at a given date
    • 'osm_all_area_categories ' (Polygon), OSM land use data with non-populated areas
    • 'osm_all_buildings' (Polygon), OSM all buildings
    • 'osm_all_streets' (LineString), OSM all streets

PPCA 2.0 : Data filter / Preparation: Link to code

Description:

The script processes and filter spatial data from OSM and GHS sources for further analysis. It performs four main tasks: (1) It reads and filters Global Human Settlement (GHS) data by rounding values and removing meshes with zero population (2) It filters OpenStreetMap (OSM) streets data to separate pedestrian and non-pedestrian streets (3) It filters OSM land use data to identify non-populated areas.

Requirements:

  • A specific working environment on Python Link to environment
  • Output file from PPCA 1.0 ('ghs_{date}_vector'(Polygon), GHS population data at a given date ; 'osm_all_area_categories ' (Polygon), OSM land use data with non-populated areas ; 'osm_all_streets' (LineString), OSM all streets)

Guide to run the script:

  • Fill 0.2 box within the script

Outputs :

  • A geopackage file with 4 layers :
    • 'ghs_populated_2020_vector'(Polygon), GHS population data with non null values
    • 'osm_non_populated_areas ' (Polygon), OSM land use data with non-populated areas
    • 'pedestrian_streets' (LineString), OSM pedestrian streets
    • 'non_pedestrian_streets' (LineString), OSM non-pedestrian streets

PPCA 3.0 Morphometry on Buildings: Link to code

Description:

This script performs several calculations and transformations on a layer of OSM buildings. It begins by ensuring the columns 'height' and 'building are numeric, converting any non-numeric entries to NaN. The script then fills missing 'height' values by multiplying floors by 3, assuming an average floor height of 3 meters. Conversely, it fills missing building values by dividing 'height' by 3 and rounding the result. It calculates and prints the number and percentage of rows with NaN in both 'height' and 'building. Several new columns are computed: 'FL' for the number of floors, 'A' for the surface area, 'P' for the perimeter, 'E' for elongation, 'C' for convexity, 'FA' for floor area, 'ECA' for a product involving elongation, convexity, and area, 'EA' for another elongation-area product, and 'SW' for shared walls ratio. Finally, the script renames 'building:floors' to 'FL'.

Requirements:

  • A specific working environment on Python Link to environment
  • Output file from PPCA 1.0 ('osm_all_buildings' (Polygon), OSM all buildings)

Guide to run the script:

  • Fill 0.2 box

Output :

  • A geopackage file with a single layer
    • 'osm_all_buildings_ind' (Polygon), osm buildings with height/floor values completed and with morphometric indicators

PPCA 4.0 Residential & non-residential buildings : classification based on attributes: Link to code

Description:

This script filter out buildings with a footprint area less than 15 m² and optionally filters out buildings that have no walls, if the 'wall' column exists. It then create a column 'type' within the OSM building data with three possible values (# 0 : NA ; 1 : residential or mixed-use ; 2 : non-residential). Values are filled using the OSM attributes 'building_type' : apartments', 'barracks', 'house', 'residential', 'bungalow', 'cabin', 'detached', 'dormitory', 'farm', 'static_caravan', 'semidetached_house' & 'stilt_house' are considered as residential or mixed-use buildings. Finally, the classification is refined by attributing 0 values to null values based on the spatial relationships with non-populated OSM land use areas. Final score of classified buildings vs buldings with null values are printed and mapped.

Requirements:

  • A specific working environment on Python Link to environment
  • Output file from PPCA 3.0 ('osm_all_buildings_ind' (Polygon), OSM all buildings)
  • Output file from PPCA 2.0 ('osm_non_populated_areas' (Polygon), OSM land use data with non-populated areas)

Guide to run the script:

  • Fill 0.2 box

Output :

  • A geopackage file with a single layer
    • 'osm_all_buildings_res_type_with_null' (Polygon), osm buildings with residential classification

PPCA 5.0 Floor : Floor : Fill null values with decision tree classifier: Link to code

Description:

This script trains and evaluates a Decision Tree Classifier on OSM building data for evaluating the number of floors per building ('FL'). The process begins by preparing the data, splitting it into training and testing subsets based on a specified training ratio. The classifier is then trained on the training set and its accuracy is evaluated on the test set. Next, the trained model is used to predict missing 'FL' values (number of floors) in the OSM building data where 'FL' values are null. The output includes a new variable named 'FL_filled', which contains the original 'FL' values for non-null entries and model predictions fornull entries. Additionally, the script visualizes the decision tree, maps the results, and explores how the classifier's accuracy varies with different proportions of training data, plotting accuracy as a function of the training data size.

Requirements:

  • A specific working environment on Python Link to environment
  • Output file from PPCA 4.0 ('osm_all_buildings_res_type_with_null' (Polygon), osm buildings with residential classification and null)

Guide to run the script:

  • Fill 0.2 box

Output :

  • A geopackage file with a single layer :
    • 'osm_all_buildings_FL_filled' (Polygon), osm buildings with number of floors filled by Decision Tree Classifier

PPCA 6.0 Residential & non-residential buildings : fill values with decision tree classifier: Link to code

Description:

This script trains and evaluates a Decision Tree Classifier on OSM building data. Initially, it splits the dataset into training and testing subsets based on a specified training ratio. It then trains the classifier using the training set and evaluates its accuracy on the test set. Subsequently, it applies the trained model to predict missing 'type' values on the OSM building data with missing values for 'type'. Within the output, a new variabme containing named 'type_filled' is created with two modalities (1 : residential or mixed-use ; 2 : non-residential). 'type_filled' takes the value of the OSM 'type' varaible for non null values, and the model prediction for null values. The script also visualizes the decision tree, map the results and examines how the classifier's accuracy varies with different proportions of training data, plotting the accuracy as a function of the training data size.

Requirements:

  • A specific working environment on Python Link to environment
  • Output file from PPCA 5.0 ('osm_all_buildings_FL_filled' (Polygon), osm buildings with number of floors filled by Decision Tree Classifier)

Guide to run the script:

  • Fill 0.2 box

Output :

  • A geopackage file with a single layer :
    • 'osm_all_buildings_FL_type_filled' (Polygon), osm buildings with residential classification null filled by Decision Tree Classifier

PPCA 7.0 : Population potential estimation per building Work in progress

PPCA 8.0 : Population potential estimation per catchment areas Work in progress

Acknowledgement

This resource was produced within the emc2 project, which is funded by ANR (France), FFG (Austria), MUR (Italy) and Vinnova (Sweden) under the Driving Urban Transition Partnership, which has been co-funded by the European Commission.

License

The emc2 project is licensed under the [Attribution-ShareAlike 4.0 International]. See the LICENSE file for details.

About

Population Potential on Catchment Area - Worldwide Protocol

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published