Skip to content
Nic Pottier edited this page Apr 24, 2020 · 21 revisions

Ubuntu 14.04 setup

Add additional software repositories

  • create a file /etc/apt/sources.list.d/nginx.list:
deb http://nginx.org/packages/ubuntu/ codename nginx
deb-src http://nginx.org/packages/ubuntu/ codename nginx
  • curl http://nginx.org/keys/nginx_signing.key | apt-key add -
  • apt-get update

Installing base packages

  • apt-get install git tig gdal-bin libgdal-dev python-dev python-virtualenv build-essential libyaml-dev libspatialindex-dev postgresql-9.3-postgis-2.1 nginx uwsgi uwsgi-plugin-python zip

Configure PostgreSQL

  • in file /etc/postgresql/9.3/main/postgresql.conf update:
    • shared_buffers = 512MB, temp_buffers = 16MB, work_mem = 32MB, maintenance_work_mem = 128MB, effective_cache_size = 1024MB, checkpoint_segments = 16, wal_buffers = 16MB, checkpoint_segments = 32, checkpoint_completion_target = 0.3, random_page_cost = 1.1 #AWS specific
  • as postgres user create ubuntu superuser
    • createuser -s ubuntu

Auto-update planet.osm dataset

Prerequisites

  • create planet.osm directory: mkdir planet.osm && cd planet.osm
  • download planet.osm dataset: wget -c http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/pbf/planet-latest.osm.pbf
  • download and compile osm data utilities:
    • wget -O - http://m.m.i24.cc/osmconvert.c | cc -x c - -lz -O3 -o osmconvert
    • wget -O - http://m.m.i24.cc/osmupdate.c | cc -x c - -o osmupdate
    • wget -O - http://m.m.i24.cc/osmfilter.c | cc -x c - -O3 -o osmfilter
  • convert planet.osm to the o5m format (required for other utilites)
    • ./osmconvert planet-latest.osm.pbf -o=planet-latest.osm.o5m
  • update planet-latest.osm.o5m
    • ./osmupdate -v planet-latest.osm.o5m new.planet-latest.osm.o5m
  • clean up
    • rm planet-latest.osm.pbf && mv new.planet-latest.osm.o5m planet-latest.osm.o5m
  • extract admin_levels
    • ./osmfilter planet-latest.osm.o5m --keep=admin_level -o=admin_levels.o5m
  • convert to the PBF format
    • ./osmconvert admin_levels.o5m -o=admin_levels.pbf

Running auto-update script

  • in posm/extractor directory, copy and update configuration file: cp auto_update_osm.conf.tmpl auto_update_osm.conf
    • configuration is straightforward, just define the directory that has osm data utilities and the planet.osm file
  • execute auto_update script: bash auto_update_osm.sh

Service setup

  • clone git repository
  • initialize python virtual environment virtualenv ~/posm_env
  • manually install packages (do not use pip install -r pip-requires.txt)
    • pip install Shapely==1.3.0 Rtree==0.7.0 PyYAML==3.11
    • ubuntu specific fix to install GDAL in the virtual_env
      • pip install --no-install GDAL==1.10.0 && cd ~/posm_env/build/GDAL/ && python setup.py build_ext --include-dirs=/usr/include/gdal && pip install --no-download GDAL && cd -

Extract configuration

  • create database and install extensions:

    • createdb posm
    • psql -c 'create extension postgis;' posm
    • psql -c 'create extension postgis_topology;' posm
  • create plpgsql functions:

    • psql -f extractor/postgis_sql/proc_functions.sql posm
  • in posm/extractor directory copy template YAML configuration files: cp admin_mapping.yaml.tmpl admin_mapping.yaml && cp settings.yaml.tmpl settings.yaml && cp admin_level_0.txt.tmpl admin_level_0.txt && cp admin_level_1.txt.tmpl admin_level_1.txt && cp admin_level_2.txt.tmpl admin_level_2.txt

  • in settings.yaml set:

    • osm_data_file to an admin_level OSM data source file (*.pbf) - final result of the auto_update_osm.sh script
    • tempfile_dir to '/mnt'
      • to use AWS SSD drive to store temporary files we need to allow write permission to ubuntu user sudo chown ubuntu /mnt/
    • memory_limit to '1' for AWS
      • due to OSM data size (~500mb), script requires ~3Gb of memory to store temporary OSM data, additionally it will require 1.5Gb of memory for the actual Python processing
      • as the server has 3.7Gb, we can limit memory usage to 1Mb which will force the script to store temporary OSM data on the disk
      • if you have a lot of memory as a rule of thumb, you can set it to 3 times the size of the OSM .pbf dataset
      • also check the debug_file file for any messages like 'Not enough memory for temporary storage, ...' and increase memory limit if needed
    • postgis to your postgis database identifier, i.e. "PG:dbname=posm", omitting other parameters

Running the extraction

  • to extract admin_levels from the OSM dataset and import them to postgis database, run:
    • python extract.py
  • after the extract finishes the current directory will contain 6 new files, named admin_[0,1,2]_[new|missing].txt
    • files are used to facilitate manual change tracking, files suffixed new will contain all the new osm_id records which are not present in base admin_level_[0,1,2].txt files, similarly files suffixed missing will contain osm_id records which were present in base files but now missing from the new OSM dataset
    • if you want to track changes base files need to be manually updated, however, base files are only useful when working with OSM data that covers the same area

Hierarchical Topological simplification

  • the simplification workflow consists of:
    • geometry deconstruction - a process which will combine all admin_level geometries and create an all_geom table that contains non overlapping geometries which are later used to create higher level topo geometries
      • in a perfect dataset we would use admin_level_2 as a base topo geometry data and create admin_level_1 and admin_level_0
      • in the real world, we need to fill in holes in admin_level_2 by using higher level geometries to later create higher level topo geometries
    • topology creation - a process that creates base_level topo geometrie, it uses data from the all_geom
    • geometry simplification - creates simple_admin_[0,1,2] tables that contain topologically simplified geometries, uses a tolearance parameter - maximum distance between original and simplified line in decimal degrees (https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm#Algorithm)
  • the final results are simple_admin_[0,1,2]_view database views that have osm_id attribute, admin_level relationships, feature name, natural and simplified geometries
  • to run topological simplification process in the database, simply execute:
    • psql -f postgis_sql/simplify_admin_workflow.sql
    • bare in mind that the only two parameters which can be manually changed are:
      • fill_holes BOOLEAN DEFAULT 't' for deconstruct_geometry() function - setting it to false will assume that we have perfect data that has no holes and bad geometries, OSM is not a perfect dataset, so we must fill in missing data
      • tolerance float DEFAULT 0.01 of a degree for simplify_dissolve() function
        • if the Earth circumference is ~40000km then 1 degree ~ 111km, 0.1 of a degree ~ 11km, 0.01 of a degree ~ 1km

GeoJSON extraction

  • to extract natural and simplified geometries from the database to a ZIP file (created in the current working directory) with geoJSON files run:
    • python generate_geojson.py --all --rm
  • to extract one or more specific countries, you can specify them on the command line
    • python generate_geojson.py 88210 87565 --rm

How to update .poly cut files

  • get updated admin_levels.pbf file and copy it to ~/planet
  • extract .polyfiles: python extract_poly.py --buffer 0.05 --simplify 0.01 --settings planet.yaml --geojson

Final remarks

  • extraction and simplification process depends only on the initial OSM dataset

  • the database will be flushed/purged on every process run

  • the extraction and simplification process is CPU bound and it can take quite some time

  • on a modern i7 laptop (16Gb of RAM), using tolerance of 0.01 of a degree for prepared Africa dataset:

    • prepared dataset in this context is a stripped down OSM dataset which contains only admin_level features
    • export.py ~ 3min
    • hierarchical topology simplification:
      • geometry deconstruction ~ 80 sec
      • topology creation ~ 1h 50 min
      • geometry simplification ~ 30 sec
    • generate_geojson.py ~ 2min
  • on a modern i7 laptop (16Gb of RAM), using tolerance of 0.1 of a degree for prepared Africa dataset:

    • export.py ~ 3min
    • hierarchical topology simplification:
      • geometry deconstruction ~ 80 sec
      • topology creation ~ 1h 50 min
      • geometry simplification ~ 20 sec
    • generate_geojson.py ~ 2 min
  • on AWS instance provided by Nyaruka, using tolerance of 0.01 of a degree for prepared World dataset:

    • export.py ~ 50min
    • hierarchical topology simplification:
      • geometry deconstruction ~ 1h 30m
      • topology creation ~ 78h 40 min
      • geometry simplification ~ 6 minutes
    • generate_geojson.py ~ 5 min

Management commands

Overview of command parameters

  • --verbose - show verbose process execution messages
  • --settings SETTINGS - path to the settings file, default: settings.yaml
  • run_all - updateOSM data, extractAdminLevels, simplifyAdminLevels
    • tolerance TOLERANCE - Tolerance parameter for DouglasPeucker simplification algorithm (default: 0.001)
  • update_data - updateOSM data
  • extract_and_simplify - extractAdminLevels, simplifyAdminLevels
    • tolerance TOLERANCE - Tolerance parameter for DouglasPeucker simplification algorithm (default: 0.001)
  • download_OSM data_url - downloads OSM data, using specified OSM data http URI
  • create_DB - creates new PostGIS database and load functions
  • init_dir - initializes empty data directory and compiles OSM utility programs
  • cut_data planetOSM- cuts OSM data using specified planet osm file in O5M format

General manage.py concepts

manage.py is a utility that automatizes common tasks when working with POSM utilities. Everything is designed around settings.yaml file. This enables you to have several settings.yaml files, each for a specific area. For example, one might have a world.yaml file that processes planet.osm file and nigeria.yaml that will limit processing to a specific area.

The most important settings parameters are: data_directory, poly_file and postgis. data_directory contains every required and intermediate file generated by the process, and poly_file defines a specific area of interest and it's used to cut the area out off the planet.osm dataset and later cut OSM changes for the specific area. If the poly_file is not defined then everything is going to be applied to the whole world. postgis specifies a database which will be used to process the dataset It's best to have separate database for each settings file as processing regularly drops tables and data.

In the case a subcommand fails script will immediately terminate execution and output error information.

Some of the commands might take a long time to finish so it's best to use tmux or screen to execute management commands.

Usage examples

A fresh start

  1. create a new_settings.yaml file (copy from settings.yaml.tmpl) and manually update specific settings (do not specify poly_file parameter)
  2. initialize empty data_directory: python manage.py --settings=new_settings.yaml init_dir
  • this command will try to create data_directory and compile OSM management utilities (osmconvert, osmupdate and osmfilter)
  1. create and initialize the database: python manage.py --settings=new_settings.yaml create_DB
  2. download initial OSM dataset: python manage.py --settings=new_settings.yaml download_OSM http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/pbf/planet-latest.osm.pbf
  • internally we use wget -c which should continue and broken downloads
  • after download file will be converted to O5M format used by OSM utilities
  1. execute the process: python manage.py --settings=new_settings.yaml run_all --tolerance=0.001
  • run_all will download and apply latest updates to the original planet.osm, and create a new planet.osm
  • after the update admin_level data will be extracted and converted to the .pbf format
  • admin_levels are then extracted from the PBF file and inserted to the database
  • in the database, geometry is deconstructed, base topology created and using tolerance parameter of 0.001
  • finally, simplified geometries are exported to the exported_geojson.zip archive and problems.geojson are available in the current directory

Once you set everything up, you only need to execute step 5, if you want to update simplified admin_level dataset.

A single country, Nigeria

  1. create a nigeria_settings.yaml file (copy from settings.yaml.tmpl) and manually update specific settings, poly_file is required in this case, and in this case will set it to poly/NG_R192787.poly
  • there are sample .poly files in the extractor/poly directory generated by extract_poly.py utility
  1. initialize empty data_directory: python manage.py --settings=nigeria_settings.yaml init_dir
  • this command will try to create data_directory and compile OSM management utilities (osmconvert, osmupdate and osmfilter)
  1. create and initialize the database: python manage.py --settings=nigeria_settings.yaml create_DB
  2. cut Nigeria out of planet.osm dataset: python manage.py --settings=nigeria_settings.yaml cut_data /somewhere/planet-latest.o5m
  • specified planet.osm is going to be cut using poly_file, creating a new data set of the specified area
  1. execute the process: python manage.py --settings=nigeria_settings.yaml run_all --tolerance=0.001
  • run_all will download and apply latest updates to the cut dataset, and create a new cut dataset using specified poly_file
    • poly_file is used to limit changes of the whole world to the area specified by poly file
  • after the update admin_level data will be extracted and converted to the .pbf format
  • admin_levels are then extracted from the PBF file and inserted to the database
  • in the database, geometry is deconstructed, base topology created and using tolerance parameter of 0.001
  • finally, simplified geometries are exported to the exported_geojson.zip archive and problems.geojson are available in the current directory

Importing GADM datasets - Vietnam

  1. create vietnam.settings.yaml, specify gadm_source config with shp_package pointing to the downloaded GADM SHP package
  2. create and initialize the database: python manage.py --settings=vietnam.settings.yaml create_DB
  3. execute the process: python manage.py --settings=vietnam.settings.yaml extract_and_simplify_gadm --tolerance=0.01

Importing GEOJSON datasets

  1. Download admin boundary files from: https://wambachers-osm.website/boundaries/
  2. Create yaml file, use nepal/nepal.yaml as template, changing geojson_source to point to the appropriate files for each level
  3. create and initialize the database: python manage.py --settings=nepal.yaml create_DB
  4. execute the process: python manage.py --settings=nepal.yaml extract_and_simplify_geojson --tolerance=0.01

Straight from Overpass

  1. Find relation id for country from Open Street Map.
  2. Create yaml file, use nepal/nepal.yaml as template
  3. create and initialize the database: python manage.py --settings=nepal.yaml create_DB
  4. execute the process: python manage.py --settings=nepal.yaml extract_and_simplify_overpass [country_relation_id]

Scripts and utilities

  • manage.py
    • enables automated execution of common POSM tasks
  • extract.py
    • reads admin_levels.pbf file, parses geometry and features, and writes parsed features to the database
  • extract_all.py
    • reads admin_levels.pbf file, writes SHP files - admin_level_[1..10]
  • extract_poly.py
    • reads admin_level_0 table from the database, applies buffer and simplification and generated .poly file for every row in the table
  • generate_geojson.py
    • exports simplified data from the database in geojson format
  • prepare_topojson.py
    • exports simplified data from the database as geojson and converts it to the topojson (require node.js topojson globally available)