Home

Ubuntu 14.04 setup

Add additional software repositories

create a file /etc/apt/sources.list.d/nginx.list:

deb http://nginx.org/packages/ubuntu/ codename nginx
deb-src http://nginx.org/packages/ubuntu/ codename nginx

curl http://nginx.org/keys/nginx_signing.key | apt-key add -
apt-get update

Installing base packages

apt-get install git tig gdal-bin libgdal-dev python-dev python-virtualenv build-essential libyaml-dev libspatialindex-dev postgresql-9.3-postgis-2.1 nginx uwsgi uwsgi-plugin-python zip

Configure PostgreSQL

in file /etc/postgresql/9.3/main/postgresql.conf update:
- shared_buffers = 512MB, temp_buffers = 16MB, work_mem = 32MB, maintenance_work_mem = 128MB, effective_cache_size = 1024MB, checkpoint_segments = 16, wal_buffers = 16MB, checkpoint_segments = 32, checkpoint_completion_target = 0.3, random_page_cost = 1.1 #AWS specific
as postgres user create ubuntu superuser
- createuser -s ubuntu

Auto-update planet.osm dataset

Prerequisites

create planet.osm directory: mkdir planet.osm && cd planet.osm
download planet.osm dataset: wget -c http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/pbf/planet-latest.osm.pbf
download and compile osm data utilities:
- wget -O - http://m.m.i24.cc/osmconvert.c | cc -x c - -lz -O3 -o osmconvert
- wget -O - http://m.m.i24.cc/osmupdate.c | cc -x c - -o osmupdate
- wget -O - http://m.m.i24.cc/osmfilter.c | cc -x c - -O3 -o osmfilter
convert planet.osm to the o5m format (required for other utilites)
- ./osmconvert planet-latest.osm.pbf -o=planet-latest.osm.o5m
update planet-latest.osm.o5m
- ./osmupdate -v planet-latest.osm.o5m new.planet-latest.osm.o5m
clean up
- rm planet-latest.osm.pbf && mv new.planet-latest.osm.o5m planet-latest.osm.o5m
extract admin_levels
- ./osmfilter planet-latest.osm.o5m --keep=admin_level -o=admin_levels.o5m
convert to the PBF format
- ./osmconvert admin_levels.o5m -o=admin_levels.pbf

Running auto-update script

in posm/extractor directory, copy and update configuration file: cp auto_update_osm.conf.tmpl auto_update_osm.conf
- configuration is straightforward, just define the directory that has osm data utilities and the planet.osm file
execute auto_update script: bash auto_update_osm.sh

Service setup

clone git repository
initialize python virtual environment virtualenv ~/posm_env
manually install packages (do not use pip install -r pip-requires.txt)
- pip install Shapely==1.3.0 Rtree==0.7.0 PyYAML==3.11
- ubuntu specific fix to install GDAL in the virtual_env
  - pip install --no-install GDAL==1.10.0 && cd ~/posm_env/build/GDAL/ && python setup.py build_ext --include-dirs=/usr/include/gdal && pip install --no-download GDAL && cd -

Extract configuration

create database and install extensions:
- createdb posm
- psql -c 'create extension postgis;' posm
- psql -c 'create extension postgis_topology;' posm
create plpgsql functions:
- psql -f extractor/postgis_sql/proc_functions.sql posm
in posm/extractor directory copy template YAML configuration files: cp admin_mapping.yaml.tmpl admin_mapping.yaml && cp settings.yaml.tmpl settings.yaml && cp admin_level_0.txt.tmpl admin_level_0.txt && cp admin_level_1.txt.tmpl admin_level_1.txt && cp admin_level_2.txt.tmpl admin_level_2.txt
in settings.yaml set:
- osm_data_file to an admin_level OSM data source file (*.pbf) - final result of the auto_update_osm.sh script
- tempfile_dir to '/mnt'
  - to use AWS SSD drive to store temporary files we need to allow write permission to ubuntu user sudo chown ubuntu /mnt/
- memory_limit to '1' for AWS
  - due to OSM data size (~500mb), script requires ~3Gb of memory to store temporary OSM data, additionally it will require 1.5Gb of memory for the actual Python processing
  - as the server has 3.7Gb, we can limit memory usage to 1Mb which will force the script to store temporary OSM data on the disk
  - if you have a lot of memory as a rule of thumb, you can set it to 3 times the size of the OSM .pbf dataset
  - also check the debug_file file for any messages like 'Not enough memory for temporary storage, ...' and increase memory limit if needed
- postgis to your postgis database identifier, i.e. "PG:dbname=posm", omitting other parameters

Running the extraction

to extract admin_levels from the OSM dataset and import them to postgis database, run:
- python extract.py
after the extract finishes the current directory will contain 6 new files, named admin_[0,1,2]_[new|missing].txt
- files are used to facilitate manual change tracking, files suffixed new will contain all the new osm_id records which are not present in base admin_level_[0,1,2].txt files, similarly files suffixed missing will contain osm_id records which were present in base files but now missing from the new OSM dataset
- if you want to track changes base files need to be manually updated, however, base files are only useful when working with OSM data that covers the same area

Hierarchical Topological simplification

the simplification workflow consists of:
- geometry deconstruction - a process which will combine all admin_level geometries and create an all_geom table that contains non overlapping geometries which are later used to create higher level topo geometries
  - in a perfect dataset we would use admin_level_2 as a base topo geometry data and create admin_level_1 and admin_level_0
  - in the real world, we need to fill in holes in admin_level_2 by using higher level geometries to later create higher level topo geometries
- topology creation - a process that creates base_level topo geometrie, it uses data from the all_geom
- geometry simplification - creates simple_admin_[0,1,2] tables that contain topologically simplified geometries, uses a tolearance parameter - maximum distance between original and simplified line in decimal degrees (https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm#Algorithm)
the final results are simple_admin_[0,1,2]_view database views that have osm_id attribute, admin_level relationships, feature name, natural and simplified geometries
to run topological simplification process in the database, simply execute:
- psql -f postgis_sql/simplify_admin_workflow.sql
- bare in mind that the only two parameters which can be manually changed are:
  - fill_holes BOOLEAN DEFAULT 't' for deconstruct_geometry() function - setting it to false will assume that we have perfect data that has no holes and bad geometries, OSM is not a perfect dataset, so we must fill in missing data
  - tolerance float DEFAULT 0.01 of a degree for simplify_dissolve() function
    - if the Earth circumference is ~40000km then 1 degree ~ 111km, 0.1 of a degree ~ 11km, 0.01 of a degree ~ 1km

GeoJSON extraction

to extract natural and simplified geometries from the database to a ZIP file (created in the current working directory) with geoJSON files run:
- python generate_geojson.py --all --rm
to extract one or more specific countries, you can specify them on the command line
- python generate_geojson.py 88210 87565 --rm

How to update .poly cut files

get updated admin_levels.pbf file and copy it to ~/planet
extract .polyfiles: python extract_poly.py --buffer 0.05 --simplify 0.01 --settings planet.yaml --geojson

Final remarks

extraction and simplification process depends only on the initial OSM dataset
the database will be flushed/purged on every process run
the extraction and simplification process is CPU bound and it can take quite some time
on a modern i7 laptop (16Gb of RAM), using tolerance of 0.01 of a degree for prepared Africa dataset:
- prepared dataset in this context is a stripped down OSM dataset which contains only admin_level features
- export.py ~ 3min
- hierarchical topology simplification:
  - geometry deconstruction ~ 80 sec
  - topology creation ~ 1h 50 min
  - geometry simplification ~ 30 sec
- generate_geojson.py ~ 2min
on a modern i7 laptop (16Gb of RAM), using tolerance of 0.1 of a degree for prepared Africa dataset:
- export.py ~ 3min
- hierarchical topology simplification:
  - geometry deconstruction ~ 80 sec
  - topology creation ~ 1h 50 min
  - geometry simplification ~ 20 sec
- generate_geojson.py ~ 2 min
on AWS instance provided by Nyaruka, using tolerance of 0.01 of a degree for prepared World dataset:
- export.py ~ 50min
- hierarchical topology simplification:
  - geometry deconstruction ~ 1h 30m
  - topology creation ~ 78h 40 min
  - geometry simplification ~ 6 minutes
- generate_geojson.py ~ 5 min

Management commands

Overview of command parameters

--verbose - show verbose process execution messages
--settings SETTINGS - path to the settings file, default: settings.yaml
run_all - updateOSM data, extractAdminLevels, simplifyAdminLevels
- tolerance TOLERANCE - Tolerance parameter for DouglasPeucker simplification algorithm (default: 0.001)
update_data - updateOSM data
extract_and_simplify - extractAdminLevels, simplifyAdminLevels
- tolerance TOLERANCE - Tolerance parameter for DouglasPeucker simplification algorithm (default: 0.001)
download_OSM data_url - downloads OSM data, using specified OSM data http URI
create_DB - creates new PostGIS database and load functions
init_dir - initializes empty data directory and compiles OSM utility programs
cut_data planetOSM- cuts OSM data using specified planet osm file in O5M format

General manage.py concepts

manage.py is a utility that automatizes common tasks when working with POSM utilities. Everything is designed around settings.yaml file. This enables you to have several settings.yaml files, each for a specific area. For example, one might have a world.yaml file that processes planet.osm file and nigeria.yaml that will limit processing to a specific area.

The most important settings parameters are: data_directory, poly_file and postgis. data_directory contains every required and intermediate file generated by the process, and poly_file defines a specific area of interest and it's used to cut the area out off the planet.osm dataset and later cut OSM changes for the specific area. If the poly_file is not defined then everything is going to be applied to the whole world. postgis specifies a database which will be used to process the dataset It's best to have separate database for each settings file as processing regularly drops tables and data.

In the case a subcommand fails script will immediately terminate execution and output error information.

Some of the commands might take a long time to finish so it's best to use tmux or screen to execute management commands.

Usage examples

A fresh start

create a new_settings.yaml file (copy from settings.yaml.tmpl) and manually update specific settings (do not specify poly_file parameter)
initialize empty data_directory: python manage.py --settings=new_settings.yaml init_dir

this command will try to create data_directory and compile OSM management utilities (osmconvert, osmupdate and osmfilter)

create and initialize the database: python manage.py --settings=new_settings.yaml create_DB
download initial OSM dataset: python manage.py --settings=new_settings.yaml download_OSM http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/pbf/planet-latest.osm.pbf

internally we use wget -c which should continue and broken downloads
after download file will be converted to O5M format used by OSM utilities

execute the process: python manage.py --settings=new_settings.yaml run_all --tolerance=0.001

run_all will download and apply latest updates to the original planet.osm, and create a new planet.osm
after the update admin_level data will be extracted and converted to the .pbf format
admin_levels are then extracted from the PBF file and inserted to the database
in the database, geometry is deconstructed, base topology created and using tolerance parameter of 0.001
finally, simplified geometries are exported to the exported_geojson.zip archive and problems.geojson are available in the current directory

Once you set everything up, you only need to execute step 5, if you want to update simplified admin_level dataset.

A single country, Nigeria

create a nigeria_settings.yaml file (copy from settings.yaml.tmpl) and manually update specific settings, poly_file is required in this case, and in this case will set it to poly/NG_R192787.poly

there are sample .poly files in the extractor/poly directory generated by extract_poly.py utility

initialize empty data_directory: python manage.py --settings=nigeria_settings.yaml init_dir

this command will try to create data_directory and compile OSM management utilities (osmconvert, osmupdate and osmfilter)

create and initialize the database: python manage.py --settings=nigeria_settings.yaml create_DB
cut Nigeria out of planet.osm dataset: python manage.py --settings=nigeria_settings.yaml cut_data /somewhere/planet-latest.o5m

specified planet.osm is going to be cut using poly_file, creating a new data set of the specified area

execute the process: python manage.py --settings=nigeria_settings.yaml run_all --tolerance=0.001

run_all will download and apply latest updates to the cut dataset, and create a new cut dataset using specified poly_file
- poly_file is used to limit changes of the whole world to the area specified by poly file
after the update admin_level data will be extracted and converted to the .pbf format
admin_levels are then extracted from the PBF file and inserted to the database
in the database, geometry is deconstructed, base topology created and using tolerance parameter of 0.001
finally, simplified geometries are exported to the exported_geojson.zip archive and problems.geojson are available in the current directory

Importing GADM datasets - Vietnam

create vietnam.settings.yaml, specify gadm_source config with shp_package pointing to the downloaded GADM SHP package
create and initialize the database: python manage.py --settings=vietnam.settings.yaml create_DB
execute the process: python manage.py --settings=vietnam.settings.yaml extract_and_simplify_gadm --tolerance=0.01

Importing GEOJSON datasets

Download admin boundary files from: https://wambachers-osm.website/boundaries/
Create yaml file, use nepal/nepal.yaml as template, changing geojson_source to point to the appropriate files for each level
create and initialize the database: python manage.py --settings=nepal.yaml create_DB
execute the process: python manage.py --settings=nepal.yaml extract_and_simplify_geojson --tolerance=0.01

Straight from Overpass

Find relation id for country from Open Street Map.
Create yaml file, use nepal/nepal.yaml as template
create and initialize the database: python manage.py --settings=nepal.yaml create_DB
execute the process: python manage.py --settings=nepal.yaml extract_and_simplify_overpass [country_relation_id]

Scripts and utilities

manage.py
- enables automated execution of common POSM tasks
extract.py
- reads admin_levels.pbf file, parses geometry and features, and writes parsed features to the database
extract_all.py
- reads admin_levels.pbf file, writes SHP files - admin_level_[1..10]
extract_poly.py
- reads admin_level_0 table from the database, applies buffer and simplification and generated .poly file for every row in the table
generate_geojson.py
- exports simplified data from the database in geojson format
prepare_topojson.py
- exports simplified data from the database as geojson and converts it to the topojson (require node.js topojson globally available)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly