-
Notifications
You must be signed in to change notification settings - Fork 5
Home
- create a file
/etc/apt/sources.list.d/nginx.list
:
deb http://nginx.org/packages/ubuntu/ codename nginx
deb-src http://nginx.org/packages/ubuntu/ codename nginx
curl http://nginx.org/keys/nginx_signing.key | apt-key add -
apt-get update
apt-get install git tig gdal-bin libgdal-dev python-dev python-virtualenv build-essential libyaml-dev libspatialindex-dev postgresql-9.3-postgis-2.1 nginx uwsgi uwsgi-plugin-python zip
- in file
/etc/postgresql/9.3/main/postgresql.conf
update:-
shared_buffers = 512MB
,temp_buffers = 16MB
,work_mem = 32MB
,maintenance_work_mem = 128MB
,effective_cache_size = 1024MB
,checkpoint_segments = 16
,wal_buffers = 16MB
,checkpoint_segments = 32
,checkpoint_completion_target = 0.3
,random_page_cost = 1.1 #AWS specific
-
- as postgres user create
ubuntu
superusercreateuser -s ubuntu
- create planet.osm directory:
mkdir planet.osm && cd planet.osm
- download planet.osm dataset:
wget -c http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/pbf/planet-latest.osm.pbf
- download and compile osm data utilities:
wget -O - http://m.m.i24.cc/osmconvert.c | cc -x c - -lz -O3 -o osmconvert
wget -O - http://m.m.i24.cc/osmupdate.c | cc -x c - -o osmupdate
wget -O - http://m.m.i24.cc/osmfilter.c | cc -x c - -O3 -o osmfilter
- convert planet.osm to the o5m format (required for other utilites)
./osmconvert planet-latest.osm.pbf -o=planet-latest.osm.o5m
- update planet-latest.osm.o5m
./osmupdate -v planet-latest.osm.o5m new.planet-latest.osm.o5m
- clean up
rm planet-latest.osm.pbf && mv new.planet-latest.osm.o5m planet-latest.osm.o5m
- extract admin_levels
./osmfilter planet-latest.osm.o5m --keep=admin_level -o=admin_levels.o5m
- convert to the PBF format
./osmconvert admin_levels.o5m -o=admin_levels.pbf
- in
posm/extractor
directory, copy and update configuration file:cp auto_update_osm.conf.tmpl auto_update_osm.conf
- configuration is straightforward, just define the directory that has osm data utilities and the planet.osm file
- execute auto_update script:
bash auto_update_osm.sh
- clone git repository
- initialize python virtual environment
virtualenv ~/posm_env
- manually install packages (do not use pip install -r pip-requires.txt)
pip install Shapely==1.3.0 Rtree==0.7.0 PyYAML==3.11
- ubuntu specific fix to install GDAL in the virtual_env
pip install --no-install GDAL==1.10.0 && cd ~/posm_env/build/GDAL/ && python setup.py build_ext --include-dirs=/usr/include/gdal && pip install --no-download GDAL && cd -
-
create database and install extensions:
createdb posm
psql -c 'create extension postgis;' posm
psql -c 'create extension postgis_topology;' posm
-
create plpgsql functions:
psql -f extractor/postgis_sql/proc_functions.sql posm
-
in
posm/extractor
directory copy template YAML configuration files:cp admin_mapping.yaml.tmpl admin_mapping.yaml && cp settings.yaml.tmpl settings.yaml && cp admin_level_0.txt.tmpl admin_level_0.txt && cp admin_level_1.txt.tmpl admin_level_1.txt && cp admin_level_2.txt.tmpl admin_level_2.txt
-
in settings.yaml set:
-
osm_data_file
to an admin_level OSM data source file (*.pbf) - final result of theauto_update_osm.sh
script -
tempfile_dir
to'/mnt'
- to use AWS SSD drive to store temporary files we need to allow write permission to ubuntu user
sudo chown ubuntu /mnt/
- to use AWS SSD drive to store temporary files we need to allow write permission to ubuntu user
-
memory_limit
to'1'
for AWS- due to OSM data size (~500mb), script requires ~3Gb of memory to store temporary OSM data, additionally it will require 1.5Gb of memory for the actual Python processing
- as the server has 3.7Gb, we can limit memory usage to 1Mb which will force the script to store temporary OSM data on the disk
- if you have a lot of memory as a rule of thumb, you can set it to 3 times the size of the OSM .pbf dataset
- also check the
debug_file
file for any messages like 'Not enough memory for temporary storage, ...' and increase memory limit if needed
-
postgis
to your postgis database identifier, i.e."PG:dbname=posm"
, omitting other parameters
-
- to extract admin_levels from the OSM dataset and import them to postgis database, run:
python extract.py
- after the extract finishes the current directory will contain 6 new files, named
admin_[0,1,2]_[new|missing].txt
- files are used to facilitate manual change tracking, files suffixed
new
will contain all the new osm_id records which are not present in baseadmin_level_[0,1,2].txt
files, similarly files suffixedmissing
will contain osm_id records which were present in base files but now missing from the new OSM dataset - if you want to track changes base files need to be manually updated, however, base files are only useful when working with OSM data that covers the same area
- files are used to facilitate manual change tracking, files suffixed
- the simplification workflow consists of:
-
geometry deconstruction - a process which will combine all admin_level geometries and create an all_geom table that contains non overlapping geometries which are later used to create higher level topo geometries
- in a perfect dataset we would use admin_level_2 as a base topo geometry data and create admin_level_1 and admin_level_0
- in the real world, we need to fill in holes in admin_level_2 by using higher level geometries to later create higher level topo geometries
- topology creation - a process that creates base_level topo geometrie, it uses data from the all_geom
- geometry simplification - creates simple_admin_[0,1,2] tables that contain topologically simplified geometries, uses a tolearance parameter - maximum distance between original and simplified line in decimal degrees (https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm#Algorithm)
-
geometry deconstruction - a process which will combine all admin_level geometries and create an all_geom table that contains non overlapping geometries which are later used to create higher level topo geometries
- the final results are simple_admin_[0,1,2]_view database views that have osm_id attribute, admin_level relationships, feature name, natural and simplified geometries
- to run topological simplification process in the database, simply execute:
psql -f postgis_sql/simplify_admin_workflow.sql
- bare in mind that the only two parameters which can be manually changed are:
-
fill_holes BOOLEAN DEFAULT 't'
fordeconstruct_geometry()
function - setting it to false will assume that we have perfect data that has no holes and bad geometries, OSM is not a perfect dataset, so we must fill in missing data -
tolerance float DEFAULT 0.01 of a degree
forsimplify_dissolve()
function- if the Earth circumference is ~40000km then 1 degree ~ 111km, 0.1 of a degree ~ 11km, 0.01 of a degree ~ 1km
-
- to extract natural and simplified geometries from the database to a ZIP file (created in the current working directory) with geoJSON files run:
python generate_geojson.py --all --rm
- to extract one or more specific countries, you can specify them on the command line
python generate_geojson.py 88210 87565 --rm
- get updated admin_levels.pbf file and copy it to
~/planet
- extract .polyfiles:
python extract_poly.py --buffer 0.05 --simplify 0.01 --settings planet.yaml --geojson
-
extraction and simplification process depends only on the initial OSM dataset
-
the database will be flushed/purged on every process run
-
the extraction and simplification process is CPU bound and it can take quite some time
-
on a modern i7 laptop (16Gb of RAM), using tolerance of 0.01 of a degree for prepared Africa dataset:
- prepared dataset in this context is a stripped down OSM dataset which contains only admin_level features
- export.py ~ 3min
- hierarchical topology simplification:
- geometry deconstruction ~ 80 sec
- topology creation ~ 1h 50 min
- geometry simplification ~ 30 sec
- generate_geojson.py ~ 2min
-
on a modern i7 laptop (16Gb of RAM), using tolerance of 0.1 of a degree for prepared Africa dataset:
- export.py ~ 3min
- hierarchical topology simplification:
- geometry deconstruction ~ 80 sec
- topology creation ~ 1h 50 min
- geometry simplification ~ 20 sec
- generate_geojson.py ~ 2 min
-
on AWS instance provided by Nyaruka, using tolerance of 0.01 of a degree for prepared World dataset:
- export.py ~ 50min
- hierarchical topology simplification:
- geometry deconstruction ~ 1h 30m
- topology creation ~ 78h 40 min
- geometry simplification ~ 6 minutes
- generate_geojson.py ~ 5 min
-
--verbose
- show verbose process execution messages -
--settings SETTINGS
- path to the settings file, default: settings.yaml -
run_all
- updateOSM data, extractAdminLevels, simplifyAdminLevels-
tolerance TOLERANCE
- Tolerance parameter for DouglasPeucker simplification algorithm (default: 0.001)
-
-
update_data
- updateOSM data -
extract_and_simplify
- extractAdminLevels, simplifyAdminLevels-
tolerance TOLERANCE
- Tolerance parameter for DouglasPeucker simplification algorithm (default: 0.001)
-
-
download_OSM data_url
- downloads OSM data, using specified OSM data http URI -
create_DB
- creates new PostGIS database and load functions -
init_dir
- initializes empty data directory and compiles OSM utility programs -
cut_data planetOSM
- cuts OSM data using specified planet osm file in O5M format
manage.py
is a utility that automatizes common tasks when working with POSM utilities. Everything is designed around settings.yaml
file. This enables you to have several settings.yaml files, each for a specific area. For example, one might have a world.yaml file that processes planet.osm file and nigeria.yaml that will limit processing to a specific area.
The most important settings parameters are: data_directory
, poly_file
and postgis
. data_directory
contains every required and intermediate file generated by the process, and poly_file
defines a specific area of interest and it's used to cut the area out off the planet.osm dataset and later cut OSM changes for the specific area. If the poly_file is not defined then everything is going to be applied to the whole world. postgis
specifies a database which will be used to process the dataset It's best to have separate database for each settings file as processing regularly drops tables and data.
In the case a subcommand fails script will immediately terminate execution and output error information.
Some of the commands might take a long time to finish so it's best to use tmux
or screen
to execute management commands.
- create a new_settings.yaml file (copy from settings.yaml.tmpl) and manually update specific settings (do not specify poly_file parameter)
- initialize empty data_directory:
python manage.py --settings=new_settings.yaml init_dir
- this command will try to create data_directory and compile OSM management utilities (osmconvert, osmupdate and osmfilter)
- create and initialize the database:
python manage.py --settings=new_settings.yaml create_DB
- download initial OSM dataset:
python manage.py --settings=new_settings.yaml download_OSM http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/pbf/planet-latest.osm.pbf
- internally we use wget -c which should continue and broken downloads
- after download file will be converted to O5M format used by OSM utilities
- execute the process:
python manage.py --settings=new_settings.yaml run_all --tolerance=0.001
- run_all will download and apply latest updates to the original planet.osm, and create a new planet.osm
- after the update admin_level data will be extracted and converted to the .pbf format
- admin_levels are then extracted from the PBF file and inserted to the database
- in the database, geometry is deconstructed, base topology created and using tolerance parameter of 0.001
- finally, simplified geometries are exported to the exported_geojson.zip archive and problems.geojson are available in the current directory
Once you set everything up, you only need to execute step 5, if you want to update simplified admin_level dataset.
- create a nigeria_settings.yaml file (copy from settings.yaml.tmpl) and manually update specific settings, poly_file is required in this case, and in this case will set it to
poly/NG_R192787.poly
- there are sample .poly files in the extractor/poly directory generated by
extract_poly.py
utility
- initialize empty data_directory:
python manage.py --settings=nigeria_settings.yaml init_dir
- this command will try to create data_directory and compile OSM management utilities (osmconvert, osmupdate and osmfilter)
- create and initialize the database:
python manage.py --settings=nigeria_settings.yaml create_DB
- cut Nigeria out of planet.osm dataset:
python manage.py --settings=nigeria_settings.yaml cut_data /somewhere/planet-latest.o5m
- specified planet.osm is going to be cut using poly_file, creating a new data set of the specified area
- execute the process:
python manage.py --settings=nigeria_settings.yaml run_all --tolerance=0.001
-
run_all will download and apply latest updates to the cut dataset, and create a new cut dataset using specified poly_file
- poly_file is used to limit changes of the whole world to the area specified by poly file
- after the update admin_level data will be extracted and converted to the .pbf format
- admin_levels are then extracted from the PBF file and inserted to the database
- in the database, geometry is deconstructed, base topology created and using tolerance parameter of 0.001
- finally, simplified geometries are exported to the exported_geojson.zip archive and problems.geojson are available in the current directory
- create vietnam.settings.yaml, specify
gadm_source
config withshp_package
pointing to the downloaded GADM SHP package - create and initialize the database:
python manage.py --settings=vietnam.settings.yaml create_DB
- execute the process:
python manage.py --settings=vietnam.settings.yaml extract_and_simplify_gadm --tolerance=0.01
- Download admin boundary files from: https://wambachers-osm.website/boundaries/
- Create yaml file, use
nepal/nepal.yaml
as template, changinggeojson_source
to point to the appropriate files for each level - create and initialize the database:
python manage.py --settings=nepal.yaml create_DB
- execute the process:
python manage.py --settings=nepal.yaml extract_and_simplify_geojson --tolerance=0.01
- Find relation id for country from Open Street Map.
- Create yaml file, use
nepal/nepal.yaml
as template - create and initialize the database:
python manage.py --settings=nepal.yaml create_DB
- execute the process:
python manage.py --settings=nepal.yaml extract_and_simplify_overpass [country_relation_id]
-
manage.py
- enables automated execution of common POSM tasks
-
extract.py
- reads admin_levels.pbf file, parses geometry and features, and writes parsed features to the database
-
extract_all.py
- reads admin_levels.pbf file, writes SHP files - admin_level_[1..10]
-
extract_poly.py
- reads admin_level_0 table from the database, applies buffer and simplification and generated .poly file for every row in the table
-
generate_geojson.py
- exports simplified data from the database in geojson format
-
prepare_topojson.py
- exports simplified data from the database as geojson and converts it to the topojson (require node.js topojson globally available)