Skip to content

Export .osm from a PostgreSQL openstreetmap database

License

Notifications You must be signed in to change notification settings

feludwig/pgsql2osm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pgsql2osm

Export .osm from a PostgreSQL openstreetmap database

Motivation

You have access to a PostgreSQL+PostGIS database which contains openstreetmap data imported by osm2pgsql for rendering maps. How can you generate an extract to use with other cartography tools ?

pgsql2osm is a python script that can generate the XML in .osm format for all entities within given boundaries. It attempts to preserve all data attributes from original osm data: nodes, ways and relations and all their osm_ids, all tags, and interreferences (references to the outside of the extract you choose will be broken, unless you use --bbox='-180,-89.99,180,89.99' to select all available data).

The main usecase I have for this is to generate an .obf file using osm2obf for the Android OsmAnd app offline maps.

Requirements

Database

  • was imported with osm2pgsql

    • in --slim mode (this allows incremental updates and is also leveraged by pgsql2osm.py)
    • Recommended : with --hstore containing all remaining tags not saved as columns in the tags::hstore. If not, the produced .osm will have incomplete tags (but will be geometrically sound)
    • with --flat-nodes, and access to the --flat-nodes <FILE> cache binary file (for now, must be readwrite)
  • The geometry tables are the first with names that end in _point, _line, _polygon listed in the following query:

SELECT f_table_name,f_geometry_column,srid,pg_size_pretty(pg_table_size(f_table_name::text))
FROM geometry_columns WHERE srid!=0 ORDER BY pg_table_size(f_table_name::text) DESC LIMIT 10;
  • Same for _ways and _rels tables:
SELECT relname,relkind,pg_size_pretty(pg_table_size(relname::text)),
    (SELECT nspname FROM pg_namespace WHERE oid=relnamespace) AS namespace
FROM pg_class WHERE relkind IN ('r','m','v') AND relnamespace NOT IN (SELECT oid FROM pg_namespace
        WHERE nspname IN ('pg_toast','pg_catalog','information_schema'))
    AND (relname~'_rels' OR relname~'_ways')
ORDER BY pg_table_size(relname::text) DESC;

Features

  • XML streaming output, allows to only store compressed .osm.bz2 on disk and not waste RAM. Example:
pgsql2osm --dsn 'dbname=gis' --iso fr --output -|bzip2 > France.osm.bz2
  • Attempt to lower RAM footprint with generators for database queries: streaming all the way from database to XML

  • Automatic detection of table names, referred to here as planet_osm_* , but they can also be named anything with the correct suffixes added by osm2pgsql: _point, _line, _polygon, _ways and _rels

  • Automatic detection of the middle database format (legacy text[] or new jsonb, available in databases created by osm2pgsql>=1.9). WARNING: very old versions <1.3 may use a different middle database format, it was undocumented at that time. In that case it is recommended to re-import with a new osm2pgsql version.

  • Automatic detection of planet_osm_point, planet_osm_line and planet_osm_polygon columns: a specific .style at import is not required

    • Warning : The column names you choose will be the keys in the .osm output regardless of the original tag's key. When not using default tag transforms, a .osm file will be produced that may not have the tags like they are expected by other software.
    • Note : key-values in tags override column key-values
  • Bounds specifiable as either :

    • iso country or region code --iso de-by for Germany/Bayern
      • grep -i MY regions.csv can show you recognized iso codes for Malaysia and subregions
    • osm relation id --osm-rel-id N, for any relation in the database.
    • geojson polygon input file --geojson file.geojson
    • bounding box --bbox='<lon_from>,<lat_from>,<lon_to>,<lat_to>'
  • Bounds intersection can also be specified as one non-bbox of the above and a --bbox. The extracted region will then be the intersection (logical AND) of the shape with the bbox.

  • Anti-Feature: unsorted ids, see Unsorted ids

Benchmarks

Extract File size time elements count RAM
Austria-left15_0.osm.bz2
--bbox='-180,-89,15.01,89'
681MB real 2h18
user 37min
sys 42min
46.7M n
4.54M w
75K r
10.6GB
Austria-right15_0.osm.bz2
--bbox='14.99,-89,180,89'
502MB real 1h23
user 23min
sys 17min
33.1M n
4.01M w
67K r
-
Baden-württemberg.osm.bz2
--bbox='-180,-89,180,48.31'
274MB real 49min
user 16min
sys 5min
17.2M n
2.53M w
32.6K r
-
Baden-württemberg.osm.bz2
--bbox='-180,48.29,180,89'
560MB real 2h08
user 34min
sys 21min
34.0M n
5.84M w
63K r
6.1GB
Austria-left14.25.osm.bz2
--bbox='-180,-89,14.26,89'
509MB real 1h21
user 21min
sys 22min
34.6M n
3.31M w
53K r
5.7GB
Austria-right14.25.osm.bz2
--bbox='14.24,-89,180,89'
676MB real 2h11
user 30min
sys 46min
45.4M n
5.27M w
89K r
10.7GB
Switzerland.osm.bz2 692MB real 2h35
user 39min
sys 45min
47.1M n
5.3M w
94.5K r
-
Germany-bavaria.osm.bz2 1.1GB real 5h17
user 1h06
sys 1h35
70.1M n
10.5M w
95.8K r
12.1GB

Note : The File size is for the compressed extract, bzip2 default settings used unless otherwise noted

RAM usage monitoring

I just run the following to get the max RAM used by that process

maxmemkb=0;while sleep 1;do
  memkb="$(ps -Ao rss,cmd|grep python3|grep pgsql2osm|cut -d' ' -f1)";
  if [ "${maxmemkb}" -lt "${memkb}" ];then maxmemkb="${memkb}";fi;
  printf '\033[2K\r%s max=%s' "${memkb}" "${maxmemkb}";done

Disclaimers

Outside of bounds geometries

The bounding region will include every entity and all its dependent entities. This also means some entities outside the boundary will be inlcuded, as long as they have any reference to entities within the boundary. No geometry clipping is taking place, entities are exported as-is and can extend hundreds of kilometers beyond the boundary if that is a continuous entity (eg a long river).

Unsorted ids

The output ids are streamed in the order they come from the database, and for speed reasons this is unsorted. If you need a sorted .osm, use osmium after exporting:

osmium sort in.osm.bz2 --output=out.osm.bz2

Database does not store all osm data strictly

The output data may have some missing tags, because the database import is not a lossless operation: importing a .osm.pbf to database, then exporting it to .osm and then converting it back to .osm.pbf will most probably (on real world data) produce a different file.

Installation

pgsql2osm

pip install pgsql2osm

get_lonlat utility

  • Need to compile osm2pgsql (instructions)
  • then copy get_lonlat.cpp to osm2pgsql/src/get_lonlat.cpp
  • and in the osm2pgsql/build/ directory:
make
g++ -std=c++17 ../src/get_lonlat.cpp ../src/node-persistent-cache.cpp \
-I ../contrib/fmt/include/ ../build/src/libosm2pgsql.a -Wall -g -o get_lonlat
  • Test:
./get_lonlat /path/to/planet.bin.nodes <<< 3546766428
  • Should output 39.188565;49.885284;3546766428 if that node was covered by database import

Run

pgsql2osm /path/to/get_lonlat /path/to/planet.bin.nodes --help

For an overview of options

Python module

Made easier with the ModuleSettings class, example:

import pgsql2osm
m=pgsql2osm.settings.ModuleSettings(
  bounds_rel_id=osm_rel_id,
  get_lonlat_binary='/path/to/get_lonlat',
  nodes_file='/path/to/planet.bin.nodes',
  # no need for the --dsn if access already from somewhere else
  access=already_previously_obtained_dbaccess)
with bz2.open('test.osm.bz2','wb') as f:
  m.out_file=f #set the output file object
  m.main()

Note: will still print progress reports to stderr

Implementation details

Database details

  • osm2pgsql's --slim mode means: has the "middle" layout tables containing all the raw osm references between ways->nodes and rels->nodes,ways,rels, more specifically:

  • The "middle" database tables have names ending in _ways and _rels (osm2pgsql defaults), and are the biggest with that suffix in the database, by used space

  • The *_ways table has columns

    • id::bigint for the osm ID,
    • nodes::bigint[] for a list of the way's child nodes,
    • and tags::text[] for that way's tags in hstore format,
    • with tags::jsonb in the case of the new jsonb format.
  • The *_rels table has columns

    • id::bigint for the osm ID,
    • in the case of legacy text[] format:
      • parts::bigint[] any ids of (mixed) nodes, ways or relations that are children of that relation,
      • tags::text[] for the relation's tags in hstore format,
      • members::text[] that more precisely describes the children, example {'n123','admin_centre','w345','outer','w567','inner'} having the node 123 as a role admin_centre and the ways 345 and 567, as roles outer and inner boundaries respectively
    • in the case of new jsonb format:
      • members::jsonb a list of json objects describing the members, the example continued [{"ref":"123","role":"admin_centre","type":"N"},{"ref":"345","role":"outer","type":"W"}, {"ref":"567","role":"inner","type":"W"}],
      • and tags::jsonb for the relation's tags in {"key1":"value1","key2":"value2"} format,
  • Access needed: SELECT granted on the above mentioned tables and also (for the autodetections of table names and column names) pg_attribute, pg_tables, pg_type, pg_class, and geometry_columns.

Regions.csv

To generate regions.csv, run in a planet database:

psql -d gis -p 5432 --csv -c "select -osm_id as osm_id,coalesce(tags->'name:en',name) as name,
    tags->'ISO3166-1' as iso_country,tags->'border_type' as border_type,
    tags->'ISO3166-2' as iso_subcountry,admin_level,tags->'wikipedia' as wikipedia
  from planet_osm_polygon where osm_id<0 and admin_level in ('1','2','3','4')
    and boundary='administrative'" > regions.csv

Note : only admin_level<=4 is loaded into regions.csv. Use a search on openstreetmap.org for smaller regions. And then extract the relation id from the url: https://www.openstreetmap.org/relation/51701 51701, eg Switzerland