Export .osm from a PostgreSQL openstreetmap database
You have access to a PostgreSQL+PostGIS database which contains openstreetmap
data imported by
osm2pgsql
for rendering maps. How can you generate an extract
to use with other cartography tools ?
pgsql2osm
is a python script that can generate the XML in .osm
format
for all entities within given boundaries.
It attempts to preserve all data attributes from original osm data:
nodes, ways and relations and all their osm_id
s, all tags, and
interreferences (references to the outside of the extract you choose will be broken,
unless you use --bbox='-180,-89.99,180,89.99'
to select all available data).
The main usecase I have for this is to generate an .obf
file using
osm2obf
for the Android
OsmAnd
app offline maps.
-
was imported with
osm2pgsql
- in
--slim
mode (this allows incremental updates and is also leveraged bypgsql2osm.py
) - Recommended : with
--hstore
containing all remaining tags not saved as columns in thetags::hstore
. If not, the produced.osm
will have incomplete tags (but will be geometrically sound) - with
--flat-nodes
, and access to the--flat-nodes <FILE>
cache binary file (for now, must be readwrite)
- in
-
The geometry tables are the first with names that end in
_point
,_line
,_polygon
listed in the following query:
SELECT f_table_name,f_geometry_column,srid,pg_size_pretty(pg_table_size(f_table_name::text))
FROM geometry_columns WHERE srid!=0 ORDER BY pg_table_size(f_table_name::text) DESC LIMIT 10;
- Same for
_ways
and_rels
tables:
SELECT relname,relkind,pg_size_pretty(pg_table_size(relname::text)),
(SELECT nspname FROM pg_namespace WHERE oid=relnamespace) AS namespace
FROM pg_class WHERE relkind IN ('r','m','v') AND relnamespace NOT IN (SELECT oid FROM pg_namespace
WHERE nspname IN ('pg_toast','pg_catalog','information_schema'))
AND (relname~'_rels' OR relname~'_ways')
ORDER BY pg_table_size(relname::text) DESC;
- XML streaming output, allows to only store compressed .osm.bz2 on disk and not waste RAM. Example:
pgsql2osm --dsn 'dbname=gis' --iso fr --output -|bzip2 > France.osm.bz2
-
Attempt to lower RAM footprint with generators for database queries: streaming all the way from database to XML
-
Automatic detection of table names, referred to here as
planet_osm_*
, but they can also be named anything with the correct suffixes added byosm2pgsql
:_point
,_line
,_polygon
,_ways
and_rels
-
Automatic detection of the middle database format (legacy text[] or new jsonb, available in databases created by
osm2pgsql
>=1.9). WARNING: very old versions <1.3 may use a different middle database format, it was undocumented at that time. In that case it is recommended to re-import with a newosm2pgsql
version. -
Automatic detection of
planet_osm_point
,planet_osm_line
andplanet_osm_polygon
columns: a specific.style
at import is not required- Warning : The column names you choose will be the keys in the
.osm
output regardless of the original tag's key. When not using default tag transforms, a.osm
file will be produced that may not have the tags like they are expected by other software. - Note : key-values in
tags
override column key-values
- Warning : The column names you choose will be the keys in the
-
Bounds specifiable as either :
- iso country or region code
--iso de-by
for Germany/Bayerngrep -i MY regions.csv
can show you recognized iso codes for Malaysia and subregions
- osm relation id
--osm-rel-id N
, for any relation in the database. - geojson polygon input file
--geojson file.geojson
- bounding box
--bbox='<lon_from>,<lat_from>,<lon_to>,<lat_to>'
- iso country or region code
-
Bounds intersection can also be specified as one non-
bbox
of the above and a--bbox
. The extracted region will then be the intersection (logical AND) of the shape with the bbox. -
Anti-Feature: unsorted ids, see Unsorted ids
Extract | File size | time |
elements count | RAM |
---|---|---|---|---|
Austria-left15_0.osm.bz2--bbox='-180,-89,15.01,89' |
681MB | real 2h18 user 37min sys 42min |
46.7M n 4.54M w 75K r |
10.6GB |
Austria-right15_0.osm.bz2--bbox='14.99,-89,180,89' |
502MB | real 1h23 user 23min sys 17min |
33.1M n 4.01M w 67K r |
- |
Baden-württemberg.osm.bz2--bbox='-180,-89,180,48.31' |
274MB | real 49min user 16min sys 5min |
17.2M n 2.53M w 32.6K r |
- |
Baden-württemberg.osm.bz2--bbox='-180,48.29,180,89' |
560MB | real 2h08 user 34min sys 21min |
34.0M n 5.84M w 63K r |
6.1GB |
Austria-left14.25.osm.bz2--bbox='-180,-89,14.26,89' |
509MB | real 1h21 user 21min sys 22min |
34.6M n 3.31M w 53K r |
5.7GB |
Austria-right14.25.osm.bz2--bbox='14.24,-89,180,89' |
676MB | real 2h11 user 30min sys 46min |
45.4M n 5.27M w 89K r |
10.7GB |
Switzerland.osm.bz2 | 692MB | real 2h35 user 39min sys 45min |
47.1M n 5.3M w 94.5K r |
- |
Germany-bavaria.osm.bz2 | 1.1GB | real 5h17 user 1h06 sys 1h35 |
70.1M n 10.5M w 95.8K r |
12.1GB |
Note : The File size is for the compressed extract, bzip2 default settings used unless otherwise noted
I just run the following to get the max RAM used by that process
maxmemkb=0;while sleep 1;do
memkb="$(ps -Ao rss,cmd|grep python3|grep pgsql2osm|cut -d' ' -f1)";
if [ "${maxmemkb}" -lt "${memkb}" ];then maxmemkb="${memkb}";fi;
printf '\033[2K\r%s max=%s' "${memkb}" "${maxmemkb}";done
The bounding region will include every entity and all its dependent entities. This also means some entities outside the boundary will be inlcuded, as long as they have any reference to entities within the boundary. No geometry clipping is taking place, entities are exported as-is and can extend hundreds of kilometers beyond the boundary if that is a continuous entity (eg a long river).
The output ids are streamed in the order they come from the database, and for speed reasons
this is unsorted. If you need a sorted .osm
, use
osmium
after exporting:
osmium sort in.osm.bz2 --output=out.osm.bz2
The output data may have some missing tags, because the database import
is not a lossless operation: importing a .osm.pbf
to database, then exporting
it to .osm
and then converting it back to .osm.pbf
will most probably
(on real world data) produce a different file.
pip install pgsql2osm
- Need to compile
osm2pgsql
(instructions) - then copy
get_lonlat.cpp
toosm2pgsql/src/get_lonlat.cpp
- and in the
osm2pgsql/build/
directory:
make
g++ -std=c++17 ../src/get_lonlat.cpp ../src/node-persistent-cache.cpp \
-I ../contrib/fmt/include/ ../build/src/libosm2pgsql.a -Wall -g -o get_lonlat
- Test:
./get_lonlat /path/to/planet.bin.nodes <<< 3546766428
- Should output
39.188565;49.885284;3546766428
if that node was covered by database import
pgsql2osm /path/to/get_lonlat /path/to/planet.bin.nodes --help
For an overview of options
Made easier with the ModuleSettings class, example:
import pgsql2osm
m=pgsql2osm.settings.ModuleSettings(
bounds_rel_id=osm_rel_id,
get_lonlat_binary='/path/to/get_lonlat',
nodes_file='/path/to/planet.bin.nodes',
# no need for the --dsn if access already from somewhere else
access=already_previously_obtained_dbaccess)
with bz2.open('test.osm.bz2','wb') as f:
m.out_file=f #set the output file object
m.main()
Note: will still print progress reports to stderr
-
osm2pgsql
's--slim
mode means: has the "middle" layout tables containing all the raw osm references betweenways->nodes
andrels->nodes,ways,rels
, more specifically: -
The "middle" database tables have names ending in
_ways
and_rels
(osm2pgsql
defaults), and are the biggest with that suffix in the database, by used space -
The
*_ways
table has columnsid::bigint
for the osm ID,nodes::bigint[]
for a list of the way's child nodes,- and
tags::text[]
for that way's tags in hstore format, - with
tags::jsonb
in the case of the new jsonb format.
-
The
*_rels
table has columnsid::bigint
for the osm ID,- in the case of legacy text[] format:
parts::bigint[]
any ids of (mixed) nodes, ways or relations that are children of that relation,tags::text[]
for the relation's tags in hstore format,members::text[]
that more precisely describes the children, example{'n123','admin_centre','w345','outer','w567','inner'}
having the node123
as a roleadmin_centre
and the ways345
and567
, as rolesouter
andinner
boundaries respectively
- in the case of new jsonb format:
members::jsonb
a list of json objects describing the members, the example continued[{"ref":"123","role":"admin_centre","type":"N"},{"ref":"345","role":"outer","type":"W"}, {"ref":"567","role":"inner","type":"W"}]
,- and
tags::jsonb
for the relation's tags in{"key1":"value1","key2":"value2"}
format,
-
Access needed:
SELECT
granted on the above mentioned tables and also (for the autodetections of table names and column names)pg_attribute
,pg_tables
,pg_type
,pg_class
, andgeometry_columns
.
To generate regions.csv
, run in a planet database:
psql -d gis -p 5432 --csv -c "select -osm_id as osm_id,coalesce(tags->'name:en',name) as name,
tags->'ISO3166-1' as iso_country,tags->'border_type' as border_type,
tags->'ISO3166-2' as iso_subcountry,admin_level,tags->'wikipedia' as wikipedia
from planet_osm_polygon where osm_id<0 and admin_level in ('1','2','3','4')
and boundary='administrative'" > regions.csv
Note : only admin_level
<=4 is loaded into regions.csv
.
Use a search on openstreetmap.org for smaller regions.
And then extract the relation id from the url:
https://www.openstreetmap.org/relation/51701 51701
,
eg Switzerland