Skip to content

Commit

Permalink
WORK IN PROGRESS: Add a new middle database format
Browse files Browse the repository at this point in the history
The database format we have been using for the middle has some problems:

* Node tags are not stored in the planet_osm_nodes table, because they
  are not needed for updates. But having access to those tags is useful
  in some cases, for instance when relations are processed after import.
* Attributes (version, timestamp, changeset, uid, user) of ways and
  relations are stored as special pseudo-tags ("osm_*") (if
  --extra-attributes) is used. This has the potential of name clashes
  and format problems and makes the attributes difficult to access from
  the database. Attributes for nodes are never stored.
* The way we store tags as array of text fields with keys and values
  intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use
  from the database.
* The way relation members are stored is rather arcane.
* When using --extra-attributes/-x the middle tables become huge (due
  to storage in pseudo-tags).

This commit fixes all those problems introducing a new database
structure:

* Tags are stored in JSONB columns.
* The nodes table gets a new "tags" column.
* Attributes are optionally stored in normal typed database columns. The
  columns are only added when --extra-attributes is specified and the
  columns can be NULL if not used which makes the overhead tiny.
* Relation members are now stored as JSONB as an array of objects, for
  example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using
  JSONB allows us to build the indexes needed to find all relations with
  certain members.
* The format for way nodes has been kept as an array of bigints.

The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with
"osm_planet" as default prefix) has been kept, but we might want to
change this and get rid of the prefix, schemas are a better mechanism
and they have been available for a while.

There is a new table PREFIX_users which contains a user id->name lookup
table. The user name isn't stored in the other tables, just the id. This
saves disk space and has the added benefit of updating the user name
correctly if a user name changes.

There are two new command line options:
* --middle-database-format=NUM - (1 for old format, 2 for new format)
* --middle-with-nodes - set this to store tagged nodes in the database
  even if a flat-node file is used. Untagged nodes are only stored in
  the database if there is no flat-node file.

For the first time this new format allows you to have a database created
by osm2pgsql that contains *all* the information in an OSM file, all
nodes, ways, and relations with all their tags and attributes.

Note that this commit currently doesn't have a mechanism for detecting
which format a database has when doing updates. This will be added
later. For the time being you have to use the same command line options
for updates that were used on import.

This commit doesn't properly do testing. If you want to run all the
tests with the new database format, set `middle_database_format = 2` in
options.hpp and recompile.

This commit adds a new dependency on a [JSON
library](https://github.com/nlohmann/json). Parsing JSON isn't something
we want to do ourselves. This library has been around for a while, is
available everywhere and is well supported with regular releases unless
the RapidJSON library we were using before.

Closes osm2pgsql-dev#692
Closes osm2pgsql-dev#1170
See osm2pgsql-dev#1502
  • Loading branch information
joto committed May 27, 2023
1 parent ee2dabb commit 2b7bd45
Show file tree
Hide file tree
Showing 8 changed files with 763 additions and 61 deletions.
1 change: 1 addition & 0 deletions .github/actions/ubuntu-prerequisites/action.yml
Expand Up @@ -24,6 +24,7 @@ runs:
libpotrace-dev \
libpq-dev \
libproj-dev \
nlohmann-json3-dev \
pandoc \
postgresql-${POSTGRESQL_VERSION} \
postgresql-${POSTGRESQL_VERSION}-postgis-${POSTGIS_VERSION} \
Expand Down
4 changes: 2 additions & 2 deletions .github/actions/win-install/action.yml
Expand Up @@ -5,8 +5,8 @@ runs:

steps:
- name: Install packages
run: vcpkg install cimg:x64-windows bzip2:x64-windows expat:x64-windows zlib:x64-windows proj4:x64-windows boost-geometry:x64-windows boost-system:x64-windows boost-filesystem:x64-windows boost-property-tree:x64-windows lua:x64-windows libpq:x64-windows
run: vcpkg install cimg:x64-windows bzip2:x64-windows expat:x64-windows zlib:x64-windows proj4:x64-windows boost-geometry:x64-windows boost-system:x64-windows boost-filesystem:x64-windows boost-property-tree:x64-windows lua:x64-windows libpq:x64-windows nlohmann-json:x64-windows
shell: bash
- name: Install psycopg2 and beahve
- name: Install psycopg2 and behave
run: python -m pip install psycopg2 behave
shell: bash

0 comments on commit 2b7bd45

Please sign in to comment.