Skip to content

Commit

Permalink
Add a new middle database format
Browse files Browse the repository at this point in the history
The database format we have been using for the middle has some problems:

* Node tags are not stored in the planet_osm_nodes table, because they
  are not needed for updates. But having access to those tags is useful
  in some cases, for instance when relations are processed after import.
* Attributes (version, timestamp, changeset, uid, user) of ways and
  relations are stored as special pseudo-tags ("osm_*") (if
  --extra-attributes) is used. This has the potential of name clashes
  and format problems and makes the attributes difficult to access from
  the database. Attributes for nodes are never stored.
* The way we store tags as array of text fields with keys and values
  intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use
  from the database.
* The way relation members are stored is rather arcane.
* When using --extra-attributes/-x the middle tables become huge (due
  to storage in pseudo-tags).

This commit fixes all those problems introducing a new database
structure:

* Tags are stored in JSONB columns.
* The nodes table gets a new "tags" column.
* Attributes are optionally stored in normal typed database columns. The
  columns are only added when --extra-attributes is specified and the
  columns can be NULL if not used which makes the overhead tiny.
* Relation members are now stored as JSONB as an array of objects, for
  example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using
  JSONB allows us to build the indexes needed to find all relations with
  certain members.
* The format for way nodes has been kept as an array of bigints.

The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with
"osm_planet" as default prefix) has been kept, but we might want to
change this and get rid of the prefix, schemas are a better mechanism
and they have been available for a while.

There is a new table PREFIX_users which contains a user id->name lookup
table. The user name isn't stored in the other tables, just the id. This
saves disk space and has the added benefit of updating the user name
correctly if a user name changes.

There are two new command line options:
* --middle-database-format=FORMAT - 'legacy' (default) or 'new'
* --middle-with-nodes - set this to store tagged nodes in the database
  even if a flat-node file is used. Untagged nodes are only stored in
  the database if there is no flat-node file.

For the first time this new format allows you to have a database created
by osm2pgsql that contains *all* the information in an OSM file, all
nodes, ways, and relations with all their tags and attributes.

A new property "db_format" is written to the osm2pgsql_properties table
with the value "0" (non-slim import), "1" (slim import with legacy
format) or "2" (slim import with new format). This is read in append
mode and handled appropriately.

This commit adds a new dependency on a [JSON
library](https://github.com/nlohmann/json). Parsing JSON isn't something
we want to do ourselves. This library has been around for a while, is
available everywhere and is well supported with regular releases unless
the RapidJSON library we were using before.

Closes osm2pgsql-dev#692
Closes osm2pgsql-dev#1170
See osm2pgsql-dev#1502
  • Loading branch information
joto committed Jun 30, 2023
1 parent ea1feb2 commit 412c08e
Show file tree
Hide file tree
Showing 17 changed files with 896 additions and 76 deletions.
1 change: 1 addition & 0 deletions .github/actions/ubuntu-prerequisites/action.yml
Expand Up @@ -24,6 +24,7 @@ runs:
libpotrace-dev \
libpq-dev \
libproj-dev \
nlohmann-json3-dev \
pandoc \
postgresql-${POSTGRESQL_VERSION} \
postgresql-${POSTGRESQL_VERSION}-postgis-${POSTGIS_VERSION} \
Expand Down
1 change: 1 addition & 0 deletions .github/actions/win-install/action.yml
Expand Up @@ -16,6 +16,7 @@ runs:
expat:x64-windows \
libpq:x64-windows \
lua:x64-windows \
nlohmann-json:x64-windows \
proj4:x64-windows \
zlib:x64-windows
shell: bash
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Expand Up @@ -14,7 +14,7 @@ jobs:

- name: Install prerequisites
run: |
brew install lua boost postgis pandoc cimg potrace
brew install lua boost postgis pandoc cimg potrace nlohmann-json
pip3 install psycopg2 behave osmium
pg_ctl -D /usr/local/var/postgres init
pg_ctl -D /usr/local/var/postgres start
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/test-install.yml
Expand Up @@ -48,6 +48,7 @@ jobs:
libpotrace-dev \
libpq-dev \
libproj-dev \
nlohmann-json3-dev \
lua${LUA_VERSION} \
pandoc \
postgresql-${POSTGRESQL_VERSION} \
Expand Down
3 changes: 3 additions & 0 deletions CMakeLists.txt
Expand Up @@ -205,6 +205,9 @@ include_directories(SYSTEM ${PostgreSQL_INCLUDE_DIRS})

find_package(Threads)

find_path(NLOHMANN_INCLUDE_DIR nlohmann/json.hpp)
include_directories(SYSTEM ${NLOHMANN_INCLUDE_DIR})

find_path(POTRACE_INCLUDE_DIR potracelib.h)
find_library(POTRACE_LIBRARY NAMES potrace)

Expand Down
1 change: 1 addition & 0 deletions README.md
Expand Up @@ -49,6 +49,7 @@ Required libraries are
* [zlib](https://www.zlib.net/)
* [Boost libraries](https://www.boost.org/), including geometry, system and
filesystem
* [nlohmann/json](https://json.nlohmann.me/)
* [CImg](https://cimg.eu/) (Optional, for generalization only)
* [potrace](https://potrace.sourceforge.net/) (Optional, for generalization only)
* [PostgreSQL](https://www.postgresql.org/) client libraries
Expand Down

0 comments on commit 412c08e

Please sign in to comment.