WORK IN PROGRESS: Add a new middle database format

The database format we have been using for the middle has some problems: * Node tags are not stored in the planet_osm_nodes table, because they are not needed for updates. But having access to those tags is useful in some cases, for instance when relations are processed after import. * Attributes (version, timestamp, changeset, uid, user) of ways and relations are stored as special pseudo-tags ("osm_*") (if --extra-attributes) is used. This has the potential of name clashes and format problems and makes the attributes difficult to access from the database. Attributes for nodes are never stored. * The way we store tags as array of text fields with keys and values intermixed ([key1, value1, key2, value2, ...]) is cumbersome to use from the database. * The way relation members are stored is rather arcane. * When using --extra-attributes/-x the middle tables become huge (due to storage in pseudo-tags). This commit fixes all those problems introducing a new database structure: * Tags are stored in JSONB columns. * The nodes table gets a new "tags" column. * Attributes are optionally stored in normal typed database columns. The columns are only added when --extra-attributes is specified and the columns can be NULL if not used which makes the overhead tiny. * Relation members are now stored as JSONB as an array of objects, for example: [{"type": "W", "ref": 123, "role": "inner"}, ...]. Using JSONB allows us to build the indexes needed to find all relations with certain members. * The format for way nodes has been kept as an array of bigints. The names of the tables PREFIX_nodes, PREFIX_ways, and PREFIX_rels (with "osm_planet" as default prefix) has been kept, but we might want to change this and get rid of the prefix, schemas are a better mechanism and they have been available for a while. There is a new table PREFIX_users which contains a user id->name lookup table. The user name isn't stored in the other tables, just the id. This saves disk space and has the added benefit of updating the user name correctly if a user name changes. There are two new command line options: * --middle-database-format=NUM - (1 for old format, 2 for new format) * --middle-with-nodes - set this to store tagged nodes in the database even if a flat-node file is used. Untagged nodes are only stored in the database if there is no flat-node file. For the first time this new format allows you to have a database created by osm2pgsql that contains *all* the information in an OSM file, all nodes, ways, and relations with all their tags and attributes. Note that this commit currently doesn't have a mechanism for detecting which format a database has when doing updates. This will be added later. For the time being you have to use the same command line options for updates that were used on import. This commit doesn't properly do testing. If you want to run all the tests with the new database format, set `middle_database_format = 2` in options.hpp and recompile. This commit adds a new dependency on a [JSON library](https://github.com/nlohmann/json). Parsing JSON isn't something we want to do ourselves. This library has been around for a while, is available everywhere and is well supported with regular releases unless the RapidJSON library we were using before. Closes osm2pgsql-dev#692 Closes osm2pgsql-dev#1170 See osm2pgsql-dev#1502
joto · May 27, 2023 · 2b7bd45 · 2b7bd45
1 parent ee2dabb
commit 2b7bd45
Show file tree

Hide file tree

Showing 8 changed files with 763 additions and 61 deletions.
diff --git a/.github/actions/ubuntu-prerequisites/action.yml b/.github/actions/ubuntu-prerequisites/action.yml
@@ -24,6 +24,7 @@ runs:
           libpotrace-dev \
           libpq-dev \
           libproj-dev \
+          nlohmann-json3-dev \
           pandoc \
           postgresql-${POSTGRESQL_VERSION} \
           postgresql-${POSTGRESQL_VERSION}-postgis-${POSTGIS_VERSION} \

diff --git a/.github/actions/win-install/action.yml b/.github/actions/win-install/action.yml
@@ -5,8 +5,8 @@ runs:
 
   steps:
     - name: Install packages
-      run: vcpkg install cimg:x64-windows bzip2:x64-windows expat:x64-windows zlib:x64-windows proj4:x64-windows boost-geometry:x64-windows boost-system:x64-windows boost-filesystem:x64-windows boost-property-tree:x64-windows lua:x64-windows libpq:x64-windows
+      run: vcpkg install cimg:x64-windows bzip2:x64-windows expat:x64-windows zlib:x64-windows proj4:x64-windows boost-geometry:x64-windows boost-system:x64-windows boost-filesystem:x64-windows boost-property-tree:x64-windows lua:x64-windows libpq:x64-windows nlohmann-json:x64-windows
       shell: bash
-    - name: Install psycopg2 and beahve
+    - name: Install psycopg2 and behave
       run: python -m pip install psycopg2 behave
       shell: bash