New flex backend

This introduces a new "flex" backend which allows much more flexibility in choosing the database format and the transformation from OSM data to the database format. The user defines all this in a Lua script.
osm2pgsql-dev · Feb 5, 2020 · 9c16722 · 9c16722
1 parent 76a3e78
commit 9c16722
Show file tree

Hide file tree

Showing 49 changed files with 6,268 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -176,6 +176,12 @@ null backend for testing. For flexibility a new [multi](docs/multi.md)
 backend is also available which allows the configuration of custom
 PostgreSQL tables instead of those provided in the pgsql backend.
 
+Also available is the new [flex](docs/flex.md) backend. It is much more
+flexible than the other backends. IT IS CURRENTLY EXPERIMENTAL AND SUBJECT
+TO CHANGE. The flex backend is only available if you have compiled osm2pgsql
+with Lua support. More details at
+https://github.com/openstreetmap/osm2pgsql/issues/1036 .
+
 ## LuaJIT support ##
 
 To speed up Lua tag transformations, [LuaJIT](https://luajit.org/) can be optionally

diff --git a/docs/flex.md b/docs/flex.md
@@ -0,0 +1,282 @@
+
+# The Flex Backend
+
+**The Flex Backend is experimental. Everything in here is subject to change.**
+
+The "Flex" backend, as the name suggests, allows for a more flexible
+configuration that tells osm2pgsql what OSM data to store in your database and
+exactly where and how. It is configured through a Lua file which
+
+* defines the structure of the output tables and
+* defines functions to map the OSM data to the database data format
+
+See also the example config files in the `flex-config` directory which contain
+lots of comments to get you started.
+
+## The Lua config file
+
+All configuration is done through the `osm2pgsql` object in Lua. It has the
+following fields:
+
+* `osm2pgsql.version`: The version of osm2pgsql as a string.
+* `osm2pgsql.srid`: The SRID set on the command line (with `-l|--latlong`,
+  `-m|--merc`, or `-E|--proj`).
+* `osm2pgsql.mode`: Either `"create"` or `"append"` depending on the command
+  line options (`--create` or `-a|--append`).
+* `osm2pgsql.stage`: Either `1` or `2` (1st/2nd stage processing of the data).
+  See below.
+
+The following functions are defined:
+
+* `osm2pgsql.define_node_table(name, columns)`: Define a node table.
+* `osm2pgsql.define_way_table(name, columns)`: Define a way table.
+* `osm2pgsql.define_relation_table(name, columns)`: Define a relation table.
+* `osm2pgsql.define_area_table(name, columns)`: Define an area table.
+* `osm2pgsql.define_table()`: Define a table. This is the more flexible
+  function behind all the other `define_*_table()` functions. It gives you
+  more control than the more convenient other functions.
+* `osm2pgsql.mark_way(id)`: Mark the OSM way with the specified id. This way
+  will be processed (again) in stage 2.
+* `osm2pgsql.mark_relation(id)`: Mark the OSM relation with the specified id.
+  This relation will be processed (again) in stage 2.
+
+You are expected to define one or more of the following functions:
+
+* `osm2pgsql.process_node()`: Called for each node.
+* `osm2pgsql.process_way()`: Called for each way.
+* `osm2pgsql.process_relation()`: Called for each relation.
+
+### Defining a table
+
+You have to define one or more tables where your data should end up. This
+is done with the `osm2pgsql.define_table()` function or one of the slightly
+more convenient functions `osm2pgsql.define_(node|way|relation|area)_table()`.
+
+Each table is either a *node table*, *way table*, *relation table*, or *area
+table*. This means that the data for that table comes primarily from a node,
+way, relation, or area, respectively. Osm2pgsql makes sure that the OSM object
+id will be stored in the table so that later updates to those OSM objects (or
+deletions) will be properly reflected in the tables. Area tables are special,
+they can contain data derived from ways or from relations. Way ids will be
+stored as is, relation ids will be stored as negative numbers.
+
+With the `osm2pgsql.define_table()` function you can also define tables that
+* don't have any ids, but those tables will never be updated by osm2pgsql
+* take *any OSM object*, in this case the type of object is stored in an
+  additional column.
+* are in a specific PostgresSQL tablespace (set `data_tablespace`) or that
+  get their indexes created in a specific tablespace (set `index_tablespace`).
+
+If you are using the `osm2pgsql.define_(node|way|relation|area)_table()`
+convenience functions, osm2pgsql will automatically create an id column named
+`(node|way|relation|area)_id`, respectively. If you want more control over
+the id column(s), use the `osm2pgsql.define_table()` function.
+
+Most tables will have a geometry column. (Currently only zero or one geometry
+columns are supported.) The types of the geometry column possible depend on
+the type of the input data. For node tables you are pretty much restricted
+to point geometries, but there is a variety of options for relation tables
+for instance.
+
+The supported geometry types are:
+* `point`: Point geometry, usually created from nodes.
+* `linestring`: Linestring geometry, usually created from ways.
+* `polygon`: Polygon geometry for area tables, created from ways or relations.
+* `multipoint`: Currently not used.
+* `multilinestring`: Created from (possibly split up) ways or relations.
+* `multipolygon`: For area tables, created from ways or relations.
+* `geometry`: Any kind of geometry. Also used for area tables that should hold
+  both polygon and multipolygon geometries.
+
+A column of type `area` will be filled automatically with the area of the
+geometry. This will only work for (multi)polygons.
+
+In addition to id and geometry columns, each table can have any number of
+"normal" columns using any type supported by PostgreSQL. Some types are
+specially recognized by osm2pgsql:
+
+* `text`: A text string.
+* `boolean`: Interprets string values `"true"`, `"yes"` as `true` and all
+   others as `false`. Boolean and integer values will also work in the usual
+   way.
+* `int2`, `smallint`: 16bit signed integer. Values too large to fit will be
+  truncated in some unspecified way.
+* `int4`, `int`, `integer`: 32bit signed integer. Values too large to fit will
+  be truncated in some unspecified way.
+* `int8`, `bigint`: 64bit signed integer. Values too large to fit will be
+  truncated in some unspecified way.
+* `real`: A real number.
+* `hstore`: Automatically filled from a Lua table with only strings as keys
+  and values.
+* `direction`: Interprets values `"true"`, `"yes"`, and `"1"` as 1, `"-1"` as
+  `-1`, and everything else as `0`. Useful for `oneway` tags etc.
+
+Instead of the above types you can use any SQL type you want. If you do that
+you have to supply the PostgreSQL string representation for that type when
+adding data to such columns (or Lua nil to set the column to `NULL`).
+
+### Processing callbacks
+
+You are expected to define one or more of the following functions:
+
+* `osm2pgsql.process_node(object)`: Called for each node.
+* `osm2pgsql.process_way(object)`: Called for each way.
+* `osm2pgsql.process_relation(object)`: Called for each relation.
+
+They all have a single argument of type table (here called `object`) and no
+return value. If you are not interested in all object types, you do not have
+to supply all the functions.
+
+These functions are called for each new or modified OSM object in the input
+file. No function is called for deleted objects, osm2pgsql will automatically
+delete all data in your database tables that derived from deleted objects.
+Modifications are handled as deletions followed by creation of a "new" object,
+for which the functions are called.
+
+The parameter table (`object`) has the following fields:
+
+* `id`: The id of the node, way, or relation.
+* `tags`: A table with all the tags of the object.
+* `version`, `timestamp`, `changeset`, `uid`, and `user`: Attributes of the
+  OSM object. These are only available if the `-x|--extra-attributes` option
+  is used and the OSM input file actually contains those fields. The
+  `timestamp` contains the time in seconds since the epoch (midnight
+  1970-01-01).
+* `grab_tag(KEY)`: Return the tag value of the specified key and remove the
+  tag from the list of tags. (Example: `local name = object:grab_tag('name')`)
+  This is often used when you want to store some tags in special columns and
+  the rest of the tags in an hstore column.
+* `get_bbox()`: Get the bounding box of the current node or way. (It doesn't
+  work for relations currently.)
+
+Ways have the following additional fields:
+* `is_closed`: A boolean telling you whether the way geometry is closed, ie
+  the first and last node are the same.
+* `nodes`: An array with the way node ids.
+
+Relations have the following additional field:
+* `members`: An array with member tables. Each member table has the fields
+  `type` (values `n`, `w`, or `r`), `ref` (member id) and `role`.
+
+You can do anything in those processing functions to decide what to do with
+this data. If you are not interested in that OSM object, simply return from the
+function. If you want to add the OSM object to some table call the `add_row()`
+function on that table:
+
+```
+-- definition of the table:
+table_pois = osm2pgsql.define_node_table('pois', {
+    { column = 'tags', type = 'hstore' },
+    { column = 'name', type = 'text' },
+    { column = 'geom', type = 'point' },
+})
+...
+function osm2pgsql.process_node(object)
+...
+    table_pois:add_row({
+        tags = object.tags,
+        name = object.tags.name,
+        geom = { create = 'point' }
+    })
+...
+end
+```
+
+The `add_row()` function takes a single table parameter, that describes what to
+fill into all the database columns. Any column not mentioned will be set to
+`NULL`.
+
+The geometry column in somewhat special. You have to define a *geometry
+transformation* that will be used to transform the OSM object data into
+a geometry that fits into the geometry column. See the next section for
+details.
+
+Note that you can't set the object id, this will be handled for you behind the
+scenes.
+
+## Geometry transformations
+
+Currently these geometry transformations are supported:
+
+* `{ create = 'point'}`. Only valid for nodes, create a 'point' geometry.
+* `{ create = 'line'}`. For ways or relations. Create a 'linestring' or
+  'multilinestring' geometry.
+* `{ create = 'area'}` For ways or relations. Create a 'polygon' or
+  'multipolygon' geometry.
+
+Some of these transformations can have parameters:
+
+* The `line` transformation has an optional parameter `split_at`. If this
+  is set to anything other than 0, linestrings longer than this value will
+  be split up into parts no longer than this value.
+* The `area` transformation has an optional parameter `multi`. If this is
+  set to `false` (the default), a multipolygon geometry will be split up into
+  several polygons. If this is set to `true`, the multipolygon geometry is
+  kept as one. It depends on this parameter whether you need a polygon
+  or multipolygon geometry column.
+
+If no geometry transformation is set, osm2pgsql will, in some cases, assume
+a default transformation. These are the defaults:
+
+* For node tables, a `point` column gets the node location.
+* For way tables, a `linestring` column gets the complete way geometry, a
+  `polygon` column gets the way geometry as area (if the way is closed and
+  the area is valid).
+
+## Stages
+
+Osm2pgsql processes the data in up to two stages. You can mark ways or
+relations in stage 1 for processing in stage 2 by calling
+`osm2pgsql.mark_way(id)` or `osm2pgsql.mark_relation(id)`, respectively. If you
+don't mark any objects, nothing will be done in stage 2.
+
+You can look at `osm2pgsql.stage` to see in which stage you are.
+
+In stage 1 you can only look at each OSM object on its own. You can see
+its id and tags (and possibly timestamp, changeset, user, etc.), but you don't
+know how this OSM objects relates to other OSM objects (for instance whether a
+way you are looking at is a member in a relation). If this is enough to decide
+in which database table(s) and with what data an OSM object should end up in,
+then you can process the OSM object in stage 1. If, on the other hand, you
+need some extra information, you have to defer processing to the second stage.
+
+You want to do all the processing you can in stage 1, because it is faster
+and there is less memory overhead. For most use cases, stage 1 is enough. If
+it is not, use stage 1 to store information about OSM objects you will need
+in stage 2 in some global variable. In stage 2 you can read this information
+again and use it to decide where and how to store the data in the database.
+
+## Command line options
+
+Use the command line option `-O flex` or `--output=flex` to enable the flex
+backend and the `-S|--style` option to set the Lua config file.
+
+The following command line options have a somewhat different meaning when
+using the flex backend:
+
+* `-p|--prefix`: The table names you are setting in your Lua config files
+  will *not* get this prefix. You can easily add the prefix in the Lua config
+  yourself.
+* `-S|--style`: Use this to specify the Lua config file. Without it, osm2pgsql
+  will not work, because it will try to read the default style file.
+* `-G|--multi-geometry` is not used. Instead, set the type of the geometry
+  column to the type you want, ie `polygon` vs. `multipolygon`.
+
+The following command line options are ignored by `osm2pgsl` when using the
+flex backend, because they don't make sense in that context:
+
+* `-k|--hstore`
+* `-j|--hstore-all`
+* `-z|--hstore-column`
+* `--hstore-match-only`
+* `--hstore-add-index`
+* `-K|--keep-coastlines` (Coastline tags are not handled specially in the
+  flex backend.)
+* `--tag-transform-script` (Set the Lua config file with the `-S|--style`
+  option.)
+* `-G|--multi-geometry` (Use the `multi` option on the geometry transformation
+  instead.)
+* The command line options to set the tablespace are ignored by the flex
+  backend, instead use the `data_tablespace` or `index_tablespace` options
+  when defining your table.
+
diff --git a/docs/osm2pgsql.1 b/docs/osm2pgsql.1
@@ -156,7 +156,8 @@ Specifies the output back\-end or database schema to use. Currently
 osm2pgsql supports \fBpgsql\fR, \fBgazetteer\fR and \fBnull\fR. \fBpgsql\fR is
 the default output back\-end / schema and is optimized for rendering with Mapnik.
 \fBgazetteer\fR is a db schema optimized for geocoding and is used by Nominatim.
-The \fBmulti\fR backend allows more customization of tables.
+The \fBmulti\fR backend allows more customization of tables. The experimental
+\fBflex\fR backend allows more flexible configuration.
 \fBnull\fR does not write any output and is only useful for testing or with
 \-\-slim for creating slim tables.
 .TP

diff --git a/flex-config/README.md b/flex-config/README.md
@@ -0,0 +1,48 @@
+
+# Flex Backend Configuration
+
+**The Flex Backend is experimental. Everything in here is subject to change.**
+
+See the [Flex Backend Documentation](docs/flex.md) for all the details.
+
+## Example config files
+
+This directory contains example config files for the flex backend. All config
+files are documented extensively with inline comments.
+
+If you are learning about the flex backend, read the config files in the
+following order (from easiest to understand to the more complex ones):
+
+1. [simple.lua](simple.lua) -- Introduction to config file format
+2. [geometries.lua](geometries.lua) -- Geometry column options
+3. [data-types.lua](data-types.lua) -- Data types and how to handle them
+
+After that you can dive into more advanced topics:
+
+* [route-relations.lua](route-relations.lua) -- Use multi-stage processing
+  to bring tags from relations to member ways
+* [unitable.lua](unitable.lua) -- Put all OSM data into a single table
+* [places.lua](places.lua) -- Creating JSON/JSONB columns
+
+The "default" configuration is a full-featured but simple configuration that
+is a good starting point for your own real-world configuration:
+
+* [default-config.lua](default-config.lua)
+
+The following config file tries to be more or less compatible with the old
+osm2pgsql C transforms:
+
+* [compatible.lua](compatible.lua)
+
+## Dependencies
+
+Some of the example files use the `inspect` Lua library to show debugging
+output. It is not needed for the actual functionality of the examples, so if
+you don't have the library, you can remove all uses of `inspect` and the
+scripts should still work.
+
+The library is available from [the
+source](https://github.com/kikito/inspect.lua) or using
+[LuaRocks](https://luarocks.org/modules/kikito/inspect). Debian/Ubuntu users
+can install the `lua-inspect` package.
+