| chapter | 10 |
|---|---|
| title | Generalization |
Version >= 1.9.0{: .version} Experimental{: .experimental} Osm2pgsql has some limited support for generalization. See [the generalization project page]({% link generalization/index.md %}) for some background and details. This work is experimental and everything described here might change without notice.
For the generalization functionality the separate program osm2pgsql-gen is
provided. In the future this functionality might be integrated into osm2pgsql
itself. The same Lua configuration file that is used for osm2pgsql is also used
to configure the generalization. Generalization will only work together with
the flex output.
The documentation in this chapter is incomplete. We are working on it...
Generalization is the process by which detailed map data is selected, simplified, or changed into something suitable for rendering on smaller scale maps (or smaller zoom levels). In osm2pgsql this is done with a separate program after an import or update finished. Data is processed in the database and/or read out of the database and processed in osm2pgsql and then written back.
The end result is that in addition to the usual tables created and filled by osm2pgsql you have a set of additional tables with the generalized data.
Generalization is currently only supported for Web Mercator (EPSG 3857). This is by far the most common use case, we can look at extending this later if needed.
Osm2pgsql supports several different strategies for generalization which use different algorithms suitable for different types of data. Each strategy has several configuration options. See the next section for general options used by most strategies and the section after that for all the details about the strategies.
All tables needed for generalized data have to be configured just like any table in osm2pgsql. Currently there are some restrictions on the tables:
- The input and output tables must use the same schema.
- The geometry column used must have the same name as the geometry column in the table used as input for a generalizer.
- Output tables for tile-based generalizers must have
idsset totile, which automatically ceatesxandycolumns for the tile coordinates. An index will also be created on those columns after generalization.
To add generalization to your config, add a callback function
osm2pgsql.process_gen() and run generalizers in there:
function osm2pgsql.process_gen()
osm2pgsql.run_gen(STRATEGY, { ... })
endReplace STRATEGY with the strategy (see below) and add all parameters to the
Lua table.
The following parameters are used by most generalizers:
| Parameter | Type | Description |
|---|---|---|
| name | text | Identifier for this generalizer used for debug outputs and error message etc. |
| debug | bool | Set to true to enable debug logging for this generalizer. Debug logging must also be enabled with -l, --log-level=debug on the command line. |
| schema | text | Database schema for all tables. Default: public. |
| src_table | text | The table with the input data. |
| dest_table | text | The table where generalizer output data is written to. |
| geom_column | text | The name of the geometry column in the input and output tables (default: geom). |
| {:.desc} |
For more specific parameters see below.
You can also run any SQL command in the process_gen() callback with the
run_sql() function:
osm2pgsql.run_sql({
description = 'Descriptive name for this command for logging',
sql = "UPDATE foo SET bar = 'x'"
})The following fields are available in the run_sql() command:
| Parameter | Type | Description |
|---|---|---|
| description | text | Descriptive name or short text for logging. |
| sql | text or array of texts | The SQL command to run. Version >= 1.9.2{: .version} The sql field can be set to a string or to an array of strings in which case the commands in those strings will be run one after the other. |
| transaction | bool | Version >= 1.9.2{: .version} Set to true to run the command(s) from the sql field in a transaction (Default: false). |
| if_has_rows | text | Version >= 1.9.2{: .version} SQL command that is run first. If that SQL command returns any rows, the commands in sql are run, otherwise nothing is done. This can be used, to trigger generalizations only if something changed, for instance when an expire table contains something. Use a query like SELECT 1 FROM expire_table LIMIT 1. Default: none, i.e. the command in sql always runs. |
| {:.desc} |
There are currently two types of strategies: Some strategies always work on all data in the input table(s). If there are changes in the input data, the processing has to restart from scratch. If you work with updates, you will usually only run these strategies once a day or once a week or so. In general those strategies make only sense for data that doesn't change that often (or where changes are usually small) and if it doesn't matter that data in smaller zoom levels are only updated occasionally.
The other type of strategy uses a tile-based approach. Whenever something changes, all tiles intersecting with the change will be re-processed. Osm2pgsql uses the existing expire mechanism to keep track of what to change.
This strategy derives builtup areas from landuse polygons, roads and building outlines. It is intended to show roughly were there are urban areas. Because it uses input data of several diverse types, it works reasonably well in different areas of the world even if the landuse tagging is incomplete.
This strategy is tile-based. Internally it uses a similar approach as the
raster-union strategy but can work with several input tables.
Parameters used by this strategy (see below for some additional general parameters):
| Parameter | Type | Description |
|---|---|---|
| src_tables | text | Comma-separated list of input table names in the order landuse layer, buildings layer, roads layer. |
| image_extent | int | Width/height of the raster used for generalization (Default: 2048). |
| image_buffer | int | Buffer used around the raster image (default: 0). |
| min_area | real | Drop output polygons smaller than this. Default: off |
| margin | real | The overlapping margin as a percentage of image_extent for raster processing of tiles. |
| buffer_size | text | Amount by which polygons are buffered in pixels. Comma-separated list for each input file. |
| turdsize | int | |
| zoom | int | Zoom level. |
| make_valid | bool | Make sure resulting geometries are valid. |
| area_column | text | Column name where to store the area of the result polygons. |
| {:.desc} |
See this blog post{:.extlink} for some background.
When rendering a map with many point features like cities or mountain peaks it is often useful to only put the most important features on the map. Importance can be something like the number of people living in a city or the height of a peak. But if only the absolute importance is used, some areas on the map will get filled with many features, while others stay empty. The Discrete Isolation algorithm can be used to calculate a more relative measure of importance which tends to create a more evenly filled map.
This strategy always processes all features in a table.
Parameters used by this strategy (see below for some additional general parameters):
| Parameter | Type | Description |
|---|---|---|
| id_column | text | The name of the id column in the source table. |
| importance_column | text | The column in the source table with the importance metric. Column type must be a number type. |
| {:.desc} |
The src_table and dest_table have always to be the same.
You must have an index on the id column, otherwise this will be very slow! Set
create_index = 'always' in your source table configuration.
You must have the following columns in your table. This is currently not configurable:
| Column | Type | Description |
|---|---|---|
| discr_iso | real | Discrete isolation value |
| irank | int | Importance rank |
| dirank | int | Discrete isolation rank |
| {:.desc} |
Use these column definitions in your config file to add them:
{ column = 'discr_iso', type = 'real', create_only = true },
{ column = 'irank', type = 'int', create_only = true },
{ column = 'dirank', type = 'int', create_only = true },
See this blog post{:.extlink} for some background.
This strategy merges and simplifies polygons using a raster intermediate. It
is intended for polygon layers such as landcover where many smaller features
should be aggregated into larger ones. It does a very similar job as the
vector-union strategy, but is faster.
This strategy is tile-based.
Parameters used by this strategy (see below for some additional general parameters):
| Parameter | Type | Description |
|---|---|---|
| image_extent | int | Width/height of the raster used for generalization (Default: 2048). |
| margin | real | The overlapping margin as a percentage of image_extent for raster processing of tiles. |
| buffer_size | text | Amount by which polygons are buffered in pixels (Default 10). |
| zoom | int | Zoom level. |
| group_by_column | text | Name of a column in the source and destination tables used to group the geometries by some kind of classification (Optional). |
| expire_list | text | |
| img_path | text | Used to dump PNGs of the "before" and "after" images to a file for debugging. |
| img_table | text | Used to dump "before" and "after" raster images to the database for debugging. The table will be created if it doesn't exist already. |
| where | text | Optional WHERE clause to add to the SQL query getting the input data from the database. Must be empty or a valid SQL snippet. |
| {:.desc} |
Actual image extent used will be image_extent + 2 * margin * image_extent. margin * image_extent is rounded to nearest multiple of 64.
The img_path parameters can be set to help with debugging. Set img_path to
something like this: some/dir/path/img. Resulting images will be in the
directory some/dir/path and are named img-X-Y-TYPE-[io].png for input (i)
or output (o) images. The TYPE is the value from the group_by_column.
See this blog post{:.extlink} for some background.
This strategy is intended to find larger rivers with their width and aggregate them into longer linestrings. The implementation is incomplete and not usable at the moment.
This strategy always processes all features in a table.
Parameters used by this strategy (see below for some additional general parameters):
| Parameter | Type | Description |
|---|---|---|
| src_areas | text | Name of the input table with waterway areas. |
| width_column | text | Name of the number type column containing the width of a feature. |
| {:.desc} |
See this blog post{:.extlink} for some background.
This strategy merges and simplifies polygons using vector calculations. It
is intended for polygon layers such as landcover where many smaller features
should be aggregated into larger ones. It does a very similar job as the
raster-union strategy, but is slower.
This strategy is tile-based.
Parameters used by this strategy (see below for some additional general parameters):
| Parameter | Type | Description |
|---|---|---|
| margin | real | |
| buffer_size | text | Amount by which polygons are buffered in Mercator map units. |
| group_by_column | text | Column to group data by. Same column is used in the output for classification. |
| zoom | int | Zoom level. |
| expire_list | text | |
| {:.desc} |
Here are the most important command line options:
| Command line option | Description |
|---|---|
| -a, --append | Run in append (update) mode. Same option as with osm2pgsql. |
| -c, --create | Run in create (import) mode. Same option as with osm2pgsql. (This is the default.) |
| -j, --jobs=JOBS | Maximum number of threads to use. (Default: no threads.) |
| -l, --log-level=LEVEL | Set log level (debug, info (default), warn, or error). |
| --log-sql | Log all SQL commands send to the database. |
| --middle-schema=SCHEMA | Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the middle. The schema must exist in the database and be writable by the database user. By default the schema set with --schema is used, or public if that is not set. Set this to the same value as used on the osm2pgsql command line. |
| {:.desc} |
Some strategies can run many jobs in parallel, speeding up processing a lot.
Use the -j, --jobs option to set the maximum number of threads. If nothing
else is running in parallel, try setting this to the number of available CPU
cores.
To specify which database to work on osm2pgsql-gen uses the same command line
options as osm2pgsql:
{% include_relative options/database.md %}