Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching OSM Carto to use the osm2pgsql flex output #4977

Open
joto opened this issue Jun 4, 2024 · 4 comments
Open

Switching OSM Carto to use the osm2pgsql flex output #4977

joto opened this issue Jun 4, 2024 · 4 comments

Comments

@joto
Copy link

joto commented Jun 4, 2024

Osm2pgsql has been moving away from the old "pgsql" output for years now. The new output can do everything the old code can do and much much more. All new development is there, the old code will not get any new features. The OSM Carto project is the last major user of the "pgsql" output.

We want to get rid of the "pgsql" output in osm2pgsql at some point, which allows us to simplify osm2pgsql internally. This will not happen tomorrow, we'll leave plenty of time for OSM Carto and other users to switch. But we have to get started on moving installations over to the flex output.

Advantages of the switch include:

  • Potential for more flexible OSM Carto setup. Of course the OSM Carto project can decide whether they want to make use of those features.
  • Potential for OSM Carto derived styles to use new features even if OSM Carto itself doesn't use them.
  • Allows bringing OSM Carto, Nominatim and other data layouts (for instance for vector tiles) into the same database.

Instead of the openstreetmap-carto.style and openstreetmap-carto.lua config files there is now a single config file openstreetmap-carto-flex.lua. The command line for osm2pgsql will change to use the flex output and the new config file. Everything else should be pretty much the same. The database layout is 100% compatible. No changes to the styles or SQL queries are needed.

Updates are totally seemless. You can keep an existing database created with the pgsql output and keep updating it now with the new flex-based configuration.

The two versions of the config files can be used side-by-side for a while if that's what OSM Carto maintainers want. The documentation can explain both options. Or we can switch over at some point.

Osm2pgsql version needed

You need at least version 1.8.0 of osm2pgsql which is available in Debian Stable, Ubuntu 24.04 has version 1.11.0.

Command line

The command line used will change. Only the output type (-O flex) and the config file have to be set.

Old command line (from INSTALL.md):

osm2pgsql -G --hstore --style openstreetmap-carto.style --tag-transform-script openstreetmap-carto.lua -d gis ~/path/to/data.osm.pbf

New command line:

osm2pgsql -O flex --style openstreetmap-carto-flex.lua -d gis ~/path/to/data.osm.pbf

Changes in database layout

The database layout have very little changes. The id columns (osm_id) and geometry columns (way) on all tables will get the NOT NULL flag when using the flex output. These have always been NOT NULL in practice anyway, so this isn't a problem.

Indexes

Currently several custom indexes have to be generated after import, see the indexes.yml and indexes.sql files.

The flex output can be configured to create those indexes. This means we can get rid of some more of the config files and the scripts/indexes.py script. If osm2pgsql is configured to create those indexes it will do so after the import is finished, running several CREATE INDEX commands concurrently (how many depends on command line options).

Open issues:

  • Indexes can currently not be named in the flex config, PostgreSQL will name them with generic name (something like planet_osm_polygon_way_idx3 instead of planet_osm_polygon_way_area_z10. A change for osm2pgsql to allow setting the name is being worked on.
  • Difference to the pgsql output if manual indexes are set: The fillfactor on the "main" geometry index is not set any more. For some background see Making indexes more flexible osm2pgsql-dev/osm2pgsql#1780 .

Question: Do we want to keep the old way of generating indexes or let osm2psqgl handle them? We can also make this optional in some way, having a flag in the config file that will trigger creation of the indexes.

Changes in database content

The content of the resulting tables look the same as before. The only exception is that in some cases rounding for the way_area column is different, so you'll get slightly different values. This should not affect the use in any major way.

Tags named z_order are handled slightly different, but those tags are bogus anyway and this should not have any effect. (I removed all z_order tags from the planet a few days ago now anyway...)

The old setup would allow objects with a layer tag and either no other tags or only tags that are ignored (such as fixme) to show up as database entries with all columns NULL or empty. This is no longer the case.

I have verified that the resulting database is the same by running both old and new configurations side by side on all of the planet data and not seen any differences beyond those described above.

Setting layer column

Most tags are used "as is" in their respective database columns. An exception is the layer which is an integer column. It gets some special treatment in the Lua code. The current code does the same as before, but it doesn't have to.

It would be a small change to use layer 0 instead of NULL when the layer is not set. This would allow the SQL queries to be simplified a little bit: We don't need COALESCE(layer,0) any more which is used in several places.

We'd probably want to keep the SQL code as it is for now, so users are not forced to re-import.

Themepark spport

Themepark is a framework for writing osm2pgsql Lua configs. It allows mixing several configurations so that one database can support several different table layouts and use cases at the same time.

The OSM Carto configuration is written in a way that it can be used with or without the Lua framework. Using it without the framework is just as easy as with the pgsql output before, you just specify the Lua config file on the command line as described above.

If you want to use it with the framework the setup is slightly more involved, but you have the advantage that you can then have tables of different layout in the same database.

Performance

From my measurements performance is about 20% to 25% better than before. I have measured this by importing various planet extracts without the --slim option and without creating all the extra indexes. Because index creation takes a lot of time, numbers will not be as good with --slim and the indexes.

Open Question: Derived styles

Some styles are derived from OSM Carto, such as OSM Carto Germany. How are these affected? What can we do to make life easier for these kind of styles?

@giggls @hholzgra

History

The changes proposed here are based on the efforts started by @pnorman in #4112 (see also the PR #4431). Those efforts have stalled since. One reason, I believe, was that those efforts switched not only from the "pgsql" to the "flex" output, but contained also other changes. That's why this change goes to quite some lengths to keep everything as compatible as possible.

Thank you, Paul, for starting this effort so many years ago. I used your code as a starting point, but there are a lot of changes due to my more limited goal, changes in osm2pgsql since then, and some performance improvements.

joto added a commit to osm2pgsql-dev/osm2pgsql-themepark that referenced this issue Jun 4, 2024
Uses the new flex configuration file currently discussed in
gravitystorm/openstreetmap-carto#4977
@imagico
Copy link
Collaborator

imagico commented Jun 4, 2024

Since it is getting a bit confusing probably with the multiple issues/PRs - we have for using the flex backend:

I am fine with taking the more conservative approach as outlined here - although many of the further reaching changes envisioned in #4112 (in particular the route relations) are highly desirable and there is agreement about them (and these changes are, frankly, the main point for us to move to the flex backend). The primary point of disagreement that remained in #4431 was the introduction of new transport tables.

The main consideration from my side on the whole matter is that we want to use a stable framework for our database import and avoid depending on features that are likely to change in the next versions of osm2pgsql in a way that requires changes.

Regarding indices: I don't have a firm position on this. But we should keep in mind that - while so far the indices are the only custom code we run on the database outside of osm2pgsql this is not necessarily going to stay so. In #4952 we are discussing introducing custom functions and we have in that context also considered using generated columns. OSM-Carto derivatives use other database structures (functions, views, additional tables). And it is not realistic that all of this can be handled by osm2pgsql.

Strategically i would like to see us staying close to the OSM data model in our database layout. This makes it much more strait away for map designers to do styling work under the goals we have and for derivative styles to use an identical/compatible database layout. That would mean decidedly not going the route @SomeoneElseOSM did with moving a lot of style specific logic like tag interpretation into lua code.

@pnorman
Copy link
Collaborator

pnorman commented Jun 6, 2024

I was unable to develop consensus on the change to flex as there was opposition to using any parts of a more modern layout than the historical planet_osm_points/line string/polygon.

Reading the above, I don't see that any of that has changed.

@joto
Copy link
Author

joto commented Jun 6, 2024

@pnorman That's why I restarted this effort in the way I did. The move to flex is needed independently of any change in layout. What I am proposing here basically changes nothing for OSM Carto developers, it just allows Carto to keep up with the changing osm2pgsql. We have to remove the old code in osm2pgsql if we want to keep improving it, so all projects still using the legacy pgsql output have to move to flex.

@imagico
Copy link
Collaborator

imagico commented Jun 6, 2024

As said the only clearly unresolved issue with #4431 was the introduction of new 'transport' tables. We have had a good discussion on this back then IMO but no final conclusion.

Anyway - @joto here suggests to separate the formal move to the flex backend from the database layout changes. While this is in a way a step back from #4431 it would have two advantages:

  • it would separate the technical change in how to invoke osm2pgsql from the database layout changes - which touch, as the discussion in Use flex backend #4431 showed, strategic questions on which achieving consensus can be difficult.
  • it would provide derivative styles with a basis to move to use the flex backend without the need to adjust to a changed database layout. Since there are many OSM-Carto derivatives which are not very actively maintained providing this ability can be of value for many users of legacy styles.

Personally i would have preferred if we could have concluded the discussion in #4431 with a clear consensus on strategy and a new database layout reflecting that. But things are as they are. And going with the approach suggested by @joto does not prevent us from approaching this again afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants