Problem with query to get changed parent objects #2056

joto · 2023-08-25T14:13:20Z

Version 1.9.0 changed the way we find parents of changed objects, i.e. ways that contain a changed node and relations that contain a changed node or way. We used to query the database for each changed object, which was rather wasteful. The new code does this check once for all changed nodes and once for all changed ways.

I had tested this with various changesets (including huge ones you would never have in real life) and never saw any problems. The performance was always better or at least not worse then before. For large changesets the performance could be an order of magnitude better!

Now we are seeing problems with this approach on one of the OSMF Nominatim servers. Strangely enough this only happens on one server, the others don't have this problem. We could narrow this down to the server choosing the wrong query plan involving a full table scan on the ways table. But it seems this only happens sometimes and is dependent on various PostgreSQL settings and also on the number of node ids we want to find the parents for. So the whole thing seems rather brittle and even if we get this fixed today for this server, we don't know what will happen tomorrow.

How do we fix this?

This is the function we are talking about here

Ideas:

The DISTINCT clause seems to influence the planner sometimes. We can do that later. Code here: https://github.com/openstreetmap/osm2pgsql/compare/master...joto:osm2pgsql:find-parent-deps?expand=1
Instead of asking for all nodes, we can limit the number of node ids and ask for them in a loop. But it is unclear how small to make this number. Going back to 1 node per query and we'll get the old behaviour...
Instead of running that one query we could call a stored procedure that does the query in a loop for all node ids. For the database the query would look like in older osm2pgsql versions, one node id at a time. So there is less chance for the query planner to mess up. But we still have the advantage of having fewer osm2pgsql<->database roundtrips.
We can play around with database settings. Disabling materialization seems to have an influence on that query. But that risks making a brittle query even more brittle.
Do we need more configurable knobs here? Like how many node ids to work on per round? This would give the user at least a chance to fix something, if the query goes bad in their environment. On the other hand, this makes usage much harder, basically we would shift the burden to the user.

@pnorman any ideas?

The text was updated successfully, but these errors were encountered:

pnorman · 2023-08-25T17:57:10Z

Do you have an EXPLAIN ANALYZE of the good and bad plan?

lonvia · 2023-08-25T20:22:59Z

An EXPLAIN ANALYSE is naturally not possible short of waiting for a couple of hours.

Bad plan:

nominatim=# EXPLAIN                
WITH changed_buckets AS (
      SELECT array_agg(id) AS node_ids, id >> 5 AS bucket
      FROM (SELECT osm_id as id FROM place LIMIT 1000) as x GROUP BY id >> 5)
      SELECT DISTINCT w.id
      FROM "public"."planet_osm_ways" w, changed_buckets b
      WHERE w.nodes && b.node_ids
      AND planet_osm_index_bucket(w.nodes) && (ARRAY[b.bucket])::bigint[];
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=46.34..55985610375.59 rows=20752577 width=8)
   ->  Nested Loop  (cost=46.34..55985558494.15 rows=20752577 width=8)
         Join Filter: ((w.nodes && (array_agg(x.id))) AND (planet_osm_index_bucket(w.nodes) && ARRAY[((x.id >> 5))]))
         ->  Index Scan using planet_osm_ways_pkey on planet_osm_ways w  (cost=0.57..472413790.47 rows=965236160 width=108)
         ->  Materialize  (cost=45.77..52.22 rows=215 width=40)
               ->  HashAggregate  (cost=45.77..48.99 rows=215 width=40)
                     Group Key: (x.id >> 5)
                     ->  Subquery Scan on x  (cost=0.57..40.77 rows=1000 width=16)
                           ->  Limit  (cost=0.57..28.27 rows=1000 width=8)
                                 ->  Index Only Scan using place_id_idx on place  (cost=0.57..9085193.42 rows=328039168 width=8)
(10 rows)

Good plan:

nominatim=# EXPLAIN                                            
WITH changed_buckets AS (
      SELECT array_agg(id) AS node_ids, id >> 5 AS bucket
      FROM (SELECT osm_id as id FROM place LIMIT 1000) as x GROUP BY id >> 5)
      SELECT DISTINCT w.id
      FROM "public"."planet_osm_ways" w, changed_buckets b
      WHERE w.nodes && b.node_ids
      AND planet_osm_index_bucket(w.nodes) && (ARRAY[b.bucket])::bigint[];
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=59521649340.32..59521753103.21 rows=20752577 width=8)
   ->  Sort  (cost=59521649340.32..59521701221.77 rows=20752577 width=8)
         Sort Key: w.id
         ->  Nested Loop  (cost=72489.62..59518154417.96 rows=20752577 width=8)
               ->  HashAggregate  (cost=45.77..48.99 rows=215 width=40)
                     Group Key: (x.id >> 5)
                     ->  Subquery Scan on x  (cost=0.57..40.77 rows=1000 width=16)
                           ->  Limit  (cost=0.57..28.27 rows=1000 width=8)
                                 ->  Index Only Scan using place_id_idx on place  (cost=0.57..9085193.42 rows=328039168 width=8)
               ->  Bitmap Heap Scan on planet_osm_ways w  (cost=72443.86..276827659.72 rows=96524 width=108)
                     Recheck Cond: (planet_osm_index_bucket(nodes) && ARRAY[((x.id >> 5))])
                     Filter: (nodes && (array_agg(x.id)))
                     ->  Bitmap Index Scan on planet_osm_ways_nodes_bucket_idx  (cost=0.00..72419.73 rows=9652362 width=0)
                           Index Cond: (planet_osm_index_bucket(nodes) && ARRAY[((x.id >> 5))])
(14 rows)

pnorman · 2023-08-25T22:21:16Z

Bad plan: https://explain.depesz.com/s/S6gG
Good plan: https://explain.depesz.com/s/jm3H

On the server getting the bad plan, can you force it to the good plan by overwriting SET enable_* variables?

Can you get an EXPLAIN ANALYZE for just the good plan?

The bad plan is materializing the CTE. The documentation for 14 and 15 states

However, if a WITH query is non-recursive and side-effect-free (that is, it is a SELECT containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the WITH query just once, but not if it references the WITH query more than once.

I guess it can still materialize a CTE just like it can materialize a query plan node if it's required for the plan.

Both plans are estimating 20 million rows. This sounds very off, as we're doing batches of 1000, and indicates the problem is likely statistics-related.

The HashAggregate node statistics look reasonable as it is <1000. Reality is probably higher than 215 if places are randomly selected.

The estimate of 9.6 million rows for the good plan's bitmap index scan seems high.

I would expect PostgreSQL is already tracking the univariate statistics for planet_osm_index_bucket(nodes). In case it's not, can you try CREATE STATISTICS planet_osm_ways_bucket_stats ( planet_osm_index_bucket(nodes) ) FROM planet_osm_ways and see if that changes the behavior?

lonvia · 2023-08-26T12:40:34Z

Sorry if that wasn't clear but the underlying issue is indeed the bad statistics for GIN indexes on array columns. That's a well known and long-standing issue. Last time it did bite us was when JIT and parallel indexing was introduced (#1045).

Problem is that our tables have now reached a size where Postgres resorts to sequential scans instead of using the index. This is bad because forcing Postgres to use an index is much more difficult than preventing it from using an index or JIT or parallel processing. The ideas above are about how to best do that without breaking things in a slightly different setup.

Tried CREATE STATISTICS but it made no difference. Maybe I did something wrong because it only took a few seconds to create it. I would have expected to take longer.

enable_material = off helps and is the workaround I'm using currently to get updates through on that machine. It looks like it slows down other queries of the update process. Updates are still very slow but at least not at a stand-still. So not a permanent solution.

rustprooflabs · 2023-08-26T13:43:54Z

Can the query be updated to WITH changed_buckets AS NOT MATERIALIZED (...)? Might help encourage the good plan without fiddling with enable_material when that might benefit other queries.

lonvia · 2023-08-26T14:42:17Z

I tried with NOT MATERIALIZED and it didn't change what the query planner was doing. So it looks like at least in this negative form it is advisory only.

The query we had before would sometimes result in really bad performance when the query planner in PostgreQL decides that it needs to use sequential scans on the ways table. This is due to the statistics being way off. By running the query for each node ids on its own we hopefully trick the query planner into always using an index lookup. See osm2pgsql-dev#2056

joto · 2023-09-04T12:18:11Z

Workaround is merged. Closing here.

…ions (#1) * Generalizer: Don't run more threads than there are tiles * Store -O/--output setting in properties and use for append * Add missing header file * Refactor db_target_descr_t: Always need schema on construction We now always construct this fully with schema, name, and id column. This way we make sure all data is there and the schema is not defaulted. * Make db_target_descr_t into a real class * Store -S/--style setting in properties and use for append * Add new command line parameter --schema that sets the default schema This will be used as default for --middle-schema, --output-pgsql-schema, and for the different ways of setting a schema in the flex output and generalizer code. This removes the last places where the schema was hardcoded to "public" (except as a default for this command line parameter and in some legacy gazetteer code). * Fix: Remove assert for the id column This is actually not required for COPY operation, only for DELETEs. The flex output can create tables without id. * add --schema option to osm2pgsql-replication * update manpage of osm2pgsql-replication * Update documentation of schema options in man pages * prepare release 1.9.0 * use r"" strings for regular expressions * bdd: make them work with psycopg3 * avoid get_dsn_parameters which no longer exists in psycopg3 Fixes osm2pgsql-dev#2040. * Fix architecture-dependent double to integer conversion Fixes osm2pgsql-dev#2041 * CI: run some builds with psycopg3 * Some small code cleanups * Make m_task_result private * Don't use unchecked array subscript Makes clang-tidy happy if we use at() and it doesn't cost us much here. * Use std::array instead of C array * Pass points as values, other geoms as reference to add_geometry() Points are small and not movable. Removes a complaint from clang-tidy about pessimizing std::move. * prepare release 1.9.1 * Get output_requirements struct its own include file * Suppress clang-tidy error on main functions The main functions now catch all exceptions, but there is still the possibility that the logging of the error message throws again. Not much we can do about this. * Avoid narrowing conversion * Refactor tests triggering a moved-from-warning Clang-tidy doesn't understand that only one SECTION is run per test so it thinks data is used after move. This removes some of the SECTIONs which don't really do things relevant to the tests here. And one test is split into two so we don't need the SECTION any more. * Github action: Test with PostgreSQL 15 instead of 14 * Use colon syntax consistently when calling Lua functions on objects * Revert "Refactor code that gets a mutex and calls into Lua code" This reverts commit 4dca255. The select_relation_members() reads data from the Lua stack after calling call_lua_function(). It is important that this is still protected by the mutex. * Add comment in the hope that we don't make this error again * Disable the warning that cache is disabled in append mode In append mode we don't necessarily have the flat node setting from the command line, because it is read from the properties file. So this would give us an unnecessary and wrong warning. This is just a band aid though, a better fix would be to check after we have read the properties file. But that needs some larger changes I'll leave for when we refactor the command line parsing code. * Generalization: Allow list of SQL commands in run_sql() command The run_sql() command now can have either a single SQL statement in the `sql` field or a list of SQL commands. The new `transaction` field can be set to `true` to wrap the SQL commands in BEGIN/COMMIT. * Refactor: Build a list of (templated) queries, then run them * Move DISTINCT from queries generating list to when we read the data We had some problems with the query planner getting confused with those queries. Lets make them as simple as possible to not confuse it unnecessarily. This might be vodoo, but it doesn't cost much, just a little bit of temporary space in the database, because when reading we do the ORDER BY anyway, so the extra DISTINCT isn't more cost. * Put query for finding parent ways from nodes inside a PL/pgSQL function The query we had before would sometimes result in really bad performance when the query planner in PostgreQL decides that it needs to use sequential scans on the ways table. This is due to the statistics being way off. By running the query for each node ids on its own we hopefully trick the query planner into always using an index lookup. See osm2pgsql-dev#2056 * Generalizer: Show some more information in log level info Now we can see SQL queries and generalizers run with some timing info already in log level info. * Clean tests: Remove 'name' field from expire which is never used This is a left-over from earlier use of the 'name' field. * Actually add geometries in the test Instead of just NULL geometries to make them a bit more realistic. * Faster hex decoder * Better implementation for low-level bit-fiddling in tracer code * Generalization: Allow setting condition for running SQL commands If a condition is set with "if_has_rows", the SQL command in "if_has_rows" will be run and only if that returns any rows, the main sql command is run. This check happens only in "append" mode, in "create" mode the command is always run. To be used with something like "SELECT 1 FROM some_table LIMIT 1" to check whether the table has any rows (for instance an expire table). * prepare release 1.9.2 * Check in CMake config that we are building for 64bit architecture For a long time we haven't officially supported 32bit architectures but not really disabled them. But this is giving us more problems lately and we can't test on 32 bit systems anyway, so this now gives us an up front check. * Add runtime warnings that area type and add_row() are deprecated Only one message about add_row() is shown. * Rename "docs" directory to "man" And updates various docs accordingly. For a while we haven't kept any docs in there other than the man pages. The new name better reflects what this directory is all about. * Github actions: Update checkout to v4 in workflow files * Make sure we are always testing with an empty grid in --append tests * Test that no maxzoom setting works on you'll get expire on zoom 0 * Generalization: Wrap upload of data in transaction So we don't use so many transaction ids. * Generalizer: Log tile being processed also in multithreaded mode Not only in singlethreaded more. * Generalizer: Simplify tracer code, use local instead of member vars * Generalizer bugfix: Set delete_existing earlier In append mode we need to delete existing tiles. We need to set this parameter earlier, so the generalizer is initialized correctly. * Check values found in expire tables (must be number in valid range) * Add first/last timestamps to expire tables This adds two timestamp (with timezone) columns to all expire tables called "first" and "last". When an entry is added, both are set to the current (transaction) timestamp. When an entry already exists a new insert will result in the "last" timestamp being updated. Having these two timestamps allows various expire/updating strategies, for instance: * update oldest entry * update entry that didn't change for the longest time * update older entries but only if there are no recent changes, which indicates that there might be more changes coming For backwards compatibility the code detects which table format is used and falls back to the old behaviour if the timestamp columns aren't there. * Github actions: Test with newer PostgreSQL versions (up to 16) * Avoid copy of vector in get_tiles_from_table() function Use out parameter with ptr instead. * Generalization: Switch from using CImg to OpenCV library. Processing is an order of magnitude faster. * Require libopencv-dev on Debian systems This package contains the CMake files needed to build osm2pgsql. * Refactor: Modernize copy functions using std::string_view * Refactor: Modernize code, use nested namespace definition * Refactor: Modernize get_count and require_has_table Use string_view parameters * Github action: Use clang 15 for C++20 test * Use tags = NULL in middle tables if object doesn't have any tags This doesn't make much of a difference for the ways and rels table, but if we store all nodes in the database, it does make a huge difference, because most nodes don't have any tags. For a current planet, disk usage for the nodes table goes from 476 GB down to 409 GB saving 67 GB or nearly 15%. Additionally it makes use of that table simpler. If you want to do any queries on tags, you need an index on the tags column on the nodes/ways/rels tables like this: CREATE INDEX ON planet_osm_ways USING gin (tags); But that is wasteful, because of the empty tags. We probably want to generate them as CREATE INDEX ON planet_osm_ways USING gin (tags) WHERE tags != '{}'::jsonb; But now all queries on those tables have to include that extra condition so that the query planner will use the index. SELECT * FROM planet_osm_ways WHERE tags ? 'highway' AND tags != '{}'::jsonb; If we use NULLs, the index can be created as: CREATE INDEX ON planet_osm_ways USING gin (tags) WHERE tags IS NOT NULL; And now the query becomes simpler, because the NOT NULL is automatically taken into account by the query planner: SELECT * FROM planet_osm_ways WHERE tags ? 'highway'; Note that this is an incompatible change to the new format middle tables, but they are still marked as experimental, so we can do this. * Allow NULL values in new middle format members list This makes osm2pgsql a bit more future proof by allowing the list of members (which is encoded as JSON in the new middle format) to be empty, i.e. to contain NULL. We currently don't write empty member lists as NULL but as an empty JSON list, but if we change this in the future, older versions of osm2pgsql will be able to read this correctly. * Update included fmt lib to 10.1.0 * Update included libosmium to version 2.20.0 * Increase the max number of jobs for osm2pgsql-gen to 256 We have a limit of 256 set in line 688 already. Fixes osm2pgsql-dev#2086 * Github action: Remove OpenCV lib from windows build It is only needed for building osm2pgsql-gen but that also needs potrace which is not installed, because it isn't available as vcpkg. * Github action: Disable build on macOS 11 Because it doesn't work any more. Github actions output shows this message: Error: You are using macOS 11. We (and Apple) do not provide support for this old version. It is expected behaviour that some formulae will fail to build in this old version. It is expected behaviour that Homebrew will be buggy and slow. Do not create any issues about this on Homebrew's GitHub repositories. Do not create any issues even if you think this message is unrelated. Any opened issues will be immediately closed without response. Do not ask for help from Homebrew or its maintainers on social media. You may ask for help in Homebrew's discussions but are unlikely to receive a response. Try to figure out the problem yourself and submit a fix as a pull request. We will review it but may or may not accept it. * Add proj.db to Windows artifact See osm2pgsql-dev#1949 * Make CMake quiet if opencv ist not found * prepare release 1.10.0 * fix: generate expire-tiles for non mercator projections * ci: temp * fix: generate expire-tiles output --------- Co-authored-by: Jochen Topf <jochen@topf.org> Co-authored-by: Sarah Hoffmann <lonvia@denofr.de> Co-authored-by: Paul Norman <penorman@mac.com>

joto mentioned this issue Aug 29, 2023

Use roundabout way to find changed ways #2059

Merged

joto closed this as completed Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with query to get changed parent objects #2056

Problem with query to get changed parent objects #2056

joto commented Aug 25, 2023

pnorman commented Aug 25, 2023

lonvia commented Aug 25, 2023

pnorman commented Aug 25, 2023

lonvia commented Aug 26, 2023

rustprooflabs commented Aug 26, 2023

lonvia commented Aug 26, 2023

joto commented Sep 4, 2023

Problem with query to get changed parent objects #2056

Problem with query to get changed parent objects #2056

Comments

joto commented Aug 25, 2023

pnorman commented Aug 25, 2023

lonvia commented Aug 25, 2023

pnorman commented Aug 25, 2023

lonvia commented Aug 26, 2023

rustprooflabs commented Aug 26, 2023

lonvia commented Aug 26, 2023

joto commented Sep 4, 2023