planet_osm_ways coming out bloated #111

pnorman · 2013-12-17T11:34:30Z

pgstattuple is a PostgreSQL extension that allows you to get various tuple-level statistics for a table.

This allows me to see, for example, that the planet_osm_ways table comes out of the import substantially larger than ideal.

gis=# SELECT pg_size_pretty(table_len) AS table_len, tuple_count, pg_size_pretty(tuple_len) AS tuple_len, tuple_percent, dead_tuple_count, pg_size_pretty(dead_tuple_len) AS dead_tuple_len, dead_tuple_percent, pg_size_pretty(free_space) AS free_space, free_percent FROM pgstattuple('planet_osm_polygon');

-[ RECORD 1 ]------+----------
table_len          | 66 GB
tuple_count        | 196145030
tuple_len          | 39 GB
tuple_percent      | 59.77
dead_tuple_count   | 217468
dead_tuple_len     | 55 MB
dead_tuple_percent | 0.08
free_space         | 24 GB
free_percent       | 36.94

What this means is that the table is currently 66GB on disk, but after a VACUUM FULL or CLUSTER would be 39GB

The text was updated successfully, but these errors were encountered:

pnorman · 2013-12-17T11:49:10Z

I'm wondering if deferring the btree (id) WHERE pending index as proposed in #105 would allow the use of HOT tuples

jeffjanes · 2013-12-19T00:10:31Z

Deferring the index doesn't help, because there still needs to space for both old a new version of the tuples. Space from the old tuples can be reused sooner under HOT, but it still can't be reused within the same transaction--and everything here happens in the same transaction.

pnorman · 2013-12-19T02:51:25Z

I'm wondering, do we need to store pending in the database at all? Isn't pending=true only found when importing or within an update transaction?

If so, could we cache pending status in memory? There's only ~250 million ways, which at 1 bit each would take about 30 megs of RAM to cache. We'd save a disk write because we'd no longer write with pending=true then update to pending=false. We'd lose the smaller partial index, but we're not currently getting much benefit from that thanks to #105, and even with that fixed we could end up ahead on IO anyways.

jeffjanes · 2013-12-21T09:20:25Z

Saving them in RAM sounds like a good idea. They are all being stored in RAM eventually anyway, because the first thing it does when iterating over the pending ways is read the entire set of pending ids from the database into memory upfront, and I doubt it is using the most efficient representation to do so.

apmon · 2013-12-21T11:06:33Z

One thing you might loose is robustness to crashes. If osm2pgsql crashes or gets terminated in the going over pending ways stage, you loose the ability to resume osm2pgsql and fix-up what the previous osm2pgsql instance hasn't completed. On the otherhand, given one likely has to re-apply the entire diff anyway if osm2pgsql crashes, perhaps that isn't as important.

pnorman · 2013-12-21T22:00:58Z

One thing you might loose is robustness to crashes. If osm2pgsql crashes or gets terminated in the going over pending ways stage, you loose the ability to resume osm2pgsql and fix-up what the previous osm2pgsql instance hasn't completed.

Isn't the work done by the previous instance in transaction anyway?

Also increments the version to 0.87.0 Closes osm2pgsql-dev#187 Fixes osm2pgsql-dev#156 Fixes osm2pgsql-dev#186 Fixes osm2pgsql-dev#105 Fixes osm2pgsql-dev#111

pnorman mentioned this issue Sep 29, 2014

Conversion to C++ #187

Closed

pnorman added a commit to pnorman/osm2pgsql that referenced this issue Oct 25, 2014

Merge C++ branch

739d422

Also increments the version to 0.87.0 Closes osm2pgsql-dev#187 Fixes osm2pgsql-dev#156 Fixes osm2pgsql-dev#186 Fixes osm2pgsql-dev#105 Fixes osm2pgsql-dev#111

pnorman closed this as completed Oct 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planet_osm_ways coming out bloated #111

planet_osm_ways coming out bloated #111

pnorman commented Dec 17, 2013

pnorman commented Dec 17, 2013

jeffjanes commented Dec 19, 2013

pnorman commented Dec 19, 2013

jeffjanes commented Dec 21, 2013

apmon commented Dec 21, 2013

pnorman commented Dec 21, 2013

planet_osm_ways coming out bloated #111

planet_osm_ways coming out bloated #111

Comments

pnorman commented Dec 17, 2013

pnorman commented Dec 17, 2013

jeffjanes commented Dec 19, 2013

pnorman commented Dec 19, 2013

jeffjanes commented Dec 21, 2013

apmon commented Dec 21, 2013

pnorman commented Dec 21, 2013