Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planet_osm_ways coming out bloated #111

Closed
pnorman opened this issue Dec 17, 2013 · 6 comments
Closed

planet_osm_ways coming out bloated #111

pnorman opened this issue Dec 17, 2013 · 6 comments

Comments

@pnorman
Copy link
Collaborator

pnorman commented Dec 17, 2013

pgstattuple is a PostgreSQL extension that allows you to get various tuple-level statistics for a table.

This allows me to see, for example, that the planet_osm_ways table comes out of the import substantially larger than ideal.

gis=# SELECT pg_size_pretty(table_len) AS table_len, tuple_count, pg_size_pretty(tuple_len) AS tuple_len, tuple_percent, dead_tuple_count, pg_size_pretty(dead_tuple_len) AS dead_tuple_len, dead_tuple_percent, pg_size_pretty(free_space) AS free_space, free_percent FROM pgstattuple('planet_osm_polygon');

-[ RECORD 1 ]------+----------
table_len          | 66 GB
tuple_count        | 196145030
tuple_len          | 39 GB
tuple_percent      | 59.77
dead_tuple_count   | 217468
dead_tuple_len     | 55 MB
dead_tuple_percent | 0.08
free_space         | 24 GB
free_percent       | 36.94

What this means is that the table is currently 66GB on disk, but after a VACUUM FULL or CLUSTER would be 39GB

@pnorman
Copy link
Collaborator Author

pnorman commented Dec 17, 2013

I'm wondering if deferring the btree (id) WHERE pending index as proposed in #105 would allow the use of HOT tuples

@jeffjanes
Copy link
Contributor

Deferring the index doesn't help, because there still needs to space for both old a new version of the tuples. Space from the old tuples can be reused sooner under HOT, but it still can't be reused within the same transaction--and everything here happens in the same transaction.

@pnorman
Copy link
Collaborator Author

pnorman commented Dec 19, 2013

I'm wondering, do we need to store pending in the database at all? Isn't pending=true only found when importing or within an update transaction?

If so, could we cache pending status in memory? There's only ~250 million ways, which at 1 bit each would take about 30 megs of RAM to cache. We'd save a disk write because we'd no longer write with pending=true then update to pending=false. We'd lose the smaller partial index, but we're not currently getting much benefit from that thanks to #105, and even with that fixed we could end up ahead on IO anyways.

@jeffjanes
Copy link
Contributor

Saving them in RAM sounds like a good idea. They are all being stored in RAM eventually anyway, because the first thing it does when iterating over the pending ways is read the entire set of pending ids from the database into memory upfront, and I doubt it is using the most efficient representation to do so.

@apmon
Copy link
Contributor

apmon commented Dec 21, 2013

One thing you might loose is robustness to crashes. If osm2pgsql crashes or gets terminated in the going over pending ways stage, you loose the ability to resume osm2pgsql and fix-up what the previous osm2pgsql instance hasn't completed. On the otherhand, given one likely has to re-apply the entire diff anyway if osm2pgsql crashes, perhaps that isn't as important.

@pnorman
Copy link
Collaborator Author

pnorman commented Dec 21, 2013

One thing you might loose is robustness to crashes. If osm2pgsql crashes or gets terminated in the going over pending ways stage, you loose the ability to resume osm2pgsql and fix-up what the previous osm2pgsql instance hasn't completed.

Isn't the work done by the previous instance in transaction anyway?

pnorman added a commit to pnorman/osm2pgsql that referenced this issue Oct 25, 2014
Also increments the version to 0.87.0

Closes osm2pgsql-dev#187

Fixes osm2pgsql-dev#156
Fixes osm2pgsql-dev#186
Fixes osm2pgsql-dev#105
Fixes osm2pgsql-dev#111
@pnorman pnorman closed this as completed Oct 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants