New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planet_osm_ways coming out bloated #111
Comments
I'm wondering if deferring the |
Deferring the index doesn't help, because there still needs to space for both old a new version of the tuples. Space from the old tuples can be reused sooner under HOT, but it still can't be reused within the same transaction--and everything here happens in the same transaction. |
I'm wondering, do we need to store pending in the database at all? Isn't If so, could we cache pending status in memory? There's only ~250 million ways, which at 1 bit each would take about 30 megs of RAM to cache. We'd save a disk write because we'd no longer write with |
Saving them in RAM sounds like a good idea. They are all being stored in RAM eventually anyway, because the first thing it does when iterating over the pending ways is read the entire set of pending ids from the database into memory upfront, and I doubt it is using the most efficient representation to do so. |
One thing you might loose is robustness to crashes. If osm2pgsql crashes or gets terminated in the going over pending ways stage, you loose the ability to resume osm2pgsql and fix-up what the previous osm2pgsql instance hasn't completed. On the otherhand, given one likely has to re-apply the entire diff anyway if osm2pgsql crashes, perhaps that isn't as important. |
Isn't the work done by the previous instance in transaction anyway? |
Also increments the version to 0.87.0 Closes osm2pgsql-dev#187 Fixes osm2pgsql-dev#156 Fixes osm2pgsql-dev#186 Fixes osm2pgsql-dev#105 Fixes osm2pgsql-dev#111
pgstattuple
is a PostgreSQL extension that allows you to get various tuple-level statistics for a table.This allows me to see, for example, that the
planet_osm_ways
table comes out of the import substantially larger than ideal.What this means is that the table is currently 66GB on disk, but after a
VACUUM FULL
orCLUSTER
would be 39GBThe text was updated successfully, but these errors were encountered: