Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planet_osm_ways_idx partial index comes out amazingly bloated #105

Closed
pnorman opened this issue Dec 5, 2013 · 6 comments
Closed

planet_osm_ways_idx partial index comes out amazingly bloated #105

pnorman opened this issue Dec 5, 2013 · 6 comments

Comments

@pnorman
Copy link
Collaborator

pnorman commented Dec 5, 2013

"planet_osm_rels_idx" is the btree (id) WHERE pending index.

On a fresh import this comes out as 3634 MB. A reindex brings this down to 8 kB. This is obviously absurdly bloated. We should either reindex at the end, or defer index creation until after pending ways if possible.

@smsm1
Copy link

smsm1 commented Dec 5, 2013

8kB is suspiciously small

@pnorman
Copy link
Collaborator Author

pnorman commented Dec 5, 2013

8kB is suspiciously small

There are no pending ways after an import is completed. You only get them during updates.

@apmon
Copy link
Contributor

apmon commented Dec 5, 2013

Yes, the index is needed during diff imports. At the end of each run of osm2pgsql there shouldn't be any pending ways left though. So 8kb sounds about right at then end of a osm2pgsql run.

The index is also used during the "going over pending ways" stage of the initial import, for which there is a very large number of pending ways. But given that, it might be better to only create that index after the end of pending ways and rely on a seq scan for the pending ways query during initial import.

@pnorman
Copy link
Collaborator Author

pnorman commented Dec 5, 2013

If you're going over more than 1-5% of the table, a seq scan will generally be used over an index scan.

It sounds like deferring its creation until the end is the best option, and of course not creating it if we're doing --drop

@pnorman
Copy link
Collaborator Author

pnorman commented Dec 16, 2013

Full stats, just for reference

version               2
tree_level            3
index_size            8485593088
root_block_no         116816
internal_pages        3733
leaf_pages            1032105
empty_pages           0
deleted_pages         0
avg_leaf_density      62.91
leaf_fragmentation    46.18

@sletuffe
Copy link

I'm unsure if that is related of if I should open a new issue, but that might be related to the behaviour I reported in 03/2012 here : https://lists.openstreetmap.org/pipermail/dev/2012-March/024589.html

summary :
After a fresh planet import the index on the "pending" field isn't used and lead
to a seq scan of the planet_osm_ways table, increasing noticeably the diff
import.
Running manually the query : "select id from planet_osm_way where pending" returns 0 elements but still takes a few tenth of seconds

To solve the problem post import, I did a re-index (reindex index planet_osm_ways_idx;) and the same query now run in less than 10ms

I did a planet import with osm2pgsql version from commit Sat Oct 19 20:31:04 2013 -0600 (Increment version to 0.85.0 to development version)
and can still see this behaviour.

If osm2pgsql arguments might matter, I can provide them

pnorman added a commit to pnorman/osm2pgsql that referenced this issue Oct 25, 2014
Also increments the version to 0.87.0

Closes osm2pgsql-dev#187

Fixes osm2pgsql-dev#156
Fixes osm2pgsql-dev#186
Fixes osm2pgsql-dev#105
Fixes osm2pgsql-dev#111
@pnorman pnorman closed this as completed Oct 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants