Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save unprojected coordinates in node caches and use osmium dense file array #668

Merged
merged 9 commits into from
Jan 9, 2017

Conversation

lonvia
Copy link
Collaborator

@lonvia lonvia commented Jan 4, 2017

This is a DB-breaking change. DBs imported with older code can no longer be updated with this code and there will be no migration to the new format.

Unprojected coordinates in node caches

Reprojection of coordinates into the tile coordinate system is delayed until the point when nodes are retrieved from the middle caches (see in particular the nodes_get_list() functions). This has a few advantages:

The main drawback is that some coordinates now may need to be reprojected multiple times. This is for example the case for nodes where roads cross each other. OTOH unused nodes are no longer converted.

Osmium Dense File Array

The custom flatnode file implementation has been replaces with osmium's dense file array index. Its implementation uses mmap, avoiding custom block management. You can also use osmium-based tools to inspect/update the file outside osm2pgsql.

Furthermore, the dense file array should be thread-safe for reading. The 4th commit in this PR removes the parts where the file is reopened and uses one flatnode store object throughout the import. (fixes #96)

There are a couple of additional formatting changes created by git-clang-format.

Resolves #542.

Moves reprojection until after node locations are retrieved
from the node caches and stores the unprojected coordinates.

This is a DB-breaking change. Reimport required.
No longer required, now that coordinates are cahced unprojected.
Copy link
Collaborator

@pnorman pnorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎆

if(found != pg_nodes.end()) {
std::unordered_map<osmid_t, osmNode>::iterator found =
pg_nodes.find(nds[i].ref());
if (found != pg_nodes.end()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this entire loop be an auto nd:nds?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the code that was just formatted by clang-format. I don't want to touch it at the moment.

}
nodesCacheLookups++;

return coord;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any reformatting has been missed by only doing the diffs, I'd just as soon reformat the entire file. Of course, you might have already thought of this ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that in a separate commit immediately before submitting this PR.

@pnorman
Copy link
Collaborator

pnorman commented Jan 4, 2017

The main drawback is that some coordinates now may need to be reprojected multiple times. This is for example the case for nodes where roads cross each other. OTOH unused nodes are no longer converted

This should be an overall savings by my estimates.

@pnorman
Copy link
Collaborator

pnorman commented Jan 5, 2017

Should we change something about the slim tables to force an error if you try to update old ones? Right now the tables have gone from storing int4 for each coordinate to storing int4 for each coordinate, so the table DDL is the same.

@lonvia
Copy link
Collaborator Author

lonvia commented Jan 5, 2017

Should we change something about the slim tables to force an error if you try to update old ones?

Yes, I would like that but I'm not sure how to do it. How about adding an additional table that holds a schema version? This would allow us to create better error reporting and also to write automatic migrations in the future.

@pnorman
Copy link
Collaborator

pnorman commented Jan 6, 2017

Yes, I would like that but I'm not sure how to do it. How about adding an additional table that holds a schema version? This would allow us to create better error reporting and also to write automatic migrations in the future.

I'd be in favour of this going forwards, but I'm not sure it's the best option for this issue. Looking at the schema, we have the column tags text[] with node tags. We don't need this information, and flat nodes doesn't have it. If we dropped that column it would break compatibility (and save space).

@lonvia
Copy link
Collaborator Author

lonvia commented Jan 6, 2017

I'd be in favour of this going forwards, but I'm not sure it's the best option for this issue. Looking at the schema, we have the column tags text[] with node tags.

It's a bit hacky but good enough for me for the moment. I can add that.

@pnorman
Copy link
Collaborator

pnorman commented Jan 7, 2017

It's a bit hacky but good enough for me for the moment. I can add that.

Well, I'd be in favour of dropping tags from nodes anyways ;)

With that the changes look good to me, and performance is about the same. We should add something to migrations that the slim tables have changed in 0.93.0-dev. Is there anything else we should do pre-merge?

Node tags are still saved in the _nodes table but are never read.
So get rid of the column completely. Also remove prepared get_node
function, which isn't used either.
Invalid location is entirely expected, for example when updating
extract data. Also adds tests for the node store and sligtly
changes clang-format style with respect to templates.
Catch not-found exceptions further out when looping over nodes.
Allows the compiler to optimize the code better.
@lonvia
Copy link
Collaborator Author

lonvia commented Jan 7, 2017

I've added commits for removing the tags column in the nodes table and reformatting node-ram-cache.

I've also had to add a try/catch around the get() function for the dense node cache as unknown ids throw an exception. There is a small penalty attached to that but not enough to block this PR. @joto will look into providing an exception free get() function on the libosmium side at some point.

If rendering results look ok, then this is good to go for me.

@pnorman pnorman merged commit a940c98 into osm2pgsql-dev:master Jan 9, 2017
@joto
Copy link
Collaborator

joto commented Jan 10, 2017

New get_noexcept() functions are now available in master for the index maps. osmcode/libosmium@d353993

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants