Save unprojected coordinates in node caches and use osmium dense file array #668

lonvia · 2017-01-04T23:02:08Z

This is a DB-breaking change. DBs imported with older code can no longer be updated with this code and there will be no migration to the new format.

Unprojected coordinates in node caches

Reprojection of coordinates into the tile coordinate system is delayed until the point when nodes are retrieved from the middle caches (see in particular the nodes_get_list() functions). This has a few advantages:

OSM coordinates can be saved directly, no need for fix-point magic on reprojected coordinates, which avoids a few rounding errors.
no more custom scale factor needed for custom projections (fixes Broken geometries with -E 4326 projection. #121)
no need for floating-point implementation of caches (supersedes Remove unused floating-point based node storage code #357)
avoid expensive floating point conversation until last minute

The main drawback is that some coordinates now may need to be reprojected multiple times. This is for example the case for nodes where roads cross each other. OTOH unused nodes are no longer converted.

Osmium Dense File Array

The custom flatnode file implementation has been replaces with osmium's dense file array index. Its implementation uses mmap, avoiding custom block management. You can also use osmium-based tools to inspect/update the file outside osm2pgsql.

Furthermore, the dense file array should be thread-safe for reading. The 4th commit in this PR removes the parts where the file is reopened and uses one flatnode store object throughout the import. (fixes #96)

There are a couple of additional formatting changes created by git-clang-format.

Resolves #542.

Moves reprojection until after node locations are retrieved from the node caches and stores the unprojected coordinates. This is a DB-breaking change. Reimport required.

No longer required, now that coordinates are cahced unprojected.

pnorman

🎆

pnorman · 2017-01-04T23:47:28Z

middle-pgsql.cpp

-            if(found != pg_nodes.end()) {
+            std::unordered_map<osmid_t, osmNode>::iterator found =
+                pg_nodes.find(nds[i].ref());
+            if (found != pg_nodes.end()) {


could this entire loop be an auto nd:nds?

This is part of the code that was just formatted by clang-format. I don't want to touch it at the moment.

pnorman · 2017-01-04T23:50:53Z

node-ram-cache.cpp

+    }
+    nodesCacheLookups++;
+
+    return coord;
 }


If any reformatting has been missed by only doing the diffs, I'd just as soon reformat the entire file. Of course, you might have already thought of this ;)

I can do that in a separate commit immediately before submitting this PR.

pnorman · 2017-01-04T23:58:58Z

The main drawback is that some coordinates now may need to be reprojected multiple times. This is for example the case for nodes where roads cross each other. OTOH unused nodes are no longer converted

This should be an overall savings by my estimates.

pnorman · 2017-01-05T00:05:23Z

Should we change something about the slim tables to force an error if you try to update old ones? Right now the tables have gone from storing int4 for each coordinate to storing int4 for each coordinate, so the table DDL is the same.

lonvia · 2017-01-05T20:13:20Z

Should we change something about the slim tables to force an error if you try to update old ones?

Yes, I would like that but I'm not sure how to do it. How about adding an additional table that holds a schema version? This would allow us to create better error reporting and also to write automatic migrations in the future.

pnorman · 2017-01-06T02:30:03Z

Yes, I would like that but I'm not sure how to do it. How about adding an additional table that holds a schema version? This would allow us to create better error reporting and also to write automatic migrations in the future.

I'd be in favour of this going forwards, but I'm not sure it's the best option for this issue. Looking at the schema, we have the column tags text[] with node tags. We don't need this information, and flat nodes doesn't have it. If we dropped that column it would break compatibility (and save space).

lonvia · 2017-01-06T20:47:11Z

I'd be in favour of this going forwards, but I'm not sure it's the best option for this issue. Looking at the schema, we have the column tags text[] with node tags.

It's a bit hacky but good enough for me for the moment. I can add that.

pnorman · 2017-01-07T00:10:32Z

It's a bit hacky but good enough for me for the moment. I can add that.

Well, I'd be in favour of dropping tags from nodes anyways ;)

With that the changes look good to me, and performance is about the same. We should add something to migrations that the slim tables have changed in 0.93.0-dev. Is there anything else we should do pre-merge?

Node tags are still saved in the _nodes table but are never read. So get rid of the column completely. Also remove prepared get_node function, which isn't used either.

Invalid location is entirely expected, for example when updating extract data. Also adds tests for the node store and sligtly changes clang-format style with respect to templates.

Catch not-found exceptions further out when looping over nodes. Allows the compiler to optimize the code better.

lonvia · 2017-01-07T18:46:07Z

I've added commits for removing the tags column in the nodes table and reformatting node-ram-cache.

I've also had to add a try/catch around the get() function for the dense node cache as unknown ids throw an exception. There is a small penalty attached to that but not enough to block this PR. @joto will look into providing an exception free get() function on the libosmium side at some point.

If rendering results look ok, then this is good to go for me.

joto · 2017-01-10T21:01:30Z

New get_noexcept() functions are now available in master for the index maps. osmcode/libosmium@d353993

lonvia added 4 commits January 3, 2017 23:33

store unprojected coordinates in node caches

456f197

Moves reprojection until after node locations are retrieved from the node caches and stores the unprojected coordinates. This is a DB-breaking change. Reimport required.

remove scale option

b9330ad

No longer required, now that coordinates are cahced unprojected.

replace flatnode store with osmium dense file array

d9951a5

allow concurrent reads on flatnode file

52e941f

pnorman reviewed Jan 4, 2017

View reviewed changes

use iterator instead of indexing when getting node list

1b52e50

This was referenced Jan 6, 2017

Unnecessary flat-nodes with enough RAM #22

Closed

Upcoming major version changes #669

Closed

lonvia added 4 commits January 7, 2017 09:51

remove tags column from _nodes table

7a081d4

Node tags are still saved in the _nodes table but are never read. So get rid of the column completely. Also remove prepared get_node function, which isn't used either.

clang-format of node-ram-cache

8f41f12

persistent store: catch invalid location exception

8bbef6a

Invalid location is entirely expected, for example when updating extract data. Also adds tests for the node store and sligtly changes clang-format style with respect to templates.

persistent cache: move try-catch further out

a940c98

Catch not-found exceptions further out when looping over nodes. Allows the compiler to optimize the code better.

pnorman merged commit a940c98 into osm2pgsql-dev:master Jan 9, 2017

pnorman mentioned this pull request Jan 9, 2017

Remove unused floating-point based node storage code #357

Closed

lonvia deleted the middle-unprojected-coordinates branch January 15, 2017 16:56

lonvia mentioned this pull request Jan 15, 2017

Storage efficiency reporting #195

Closed

talllguy mentioned this pull request Jan 31, 2017

Tiles at z17 not rendering properly openstreetmap/openstreetmap-website#1420

Closed

pnorman mentioned this pull request Mar 3, 2017

Use libosmium for geometry building #684

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save unprojected coordinates in node caches and use osmium dense file array #668

Save unprojected coordinates in node caches and use osmium dense file array #668

lonvia commented Jan 4, 2017 •

edited

Loading

pnorman left a comment

pnorman Jan 4, 2017

lonvia Jan 5, 2017

pnorman Jan 4, 2017

lonvia Jan 5, 2017

pnorman commented Jan 4, 2017

pnorman commented Jan 5, 2017

lonvia commented Jan 5, 2017

pnorman commented Jan 6, 2017

lonvia commented Jan 6, 2017

pnorman commented Jan 7, 2017

lonvia commented Jan 7, 2017

joto commented Jan 10, 2017

Save unprojected coordinates in node caches and use osmium dense file array #668

Save unprojected coordinates in node caches and use osmium dense file array #668

Conversation

lonvia commented Jan 4, 2017 • edited Loading

Unprojected coordinates in node caches

Osmium Dense File Array

pnorman left a comment

Choose a reason for hiding this comment

pnorman Jan 4, 2017

Choose a reason for hiding this comment

lonvia Jan 5, 2017

Choose a reason for hiding this comment

pnorman Jan 4, 2017

Choose a reason for hiding this comment

lonvia Jan 5, 2017

Choose a reason for hiding this comment

pnorman commented Jan 4, 2017

pnorman commented Jan 5, 2017

lonvia commented Jan 5, 2017

pnorman commented Jan 6, 2017

lonvia commented Jan 6, 2017

pnorman commented Jan 7, 2017

lonvia commented Jan 7, 2017

joto commented Jan 10, 2017

lonvia commented Jan 4, 2017 •

edited

Loading