Update options.cpp #893

Komzpa · 2019-01-03T20:56:25Z

nodes.cache is 50GB nowadays:

-rw-r--r--  1 gis gis 49423874128 сту  3 23:53 nodes.cache

lonvia · 2019-01-04T21:01:22Z

Eventually, we should avoid absolute numbers in the help texts and switch to a wording that flat node file is about the size of the planet pbf and cache the size of the imported pbf.

StyXman · 2019-01-05T10:09:52Z

flat node file is about the size of the planet pbf

Did you measure with stat or du? I have the impression that the file would be fairly sparse, specially with partial imports.

mmd-osm · 2019-01-05T17:25:55Z

I think that depends a bit on the size of the extract and the distribution of node ids: on a file system with 4K block size,12 mio nodes would be theoretical lower bound to still allocate 50GB, assuming equal distribution of node ids. In reality it won't be this bad, though.

One way to keep nodes.cache small for a one-time import could be renumbering ids via https://docs.osmcode.org/osmium/latest/osmium-renumber.html.

lonvia · 2019-01-05T20:59:39Z

Sparseness does not matter for the flat node file. Unused nodes are still written out to disk (they are -1 not 0 because 0 is a valid coordinate value).

mmd-osm · 2019-01-06T10:14:25Z

Getting rid of that special -1 shouldn't be too difficult: calculating the node location as: {Node location} bitwise-XOR {undefined location magic number} before writing locations to disk would turn 512 undefined Locations into a file system page with all zeros in them. This way, a zero always represents the undefined value, rather than the max int value.

It could be a starting point to enable sparse files. For sure this would require some support on the libosmium side as well. I don't know if this would actually help, it's just an idea and I didn't test anything.

Today, flat nodes are recommended for planet file imports, and here it's quite unlikely to find larger amounts of sparse blocks. For smaller extracts, this option might become more interesting when using sparse files, in case memory is limited and writing a full 50GB flat node file would be prohibitively expensive.

In reality though, some extracts like D-A-CH (size: 3.7GB, 390 mio. nodes) have nodes in every single 4K block. In this case, this all becomes a bit futile.

Komzpa · 2019-01-10T14:41:50Z

a) is it a problem if non-mentioned nodes go to (0,0) in flat mode during way reconstruction?
b) if it is, can it be worked around by just shifting every (0,0) by 1e-15 so that it's binarily different?
c) isn't -1 a valid coordinate too?

joto · 2019-01-10T16:16:57Z

When lonvia mentioned "-1" as the invalid coordinate what she meant was the largest positive int 32 value. That can never be a valid coordinate, -1 is valid of course.

The real solution here is not to fiddle around with the invalid value, but to find a different encoding of the flat node file for small datasets. Something like the FlexMem index in libosmium, but one that also works on disk. This would be totally doable, it just needs some careful work defining such a format and make sure it re-sizes to the other format when the dataset grows too much.

The reason for this is the following: If we change the current format somehow to use the zero byte for the invalid value, we could potentially recover disk space for blocks that are completely empty. But if you have a sizable number of completely empty blocks (and only then this would matter), chances are you have lots of block containing just one location, or two, or three. For them the optimization would not work, so you still use 4k or so for a single or a few locations. So in this case a different format would be much better, even if it would use, say, 10 bytes per location it would still be two orders of magnitude better.

Update options.cpp

31979fe

lonvia merged commit 45ec5dc into osm2pgsql-dev:master Jan 4, 2019

Komzpa deleted the patch-2 branch January 6, 2019 10:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update options.cpp #893

Update options.cpp #893

Uh oh!

Komzpa commented Jan 3, 2019

Uh oh!

lonvia commented Jan 4, 2019

Uh oh!

StyXman commented Jan 5, 2019

Uh oh!

mmd-osm commented Jan 5, 2019

Uh oh!

lonvia commented Jan 5, 2019

Uh oh!

mmd-osm commented Jan 6, 2019 •

edited

Loading

Uh oh!

Komzpa commented Jan 10, 2019

Uh oh!

joto commented Jan 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Update options.cpp #893

Update options.cpp #893

Uh oh!

Conversation

Komzpa commented Jan 3, 2019

Uh oh!

lonvia commented Jan 4, 2019

Uh oh!

StyXman commented Jan 5, 2019

Uh oh!

mmd-osm commented Jan 5, 2019

Uh oh!

lonvia commented Jan 5, 2019

Uh oh!

mmd-osm commented Jan 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Komzpa commented Jan 10, 2019

Uh oh!

joto commented Jan 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mmd-osm commented Jan 6, 2019 •

edited

Loading