-
-
Notifications
You must be signed in to change notification settings - Fork 478
Update options.cpp #893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update options.cpp #893
Conversation
|
Eventually, we should avoid absolute numbers in the help texts and switch to a wording that flat node file is about the size of the planet pbf and cache the size of the imported pbf. |
Did you measure with |
|
I think that depends a bit on the size of the extract and the distribution of node ids: on a file system with 4K block size,12 mio nodes would be theoretical lower bound to still allocate 50GB, assuming equal distribution of node ids. In reality it won't be this bad, though. One way to keep nodes.cache small for a one-time import could be renumbering ids via https://docs.osmcode.org/osmium/latest/osmium-renumber.html. |
|
Sparseness does not matter for the flat node file. Unused nodes are still written out to disk (they are -1 not 0 because 0 is a valid coordinate value). |
|
Getting rid of that special -1 shouldn't be too difficult: calculating the node location as: It could be a starting point to enable sparse files. For sure this would require some support on the libosmium side as well. I don't know if this would actually help, it's just an idea and I didn't test anything. Today, flat nodes are recommended for planet file imports, and here it's quite unlikely to find larger amounts of sparse blocks. For smaller extracts, this option might become more interesting when using sparse files, in case memory is limited and writing a full 50GB flat node file would be prohibitively expensive. In reality though, some extracts like D-A-CH (size: 3.7GB, 390 mio. nodes) have nodes in every single 4K block. In this case, this all becomes a bit futile. |
|
a) is it a problem if non-mentioned nodes go to (0,0) in flat mode during way reconstruction? |
|
When lonvia mentioned "-1" as the invalid coordinate what she meant was the largest positive int 32 value. That can never be a valid coordinate, -1 is valid of course. The real solution here is not to fiddle around with the invalid value, but to find a different encoding of the flat node file for small datasets. Something like the The reason for this is the following: If we change the current format somehow to use the zero byte for the invalid value, we could potentially recover disk space for blocks that are completely empty. But if you have a sizable number of completely empty blocks (and only then this would matter), chances are you have lots of block containing just one location, or two, or three. For them the optimization would not work, so you still use 4k or so for a single or a few locations. So in this case a different format would be much better, even if it would use, say, 10 bytes per location it would still be two orders of magnitude better. |
nodes.cache is 50GB nowadays: