-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v3.0 #643
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently, Tilemaker uses member functions for interop: ```lua function node_function(node) node:Layer(...) ``` This PR changes Tilemaker to use global functions: ```lua function node_function() Layer(...) ``` The chief rationale is performance. Every member function call needs to push an extra pointer onto the stack when crossing the Lua/C++ boundary. Kaguya serializes this pointer as a Lua userdata. That means every call into Lua has to malloc some memory, and every call back from Lua has to dereference through this pointer. And there are a lot of calls! For OMT on the GB extract, I counted ~1.4B calls from Lua into C++. A secondary rationale is that a global function is a bit more honest. A user might believe that this is currently permissible: ```lua last_node = nil function node_function(node) if last_node ~= nil -- do something with last_node end -- save the current node for later, for some reason last_node = node ``` But in reality, the OSM objects we pass into Lua don't behave quite like Lua objects. They're backed by OsmLuaProcessing, who will move on, invalidating whatever the user thinks they've got a reference to. This PR has a noticeable decrease in reading time for me, measured on the OMT profile for GB, on a 16-core computer: Before: ``` real 1m28.230s user 19m30.281s sys 0m29.610s ``` After: ``` real 1m21.728s user 17m27.150s sys 0m32.668s ``` The tradeoffs: - anyone with a custom Lua profile will need to update it, although the changes are fairly mechanical - Tilemaker now reserves several functions in the global namespace, causing the potential for conflicts
Cherry-picked from b322166, 5c807a9, 13b3465 and fixed up to work with protozero's data_view structure. Original commit messages below, the timings will vary but the idea is the same: Faster tagmap ===== Building a std::map for tags is somewhat expensive, especially when we know that the number of tags is usually quite small. Instead, use a custom structure that does a crappy-but-fast hash to put the keys/values in one of 16 buckets, then linear search the bucket. For GB, before: ``` real 1m11.507s user 16m49.604s sys 0m17.381s ``` After: ``` real 1m9.557s user 16m28.826s sys 0m17.937s ``` Saving 2 seconds of wall clock and 20 seconds of user time doesn't seem like much, but (a) it's not nothing and (b) having the tags in this format will enable us to thwart some of Lua's defensive copies in a subsequent commit. A note about the hash function: hashing each letter of the string using boost::hash_combine eliminated the time savings. Faster Find()/Holds() ===== We (ab?)use kaguya's parameter serialization machinery. Rather than take a `std::string`, we take a `KnownTagKey` and teach Lua how to convert a Lua string into a `KnownTagKey`. This avoids the need to do a defensive copy of the string when coming from Lua. It provides a modest boost: ``` real 1m8.859s user 16m13.292s sys 0m18.104s ``` Most keys are short enough to fit in the small-string optimization, so this doesn't help us avoid mallocs. An exception is `addr:housenumber`, which, at 16 bytes, exceeds g++'s limit of 15 bytes. It should be possible to also apply a similar trick to the `Attribute(...)` functions, to avoid defensive copies of strings that we've seen as keys or values. avoid malloc for Attribute with long strings ===== After: ``` real 1m8.124s user 16m6.620s sys 0m16.808s ``` Looks like we're solidly into diminishing returns at this point.
On a 48-core machine, I still see lots of lock contention. AttributeStore:add is one place. Add a thread-local cache that can be consulted without taking the shared lock. The intuition here is that there are 1.3B objects, and 40M attribute sets. Thus, on average, an attribute set is reused 32 times. However, average is probably misleading -- the distribution is likely not uniform, e.g. the median attribute set is probably reused 1-2 times, and some exceptional attribute sets (e.g. `natural=tree` are reused thousands of times). For GB on a 16-core machine, this avoids 27M of 36M locks.
On a 48-core machine, this phase currently achieves only 400% CPU usage, I think due to these locks
This PR generalizes the idea of `node_keys`, adds `way_keys`, and fixes #402. I'm not too sure if this is generally useful - it's useful for one of my use cases, and I see someone asking about it in #190 and, elsewhere, in onthegomap/planetiler#99 If you feel it complicates the maintainer story too much, please reject. The goal is to reduce memory usage for users doing thematic extracts by not indexing nodes that are only used by uninteresting ways. For example, North America has ~1.8B nodes, needing 9.7GB of RAM for its node store. By contrast, if your interest is only to build a railway map, you require only ~8M nodes, needing 70MB of RAM. Or, to build a map of national/provincial parks, 12M nodes and ~120MB of RAM. Currently, a user can achieve this by pre-filtering their PBF using osmium-tool. If you know exactly what you want, this is a good long-term solution. But if you're me, flailing about in the OSM data model, it's convenient to be able to tweak something in the Lua script and observe the results without having to re-filter the PBF and update your tilemaker command to use the new PBF. Sample use cases: ```lua -- Building a map without building polygons, ~ excludes ways whose -- only tags are matched by the filter. way_keys = {"~building"} ``` ```lua -- Building a railway map way_keys = {"railway"} ``` ```lua -- Building a map of major roads way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}` ``` Nodes used in ways which are used in relations (as identified by `relation_scan_function`) will always be indexed, regardless of `node_keys` and `way_keys` settings that might exclude them. A concrete example, given a Lua script like: ```lua function way_function() if Find("railway") ~= "" then Layer("lines", false) end end ``` it takes 13GB of RAM and 100 seconds to process North America. If you add: ```lua way_keys = {"railway"} ``` It takes 2GB of RAM and 47 seconds. Notes: 1. This is based on `lua-interop-3`, as it interacts with files that are changed by that. I can rebase against master after lua-interop-3 is merged. 2. The names `node_keys` and `way_keys` are perhaps out of date, as they can now express conditions on the values of tags in addition to their keys. Leaving them as-is is nice, as it's not a breaking change. But if breaking changes are OK, maybe these should be `node_filters` and `way_filters` ? 3. Maybe the value for `node_keys` in the OMT profile should be expressed in terms of a negation, e.g. `node_keys = {"~created_by"}`? This would avoid issues like #337 4. This also adds a SIGUSR1 handler during OSM processing, which prints the ID of the object currently being processed. This is helpful for tracking down slow geometries.
D'oh, I thought this could be handled with fewer cases, but I don't think that's the case. If it's a multipolygon: whether to accept or not is a function of wayKeys filtering. If it's not a multipolygon: whether to accept or not is the result of relation_scan_function.
When I replaced #604 with #626, I botched extracting this part of the code. I had the trait, which taught kaguya how to serialize `PossiblyKnownTagValue`, but I missed updating the parameter type of `Attribute` to actually use it, so it was a no-op. This PR restores the behaviour of avoiding string copies, but now that we have protozero's data_view class, we can use that rather than our own weirdo struct.
Eep, two fixes here as well: - I had rejigged how the skipping of LayerAsCentroid's algorithm argument worked; this rejigging ultimately broke it entirely, as `i` would never get incremented. - If `way_keys` is provided, we are no longer guaranteed that we'll have stored the `label` node of the relation
Previousy, we were excluding the last row and column when searching the bitmap. Most obvious with nodes, as they'd always be skipped.
- apply off-by-one fix to indexing phase (not just querying phase) - make both phases robust against invalid input coordinates
This changes the default to lazy geometries for both in-memory and on-disk. --fast selects materialized geometries when running in memory, and unsharded stores when running on disk.
This changes the default to lazy geometries for both in-memory and on-disk. --fast selects materialized geometries when running in memory, and unsharded stores when running on disk.
Closed
18 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Release candidate for tilemaker 3.0. See #622 for release planning.