-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter out ways when importing #190
Comments
OK. I made some experiments with Estonian map data as it is not a tiny but not too big. With the shipped OSM process files: With the shipped almost blank process files: With an empty process file which does not include anything: So the node number is always the same but ways seem to reduce. However, it is not zero in the last test. The |
Yes, tilemaker reads all nodes in the .pbf first, then (multipolygon) relations, then ways. It needs to read nodes before ways so that, when processing a way, it has the location of the way available for functions like
I suspect the reason that the OMT and example process files have similar results is because both of them include buildings, which are the biggest contributor to OSM data bloat. You can eliminate unwanted nodes from the .pbf in advance by using tools such as osmfilter/osmconvert or osmium. You can also potentially reduce memory requirements by preprocessing the .pbf using mapsplit. Conceivably we could implement a Another possibility I've considered is adding a runtime switch that states all nodes in the .pbf are consecutively numbered (i.e. 1,2,3,4... without any gaps). You can produce .pbfs like this with |
Thanks for the very detailed answer! I'll take a look at these options. |
Osmium seems to be the easiest option as it can read and write directly |
This PR generalizes the idea of `node_keys`, adds `way_keys`, and fixes systemed#402. I'm not too sure if this is generally useful - it's useful for one of my use cases, and I see someone asking about it in systemed#190 and, elsewhere, in onthegomap/planetiler#99 If you feel it complicates the maintainer story too much, please reject. The goal is to reduce memory usage for users doing thematic extracts by not indexing nodes that are only used by uninteresting ways. For example, North America has ~1.8B nodes, needing 9.7GB of RAM for its node store. By contrast, if your interest is only to build a railway map, you require only ~8M nodes, needing 70MB of RAM. Currently, a user can achieve this by pre-filtering their PBF using osmium-tool. If you know exactly what you want, this is a good long-term solution. But for one-offs and experimenting, it's a bit cumbersome to iterate. Sample use cases: ```lua -- Building a map without building polygons - exclude them way_keys = {"~building"} ``` ```lua -- Building a railway map way_keys = {"railway"} ``` ```lua -- Building a map of major roads way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}` ``` Nodes used in ways which are used in relations (as identified by `relation_scan_function`) will always be indexed, regardless of `node_keys` and `way_keys` settings that might exclude them. Notes: 1. This is based on `lua-interop-3`, as it interacts with files that are changed by that. I can rebase against master after lua-interop-3 is merged. 2. The names `node_keys` and `way_keys` are perhaps out of date, as they can now express conditions on the values of tags in addition to their keys. Leaving them as-is is nice, as it's not a breaking change. But if breaking changes are OK, maybe these should be `node_filters` and `way_filters` ?
This PR generalizes the idea of `node_keys`, adds `way_keys`, and fixes systemed#402. I'm not too sure if this is generally useful - it's useful for one of my use cases, and I see someone asking about it in systemed#190 and, elsewhere, in onthegomap/planetiler#99 If you feel it complicates the maintainer story too much, please reject. The goal is to reduce memory usage for users doing thematic extracts by not indexing nodes that are only used by uninteresting ways. For example, North America has ~1.8B nodes, needing 9.7GB of RAM for its node store. By contrast, if your interest is only to build a railway map, you require only ~8M nodes, needing 70MB of RAM. Or, to build a map of national/provincial parks, 12M nodes and ~120MB of RAM. Currently, a user can achieve this by pre-filtering their PBF using osmium-tool. If you know exactly what you want, this is a good long-term solution. But if you're me, flailing about in the OSM data model, it's convenient to be able to tweak something in the Lua script and observe the results without having to re-filter the PBF and update your tilemaker command to use the new PBF. Sample use cases: ```lua -- Building a map without building polygons, ~ excludes ways whose -- only tags are matched by the filter. way_keys = {"~building"} ``` ```lua -- Building a railway map way_keys = {"railway"} ``` ```lua -- Building a map of major roads way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}` ``` Nodes used in ways which are used in relations (as identified by `relation_scan_function`) will always be indexed, regardless of `node_keys` and `way_keys` settings that might exclude them. A concrete example, given a Lua script like: ```lua function way_function() if Find("railway") ~= "" then Layer("lines", false) end end ``` it takes 13GB of RAM and 100 seconds to process North America. If you add: ```lua way_keys = {"railway"} ``` It takes 2GB of RAM and 47 seconds. Notes: 1. This is based on `lua-interop-3`, as it interacts with files that are changed by that. I can rebase against master after lua-interop-3 is merged. 2. The names `node_keys` and `way_keys` are perhaps out of date, as they can now express conditions on the values of tags in addition to their keys. Leaving them as-is is nice, as it's not a breaking change. But if breaking changes are OK, maybe these should be `node_filters` and `way_filters` ? 3. Maybe the value for `node_keys` in the OMT profile should be expressed in terms of a negation, e.g. `node_keys = {"~created_by"}`? This would avoid issues like systemed#337
This PR generalizes the idea of `node_keys`, adds `way_keys`, and fixes systemed#402. I'm not too sure if this is generally useful - it's useful for one of my use cases, and I see someone asking about it in systemed#190 and, elsewhere, in onthegomap/planetiler#99 If you feel it complicates the maintainer story too much, please reject. The goal is to reduce memory usage for users doing thematic extracts by not indexing nodes that are only used by uninteresting ways. For example, North America has ~1.8B nodes, needing 9.7GB of RAM for its node store. By contrast, if your interest is only to build a railway map, you require only ~8M nodes, needing 70MB of RAM. Or, to build a map of national/provincial parks, 12M nodes and ~120MB of RAM. Currently, a user can achieve this by pre-filtering their PBF using osmium-tool. If you know exactly what you want, this is a good long-term solution. But if you're me, flailing about in the OSM data model, it's convenient to be able to tweak something in the Lua script and observe the results without having to re-filter the PBF and update your tilemaker command to use the new PBF. Sample use cases: ```lua -- Building a map without building polygons, ~ excludes ways whose -- only tags are matched by the filter. way_keys = {"~building"} ``` ```lua -- Building a railway map way_keys = {"railway"} ``` ```lua -- Building a map of major roads way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}` ``` Nodes used in ways which are used in relations (as identified by `relation_scan_function`) will always be indexed, regardless of `node_keys` and `way_keys` settings that might exclude them. A concrete example, given a Lua script like: ```lua function way_function() if Find("railway") ~= "" then Layer("lines", false) end end ``` it takes 13GB of RAM and 100 seconds to process North America. If you add: ```lua way_keys = {"railway"} ``` It takes 2GB of RAM and 47 seconds. Notes: 1. This is based on `lua-interop-3`, as it interacts with files that are changed by that. I can rebase against master after lua-interop-3 is merged. 2. The names `node_keys` and `way_keys` are perhaps out of date, as they can now express conditions on the values of tags in addition to their keys. Leaving them as-is is nice, as it's not a breaking change. But if breaking changes are OK, maybe these should be `node_filters` and `way_filters` ? 3. Maybe the value for `node_keys` in the OMT profile should be expressed in terms of a negation, e.g. `node_keys = {"~created_by"}`? This would avoid issues like systemed#337
This PR generalizes the idea of `node_keys`, adds `way_keys`, and fixes systemed#402. I'm not too sure if this is generally useful - it's useful for one of my use cases, and I see someone asking about it in systemed#190 and, elsewhere, in onthegomap/planetiler#99 If you feel it complicates the maintainer story too much, please reject. The goal is to reduce memory usage for users doing thematic extracts by not indexing nodes that are only used by uninteresting ways. For example, North America has ~1.8B nodes, needing 9.7GB of RAM for its node store. By contrast, if your interest is only to build a railway map, you require only ~8M nodes, needing 70MB of RAM. Or, to build a map of national/provincial parks, 12M nodes and ~120MB of RAM. Currently, a user can achieve this by pre-filtering their PBF using osmium-tool. If you know exactly what you want, this is a good long-term solution. But if you're me, flailing about in the OSM data model, it's convenient to be able to tweak something in the Lua script and observe the results without having to re-filter the PBF and update your tilemaker command to use the new PBF. Sample use cases: ```lua -- Building a map without building polygons, ~ excludes ways whose -- only tags are matched by the filter. way_keys = {"~building"} ``` ```lua -- Building a railway map way_keys = {"railway"} ``` ```lua -- Building a map of major roads way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}` ``` Nodes used in ways which are used in relations (as identified by `relation_scan_function`) will always be indexed, regardless of `node_keys` and `way_keys` settings that might exclude them. A concrete example, given a Lua script like: ```lua function way_function() if Find("railway") ~= "" then Layer("lines", false) end end ``` it takes 13GB of RAM and 100 seconds to process North America. If you add: ```lua way_keys = {"railway"} ``` It takes 2GB of RAM and 47 seconds. Notes: 1. This is based on `lua-interop-3`, as it interacts with files that are changed by that. I can rebase against master after lua-interop-3 is merged. 2. The names `node_keys` and `way_keys` are perhaps out of date, as they can now express conditions on the values of tags in addition to their keys. Leaving them as-is is nice, as it's not a breaking change. But if breaking changes are OK, maybe these should be `node_filters` and `way_filters` ? 3. Maybe the value for `node_keys` in the OMT profile should be expressed in terms of a negation, e.g. `node_keys = {"~created_by"}`? This would avoid issues like systemed#337 4. This also adds a SIGUSR1 handler during OSM processing, which prints the ID of the object currently being processed. This is helpful for tracking down slow geometries.
If I have understood correctly, while reading the pbf file
tilemaker
will store all the ways to the memory. Based on the number of ways to be found, the memory consumption can be high.What if I needed only, let's say motorways or land usage information, would it be possible to have some kind of filter Lua function to skip those ways which are not relevant? Does it already work like that for nodes with the
node_keys
list?I think this would reduce the memory consumption quite a lot if the whole information is not needed but are there some drawbacks or something that I didn't understood correctly about the import process?
The text was updated successfully, but these errors were encountered: