-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clipping monster polygons #606
Conversation
Progress on systemed#605. The insight is that very large relations are expensive to clip, and we can often avoid clipping them at higher zooms. If a polygon clips to the full extent of a lower zoom tile, then it will do so for all higher zoom tiles. This PR adds a `largePolygons` map that tracks this. Any polygon that covers at least a z9 tile will be tracked, and short-circuited at higher zooms. It seems to speed up rendering z6/16/18 (Hudson Bay) dramatically. Neighboring tiles still have very high CPU, will dig into that next.
...I think? This is the case where the size of the multipolygon is 0. This happened for Hudson Bay when it overlapped with land. A possible risk: are there valid cases where boost fails to produce a polygon at a lower zoom level and so returns 0, but would succeed on a higher zoom? Processing the Hudson Bay extract [1]: master: real 22m27.753s user 336m50.481s sys 0m25.314s this branch: real 3m46.838s user 45m33.959s sys 0m12.161s takes 3m46sec. [1]: https://extract.bbbike.org/?lang=en;sw_lng=-99.269;sw_lat=53.617;ne_lng=-65.244;ne_lat=64.616;format=osm.pbf;city=a%20big%20bay;email=cldellow%40gmail.com;as=76.54103151056417;pg=0.8698404751901266;coords=;oi=1;layers=B000T;submit=;expire=
Ah, I suppose another cause of this (and a more likely one?) is that we're just outside of the polygon, not in one of its holes. I think this is possible because we use a bounding box when deciding which tiles the polygon affects. I shouldn't be trusted with geometry stuff. |
Impressive speed-up! Big, complex multipolygons are the root of all evil (or at least, most CPU time). Clipping/simplifying them, while still producing a valid geometry out the other end, is one of the hardest things tilemaker has to do. As you've seen, in #479 (based on #323) we moved to using a spatial index (rtree) for large polygons, which meant we don't have to store references to OutputObjects in every single z14 tile covered by the polygon. Antarctica is pretty much the canonical case for this. The rtree simply stores the bbox of the large polygon, however. This leads to situations like #323 (comment) where we start pulling in a lot of tiles that aren't actually affected by the geometry. My instinct (and nothing more than that!) is that this is essentially the same issue as here. So it might make sense to integrate this into the LargeObjects code - i.e. extend the |
Oops, I was thinking z9 was the base zoom, so it didn't make sense to track tiles there--but z9 is the base zoom only for this bitset, not for the actual map. New timings: real 1m51.573s user 16m31.069s sys 0m9.744s This suggests that maybe it's worthwhile to track a secondary hierarchy from z10..z13, too.
Thanks for the pointer to those tickets! Yes, if you squint, this PR seems in the vein of what you mention in #479 (comment):
The r-tree currently answers the question "does this object possibly output to a given tile?" I guess this would widen the r-tree's capacities to "does this object definitely output to a given tile?" and, even further "when it does output to a given tile, does it cover it completely?" Doing the pruning earlier is attractive. This also looks like it'd be in code near I'd want to measure the memory impact -- this PR doesn't yet try to do something intelligent with memory, but it should be straight-forward to turn this PR's maps into a least-recently-used cache so that bitmaps from past z6 tiles get evicted naturally. The large object index, by contrast, is global and long-lived. I've put this PR as draft while I investigate. I also noticed two other things:
|
Unfortunately, the bug turned out to be a critical part of the speed. :) With it fixed, the new runtime is 11m5s. This is still a ~50% decrease, but not what I was hoping for. Some additional analysis shows that the frequency of a tile being entirely excluded or covered is rarer than I initially thought -- it looks like it's maybe 15-25% for Hudson Bay and James Bay. I suspect a better savings would be to also keep the unsimplified, uncombined clip from lower zooms around, and have higher zooms clip against them rather the raw polygon. I'll see if I can prototype that.
This replaces systemed#606 Rather than dealing with the special case of fully covered or fully excluded, we keep a cache of recently clipped geometries. A higher zoom will look in the cache for a clipped geometry of the object from a lower zoom and re-use it if possible, falling back to the whole geometry only if no cache entry is found. Hudson Bay, master: 22m27s this PR: 1m28s This relies on the assumption that clipping a clipped polygon gives the same result as clipping the original polygon. That seems reasonable to me, but if you know of edge cases to consider, let me know.
Closing in favour of #607 |
* clipping monster polygons, take 2 This replaces #606 Rather than dealing with the special case of fully covered or fully excluded, we keep a cache of recently clipped geometries. A higher zoom will look in the cache for a clipped geometry of the object from a lower zoom and re-use it if possible, falling back to the whole geometry only if no cache entry is found. Hudson Bay, master: 22m27s this PR: 1m28s This relies on the assumption that clipping a clipped polygon gives the same result as clipping the original polygon. That seems reasonable to me, but if you know of edge cases to consider, let me know. * more robustness against invalid Antarctica nodes This is another fix in the vein of 066ca0a Testing Antarctica with this PR: 9m5s The generated mbtiles doesn't seem to be convertible to a pmtiles file -- `pmtiles convert` fails with: ``` main.go:162: Failed to convert /home/cldellow/src/basemap/tiles.mbtiles, Missing row ``` I've never generated Antarctica with the previous code, so I'm not sure if this is specific to this PR or not. My intuition is this is not specific to this PR. Wild guess is that we're creating tiles with no features, due to https://github.com/systemed/tilemaker/blob/d470dc94fea6ae74e948c1a858c6e1228c9f4bf9/src/tile_worker.cpp#L210, and pmtiles doesn't like that.
This implements some ideas to mitigate the impact of clipping monster multipolygons.
In particular, for any multipolygon that is fully clipped out (i.e. we're in its hole) or clips to the full extent of a tile at z6..z8, we'll track the z9 tiles that it would fully cover or be excluded from.
Later, when processing a tile at z7 or higher, we'll check this tracking information to short-circuit doing an expensive clip operation.
Processing the Hudson Bay extract 1:
master:
this branch:
I think the general idea here is sound...but I also think I don't know much about geometry or how it interacts with vector tiles, so a critical eye & mind would be extra appreciated.
Eyeballing the resulting tiles, it seems to look OK. This is not very rigorous, though. :) I wonder if there is tooling that will diff an mbtiles or pmtiles file? In theory, the files ought to be equivalent.