Use fixed polygon clip cache level#892
Conversation
Higher zoom polygon tiles previously reused whichever lower-zoom cached clip happened to be present. In a multithreaded run that can depend on worker scheduling, and clipping an already-clipped polygon is not guaranteed to produce the same result as clipping the original geometry through a different ancestor path. Cache polygon clips at tilemaker's existing z6 object-clustering level and write z0..z6 before z7+. This keeps the monster-polygon cache useful while removing the scheduling-dependent ancestor choice. Linestring cache behavior is unchanged.
|
Thanks! Oddly I get a big slowdown running this on a Europe extract. Current master: With this applied: HP DL380 G7, 144GB RAM, running with |
|
I suspect it's probably because I wasn't testing with a lot of low zoom level fixtures (nor did I include the extra ne polys). Biggest I did was Austria, so wouldn't have had to deal with 0-6 much. This code gates on needing all of 0-6 to be generated before proceeding to higher zoom levels, so the larger fixtures take longer to generate at those low zoom levels and so it spends a lot of time waiting. I'll have a think about how to resolve this. Might be this fix isn't the right way to achieve what I'm trying to achieve (determinism). |
|
So essentially this PR made the cache worthless, which explains the slow down:
I'm investigating another candidate looks only slightly worse than the current cache and preserves determinism. If you'd like me to close this PR and submit a new one, happy to do that, since it will be a significantly different solution. If you're also not happy with a slight performance hit to guarantee determinism, let me know. |
|
I'm going to cancel this PR. The current method doesn't work. |
Use fixed polygon clip cache level
This PR was AI generated.
Summary
This makes multipolygon clip-cache reuse deterministic by caching polygon clips
at tilemaker's existing z6 object-clustering level, then writing z0..z6 tiles
before z7+ tiles.
Higher zoom polygon tiles previously reused whichever lower-zoom cached clip
happened to be present. In a multithreaded run, that ancestor could depend on
worker scheduling. Clipping an already clipped polygon through different
ancestor paths can produce different encoded polygon geometry, even when the
source data and command line are unchanged.
Linestring clip-cache behavior is unchanged.
Background
The existing clip cache is intentional and performance-sensitive. It was added
as part of #612, which extracted
ClipCacheand reduced memory use by lazilyreconstructing way-backed geometries. PR #607 explains the intended cache model:
when writing higher zooms, tilemaker prefers clipping a cached lower-zoom
geometry over clipping the original geometry again. That avoids repeated work
for very large polygons.
This PR keeps that optimization but makes the parent level deterministic for
polygons. Rather than searching z-1, z-2, and so on until any cached ancestor is
found, multipolygons now use the z6 ancestor. z6 is already tilemaker's object
clustering level, so the change follows an existing locality boundary instead
of adding a new one.
Investigation
This was found while investigating generated-tile semantic differences. A
separate fix exposed a repeatability issue where normal multithreaded runs could
emit different polygon coordinates for the same tile. The first observed local
case was:
waterclass=lakerings
between repeated runs
Additional checks:
--threads=1was stableClipCachereads made repeated normalmultithreaded runs semantically match again
sensitivity
expensive on a medium fixture
The narrowed root cause is shared parent-tile clip-cache reuse. A higher zoom
tile could reuse a z10 parent in one run and a z6 parent in another run,
depending on which worker populated the cache first.
Implementation
This PR adds an optional fixed-zoom mode to
ClipCache.For multipolygons only:
min(basezoom, CLUSTER_ZOOM)are available before child tiles are processed
The existing ancestor-search behavior remains available and is still used for
linestrings.
Results
Small fixture / CI
The fix was tested stacked on the currently submitted PR changes in fork CI run
25960612652.After rerunning a known intermittent Windows crash in tile generation, the run
passed native generated-tile verification:
Repeat outputs and cross-runner outputs matched semantically. Raw MBTiles and
PMTiles bytes still differed, but decoded MVT content matched.
The clean, isolated branch was also checked locally against the Liechtenstein
fixture using upstream
resources/config-openmaptiles.jsonandresources/process-openmaptiles.lua.That isolated check is useful, but it also shows why this PR should be read as
one part of the broader determinism work:
masterrepeat output differed in 15 decoded MBTiles layers:1
watergeometry difference and 14poidifferencesbut still showed unrelated
poidifferences because the isolated branch doesnot include the Lua POI-order determinism fix
204 decoded MBTiles layer differences, mostly
landcover,landuse,water, andparkTwo visual checks were prepared from that fixture:
https://www.openstreetmap.org/relation/9404214#map=15/47.16020/9.49300
upstream repeat, instead of the upstream run that added an extra clipped
corner point
https://www.openstreetmap.org/#map=17/47.07210/9.48670
parent reuse
deterministic
Larger local fixture
A local Austria fixture was used for a larger timing check:
resources/config-openmaptiles.jsonandresources/process-openmaptiles.luaResults:
masterOn this fixture, the fixed-z6 branch was about 8.4% faster wall-clock, used
about 5.6% less total CPU time, and used about 2.1% less peak RSS. Treat these
as fixture-specific results rather than a guarantee.
An earlier larger PMTiles repeat test on the same Austria fixture produced
47,742 addressed tiles. The fixed-z6 outputs differed at the raw tile-byte
level, but the semantic verifier checked 43,124 raw-different PMTiles tile
pairs and confirmed matching decoded content.
Planet Estimate
This has not been measured on a planet build.
PR #612 reported a planet build at 67m51s and 42 GB RAM on a 48-core Hetzner
CCX63. If the Austria wall-time ratio transferred directly, that would estimate
roughly 62 minutes for the same planet build. Applying the Austria RSS ratio
directly would estimate roughly 41 GB RAM.
Those numbers are only directional:
The safer expectation is that this should be approximately performance-neutral
to modestly faster on large polygon-heavy outputs, with memory roughly neutral.
Expected Improvement
The main improvement is deterministic decoded tile content for multipolygon
outputs that previously depended on worker scheduling.
This should make repeated tilemaker runs less sensitive to thread timing and
platform scheduling. It also keeps the monster-polygon clip cache rather than
disabling it.
Possible Regressions
This can change polygon geometry compared with previous output because clipping
is not perfectly associative:
can differ slightly from:
Expected visual differences, if any, should be localized to polygon edges,
especially water, landuse, landcover, and building polygons that cross tile or
z6-cluster boundaries.
This does not make generated archives byte-for-byte deterministic. Archive
ordering, compression, and raw tile bytes can still differ while decoded MVT
content matches.
There is also a scheduling tradeoff: z0..z6 tiles now complete before z7+ tiles
are posted. The tested Austria fixture did not show a slowdown, but a workload
with unusually expensive low-zoom/z6 tiles could see reduced parallelism during
that phase.
Related Issues and PRs
ClipCachestructure and explains the memory andconcurrency goals behind the surrounding code.
context.
related to polygon clipping, but it should not be claimed as a fix for Invalid polygons after clipping at tile boundaries #697
without targeted reproduction against that issue's fixture.
No directly matching upstream issue or PR was found for the scheduling-sensitive
deterministic-output bug.
Testing
cmake -S . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfocmake --build build -j225960612652, stacked test branch, native generated-tilesemantic verification passed after rerunning the known intermittent Windows
tile-generation crash
semantic verification for the timing run