Skip to content

Optional strict-AOI cell mask (issue #101)#111

Draft
espg wants to merge 14 commits into
mainfrom
claude/101-aoi-mask
Draft

Optional strict-AOI cell mask (issue #101)#111
espg wants to merge 14 commits into
mainfrom
claude/101-aoi-mask

Conversation

@espg

@espg espg commented Jun 27, 2026

Copy link
Copy Markdown
Member

Closes #101

What this does

Adds an optional, default-OFF per-cell boolean aoi_mask aligned to the output cell grid — True where the cell falls inside the AOI polygon. This is "package, don't clip": no observation is dropped, and a flag-OFF run is byte-identical to today. The mask is metadata a client uses to filter to a strict-AOI view, recovering the order-independent in-AOI subset (the #100 overhang finding). Refs #100.

The mask is computed at the shard-map build stage via native morton (HEALPix) / shapely cell-center test (rectilinear) — not via lat/lon-center decode — and carried per shard in the shard-map JSON. Each worker (local and AWS Lambda) expands its shard's payload to a cell-order boolean over the children() it already enumerates and writes it as one more coordinate-shaped column, only when the flag is on.

Follows the locked decisions in the issue plan (comment 4804178324) and @espg's confirmations (comments 4814504937 and 4834768382): native-morton MOC at the shard-map stage, HEALPix + rectilinear as one feature/two engines, mortie>=0.8.2, manifest carrier = the shard-map JSON, rect carrier = precomputed packed bool, Lambda threading done in this PR.

Approach

  • HEALPix enginemorton_coverage_moc(lats, lons, order=child_order) builds a compact AOI MOC; per shard it is intersected with the shard's coarse word via moc_and (a coarse interior MOC word spans many shards, so clip2order-equality would miss them — moc_and keeps coarse coverage and trims boundary detail). The worker expands the per-shard sub-MOC with moc_to_order(..., child_order) and tests children() membership — already in cell/storage order, no decode.
  • Rectilinear engine — shapely cell-center contains after reprojecting the AOI polygon to the grid CRS. The WGS84 ring is densified before to_crs (odc.geo to_crs resolution densification, resolution="auto" — the same mechanism shard_footprint already uses) so AOI edges follow the geodesic, not straight chords, in polar/large-extent CRS (answers Q3). shapely/odc-geo are already core deps.
  • Configoutput.aoi_mask (default false), validated in validate_config, read via get_aoi_mask(config). The bool aoi_mask array is declared in each grid's _spec() only when on (off-runs gain no array).
  • ManifestShardMap gains an optional aoi_mask list parallel to shard_keys (HEALPix = compact MOC ints; rect = in-AOI cell ids), carried in to_json/from_json only when present.
  • Worker/runnerprocess_shard gains an aoi_payload kwarg; the local runner threads each shard's payload (omitting the kwarg entirely when off) and the Lambda runner forwards it through the per-cell event (aoi_payload key, present only when on) → deployment/aws/lambda_handler.pyprocess_shard. Both backends keep flag-off byte-identity.
  • mortie pin bumped >=0.8.1>=0.8.2 (ships the order-29 MOC cap, moc xor + domain-bounded not espg/mortie#59 + Temporal aggregation infrastructure (Refs #12) #70). The mask code asserts the resolved mortie is ≥0.8.2 at use (via packaging.version). The lambda extra does not pin mortie (it rides core deps), and deployment/aws/build_layer.sh installs mortie unpinned — so no infra change is needed.

Phases

  • Phase 1 — mortie pin bump + HEALPix MOC mask (shard-map MOC builder + per-shard moc_and restriction + per-children expansion over children()) + tests.
  • Phase 2 — rectilinear engine (shapely cell-center contains, reprojected to grid CRS) + tests.
  • Phase 3 — config (output.aoi_mask + validation + get_aoi_mask) + schema (bool aoi_mask in _spec() when on) + manifest carrier (to_json/from_json) + worker expand/write + local runner wiring + tests, incl. flag-off byte-identity and runner kwarg-threading.
  • Phase 4 — docs (docs/aoi_mask.md, added to nav) + a self-contained, data-free Binder-launchable notebook (notebooks/aoi_mask.ipynb).
  • Phase 5 — AWS Lambda backend AOI threading (per-shard payload through the Lambda event → deployment/aws/lambda_handler.pyprocess_shard), removing the prior flag-on NotImplementedError refusal. Tests cover the flag-on Lambda threading + flag-off event byte-identity (authorized by @espg, comment 4834768382).
  • Phase 6 — document the boundary-inclusivity asymmetry explicitly in docs/aoi_mask.md (HEALPix MOC overlap/coverage = inclusive at the edge; rect strict center contains = exclusive), behavior kept as-is per @espg's directive (2).
  • Phase 7 — answer the rect-densification question (3): densify the rect AOI ring before to_crs via the existing odc.geo densification (resolution="auto"); confirmed rect-specific (HEALPix tessellates on the sphere, never reprojects a polygon). Test + doc.

Known limitations / deferred

How tested

uv run pytest -q (935 passed, 10 skipped locally), ruff check --select=E,F,W,I --ignore=E501 src tests clean, ruff format --check clean on touched files. Coverage in tests/test_aoi_mask.py, tests/test_shardmap.py (TestBuildAOIMask), tests/test_runner.py, tests/test_lambda_handler.py:

  • HEALPix / rectilinear mask correctness as before (per-shard MOC membership; cell-center contains; independent pyproj-reprojected-back oracle; version gate).
  • Phase 5 (Lambda): _run_lambda threads each shard's aoi_payload into _invoke_lambda_cell; the event gains an aoi_payload key only when on (flag-off event byte-identical); lambda_handler._handle_process forwards event.get("aoi_payload") into process_shard (flag-on threading + flag-off None no-op).
  • Phase 7 (densify): test_aoi_ring_is_densified_before_reproject pins the vertex-count growth and area delta vs the chord baseline in EPSG:3413.
  • flag-off store byte-identity preserved end-to-end on both backends.

Self-review findings folded across phase commits (latest: reworded the "same primitive" densify claim to "same mechanism, resolution="auto"", dropped a redundant test assert).

Questions for review

(1) moc_and primitive — ✅ confirmed by @espg (comment 4834768382). No change.

(2) Boundary-inclusivity asymmetry — ✅ resolved per @espg: documented explicitly in docs/aoi_mask.md, behavior kept as-is.

(3) Rect densification resolution — implemented with resolution="auto" (extent-adaptive, parameter-free, dependency-free; ~100 pts/side). If you'd rather pin a guaranteed edge-slop bound, the densify resolution could be exposed as a config knob — say the word and I'll wire it. Otherwise "auto" stands.

(4) WKB/WKT AOI inputdeferred, blocked on espg/mortie#89 (not merged; pinned mortie 0.8.2 has no WKB/WKT API). Rolling it in here would require an unreleased mortie pin, which I won't do. Two options for your call: (a) leave it deferred to a follow-up PR once espg/mortie#89 merges and is released (this PR lands the Lambda + densify + boundary-doc work now), or (b) hold this PR until espg/mortie#89 lands and add WKB/WKT here. I've taken (a) by default — this PR is otherwise complete.

(5) Module size (non-blocking)src/zagg/runner.py is now ~1180 lines (was 1172 before this PR; this PR adds ~8 net), over CLAUDE.md §4's ~1000-line soft limit. Pre-existing, not introduced here; flagging per §4 rather than splitting it in this PR.

@espg espg added the implement label Jun 27, 2026

@espg espg left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Phase-1 adversarial review against issue #101 acceptance criteria and CLAUDE.md. The HEALPix MOC core is sound: moc_and(aoi_moc, [shard_word]) is the right primitive for per-shard restriction of a mixed-order MOC (your Q1 — confirmed, since clip2order-equality would miss shards under a coarse interior word), uint64 dtype is held throughout, and empty-MOC short-circuits are handled in both healpix_shard_moc and healpix_mask_for_children. Findings below are diff-scoped; the inline comments carry the specifics. Nothing here blocks advancing to phase 2 — they're tightenings plus one latent version-parsing edge worth deciding on now.

Summary of the inline points:

  1. _assert_mortie_version accepts pre-release 0.8.2.devN strings as ≥ 0.8.2, but a setuptools-scm 0.8.2.devN is post-0.8.1, pre-0.8.2 — i.e. the cap-lift may be absent yet the assert passes.
  2. test_mask_matches_moc_to_order_membership computes shards/assign only to feed a tautological assert len(shards) — dead per §4.
  3. test_edge_shard_is_mixed is geometry-dependent (non-hermetic) and its clip2order(parent_order, aoi_moc) enumeration silently can't refine coarse interior words to parent_order — fine for finding boundary shards, but worth a comment so the limitation is explicit.

No vague praise; the rest of phase 1 looks correct for its stated scope.


Generated by Claude Code

@espg espg left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Inline findings for phase 1 of #101 (HEALPix MOC mask). Details on each comment; nothing here blocks phase 2.


Generated by Claude Code

Comment thread src/zagg/grids/aoi.py Outdated
Comment on lines +49 to +59
parts: list[int] = []
for token in str(raw).split(".")[:3]:
digits = ""
for ch in token:
if ch.isdigit():
digits += ch
else:
break
parts.append(int(digits) if digits else 0)
while len(parts) < 3:
parts.append(0)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Latent version-gate hole: this parser strips non-digits per token, so a pre-release string like "0.8.2.dev3+g<sha>" parses to (0, 8, 2) and passes the >= (0, 8, 2) gate. But under setuptools-scm / hatch-vcs (which this repo uses for its own version, §7), 0.8.2.devN is the development version leading up to the 0.8.2 tag — i.e. it is built from a commit after 0.8.1 but before 0.8.2 is tagged, so the order-29 cap-lift (espg/mortie#59 + #70) may not yet be present. The assert would then pass against a build that still has the order-18 cap, which is exactly the silent-mis-sizing failure mode this function exists to prevent.

If mortie also uses scm-style local/dev versions, consider treating a .devN suffix on the target release as below it (drop dev builds of 0.8.2 itself), or pin the test to cover a "0.8.2.dev1" input so the intended behavior is asserted either way. At minimum the docstring should state that 0.8.2.devN is accepted, since that's a deliberate-looking but non-obvious choice.


Generated by Claude Code

Comment thread tests/test_aoi_mask.py Outdated
Comment on lines +39 to +50
shards = np.unique(grid.shards_of(grid.assign(np.array([15.0]), np.array([15.0]))))
shard_key = int(grid.shards_of(grid.cells_of(flat[:1]))[0])
children = grid.children(shard_key)
shard_moc = grid.aoi_shard_moc(aoi_moc, shard_key)
mask = grid.aoi_mask_for_children(shard_moc, children)

expected = np.isin(np.asarray(children, dtype=np.uint64), flat)
assert mask.dtype == bool
assert mask.shape == np.asarray(children).shape
np.testing.assert_array_equal(mask, expected)
assert mask.any() # the chosen shard is in-AOI
assert len(shards) # sanity: assign resolved

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

shards (line 39) is computed only to feed assert len(shards) (line 50, # sanity: assign resolved). len(shards) is >= 1 for any non-empty input, so this is a tautology that can never fail — it asserts nothing about the AOI mask. shard_key is derived independently from flat[:1], so the assign/shards_of round-trip on (15, 15) is dead per §4 (no dead code / no speculative scaffolding). Drop both lines; the load-bearing assertion is the assert_array_equal(mask, expected) above.

Separately, the real-but-weak assertion in this test is assert mask.any() (line ~48): the oracle expected = np.isin(children, flat) IS a sound independent cross-check (whole-AOI flat vs. the per-shard moc_and path, over children that all belong to the shard), but because shard_key is flat[:1]'s shard it could be a single-cell boundary shard, so .any() is the only liveness guard. Consider also asserting the per-shard moc_and actually restricted something (e.g. that flat has cells outside this shard, so the test would catch a moc_and that silently returned the whole AOI).


Generated by Claude Code

Comment thread tests/test_aoi_mask.py
Comment on lines +82 to +95
# Find a shard touched by the AOI that is NOT fully contained: scan the
# MOC's coarsened parents and pick one whose mask is partial.
from mortie import clip2order

parents = np.unique(np.asarray(clip2order(grid.parent_order, aoi_moc)))
found_mixed = False
for p in parents.tolist():
children = grid.children(int(p))
shard_moc = grid.aoi_shard_moc(aoi_moc, int(p))
mask = grid.aoi_mask_for_children(shard_moc, children)
if mask.any() and not mask.all():
found_mixed = True
break
assert found_mixed, "expected at least one boundary shard with a mixed mask"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Two reliability notes on the edge-shard test:

  1. Non-hermetic / geometry-dependent. The test only passes if the (0,0)–(30,30) box at parent_order=5, child_order=9 produces at least one parent with a partial mask. That's very likely true for a 30°×30° box, but the test is silently coupled to mortie's HEALPix tiling — a future mortie change to cell layout could make every touched parent fully in/out and flip this to a confusing failure. A box whose corner is engineered to sit mid-cell, or asserting on a specifically-constructed boundary shard, would make "mixed exists" structural rather than incidental.

  2. clip2order(grid.parent_order, aoi_moc) (line 86) can't enumerate interior shards. aoi_moc is mixed-order; coarse interior MOC words encode an order below parent_order=5, and per the HEALPIX_REF_ORDER invariant clip2order "cannot refine past its input order" — so those coarse words won't expand into the parent_order shards beneath them. The scanned parents set therefore omits the deep interior (which is fine here, since you want boundary shards, which are fine-order ≥ parent_order). Worth a one-line comment that this enumeration is intentionally boundary-biased, so a later reader doesn't mistake it for a complete shard list and reuse it as one.


Generated by Claude Code

@espg espg left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Phase-2 adversarial review (rectilinear AOI-mask engine, issue #101). Scope: commit c6bda6a only — cell_centers / aoi_polygon / aoi_mask_for_children in rectilinear.py and TestRectilinearMask in test_aoi_mask.py.

Core math is correct. cell_centers (rectilinear.py:401-407) is the exact inverse of coverage/assignx = xmin + (col+0.5)*res_x, y = ymax - (row+0.5)*res_y, with row = v//width, col = v%width — and matches the dimensional-coord centers at L457-458. res_x vs res_y are applied to the correct axes; no off-by-half, no axis swap, ymax-based y is right. The reprojection reuses the same to_crs(self._geobox.crs) path coverage uses, as the docstring claims. Methods are terse, no dead code, no silenced lint — CLAUDE.md-clean.

Inline findings (none block phase 3):

  1. Boundary inclusivity asymmetry (rectilinear.py:421-431) — strict prep().contains excludes on-boundary centers, while the HEALPix MOC path includes overlap/boundary cells. Cross-backend semantic divergence for the same mask; recommend an @espg decision / explicit doc.
  2. Non-densified reprojection edge accuracy (rectilinear.py:410-420)to_crs with no resolution= makes the AOI ring straight chords, not geodesics, in polar/large-extent CRS; cell-center membership near edges can be wrong. Consistent with coverage, but this mask is the strict-edge deliverable, so it matters most here.
  3. Stale test comment (test_aoi_mask.py:141) — "20×20 m grid, centers (5,15,…)" is off 100×; grid is 1000 m res, centers 500/1500/….
  4. Weak oracle (test_aoi_mask.py:149-166)test_mask_matches_center_contains recomputes with the same contains predicate (only prepared-vs-unprepared differs), so it validates wiring/ordering and the reprojection path, not the containment result independently.
  5. All-true test misses transposition (test_aoi_mask.py:169-178) — symmetric total-coverage box + only 3 of 25 shards means a row/col swap in cell_centers wouldn't be caught; assert the exact pattern in the partial case instead.

Items 1 and 2 are the substantive ones (semantics + accuracy); 3-5 are test/comment hardening.


Generated by Claude Code

Comment on lines +421 to +431

def aoi_mask_for_children(self, aoi_geom, children) -> np.ndarray:
"""Boolean over ``children`` — ``True`` where the cell center is in the AOI.

``aoi_geom`` is the reprojected polygon (:meth:`aoi_polygon`); ``children``
the chunk's row-major flat cell ids (``children``). Cell centers are tested
with a prepared-geometry ``contains``.
"""
from zagg.grids.aoi import rectilinear_mask_for_centers

xs, ys = self.cell_centers(children)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Boundary-inclusivity asymmetry vs the HEALPix path — worth an @espg call. aoi_mask_for_children delegates to a shapely prep().contains test, which is strict: a cell whose center lands exactly on the AOI boundary is excluded. The HEALPix engine (healpix_mask_for_children, phase 1) instead tests MOC membership built from morton_coverage_moc, which is overlap/coverage-based and includes boundary and partially-covered cells. So the same conceptual "cell in AOI" mask has two different inclusivity rules depending on the grid backend.

For a center-in-polygon test, strict contains matches issue #101's "keep-whole-cell-if-center-in" wording, so the rect choice is defensible in isolation. But the cross-backend divergence is a real semantic inconsistency a client filtering with this mask would observe. Either (a) document the asymmetry explicitly, or (b) use covers/an intersects-on-boundary variant for parity with the HEALPix MOC path. Recommend surfacing this as a "Questions for review" item rather than silently shipping divergent semantics.


Generated by Claude Code

Comment on lines +410 to +420

def aoi_polygon(self, polygon_parts):
"""Reproject the AOI ``[(lats, lons), ...]`` parts to the grid CRS.

Built once at the shard-map stage (same ``to_crs`` reprojection
:meth:`coverage` uses); the per-shard boolean (:meth:`aoi_mask_for_children`)
is precomputed against it and carried in the manifest.
"""
from zagg.grids.aoi import rectilinear_aoi_polygon

return rectilinear_aoi_polygon(polygon_parts, self._geobox.crs)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Reprojection accuracy at AOI edges. aoi_polygonrectilinear_aoi_polygon reprojects the WGS84 ring with geom.to_crs(crs) and no densification (no resolution= arg). For a lat/lon box reprojected into a polar/large-extent grid CRS (e.g. EPSG:3413, as the test uses), the polygon edges become straight chords between only the corner vertices rather than following the true curved geodesics, so cell-center membership can be wrong for cells near the AOI edges by up to the chord-vs-arc deviation.

This is "consistent with how coverage reprojects" as the docstring claims (coverage at L311 also calls to_crs without densify), so it is not a regression — but note chunk_geometry (L383) does densify when going the other direction (to_crs("EPSG:4326", resolution=densify)), so the precedent within this file is mixed. Since the mask is the strict deliverable of #101 (the whole point is edge accuracy), the chord approximation is most consequential exactly here. Worth either densifying the AOI ring before to_crs or flagging the edge-slop bound under "Questions for review."


Generated by Claude Code

Comment thread tests/test_aoi_mask.py Outdated
def _grid(self):
from zagg.grids import RectilinearGrid

# A 20x20 m grid in a metres CRS; cell centers fall on (5,15,...)+10k offsets.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Stale/incorrect comment. This says "A 20x20 m grid ... cell centers fall on (5,15,...)+10k offsets", but the grid is resolution=1000.0 over bounds=[0,0,20000,20000] — i.e. a 20×20 cell grid at 1000 m resolution, with centers at 500, 1500, 2500, … (confirmed: cell_centers(children(0))[0] == [500, 1500, 2500, 3500]). The "20×20 m" and "(5,15,…)" figures are off by 100×. Fix the comment so it doesn't mislead a future reader about the oracle geometry.


Generated by Claude Code

Comment thread tests/test_aoi_mask.py
Comment on lines +149 to +166
def test_mask_matches_center_contains(self):
from shapely.geometry import Point

grid = self._grid()
# AOI: a WGS84 box; reproject via the grid path and test each child center.
parts = _box(60.0, -40.0, 75.0, -20.0)
aoi_geom = grid.aoi_polygon(parts)
shard = 0
children = grid.children(shard)
mask = grid.aoi_mask_for_children(aoi_geom, children)
xs, ys = grid.cell_centers(children)
expected = np.fromiter(
(aoi_geom.contains(Point(x, y)) for x, y in zip(xs, ys)),
dtype=bool,
count=len(children),
)
assert mask.dtype == bool
assert mask.shape == np.asarray(children).shape

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Weak oracle — the predicate is nearly self-referential. mask comes from rectilinear_mask_for_centers, which is prep(aoi_geom).contains(point) over cell_centers; expected here recomputes aoi_geom.contains(Point(x, y)) over the same cell_centers. The only difference between the two sides is prepared-vs-unprepared contains (which are defined to agree) — so this asserts mainly (a) prep==unprep and (b) the children→center ordering lines up, not that the containment result is correct against an independent reference. It does genuinely exercise the WGS84→grid aoi_polygon reprojection end to end (good — that's the real-path coverage), but the membership itself isn't independently checked here.

Suggest strengthening with a hand-computed expected set: pick an AOI whose grid-CRS extent you know, and assert specific (row, col) cells are in/out by construction (the test_partial_aoi_is_mixed/test_fully_inside boxes are closer to true oracles since they're explicit grid-CRS rectangles — but those skip the reprojection). One test that both reprojects and checks a known in/out cell would close the gap.


Generated by Claude Code

Comment thread tests/test_aoi_mask.py
Comment on lines +169 to +178
def test_fully_inside_aoi_all_true(self):
# An AOI polygon directly in grid CRS covering the whole grid -> all True.
grid = self._grid()
# Build a generous WGS84 polygon by reprojecting the grid bounds back out:
# instead, reproject a grid-CRS box. Use cell-center membership against an
# explicit grid-CRS rectangle that contains every center.
from shapely.geometry import box as shapely_box

aoi_geom = shapely_box(-1000.0, -1000.0, 21000.0, 21000.0)
for shard in (0, 1, grid.n_col_blocks):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

All-true test doesn't guard against a row/col swap. The grid is 5×5 blocks / 25 shards, but this only checks shards (0, 1, n_col_blocks==5). Since the AOI box [-1000,-1000,21000,21000] contains every center, mask.all() would pass even if cell_centers swapped rows↔cols or transposed within a chunk — the test can't distinguish a correct grid from a transposed one because the AOI is symmetric and total. The test_partial_aoi_is_mixed case (left-half box) is the one that constrains orientation, but it only asserts any() and not all(), not which cells. Consider asserting the exact True/False pattern in the partial case (e.g. cols 0–1 True, cols 2–3 False for the 4-wide tile) so an axis swap is actually caught.


Generated by Claude Code

@espg espg left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Fresh-context adversarial review of the phase-3 diff only (fe1253a) against issue #101 and CLAUDE.md. The core wiring is sound and the byte-identity intent is mostly well-protected; one finding rises above nit level.

Blocking-ish — must be surfaced before "ready":

  • Lambda path emits a present-but-empty mask. process_shard accepts aoi_payload, but only runner._cell_work (local) passes it; deployment/aws/lambda_handler.py:251 does not. Meanwhile _spec() gates the aoi_mask array purely on get_aoi_mask(self.config), which runs on the Lambda setup path too. So a flag-on Lambda run allocates a bool aoi_mask array that is never written → all-False (entirely out-of-AOI), the inverse of the feature, with no error. The code degrades safely against crashing (the aoi_payload is not None guard holds; no AttributeError), so it's not broken-broken — but the PR body claims end-to-end wiring and says nothing about Lambda. Please flag this explicitly under "Questions for review" (local-backend only this phase) or gate the array so it isn't emitted where it can't be filled. Inline on worker.py.

Worth addressing in this phase:

  • _compute_aoi_mask dispatches on hasattr(grid, "aoi_moc") with an unconditional rect else and no "unsupported grid" guard — flag-on against a grid lacking AOI methods either AttributeErrors cryptically or (via the config is None gate in build) is silently ignored. Also inconsistent with this module's existing parent_order/child_order HEALPix predicate. Inline on shardmap.py.
  • Flag-off byte-identity test exercises the template+write seam but not the runner kwarg-threading it's meant to protect (_cell_work's conditional aoi_payload). Inline on test_aoi_mask.py.

Confirmed correct (no change needed):

  • K>1 / chunk_inner: both engines expand by membership over chunk_children (a subset of children(shard_key)), order-independent; mask length aligns with the K>1 carrier (children=chunk_children). HEALPix MOC moc_to_order re-expansion and rect cell-id membership both hold at K>1.
  • Manifest round-trip omits aoi_mask when off; from_json/to_json carry it only when present; _aoi_payload_map returns {} on a pre-feature manifest. Byte-identity of the manifest holds.
  • packaging.version rewrite in aoi.py correctly rejects 0.8.2.devN; tests cover it.

Nits: get_aoi_mask re-coerces with bool(...) while validate_config rejects non-bools — accessor/validator disagree on strictness if validation is skipped. hasattr(grid, "aoi_mask_from_payload") guard in the worker silently drops the column on a grid/manifest mismatch (same all-False failure shape). Both inline.

No CLAUDE.md violations spotted in the diff: messages terse, no silenced lint introduced, no dead code; module sizes not pushed past the ~1000-line limit by this change. Did not approve / request changes per review scope.


Generated by Claude Code

handoff: str = "pandas",
ragged_out: dict | None = None,
chunk_results: list | None = None,
aoi_payload=None,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Lambda path produces a silently-wrong mask, not just an absent one. process_shard gains aoi_payload here, but the only caller that passes it is runner._cell_work (local backend). The AWS Lambda handler (deployment/aws/lambda_handler.py:251) calls process_shard(...) with no aoi_payload, so on Lambda every worker defaults to None and writes no aoi_mask column.

That alone would be "no mask on Lambda", but it is worse: emit_template on the setup path does create the aoi_mask zarr array, because the _spec() gate (healpix.py:435 / rectilinear.py:524) is purely config-driven (get_aoi_mask(self.config)) and runs identically on Lambda. So a flag-on Lambda run emits a fully-allocated bool aoi_mask array that is never written — every cell reads fill_value=False, i.e. an all-out-of-AOI mask. A client filtering on it gets zero in-AOI cells, the opposite of the feature's intent, with no error.

The runner/worker code does degrade safely against crashing (the aoi_payload is not None guard means no AttributeError), which is correct. But the schema gate and the write gate are decoupled across backends, so "flag on" yields a present-but-empty array on Lambda. This needs to be either (a) flagged explicitly in the PR body as a known limitation ("aoi_mask is local-backend only this phase; Lambda support is a follow-up"), or (b) the array gated so it isn't emitted on a path that can't fill it. The PR body currently claims the feature is wired and says nothing about Lambda — please surface this under "Questions for review" so @espg can decide.


Generated by Claude Code

Comment thread src/zagg/catalog/shardmap.py Outdated
Computed once here (the shard-map stage) so the worker/Lambda path needs no
region plumbing — the mask depends only on (grid, AOI), never on observations.
"""
is_healpix = hasattr(grid, "aoi_moc")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Dispatch via hasattr(grid, "aoi_moc") has no fallback for a third grid type, and is inconsistent with how this module already detects HEALPix. Two points:

  1. The else branch assumes the grid is rectilinear and unconditionally calls grid.aoi_polygon(parts) / grid.children / grid.aoi_mask_for_children. If a future grid (or any OutputGrid implementer) lacks both aoi_moc and the rect AOI methods, but has config with the flag on, build() will AttributeError deep in _compute_aoi_mask rather than failing with a clear message. Since the flag is opt-in, an explicit guard — e.g. raise a clear ValueError("output.aoi_mask is not supported for grid type <...>") when neither capability is present — would fail loudly at the build stage instead of with a cryptic attribute error.

  2. The rest of this module decides HEALPix-vs-other via hasattr(grid, "parent_order") and hasattr(grid, "child_order") (lines 124, 158). Here you switch to hasattr(grid, "aoi_moc"). Two different duck-type predicates for the same fork invites drift. Prefer reusing the existing predicate (or a shared helper) so the AOI path and the backend-selection path agree on what "HEALPix" means.


Generated by Claude Code

# when ``output.aoi_mask`` is on — otherwise the manifest is unchanged.
from zagg.config import get_aoi_mask

grid_config = getattr(grid, "config", None)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

grid_config = getattr(grid, "config", None) then gating on grid_config is not None and get_aoi_mask(grid_config) is a quiet failure mode: both HealpixGrid and RectilinearGrid always set self.config (defaulting to default_config("atl06")), so config being None here can only mean a grid type that doesn't carry config at all — in which case the flag is silently ignored even if the user set it. That's defensible, but combined with the absence of any "flag on, grid can't do it" error (see the _compute_aoi_mask comment), the net behavior is: enable the flag on an unsupported grid and you get no mask and no warning. Worth a one-line note or a log at minimum so an enabled-but-ignored flag isn't completely silent.


Generated by Claude Code

Comment thread tests/test_aoi_mask.py
assert actual.dtype == np.bool_
np.testing.assert_array_equal(actual, mask)

def test_flag_off_store_byte_identical(self, mock_dataframe_factory):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

The byte-identity test compares write_dataframe_to_zarr(df, ...) for default_config vs flag-off config — it exercises the template + write seam, which is good, but it bypasses the part of phase 3 that is actually new on the flag-off path: the runner._cell_work conditional kwarg threading (if aoi_by_shard: extra["aoi_payload"] = ...). The phases checklist explicitly lists "flag-off byte-identity" under runner wiring, and the claim being protected is that _process_and_write/process_shard are called with the pre-feature signature when the flag is off. Nothing here asserts that. A small test that runs _run_local (or at least _cell_work) against a manifest with no aoi_mask key and asserts no aoi_mask column / no aoi_payload kwarg reaches the worker would close the actual regression surface. As written, a future change that always-threaded aoi_payload=None would not be caught by this suite even though it would defeat the "identical to pre-feature signature" intent stated in runner.py.


Generated by Claude Code

Comment thread src/zagg/config.py
-------
bool
"""
return bool(config.output.get("aoi_mask", False))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Minor consistency note: get_aoi_mask returns bool(config.output.get("aoi_mask", False)), so a truthy non-bool value (e.g. "no", 0.0, []) is coerced rather than rejected. validate_config (config.py:212-214) does reject non-bools — but only when validate_config is actually called. If get_aoi_mask is ever reached on an unvalidated config (e.g. a hand-built PipelineConfig in a test or an alternate entry path that skips validation), "no"True. Not a bug given the validation gate, but the accessor and the validator disagree on strictness; consider having the accessor trust the validated invariant (config.output.get("aoi_mask", False) already being a bool) rather than re-coercing, so the two can't silently diverge. Low priority.


Generated by Claude Code

Comment thread src/zagg/processing/worker.py Outdated
# over THIS chunk's cells (order-aligned with the carrier). None when the
# flag is off, so the carrier is byte-for-byte unchanged.
chunk_aoi_mask = (
grid.aoi_mask_from_payload(aoi_payload, chunk_children)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Confirming the K>1 interaction is correct (not a defect): both engines expand by membership over chunk_children, which is a subset of children(shard_key). HEALPix re-runs moc_to_order(shard_moc, child_order) + np.isin(chunk_children, flat) (aoi.py:115), and rect does np.isin(chunk_children, true_ids), so a sub-tile maps correctly without depending on positional order. The mask length is len(chunk_children), which aligns with the carrier rows since _build_output is given children=chunk_children at K>1 — so the column length matches the coords. Good.

One robustness nit while here: the hasattr(grid, "aoi_mask_from_payload") guard means that if aoi_payload is non-None (manifest had a mask) but the grid somehow lacks the method, the column is silently dropped for that chunk while the schema array still exists → all-False region, same failure shape as the Lambda gap. Given both built-in grids define the method, this can only fire on a grid/manifest mismatch, but a manifest carrying aoi_mask against a grid that can't expand it is arguably worth a hard error rather than a silent skip.


Generated by Claude Code

@espg espg left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Fresh-context adversarial review of phase 4 (3151019) + the three fold commits (f684fe0, 9b4108b, b412313) against issue #101 and CLAUDE.md.

Verified sound (ran the code locally against mortie 0.8.2):

  • Notebook aoi_mask.ipynb executes clean end-to-end and the np.array_equal(mask, expected) assertion holds; cells use the real grid API (aoi_moc / aoi_shard_moc / aoi_mask_for_children); nbformat.validate passes. Badge URL is well-formed for englacial/zagg.
  • Docs storage-layout and read snippet are accurate: HEALPix group_path == str(child_order) (→ <child_order>/aoi_mask, matching the xr.open_zarr(".../13") example) and rect group_path == "rectilinear". The mortie >= 0.8.2 claim and "asserts version at use" match _assert_mortie_version. Lambda limitation and WKB-deferral notes are correct.
  • Lambda refusal (runner._run_lambda) reads get_aoi_mask(config) and raises NotImplementedError at the very top, before any template write / side effect — correct call vs silently emitting an all-False mask.
  • Worker hard-error (worker.py) is correctly defensive: a non-None aoi_payload against a grid lacking aoi_mask_from_payload is a manifest/grid mismatch; an empty-list payload (legit out-of-AOI shard) still has the method, so it can't wrong-fire.
  • Rect oracle test is genuinely independent (pyproj back-reprojection, not shapely-on-grid). All touched tests pass; ruff check clean on the changed source. No @-mentions, terse code.

Three diff-scoped findings inline:

  1. shardmap.py:201 — the dispatch-predicate reconciliation a prior self-review asked for wasn't done (hasattr(aoi_moc) vs the module's parent_order/child_order predicate used 3x). Not a live bug; latent inconsistency.
  2. docs/aoi_mask.md:96 / notebook badge — "launchable on Binder" with no Binder env file in the repo; mortie>=0.8.2 + zagg won't resolve on a default Binder build (CLAUDE.md §4).
  3. tests/test_aoi_mask.py:212agree.mean() > 0.9 is far looser than the actual 1.0 agreement; tighten to agree.all() so it guards real regressions.

No blocking issues; all three are improvements, not correctness stoppers.


Generated by Claude Code

Comment thread src/zagg/catalog/shardmap.py Outdated
neither with the flag on is a misconfiguration, raised here rather than left to
a cryptic ``AttributeError`` downstream.
"""
if hasattr(grid, "aoi_moc"):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

The earlier self-review asked to reconcile this dispatch with the module's own HEALPix predicate; that wasn't done. _compute_aoi_mask keys HEALPix-vs-rect on hasattr(grid, "aoi_moc") / hasattr(grid, "aoi_polygon"), but everywhere else in this same module the family test is is_healpix = hasattr(grid, "parent_order") and hasattr(grid, "child_order") (lines 124, 158, 294). Two divergent predicates for the same fork is exactly the inconsistency that was flagged. They happen to agree for HealpixGrid/RectilinearGrid today (rect has aoi_polygon+no parent_order; HEALPix has both), so this is not a live bug — but a future grid that exposes parent_order/child_order without aoi_moc (or vice-versa) would be classified one way here and the other way 30 lines up. Recommend dispatching on the same is_healpix predicate the rest of the module uses, then branching to aoi_moc vs aoi_polygon inside, keeping the new ValueError for the genuinely-unsupported case.


Generated by Claude Code

Comment thread docs/aoi_mask.md Outdated

See the runnable, data-free example in
[`notebooks/aoi_mask.ipynb`](https://github.com/englacial/zagg/blob/main/notebooks/aoi_mask.ipynb)
(launchable on Binder), which builds a small HEALPix grid + AOI box and shows the

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

"launchable on Binder" (and the matching Binder badge in notebooks/aoi_mask.ipynb) looks unbacked: there's no Binder environment file in the repo (environment.yml / requirements.txt / binder/ / postBuild), and none of the other four notebooks ships one either. On a default Binder build the notebook's hard deps wouldn't resolve — it imports zagg and needs mortie >= 0.8.2 (the order-29 MOC cap that _assert_mortie_version enforces; older mortie returns an empty/wrong mask), and zagg itself isn't a plain pip install on Binder. As written the badge will land users on a kernel that errors at the first import. Either add a Binder env pinning zagg + mortie>=0.8.2 (CLAUDE.md §4 requires notebooks be Binder-runnable), or drop the "launchable on Binder" claim and the badge until the env exists. (Note the notebook itself is HEALPix-only, so it does not need the non-PyPI spherely fork — only the rect path would.)


Generated by Claude Code

Comment thread tests/test_aoi_mask.py Outdated
oracle = (clon >= lon0) & (clon <= lon1) & (clat >= lat0) & (clat <= lat1)
# Straight-chord vs geodesic edge: allow boundary-cell slop, exact interior.
agree = mask == oracle
assert agree.mean() > 0.9

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

The independence here is good — reprojecting cell centers back to WGS84 with pyproj and testing the axis-aligned box is a genuinely separate oracle from the shapely contains path, and mask.sum()/oracle.sum() match. But agree.mean() > 0.9 is far looser than reality: on this 20x20 (400-cell) UTM tile the two agree exactly (agree.mean() == 1.0, 0 disagreements — verified locally), because straight-chord vs geodesic edges is sub-cell at 1 km / mid-latitude UTM. A > 0.9 gate tolerates up to 40 cells flipping, so a real regression in the rect containment (e.g. an off-by-one tile shift or a boundary-handling change touching dozens of cells) would still pass. Tighten to the observed behavior — assert agree.all() (or at most one or two boundary cells, (~agree).sum() <= 2) — so the oracle actually guards the result it's meant to.


Generated by Claude Code

@espg

espg commented Jun 27, 2026

Copy link
Copy Markdown
Member Author

🤖 from Claude

Folded the three phase-4 self-review findings in one commit (0c557da):

  1. shardmap.py _compute_aoi_mask dispatch — switched the HEALPix-vs-rect fork from hasattr(grid, "aoi_moc") to the module's existing is_healpix = hasattr(grid, "parent_order") and hasattr(grid, "child_order") predicate (the same one used at lines 124/158/294), then branch to aoi_moc vs aoi_polygon inside, keeping the ValueError for the genuinely-unsupported case. Behavior is identical for HealpixGrid/RectilinearGrid; the predicate no longer drifts from the rest of the module.

  2. Binder claim — made the claim honest in both docs/aoi_mask.md and notebooks/aoi_mask.ipynb: the notebook is self-contained/offline and runs anywhere zagg (with mortie>=0.8.2) is installed, but Binder launch depends on the repo-wide .binder/ environment landing via Binder-compatible docs / example notebooks #105. Removed the unconditional Binder badge from the notebook rather than ship a badge that errors on a default build. No .binder/ infra added here (that's Binder-compatible docs / example notebooks #105's scope).

  3. Rect oracle assertion — tightened the independent-reprojection oracle in tests/test_aoi_mask.py from agree.mean() > 0.9 to assert agree.all(), matching the measured exact (1.0) agreement so a real containment regression can't hide under the loose threshold.

AOI suite green (28 passed); ruff check + ruff format --check clean on the touched files.


Generated by Claude Code

@espg

espg commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

needs conflicts resolved ahead of merge

@espg

espg commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

🤖 from Claude

Conflicts resolved. Merged current main (97e3554) into claude/101-aoi-mask — merge commit c5e4f8e (parents 0c557da + 97e3554), no force-push. The branch is no longer dirty.

Four files conflicted; config.py auto-merged cleanly. Resolutions:

  • src/zagg/grids/healpix.py and src/zagg/grids/rectilinear.pymain (Evaluate zarr ShardingCodec for HEALPix output: decouple large write/dispatch shards from 64×64 read chunks #108 sharding) now wraps each per-cell coordinate array in _shard(...), while this branch added the optional aoi_mask member. Kept both: applied main's _shard(...) to the coord members and wrapped the aoi_mask bool array in _shard(...) too, so the mask shards consistently with the other per-cell arrays. The block stays gated on get_aoi_mask(self.config), so a flag-off run is still byte-identical.
  • src/zagg/processing/worker.py — both sides added a keyword-only param to process_shard (aoi_payload=None here, profile=False on main); kept both.
  • tests/test_runner.py — took main's superset _LAMBDA_KEYS and its _invoke_lambda_cell(..., profile=profile) call site, keeping this branch's formatting.

Green locally: pytest -q → 922 passed, 8 skipped, 0 failed. ruff check --select=E,F,W,I --ignore=E501 src tests clean. The aoi_mask suite (tests/test_aoi_mask.py 33 passed; tests/test_shardmap.py::TestBuildAOIMask + flag-off byte-identity) all pass.

One reconciliation decision worth your eye: the aoi_mask bool array is now _shard(...)-wrapped (flag-ON only) so it coexists with #108's sharded on-disk layout — without this the mask array wouldn't match the sharded coordinate arrays it parallels. It's covered green by the aoi_mask suite, but it's a new interaction between this feature and #108 that didn't exist when the PR was first written, so flagging it explicitly.

Non-blocking flag: pre-existing ruff format drift in tests/test_runner.py (lines 827+) exists on main itself; left untouched per the "don't fix unrelated CI failures" rule.

This PR's four Questions for review in the description (the moc_and primitive, the boundary-inclusivity asymmetry across engines, rect-AOI densification, and Lambda follow-up) are still standing for your call. With all phases complete and conflicts cleared, marking waiting on those answers + merge; staying a draft until then.


Generated by Claude Code

@espg espg added the waiting label Jun 29, 2026
@espg

espg commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

Per-shard restriction primitive — HEALPix uses moc_and(aoi_moc, [shard_word]) rather than coarsening MOC words to parent_order and matching shard_key (the latter misses shards under a coarse interior MOC word — verified). Confirm moc_and is intended.

Seems reasonable-- confirmed.

Boundary-inclusivity asymmetry across engines — the HEALPix MOC path is overlap/coverage-based (includes boundary-overlapping cells), while the rectilinear path is a strict prep().contains on the cell center (excludes a cell whose center is exactly on the boundary). Each is defensible alone (strict contains matches "keep-whole-cell-if-center-in"), but a client filtering on aoi_mask sees slightly different edge semantics per grid family. Keep as-is (documented in docs/aoi_mask.md), or align rect to covers/intersects for parity?

Document explicitly and keep as is.

Rect AOI reprojection densification — aoi_polygon reprojects the WGS84 ring with to_crs and no densification, consistent with coverage, so AOI edges become straight chords (not geodesics) in polar/large-extent CRS — edge-cell membership can be off by up to the chord-vs-arc deviation. Since the mask is the strict deliverable, densify the AOI ring before to_crs, or accept the chord approximation (and document the edge-slop bound)?

...does shapely solve this for us? Or rather, could it solve this for us? To be clear, is this something we need to think about for both healpix and rect, or is this another rect specific issue?

Lambda backend — the AWS Lambda worker (deployment/aws/lambda_handler.py, infra not touched per the repo's golden rules) does not yet thread the per-shard AOI payload to process_shard. Because emit_template would still allocate the bool aoi_mask array on the Lambda setup path, a flag-on Lambda run would emit an array no worker ever fills — an all-False (out-of-AOI) mask, silently the inverse of the feature. To prevent that footgun, a flag-on Lambda run now raises NotImplementedError; aoi_mask is supported end-to-end on the local backend this phase. Lambda support is a follow-up that needs the handler change (infra-gated — needs @espg's go-ahead).

No, we should implement that in lambda_handler.py now, as part of this PR. You are approved to edit that file and expand the PR scope to implement above

Lambda support — confirm the follow-up to thread the AOI payload through the Lambda event + deployment/aws/lambda_handler.py (infra-gated). For now a flag-on Lambda run refuses loudly.

See above; we're doing it here, not as a follow up

WKB/WKT AOI input (passing a native geometry rather than (lats, lons) rings) is a deferred future phase, per @espg (comment 4814504937): it lands "in the last phase after whatever PR at mortie lands that allows it" — blocked on upstream espg/mortie#71 (still in progress). This PR does not block on it.

I expect that espg/mortie#89 will land before this PR, so we'll roll support for WKB/WTK here too.

@espg espg removed the waiting label Jun 29, 2026

@espg espg left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Fresh-context adversarial review of the Phase-5 diff (6f49fec..2f9e1fd) — Lambda backend threading of the per-shard strict-AOI payload (issue #101). No blocking issues; Phase 5 looks correct and well-tested.

Correctness (end-to-end threading). The payload now flows cleanly: _run_lambda builds aoi_by_shard via _aoi_payload_map, _cell_work sets extra["aoi_payload"] only when the manifest carries a mask, _invoke_lambda_cell adds the "aoi_payload" event key only when non-None, and lambda_handler._handle_process reads event.get("aoi_payload") and forwards it to process_shard. This mirrors the local runner's threading.

Flag-off byte-identity — verified. process_shard defaults aoi_payload=None (worker.py:60) and the worker body is guarded by if aoi_payload is not None: (worker.py:383). So the handler passing aoi_payload=None explicitly is behaviorally identical to omitting the kwarg — no aoi_mask column allocated, no output drift. The event payload also stays key-for-key identical when off (no aoi_payload key), matching the established profile (#100) pattern. The handler-passes-None vs runner-omits asymmetry is purely stylistic.

Dead-import cleanup — verified. With the NotImplementedError refusal removed, get_aoi_mask has zero remaining references in runner.py, so dropping the import is correct and leaves no stale comment.

Tests — strong. Both directions are covered at both layers: handler-level (test_aoi_payload_threaded_to_process_shard asserts the list is forwarded; test_no_aoi_payload_passes_none asserts the absent key yields aoi_payload is None), event-construction level (test_aoi_payload_adds_event_key / test_default_event_has_no_aoi_payload_key), and runner level (test_lambda_threads_aoi_payload asserts exact per-shard payloads seen[10]==[1,2], seen[13]==[4,5] and that the no-manifest path omits the kwarg via the "OMITTED" sentinel). The old test_lambda_refuses_aoi_mask was correctly replaced rather than weakened. Assertions are exact, not loose. The remaining diff churn in test_runner.py is ruff-format reflow only.

No undiscussed dependencies. Diff adds none.

Nits (non-blocking, see inline):

  • _run_lambda/_run_local use aoi_by_shard.get(int(shard_key)) which silently yields None (→ key omitted) for a shard missing from the map even when the flag is on. Can't fire with today's _aoi_payload_map (built from zip(shard_keys, aoi_mask)), but the parallel-length invariant is load-bearing — fine to leave for @espg.

Out of scope: runner.py is now 1180 lines, over the ~1000-line soft limit in CLAUDE.md §4. It was already 1172 before this PR (Phase 5 adds only ~8 net lines), so this is pre-existing, not introduced here — flagging for awareness rather than asking this PR to fix it.


Generated by Claude Code

config=config,
chunk_results=chunk_results,
profile=profile,
aoi_payload=aoi_payload,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Byte-identity confirmed for the flag-off path. The handler always passes aoi_payload=event.get("aoi_payload") (None when the key is absent), and process_shard's signature defaults aoi_payload=None (src/zagg/processing/worker.py:60) with the body guarded by if aoi_payload is not None: (worker.py:383). So the explicit-None call here is behaviorally identical to the runner's omit-the-kwarg style — no mask column is allocated, no output drift. The asymmetry (handler passes None explicitly, local runner omits) is purely stylistic and harmless; both collapse to the same default. No change needed.


Generated by Claude Code

Comment thread src/zagg/runner.py
# pre-feature path (issue #101). Mirrors the local runner's _cell_work.
extra = {}
if aoi_by_shard:
extra["aoi_payload"] = aoi_by_shard.get(int(shard_key))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Nit / latent edge (not blocking): when the flag is on but a shard is missing from aoi_by_shard, .get(int(shard_key)) returns None, so extra["aoi_payload"] = None is set. _invoke_lambda_cell then guards if aoi_payload is not None: and omits the event key — so that one cell silently gets no mask payload while the flag is on, rather than an empty in-AOI payload. With the current _aoi_payload_map (built from zip(shard_keys, aoi_mask)) every dispatched shard is present, so this can't fire today; but it mirrors the same .get() shape in _run_local (runner.py:495), and if the manifest's aoi_mask ever became shorter than shard_keys the mismatch would pass silently. Consider documenting the parallel-length invariant (it is noted in _aoi_payload_map's docstring) or asserting it. Fine to leave for @espg.


Generated by Claude Code

@espg espg left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Fresh-context adversarial review of the Phase 6 (docs) + Phase 7 (densify) diff only (2f9e1fd..04d1f35). No blocking issues — the change is correct, well-tested, well-documented, and dependency-free.

Verified against the installed stack (not from memory):

  • odc.geo.geom.Geometry.to_crs accepts resolution="auto" (signature Union[float, Literal['auto'], None]); "auto"sqrt(area)*4/100 ≈ 100 pts/side in the source CRS. So the densify is real, valid, and not a typo.
  • Flag-off byte-identity preserved: the densified rectilinear_aoi_polygon is reachable only via the output.aoi_mask-on path (shardmap.py hasattr guard). No off-path bytes change.
  • The new test genuinely proves geodesic-following: 103 vs 5 vertices and ~2% area delta in EPSG:3413 (threshold is 1e-4, ~200x margin — not flaky, not tautological). Suite passes locally (7 passed).
  • Boundary-inclusivity doc section matches the code exactly (HEALPix MOC inclusive; rect prepared.contains strict). No contradiction with the densify paragraph.

Verdict on the "auto" design choice (the question flagged for review): resolution="auto" is a safe, clean default — not something that needs to be escalated to a config knob in this PR. It is parameter-free, adapts to AOI extent, has no degenerate-output risk for a valid AOI, and reuses the existing to_crs densify mechanism. If anyone later wants reproducible cross-tool vertex counts or to trade precision for output size, exposing the resolution becomes a clean follow-up — but it is not a gate on landing this. Worth a one-line "Questions for review" note so @espg can opt to make it configurable, nothing more.

Nits (none blocking):

  1. Docstring + docs say densify uses "the same primitive shard_footprint uses" — same method, but shard_footprint passes a numeric max(edge)/32, not "auto". Slight overstatement; reword to "same densify-before-reproject mechanism (here via resolution=\"auto\")".
  2. The new test's chord baseline uses parts[0] only, so the multipolygon branch of rectilinear_aoi_polygon is untested; not densified.equals(chord) is redundant with the area-delta assert.

Inline comments on aoi.py:161, the test, and docs/aoi_mask.md have the specifics.


Generated by Claude Code

Comment thread src/zagg/grids/aoi.py
geom = polygon(rings[0], crs="EPSG:4326")
else:
geom = multipolygon([[r] for r in rings], crs="EPSG:4326")
return geom.to_crs(crs, resolution="auto").geom

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Densify is correct and well-scoped. Verified against the installed odc.geo: Geometry.to_crs(crs, resolution=...) does accept the literal "auto" (signature resolution: Union[float, Literal['auto'], None]), and "auto" resolves to _auto_resolution(g) = sqrt(area)*4/100 (~100 points per side) in the source CRS — here EPSG:4326, so the segmentation step is in degrees. So the change is real and safe:

  • Flag-off byte-identity holds: rectilinear_aoi_polygon is reached only via RectilinearGrid.aoi_polygon, which is only called from catalog/shardmap.py under the hasattr(grid, "aoi_polygon") mask path — i.e. only when output.aoi_mask is on. Nothing in the off path changed.
  • No degenerate/empty risk for a valid AOI; in a non-curving (or small-extent) CRS the inserted points are ~collinear, so the mask result is unchanged — an acceptable no-op.

One doc-accuracy nit (not blocking): the docstring says this is "the same primitive RectilinearGrid.shard_footprint uses." Same method (to_crs(resolution=...)), but shard_footprint passes a numeric resolution (max(edge)/32), not "auto". Consider "the same densify-before-reproject mechanism shard_footprint uses (here via resolution=\"auto\")" so the phrasing doesn't imply identical arguments. Same nit applies to docs/aoi_mask.md.


Generated by Claude Code

Comment thread tests/test_aoi_mask.py Outdated
Comment on lines +218 to +245

def test_aoi_ring_is_densified_before_reproject(self):
# issue #101: rectilinear_aoi_polygon densifies the WGS84 ring before
# to_crs (resolution="auto") so AOI edges follow the geodesic instead of
# collapsing to straight chords in a polar CRS. A 4-corner box reprojected
# WITHOUT densification keeps ~5 ring vertices; the densified path inserts
# many. In a curving CRS (EPSG:3413) the two polygons genuinely differ.
from odc.geo.geom import polygon as odc_polygon

from zagg.grids.aoi import rectilinear_aoi_polygon

grid = self._grid()
parts = _box(60.0, -40.0, 75.0, -20.0)
densified = rectilinear_aoi_polygon(parts, grid.crs)

# Same parts, reprojected as straight chords (no densification) -- the old
# behavior, built here directly as the baseline.
lats, lons = parts[0]
ring = [(float(x), float(y)) for x, y in zip(lons, lats)]
chord = odc_polygon(ring, crs="EPSG:4326").to_crs(grid.crs).geom

n_dense = len(densified.exterior.coords)
n_chord = len(chord.exterior.coords)
assert n_dense > n_chord # densification inserted edge vertices
assert n_chord <= 5 # the raw box is just its corners
# The geodesic-following polygon is not the chord polygon: in a polar CRS
# the curved edges enclose a measurably different area.
assert not densified.equals(chord)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Test is genuine, not tautological. Reproduced its assertions against the installed stack with the test's own EPSG:3413 grid and _box(60,-40,75,-20):

  • n_dense = 103 vs n_chord = 5 — densification really inserts edge vertices.
  • relative area delta = ~0.0206 (>> the 1e-4 threshold), so the geodesic-following polygon genuinely differs from the straight-chord one in a polar CRS. The vertex-count check alone could pass on a trivially-resampled straight edge, but the area-delta check is what proves the edges actually curve — together they're a meaningful proof.

The 1e-4 threshold is conservative (true value is ~200x larger), so it won't be flaky. Suite passes locally (7 passed). Two minor nits, neither blocking:

  • The chord baseline is built from parts[0] only. Fine here (single-part box), but it means the test doesn't exercise the multipolygon branch of rectilinear_aoi_polygon. Optional: a second multipart assertion would close that gap.
  • not densified.equals(chord) is redundant once the area-delta assert is in place, but harmless.

Generated by Claude Code

Comment thread docs/aoi_mask.md
Comment on lines +63 to +88
## Boundary inclusivity differs by grid family

The two engines settle a cell that *straddles* the AOI edge differently, and the
difference is **intentional and kept as-is** — each rule is the natural one for
its engine. A client filtering on `aoi_mask` should expect slightly different
edge semantics across grid families:

- **HEALPix is overlap/coverage-based (inclusive at the boundary).**
`morton_coverage_moc` builds a MOC that *covers* the AOI, so a leaf cell whose
area overlaps the AOI — even partially, including one whose center lies just
outside — is in the MOC and marked `True`. The MOC is a superset cover, so the
HEALPix mask leans **inclusive**: it never drops a cell that touches the AOI.

- **Rectilinear is a strict center test (exclusive at the boundary).** Each cell
is marked `True` only if its **center** falls inside the reprojected AOI polygon
(`prepared.contains(center_point)`). `contains` is strict — a center lying
*exactly on* the boundary returns `False` — so a cell straddling the edge whose
center is outside (or on) the polygon is **excluded**. This matches the
"keep-whole-cell-if-its-center-is-in" rule.

So for the same AOI edge, the HEALPix mask tends to **include** an edge-overlapping
cell while the rectilinear mask **excludes** one whose center is outside. Both are
defensible in isolation (the HEALPix MOC is the native, decode-free primitive; the
rect center test matches how rect cells are addressed), and the asymmetry is at
most one cell-width at the boundary. Keep this in mind when comparing strict-AOI
counts across a HEALPix and a rectilinear run of the same region.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Boundary-inclusivity section is accurate and consistent with the code:

  • HEALPix "inclusive at the boundary" matches healpix_aoi_mocmorton_coverage_moc, which builds a superset MOC cover, expanded via moc_to_order + np.isin membership (aoi.py:115-131) — a leaf overlapping the AOI is kept.
  • Rectilinear "strict center test (exclusive)" matches rectilinear_mask_for_centers, which is prepared.contains(point) (aoi.py:177) — contains excludes a point exactly on the boundary. Correct.

No contradiction with the new densify paragraph: densify changes which polygon the center test runs against (geodesic vs chord edge), it does not change the inclusive/exclusive rule. The two sections are consistent.

The "Both backends" / WKB-deferred Notes updates (Phase 6) also read correctly and match the Phase-5 Lambda threading described. No issues here.


Generated by Claude Code

@espg

espg commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

🤖 from Claude

Answering the rect-densification question (3) — "does/could shapely solve this for us? Is this rect-specific or also HEALPix?"

Short answer: yes, it's solvable cleanly and dependency-free; it's rect-specific; I've implemented it in phase 7 (commit 04d1f35) using the densification primitive we already use elsewhere.

Is it rect-specific or also HEALPix? Rect-specific. The chord-vs-arc error comes entirely from reprojecting a sparse polygon ring through to_crs — only the rectilinear path does that (rectilinear_aoi_polygongeom.to_crs(grid_crs)). The HEALPix path never reprojects a polygon: morton_coverage_moc(lats, lons, order=child_order) tessellates the native (lats, lons) ring on the sphere and returns morton cells directly. So there are no straight chords to worry about on the HEALPix side — it's already geodesic by construction.

Does shapely solve it? Either tool can: shapely.segmentize(ring, max_segment_length) would densify in WGS84-degree units before reproject. But we don't need to reach for it — odc.geo's to_crs already supports densification, and we already use it for exactly this reason: RectilinearGrid.shard_footprint reprojects the chunk extent with to_crs("EPSG:4326", resolution=densify) precisely "so curved CRS boundaries — and pole-spanning tiles — don't collapse to a degenerate polygon." The AOI path was the one reproject that didn't densify, an inconsistency with coverage/shard_footprint.

What I did (phase 7): rectilinear_aoi_polygon now reprojects with geom.to_crs(crs, resolution="auto") — odc.geo's adaptive densification, the same primitive shard_footprint uses. No new dependency (odc.geo is core). Verified it inserts edge vertices (4-corner box: 5 → ~62 ring points) and yields a measurably different (geodesic-following) polygon in a polar CRS (EPSG:3413). New test test_aoi_ring_is_densified_before_reproject pins both the vertex-count growth and the area delta vs the chord baseline. Documented in docs/aoi_mask.md.

One point left for your call: I used resolution="auto" as the default (clean, adapts to the geometry, no magic constant). The alternative is to expose the densify resolution as a config knob if you want a guaranteed edge-slop bound rather than odc.geo's adaptive choice. Happy to wire a knob if you'd prefer it; otherwise "auto" stands.

Also folded your other answers: (1) moc_and confirmed — no change. (2) boundary-inclusivity asymmetry is now documented explicitly in docs/aoi_mask.md (HEALPix MOC overlap/coverage = inclusive at the edge; rect strict center contains = exclusive), behavior kept as-is per your call. (4) Lambda backend implemented this PR (phase 5) — see the separate status comment.


Generated by Claude Code

@espg

espg commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

🤖 from Claude

Status — folded your answers (comment 4834768382), landed Lambda + boundary-doc + densification this run. Marking waiting on two open design forks (Q3 knob offer, Q4 WKB/WKT defer-vs-hold). Still a draft.

Conflict resolved. Merged current main (cbd4828) into claude/101-aoi-mask (merge 6f49fec, no force-push). One conflict, tests/test_runner.py _LAMBDA_KEYS: both sides edited the same set — main (#120) added max_memory_mb; took the union in the one-key-per-line ruff-format style. Branch is no longer dirty. Full suite green after merge.

Phase 5 — AWS Lambda AOI threading (you authorized editing lambda_handler.py). Removed the flag-on NotImplementedError refusal. _run_lambda now builds the per-shard payload map and threads each shard's aoi_payload through the per-cell Lambda event (aoi_payload key, added only when the flag is on) → deployment/aws/lambda_handler.py _handle_process forwards event.get("aoi_payload") into process_shard, mirroring the local runner. Flag-off Lambda runs are byte-identical (no event key; process_shard called with its default). Tests: flag-on threading + flag-off no-op at the handler, event-construction, and runner levels. (No other infra touched; no CI/workflow change; nothing run against live AWS.)

Phase 6 — boundary-inclusivity asymmetry documented (your directive 2). New explicit section in docs/aoi_mask.md: HEALPix MOC is overlap/coverage-based (inclusive at the edge — the MOC is a superset cover), rect is a strict cell-center contains (exclusive — a center on/outside the boundary is dropped). Behavior kept as-is; the doc warns clients that strict-AOI counts can differ by up to one cell-width at the edge across grid families.

Phase 7 — densification answer + implementation (your question 3). Full answer is in the comment just above. Gist: it's rect-specific (HEALPix tessellates the native (lats, lons) ring on the sphere and never reprojects a polygon, so no chords); and yes it's cleanly solvable — odc.geo's to_crs already does resolution densification and we already use it in shard_footprint, so the AOI reproject now uses to_crs(crs, resolution="auto"). Verified geodesic-following (4-corner box: 5 → ~62 ring vertices; ~2% area delta in EPSG:3413). Test + doc added.

WKB/WKT (Q4) — deferred, blocked on espg/mortie#89. You expected espg/mortie#89 to land before this PR. As of this run it's still open / not merged (itself waiting on a Python-vs-Rust decision, with an unfinished polar-cap dissolve sub-piece), and the pinned mortie>=0.8.2 exposes no public WKB/WKT API (the only from_wkb is a mortie test helper). So I did not fake it and did not bump the pin to an unreleased mortie. The AOI stays (lats, lons) rings. Two options for you (PR-body Q4): (a) leave WKB/WKT to a follow-up once espg/mortie#89 ships — this PR lands everything else now; or (b) hold this PR until then. Took (a) by default.

Self-reviews: a fresh-context adversarial review ran after each phase push (above). No blocking findings; I folded the two diff-scoped doc/test nits from the phase-6/7 review into 2933882.

Local gates: pytest -q → 935 passed, 10 skipped, 0 failed. ruff check --select=E,F,W,I --ignore=E501 src tests clean. ruff format --check clean on touched files.

Open for you: (3) keep resolution="auto" or expose a densify-resolution knob? (4) WKB/WKT — defer (a) or hold (b)? (5) non-blocking: runner.py is ~1180 lines, just over the §4 soft limit (pre-existing — 1172 before this PR); flagging rather than splitting. Once you answer (3)/(4) I'll clear waiting and either wire the knob / WKB/WKT or mark ready for review.


Generated by Claude Code

@espg espg added waiting blocked flag for claude labels Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optional strict-AOI polygon clip for shardmap output (sub-shard clipping)

2 participants