MOC set-algebra (union/intersection/difference) via sorted-range merge (phase 1 of #50)#52
MOC set-algebra (union/intersection/difference) via sorted-range merge (phase 1 of #50)#52espg wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
🤖 from Claude
Leaving this I001 as-is (not a wholesale isort reformat), for three reasons:
- mortie's lint contract is flake8, not ruff/isort (CLAUDE.md §7: "There is no
black/ruff/mypyconfigured"). The gatingruff/flake8 CI job passes; reviewdog's isort rule is advisory and not part of the project's style authority. - This PR didn't introduce the disorder. The
__init__.pyimport region uses intentional logical grouping with section comments (# Import coverage functions,# Inverse functions, etc.), not alphabetical isort order — that predates this PR tree-wide. MOC set-algebra (union/intersection/difference) via sorted-range merge (phase 1 of #50) #52 only appendsmoc_union/moc_intersection/moc_differenceto the existing.coverageblock in their natural (union → intersection → difference) order. - Satisfying
I001would require reordering the whole import section (modules and names), churning unrelated lines and breaking the logical grouping/comments — out of scope for a set-algebra PR.
Flagging rather than silencing — happy to do a dedicated import-sort pass tree-wide if you want one (it'd pair naturally with the queued fmt/clippy sweep). The one real review finding (an unused_mut on the new test closure) is fixed in a902afe.
Generated by Claude Code
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #52 +/- ##
==========================================
+ Coverage 91.95% 92.71% +0.75%
==========================================
Files 19 23 +4
Lines 2636 3268 +632
==========================================
+ Hits 2424 3030 +606
- Misses 212 238 +26
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 4 files with indirect coverage changes Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
Merging this PR will degrade performance by 18.29%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | coverage_triangle[4] |
110.1 µs | 134.7 µs | -18.29% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/50-bmoc-setops (e0492eb) with main (5f9e90b)
moc_or / moc_and / moc_minus is terser, which I like. Use these, but document in the docstring the original equivalency names.
Yes, pursue this. I'd like another sub-agent to track down what the issue is the upstream code with minus -- that said, regardless of what that finds, having this in house sounds great, so proceed.
...whats the question here? Keep in mind we're going to standardize on unsigned morton indices going forward, which will also standardize on support for full order range 0 to 29, rather than truncated at order 18.
I think this is being covered by issue #34 now (don't worry about it here). |
|
🤖 from Claude Phase 2 — rename the public Python API to the terser What changed (Python only):
Rust left untouched (deliberate, to minimize churn/risk): the internal Docstring equivalency notes added to each renamed function, as requested:
Test counts (real):
Tree-wide grep confirms no remaining Generated by Claude Code |
espg
left a comment
There was a problem hiding this comment.
🤖 from Claude (review)
Fresh-context adversarial review of the phase 2 commit (e0492eb, "phase 2 of issue #50") — the pure rename of the public MOC set-op API. No actionable defects found; the rename is correct and complete.
1. Completeness — pass. Tree-wide grep for moc_union|moc_intersection|moc_difference leaves only the intentional artifacts:
src_rust/src/lib.rs(rust_moc_union/intersection/differencePyO3 bindings + theadd_functionregistrations) — deliberately kept per the task to minimize Rust churn.- The three
Originally named ...docstring notes inmortie/coverage.py. - The three
_rustie.rust_moc_*(...)calls inside the wrappers.
Public surface fully renamed: coverage.py defs, __init__.py imports + __all__, and all test_coverage.py call sites. No stray references in README*, docs/, notebooks, .md/.rst files.
2. Directive compliance — pass. Each renamed function documents its original equivalency name and keeps the terse surrounding numpydoc style:
moc_or(coverage.py:268): "Originally namedmoc_union; this is the set union of the two covers."moc_and(coverage.py:298): "Originally namedmoc_intersection; ..."moc_minus(coverage.py:329): "Originally namedmoc_difference; ...a \\ b..."
All three See Also cross-refs updated consistently to the new names (coverage.py:282, 313, 344).
3. Correctness — pass. No accidental binding swap. Each wrapper calls its matching Rust binding:
moc_or→rust_moc_union(coverage.py:288)moc_and→rust_moc_intersection(coverage.py:319)moc_minus→rust_moc_difference(coverage.py:350)
4. Tests — pass. All TestMOCSetOps call sites exercise the new names; the stale comment at test_coverage.py:581 was updated to moc_or. No assertions weakened or removed — the rename is mechanical and behavior coverage (union==compress-concat, intersection/difference brute-force, disjoint, self-difference, mixed-order, empty-input, southern hemisphere) is unchanged.
5. Conventions — pass. Commit message is title-only (phase 2 of issue #50, §3); the explanatory note lives in docstrings/PR, not the commit body. No authorship credit in the message. All changed lines in coverage.py, __init__.py, and test_coverage.py are within the 88-char style limit.
Note (out of scope, not a defect): the Rust binding names still read rust_moc_union/intersection/difference while the Python surface is now or/and/minus. This is the explicit task decision to minimize churn, so flagging only for awareness — no change requested.
LGTM. Recommend proceeding.
Generated by Claude Code
Sounds reasonable. This PR is blocked while we test (cleaner) version that fixed the upstream hang instead of reimplementing Bmoc; new PR will inherit design decisions from here |
|
superseded by #53 |
Refs #50Implements the Option D decision from #50: fast boolean set-algebra over mortie's morton-index
i64covers with no new crate dependency (nomoc, nocdshealpix, no second HEALPix implementation, no FITS/IVOA serialization).What this adds
Three set operations over two mixed-order morton-cell arrays (the existing public MOC representation
moc.rsoperates on — signed decimal morton, order ≤ 18):union(a, b)— cells covering either input (equivalent tocompress_moc(concat(a, b))).intersection(a, b)— cells covered by both (empty when disjoint).difference(a, b)— cells ofanot covered byb.Each returns a canonical compact MOC (sorted mixed-order
int64).Rust (
src_rust/src/moc.rs):pub fn union/intersection/difference(a: &[i64], b: &[i64]) -> Vec<i64>.Python bindings (
src_rust/src/lib.rs):rust_moc_union,rust_moc_intersection,rust_moc_difference(numpyi64in/out, compute wrapped inpy.allow_threads, mirroring the existingrust_moc_normalizestyle).Python API (
mortie/coverage.py, exported frommortie/__init__.py):moc_union,moc_intersection,moc_difference— thin wrappers mirroringcompress_moc/moc_to_order.Before, mortie could only union via Python
np.union1d+compress_moc; there was no Rust intersection/difference.Backing path taken: in-house sorted-range merge (not BMOC)
The issue allows either the
healpix 0.3.2BMOC ops if public and usable, or an in-house sorted-range merge. I inspected the crate (~/.cargo/registry/.../healpix-0.3.2/src/bmoc/) and first wired the ops onto BMOCBmoc::or/and/minus(all public, packed-u64cell lists, z-order merges).orandandworked and passed parity tests, butminushangs (infinite loop) on mixed-order MOC inputs — reproduced deterministically with two normalized polygon covers (190 and 182 cells);union/intersectionreturned instantly,differencenever terminated. That makes the BMOC path only partially usable.Rather than ship a fragile mix (two ops on BMOC, one in-house), I implemented all three in-house as the same single-pass sorted-range linear merge that the existing
moc::normalizealready uses:morton_to_ranges—normalizethe cover, map each cell to its half-open[start, end)range on the uniformMAX_DEPTH=18grid, sort by range, coalesce touching runs (note:normalizereturns morton sorted by signed value, which is not grid-range order once depths mix or the southern sign flips, so the ranges are re-sorted explicitly — this was a bug in the first cut, now covered by tests).ranges_to_morton— greedy largest-aligned-cell decomposition of each result range, thennormalizeto collapse quartets spanning adjacent ranges.Signed-morton handling stays at the boundary (
mort2nesteddecode → range op →nested2mortre-encode), as the existing code does. This is still Option D: no new dependency, no second HEALPix crate.Phases
union/intersection/differenceover morton covers (Rust + numpy bindings + Python wrappers + tests). Done.normalizeonto a range/pack-style fast path if it's a clean win. Deferred per the issue's guidance to propose rather than do;normalizeis unchanged here.How tested (real counts)
cargo test(full): 151 passed, 0 failed. Newmoc::tests:test_union_equals_normalize_concat(bit-for-bitunion == normalize(concat)),test_intersection_brute_force/test_difference_brute_force(vs leaf-set expansion at the deepest order, the patternnormalize's reference test uses),test_mixed_order_inputs,test_setops_empty_inputs,test_setops_both_hemispheres, andtest_setops_match_brute_force_random(200 random mixed-order cover pairs, all three ops checked against the densified leaf-set ground truth +union == normalize(concat)).cargo bench --no-run: clean (Finished bench profile, 5m38s). No bench call sites changed (no bench referencesmoc::).cargo clippy --lib: moc.rs clean. lib.rs shows only the pre-existing tree-wideuseless_conversionwarning (the?-on-to_vec()idiom every binding in the file already uses); my additions mirror it — see Questions for review.cargo fmt --check: moc.rs and my lib.rs additions clean. Pre-existing fmt drift in unrelated files (benches/*,buffer.rs,cell_geom.rs,coverage/tests.rs,decimal_morton.rs,sphere.rs, and earlier lib.rs lines) left untouched — see Questions for review.flake8 mortie --select=E9,F63,F7,F82: clean (exit 0).pytest -v(full suite): 259 passed, 8 skipped, 2 warnings (warnings pre-existing: a pytest class-fixture deprecation). New PythonTestMOCSetOps(13 tests): union==compress(concat) bit-for-bit, brute-force intersection/difference vs flat sets viamoc_to_order, disjoint covers, self-difference empty, mixed-order, empty inputs, southern hemisphere.Questions for review
union/intersection/difference(a: &[i64], b: &[i64]) -> Vec<i64>; Pythonmoc_union(a, b),moc_intersection(a, b),moc_difference(a, b)returning sorted compactint64. Naming mirrorscompress_moc/moc_to_order. Happy to rename (e.g.moc_or/moc_and/moc_minus) if you prefer the BMOC verbs.minushangs on mixed-order input. If you'd rather use BMOCor/andfor those two and only dominusin-house, say so — but I think one uniform code path is more maintainable. (Nonot/xorexposed; not requested.)4^18); both hemispheres covered by tests. The southern sign flip is whymorton_to_rangesre-sorts by grid range rather than trustingnormalize's morton order.useless_conversionclippy warnings andrustfmtdrift in files I didn't change, per "don't fix unrelated CI noise." Flagging in case you want a separate cleanup pass.Generated by Claude Code