Skip to content

MOC set-algebra (union/intersection/difference) via sorted-range merge (phase 1 of #50)#52

Closed
espg wants to merge 3 commits into
mainfrom
claude/50-bmoc-setops
Closed

MOC set-algebra (union/intersection/difference) via sorted-range merge (phase 1 of #50)#52
espg wants to merge 3 commits into
mainfrom
claude/50-bmoc-setops

Conversation

@espg

@espg espg commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Refs #50

Implements the Option D decision from #50: fast boolean set-algebra over mortie's morton-index i64 covers with no new crate dependency (no moc, no cdshealpix, no second HEALPix implementation, no FITS/IVOA serialization).

What this adds

Three set operations over two mixed-order morton-cell arrays (the existing public MOC representation moc.rs operates on — signed decimal morton, order ≤ 18):

  • union(a, b) — cells covering either input (equivalent to compress_moc(concat(a, b))).
  • intersection(a, b) — cells covered by both (empty when disjoint).
  • difference(a, b) — cells of a not covered by b.

Each returns a canonical compact MOC (sorted mixed-order int64).

Rust (src_rust/src/moc.rs): pub fn union/intersection/difference(a: &[i64], b: &[i64]) -> Vec<i64>.
Python bindings (src_rust/src/lib.rs): rust_moc_union, rust_moc_intersection, rust_moc_difference (numpy i64 in/out, compute wrapped in py.allow_threads, mirroring the existing rust_moc_normalize style).
Python API (mortie/coverage.py, exported from mortie/__init__.py): moc_union, moc_intersection, moc_difference — thin wrappers mirroring compress_moc / moc_to_order.

Before, mortie could only union via Python np.union1d + compress_moc; there was no Rust intersection/difference.

Backing path taken: in-house sorted-range merge (not BMOC)

The issue allows either the healpix 0.3.2 BMOC ops if public and usable, or an in-house sorted-range merge. I inspected the crate (~/.cargo/registry/.../healpix-0.3.2/src/bmoc/) and first wired the ops onto BMOC Bmoc::or/and/minus (all public, packed-u64 cell lists, z-order merges). or and and worked and passed parity tests, but minus hangs (infinite loop) on mixed-order MOC inputs — reproduced deterministically with two normalized polygon covers (190 and 182 cells); union/intersection returned instantly, difference never terminated. That makes the BMOC path only partially usable.

Rather than ship a fragile mix (two ops on BMOC, one in-house), I implemented all three in-house as the same single-pass sorted-range linear merge that the existing moc::normalize already uses:

  1. morton_to_rangesnormalize the cover, map each cell to its half-open [start, end) range on the uniform MAX_DEPTH=18 grid, sort by range, coalesce touching runs (note: normalize returns morton sorted by signed value, which is not grid-range order once depths mix or the southern sign flips, so the ranges are re-sorted explicitly — this was a bug in the first cut, now covered by tests).
  2. Linear merge of the two sorted disjoint range lists (OR / AND / subtraction), all O(n+m).
  3. ranges_to_morton — greedy largest-aligned-cell decomposition of each result range, then normalize to collapse quartets spanning adjacent ranges.

Signed-morton handling stays at the boundary (mort2nested decode → range op → nested2mort re-encode), as the existing code does. This is still Option D: no new dependency, no second HEALPix crate.

Phases

  • Phase 1union/intersection/difference over morton covers (Rust + numpy bindings + Python wrappers + tests). Done.
  • Phase 2 (proposed, not done) — swap normalize onto a range/pack-style fast path if it's a clean win. Deferred per the issue's guidance to propose rather than do; normalize is unchanged here.

How tested (real counts)

  • cargo test (full): 151 passed, 0 failed. New moc::tests: test_union_equals_normalize_concat (bit-for-bit union == normalize(concat)), test_intersection_brute_force / test_difference_brute_force (vs leaf-set expansion at the deepest order, the pattern normalize's reference test uses), test_mixed_order_inputs, test_setops_empty_inputs, test_setops_both_hemispheres, and test_setops_match_brute_force_random (200 random mixed-order cover pairs, all three ops checked against the densified leaf-set ground truth + union == normalize(concat)).
  • cargo bench --no-run: clean (Finished bench profile, 5m38s). No bench call sites changed (no bench references moc::).
  • cargo clippy --lib: moc.rs clean. lib.rs shows only the pre-existing tree-wide useless_conversion warning (the ?-on-to_vec() idiom every binding in the file already uses); my additions mirror it — see Questions for review.
  • cargo fmt --check: moc.rs and my lib.rs additions clean. Pre-existing fmt drift in unrelated files (benches/*, buffer.rs, cell_geom.rs, coverage/tests.rs, decimal_morton.rs, sphere.rs, and earlier lib.rs lines) left untouched — see Questions for review.
  • flake8 mortie --select=E9,F63,F7,F82: clean (exit 0).
  • pytest -v (full suite): 259 passed, 8 skipped, 2 warnings (warnings pre-existing: a pytest class-fixture deprecation). New Python TestMOCSetOps (13 tests): union==compress(concat) bit-for-bit, brute-force intersection/difference vs flat sets via moc_to_order, disjoint covers, self-difference empty, mixed-order, empty inputs, southern hemisphere.

Questions for review

  1. Public API signatures — Rust union/intersection/difference(a: &[i64], b: &[i64]) -> Vec<i64>; Python moc_union(a, b), moc_intersection(a, b), moc_difference(a, b) returning sorted compact int64. Naming mirrors compress_moc/moc_to_order. Happy to rename (e.g. moc_or/moc_and/moc_minus) if you prefer the BMOC verbs.
  2. Backing path — I chose the in-house range merge over BMOC because the crate's minus hangs on mixed-order input. If you'd rather use BMOC or/and for those two and only do minus in-house, say so — but I think one uniform code path is more maintainable. (No not/xor exposed; not requested.)
  3. Signed-morton edge cases — ops never produce order-0 cells from order≥1 inputs (range decomposition can't merge across base-cell boundaries since those align to 4^18); both hemispheres covered by tests. The southern sign flip is why morton_to_ranges re-sorts by grid range rather than trusting normalize's morton order.
  4. Pre-existing lint/fmt drift — left the tree-wide useless_conversion clippy warnings and rustfmt drift in files I didn't change, per "don't fix unrelated CI noise." Flagging in case you want a separate cleanup pass.

Generated by Claude Code

@espg espg added the implement label Jun 18, 2026
Comment thread mortie/__init__.py

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <I001> reported by reviewdog 🐶
Import block is un-sorted or un-formatted

from .tools import (
order2res,
res2display,
unique2parent,
heal_norm,
VaexNorm2Mort,
fastNorm2Mort,
geo2uniq,
clip2order,
geo2mort,
# Inverse functions
infer_order_from_morton,
validate_morton,
mort2norm,
norm2uniq,
uniq2geo,
mort2geo,
mort2bbox,
mort2polygon,
generate_morton_children,
mort2healpix,
morton_buffer,
morton_buffer_meters,
)
# Import coverage functions
from .coverage import (
morton_coverage,
morton_coverage_moc,
compress_moc,
moc_to_order,
moc_union,
moc_intersection,
moc_difference,
)
from .linestring import linestring_coverage
# Import prefix trie functions
from .prefix_trie import (
MortonChild,
split_children,
split_children_geo,
geo_morton_polygon,
morton_polygon,
morton_polygon_from_array,
)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude

Leaving this I001 as-is (not a wholesale isort reformat), for three reasons:

  1. mortie's lint contract is flake8, not ruff/isort (CLAUDE.md §7: "There is no black/ruff/mypy configured"). The gating ruff/flake8 CI job passes; reviewdog's isort rule is advisory and not part of the project's style authority.
  2. This PR didn't introduce the disorder. The __init__.py import region uses intentional logical grouping with section comments (# Import coverage functions, # Inverse functions, etc.), not alphabetical isort order — that predates this PR tree-wide. MOC set-algebra (union/intersection/difference) via sorted-range merge (phase 1 of #50) #52 only appends moc_union/moc_intersection/moc_difference to the existing .coverage block in their natural (union → intersection → difference) order.
  3. Satisfying I001 would require reordering the whole import section (modules and names), churning unrelated lines and breaking the logical grouping/comments — out of scope for a set-algebra PR.

Flagging rather than silencing — happy to do a dedicated import-sort pass tree-wide if you want one (it'd pair naturally with the queued fmt/clippy sweep). The one real review finding (an unused_mut on the new test closure) is fixed in a902afe.


Generated by Claude Code

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.71%. Comparing base (3d5a78f) to head (e0492eb).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   91.95%   92.71%   +0.75%     
==========================================
  Files          19       23       +4     
  Lines        2636     3268     +632     
==========================================
+ Hits         2424     3030     +606     
- Misses        212      238      +26     
Flag Coverage Δ
unittests 92.71% <100.00%> (+0.75%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
mortie/__init__.py 85.00% <ø> (+5.00%) ⬆️
mortie/coverage.py 96.73% <100.00%> (+0.63%) ⬆️
mortie/tests/test_coverage.py 98.79% <100.00%> (+0.27%) ⬆️

... and 4 files with indirect coverage changes


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d5a78f...e0492eb. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq

codspeed-hq Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will degrade performance by 18.29%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 1 regressed benchmark
✅ 66 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
coverage_triangle[4] 110.1 µs 134.7 µs -18.29%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/50-bmoc-setops (e0492eb) with main (5f9e90b)

Open in CodSpeed

@espg

espg commented Jun 18, 2026

Copy link
Copy Markdown
Owner Author

Rust union/intersection/difference(a: &[i64], b: &[i64]) -> Vec; Python moc_union(a, b), moc_intersection(a, b), moc_difference(a, b) returning sorted compact int64. Naming mirrors compress_moc/moc_to_order. Happy to rename (e.g. moc_or/moc_and/moc_minus) if you prefer the BMOC verbs.

moc_or / moc_and / moc_minus is terser, which I like. Use these, but document in the docstring the original equivalency names.

Backing path — I chose the in-house range merge over BMOC because the crate's minus hangs on mixed-order input. If you'd rather use BMOC or/and for those two and only do minus in-house, say so — but I think one uniform code path is more maintainable. (No not/xor exposed; not requested.)

Yes, pursue this. I'd like another sub-agent to track down what the issue is the upstream code with minus -- that said, regardless of what that finds, having this in house sounds great, so proceed.

Signed-morton edge cases — ops never produce order-0 cells from order≥1 inputs (range decomposition can't merge across base-cell boundaries since those align to 4^18); both hemispheres covered by tests. The southern sign flip is why morton_to_ranges re-sorts by grid range rather than trusting normalize's morton order.

...whats the question here? Keep in mind we're going to standardize on unsigned morton indices going forward, which will also standardize on support for full order range 0 to 29, rather than truncated at order 18.

Pre-existing lint/fmt drift — left the tree-wide useless_conversion clippy warnings and rustfmt drift in files I didn't change, per "don't fix unrelated CI noise." Flagging in case you want a separate cleanup pass.

I think this is being covered by issue #34 now (don't worry about it here).

Comment thread mortie/__init__.py

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <I001> reported by reviewdog 🐶
Import block is un-sorted or un-formatted

from .tools import (
order2res,
res2display,
unique2parent,
heal_norm,
VaexNorm2Mort,
fastNorm2Mort,
geo2uniq,
clip2order,
geo2mort,
# Inverse functions
infer_order_from_morton,
validate_morton,
mort2norm,
norm2uniq,
uniq2geo,
mort2geo,
mort2bbox,
mort2polygon,
generate_morton_children,
mort2healpix,
morton_buffer,
morton_buffer_meters,
)
# Import coverage functions
from .coverage import (
morton_coverage,
morton_coverage_moc,
compress_moc,
moc_to_order,
moc_or,
moc_and,
moc_minus,
)
from .linestring import linestring_coverage
# Import prefix trie functions
from .prefix_trie import (
MortonChild,
split_children,
split_children_geo,
geo_morton_polygon,
morton_polygon,
morton_polygon_from_array,
)

@espg

espg commented Jun 18, 2026

Copy link
Copy Markdown
Owner Author

🤖 from Claude

Phase 2 — rename the public Python API to the terser moc_or / moc_and / moc_minus (per the directive in #52 (comment)), pushed as phase 2 of issue #50 (commit e0492eb).

What changed (Python only):

  • mortie/coverage.py: moc_union → moc_or, moc_intersection → moc_and, moc_difference → moc_minus.
  • mortie/__init__.py: imports and __all__ updated to the new names.
  • mortie/tests/test_coverage.py: all TestMOCSetOps call sites and the one inline comment updated to the new names.

Rust left untouched (deliberate, to minimize churn/risk): the internal moc::union/intersection/difference and the PyO3 binding names rust_moc_union/rust_moc_intersection/rust_moc_difference are unchanged — only the public mortie.* surface renamed. The thin Python wrappers still call _rustie.rust_moc_* as before. No Rust rebuild was needed; the backing in-house range-merge algorithm is unchanged.

Docstring equivalency notes added to each renamed function, as requested:

  • moc_or — "set union of the two covers", and it retains the existing line noting it equals compress_moc(concatenate([a, b])); also "Originally named moc_union".
  • moc_and — "set intersection of the two covers"; "Originally named moc_intersection".
  • moc_minus — "set difference a \ b (the part of a not covered by b)"; "Originally named moc_difference".
  • See Also cross-references in all three updated to the new names.

Test counts (real):

  • pytest mortie/tests/test_coverage.py -q: 62 passed, 1 skipped.
  • pytest -q (full suite): 259 passed, 8 skipped, 2 warnings (warnings pre-existing).
  • flake8 mortie --select=E9,F63,F7,F82: clean. Style pass (--max-line-length=88) on my changed lines (581–665 of test_coverage.py) is clean; remaining E128/E501 hits are pre-existing in unrelated lines/files and left untouched.

Tree-wide grep confirms no remaining moc_union/moc_intersection/moc_difference references except the intentional rust_moc_* binding names and the "Originally named ..." docstring notes. No docs/README/notebook referenced the old names. PR stays draft with implement.


Generated by Claude Code

@espg espg left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 from Claude (review)

Fresh-context adversarial review of the phase 2 commit (e0492eb, "phase 2 of issue #50") — the pure rename of the public MOC set-op API. No actionable defects found; the rename is correct and complete.

1. Completeness — pass. Tree-wide grep for moc_union|moc_intersection|moc_difference leaves only the intentional artifacts:

  • src_rust/src/lib.rs (rust_moc_union/intersection/difference PyO3 bindings + the add_function registrations) — deliberately kept per the task to minimize Rust churn.
  • The three Originally named ... docstring notes in mortie/coverage.py.
  • The three _rustie.rust_moc_*(...) calls inside the wrappers.

Public surface fully renamed: coverage.py defs, __init__.py imports + __all__, and all test_coverage.py call sites. No stray references in README*, docs/, notebooks, .md/.rst files.

2. Directive compliance — pass. Each renamed function documents its original equivalency name and keeps the terse surrounding numpydoc style:

  • moc_or (coverage.py:268): "Originally named moc_union; this is the set union of the two covers."
  • moc_and (coverage.py:298): "Originally named moc_intersection; ..."
  • moc_minus (coverage.py:329): "Originally named moc_difference; ... a \\ b ..."

All three See Also cross-refs updated consistently to the new names (coverage.py:282, 313, 344).

3. Correctness — pass. No accidental binding swap. Each wrapper calls its matching Rust binding:

  • moc_orrust_moc_union (coverage.py:288)
  • moc_andrust_moc_intersection (coverage.py:319)
  • moc_minusrust_moc_difference (coverage.py:350)

4. Tests — pass. All TestMOCSetOps call sites exercise the new names; the stale comment at test_coverage.py:581 was updated to moc_or. No assertions weakened or removed — the rename is mechanical and behavior coverage (union==compress-concat, intersection/difference brute-force, disjoint, self-difference, mixed-order, empty-input, southern hemisphere) is unchanged.

5. Conventions — pass. Commit message is title-only (phase 2 of issue #50, §3); the explanatory note lives in docstrings/PR, not the commit body. No authorship credit in the message. All changed lines in coverage.py, __init__.py, and test_coverage.py are within the 88-char style limit.

Note (out of scope, not a defect): the Rust binding names still read rust_moc_union/intersection/difference while the Python surface is now or/and/minus. This is the explicit task decision to minimize churn, so flagging only for awareness — no change requested.

LGTM. Recommend proceeding.


Generated by Claude Code

@espg

espg commented Jun 18, 2026

Copy link
Copy Markdown
Owner Author

Phase 2 (proposed, not done) — swap normalize onto a range/pack-style fast path if it's a clean win. Deferred per the issue's guidance to propose rather than do; normalize is unchanged here.

Sounds reasonable.

This PR is blocked while we test (cleaner) version that fixed the upstream hang instead of reimplementing Bmoc; new PR will inherit design decisions from here

@espg espg added the blocked flag for claude label Jun 18, 2026
@espg

espg commented Jun 18, 2026

Copy link
Copy Markdown
Owner Author

superseded by #53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked flag for claude implement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants