refactor(reconstruct): ReconstructionRequest + restructure _get_haplotypes (PR5)#188
Merged
Merged
Conversation
…types Introduce a `ReconstructionRequest` frozen dataclass holding the per-batch prep state (geno_offset_idx, regions, shifts, out_offsets, diffs, hap_lengths, keep, keep_offsets, splice_plan). Extract `Haps._prepare_request` out of `get_haps_and_shifts` to compute it. Replace `Haps._get_haplotypes` (2 @overload stubs + 1 implementation spanning ~167 lines) with two named methods on `Haps`: - `_reconstruct_haplotypes(req) -> Ragged[bytes_]` - `_reconstruct_annotated_haplotypes(req) -> (haps, var_idxs, positions)` The shared splice-plan permutation prep is extracted into `_permute_request_for_splice(req)`. `get_haps_and_shifts` becomes a two-stage operation: prep → dispatch on `self.kind`. The 7-tuple return shape is preserved so `HapsTracks.__call__` consumers stay unchanged. Decoupling win: a future variant-major reconstructor can construct a `ReconstructionRequest` directly (without going through region-major `_prepare_request`) and invoke the kernel-facing methods. This PR also folds in a spec audit that disproved two of the original PR5 premises: 1. The "haplotype-reconstruction kernel" the spec wanted extracted is already a pure numba function at `_dataset/_genotypes.py: reconstruct_haplotypes_from_sparse`, taking raw arrays with no `Haps`/`Dataset` coupling. 2. `Tracks.write_transformed_track` raises `NotImplementedError` unconditionally on line 1189; the 145 lines below are unreachable dead code, with zero insertion-fill branching. Carved out as a separate small follow-up (PR5c) to delete the dead body. The campaign spec is updated inline to document these findings. Behavior preserving; kernel calls unchanged; full test suite (488 pytest + cargo) green; pyrefly + ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ReconstructionRequest(frozen dataclass) holding the per-batch prep state.Haps._prepare_requestout ofget_haps_and_shiftsto compute it.Haps._get_haplotypes(2@overloadstubs + 1 impl, ~167 lines) with two named methods:_reconstruct_haplotypes(req)and_reconstruct_annotated_haplotypes(req). Overload triplet eliminated._permute_request_for_splice(req).get_haps_and_shiftsbecomes a two-stage op (_prepare_request→ dispatch onself.kind). 7-tuple return shape preserved soHapsTracks.__call__consumers stay unchanged.Decoupling win: a future variant-major reconstructor can construct a
ReconstructionRequestdirectly and invoke the kernel-facing methods, without going through region-major_prepare_request.Spec audit folded in
Before implementing, I audited the original PR5 spec premises against the code and disproved two:
_dataset/_genotypes.py:reconstruct_haplotypes_from_sparseis a pure numba function taking raw arrays. A future variant-major class can call it directly.Tracks.write_transformed_trackis 166 lines of insertion-fill branching" — wrong. Line 1189 unconditionally raisesNotImplementedError; the 145 lines below are unreachable dead code with zero insertion-fill branching. Carved out as PR5c (next, small) — delete the dead body.Spec at
docs/superpowers/specs/2026-05-23-refactor-campaign-design.mdis updated inline to reflect these findings.Numerical parity
The kernel calls (
reconstruct_haplotypes_from_sparse) are unchanged — same args, same library call. The change is a reorganization of the Python wrapper. Parity is verified by the existing test suite (488 pytest + cargo). The original spec's "medium parity risk" assessment turned out to be overstated.Test plan
pixi run -e dev test(488 pytest + cargo) — all passpixi run -e dev ruff check python/— cleanpixi run -e dev pyrefly check— 0 errors (baseline unchanged)Part of the refactor campaign tracked in
docs/superpowers/specs/2026-05-23-refactor-campaign-design.md.🤖 Generated with Claude Code