Skip to content

refactor(reconstruct): ReconstructionRequest + restructure _get_haplotypes (PR5)#188

Merged
d-laub merged 2 commits into
mainfrom
refactor/pr5-reconstruction-request
May 24, 2026
Merged

refactor(reconstruct): ReconstructionRequest + restructure _get_haplotypes (PR5)#188
d-laub merged 2 commits into
mainfrom
refactor/pr5-reconstruction-request

Conversation

@d-laub
Copy link
Copy Markdown
Collaborator

@d-laub d-laub commented May 24, 2026

Summary

  • Introduce ReconstructionRequest (frozen dataclass) holding the per-batch prep state.
  • Extract Haps._prepare_request out of get_haps_and_shifts to compute it.
  • Replace Haps._get_haplotypes (2 @overload stubs + 1 impl, ~167 lines) with two named methods: _reconstruct_haplotypes(req) and _reconstruct_annotated_haplotypes(req). Overload triplet eliminated.
  • Shared splice-plan permutation prep is extracted into _permute_request_for_splice(req).
  • get_haps_and_shifts becomes a two-stage op (_prepare_request → dispatch on self.kind). 7-tuple return shape preserved so HapsTracks.__call__ consumers stay unchanged.

Decoupling win: a future variant-major reconstructor can construct a ReconstructionRequest directly and invoke the kernel-facing methods, without going through region-major _prepare_request.

Spec audit folded in

Before implementing, I audited the original PR5 spec premises against the code and disproved two:

  1. "Extract the haplotype kernel" — already done. _dataset/_genotypes.py:reconstruct_haplotypes_from_sparse is a pure numba function taking raw arrays. A future variant-major class can call it directly.
  2. "Tracks.write_transformed_track is 166 lines of insertion-fill branching" — wrong. Line 1189 unconditionally raises NotImplementedError; the 145 lines below are unreachable dead code with zero insertion-fill branching. Carved out as PR5c (next, small) — delete the dead body.

Spec at docs/superpowers/specs/2026-05-23-refactor-campaign-design.md is updated inline to reflect these findings.

Numerical parity

The kernel calls (reconstruct_haplotypes_from_sparse) are unchanged — same args, same library call. The change is a reorganization of the Python wrapper. Parity is verified by the existing test suite (488 pytest + cargo). The original spec's "medium parity risk" assessment turned out to be overstated.

Test plan

  • pixi run -e dev test (488 pytest + cargo) — all pass
  • pixi run -e dev ruff check python/ — clean
  • pixi run -e dev pyrefly check — 0 errors (baseline unchanged)

Part of the refactor campaign tracked in docs/superpowers/specs/2026-05-23-refactor-campaign-design.md.

🤖 Generated with Claude Code

d-laub and others added 2 commits May 23, 2026 20:51
…types

Introduce a `ReconstructionRequest` frozen dataclass holding the per-batch
prep state (geno_offset_idx, regions, shifts, out_offsets, diffs,
hap_lengths, keep, keep_offsets, splice_plan). Extract `Haps._prepare_request`
out of `get_haps_and_shifts` to compute it.

Replace `Haps._get_haplotypes` (2 @overload stubs + 1 implementation
spanning ~167 lines) with two named methods on `Haps`:

- `_reconstruct_haplotypes(req) -> Ragged[bytes_]`
- `_reconstruct_annotated_haplotypes(req) -> (haps, var_idxs, positions)`

The shared splice-plan permutation prep is extracted into
`_permute_request_for_splice(req)`. `get_haps_and_shifts` becomes a
two-stage operation: prep → dispatch on `self.kind`. The 7-tuple return
shape is preserved so `HapsTracks.__call__` consumers stay unchanged.

Decoupling win: a future variant-major reconstructor can construct a
`ReconstructionRequest` directly (without going through region-major
`_prepare_request`) and invoke the kernel-facing methods.

This PR also folds in a spec audit that disproved two of the original
PR5 premises:

1. The "haplotype-reconstruction kernel" the spec wanted extracted is
   already a pure numba function at `_dataset/_genotypes.py:
   reconstruct_haplotypes_from_sparse`, taking raw arrays with no
   `Haps`/`Dataset` coupling.

2. `Tracks.write_transformed_track` raises `NotImplementedError`
   unconditionally on line 1189; the 145 lines below are unreachable
   dead code, with zero insertion-fill branching. Carved out as a
   separate small follow-up (PR5c) to delete the dead body.

The campaign spec is updated inline to document these findings.

Behavior preserving; kernel calls unchanged; full test suite
(488 pytest + cargo) green; pyrefly + ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@d-laub d-laub merged commit 0d36944 into main May 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant