Phase 3 close-out: main merge, missing-kernel ports, fused annot/splice haps, seqpro 0.20#246
Merged
Merged
Conversation
Kernel-clip both backends (numba + rust) so intervals starting before the per-read jittered query window paint correctly. Rejects the two upstream fixes in the issue (write-clip breaks left-jitter; read-side expanded-query is a redesign). Adds oracle test + widens parity strategy to cover negative interval starts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tracks + max_jitter>0 store intervals against a jitter-expanded window, but the read path queries the original chromStart, so left-edge intervals have start < query_start. numba wrapped the negative index (silent wrong output); rust hit debug_assert / bounds panic. Both kernels now clip to the query window (s=max(start,0), e=min(end,length)), which is correct and jitter-preserving. Adds cross-backend oracle tests + rust unit tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…242) Widen the hypothesis strategy so the first interval may start before the query (negative relative start), exercising the case that previously violated the kernel contract. numba↔rust parity holds. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The module doc-comment described end-clamping but not the new left-clamp (s = max(start, 0)) added for sub-query interval starts. Sync it with the inline comment so the documented semantics match the code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…k-jitter-clip fix(intervals): clip sub-query interval starts in both kernels (#242)
When a Dataset has been sample-subsetted (non-identity sample_subset_idxs) AND output is spliced, SpliceIndexer.parse_idx applied the sample-subset map twice: 1. Via self.dsi._s_idx[s_idx] within parse_idx, which maps output-space sample positions to on-disk positions. 2. Again via self.dsi.parse_idx((r_idx, s_idx)) at the end, which applies _s_idx a second time to what are already on-disk positions. With the MMRF consensus dataset this caused sample MMRF_2702 (svar pos 54, GVL sorted 625) to receive MMRF_1395's NRAS G12D mutation because sample_subset_idxs[625] mapped to MMRF_1395. Fix: after the unravel/exon-expand step, r_idx and s_idx already hold full-dataset positions. Compute the flat storage index directly via ravel_multi_index using full_region_idxs, bypassing the second _s_idx application that parse_idx would otherwise inject. Adds regression test test_splice_indexer_subset.py that constructs a SpliceIndexer with sample_subset_idxs=[2,4] over 5 on-disk samples, verifies all three index paths (slice×slice, scalar×scalar, no-subset) return the correct on-disk sample positions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ramble fix: SpliceIndexer double-applies sample-subset map (spliced+subset sample scramble)
…pro 0.20 Design for: merge origin/main (#242/#244 clip fix + #243 splice-subset fix) into the branch, lift the now-obsolete #242 xfails, port Reference.fetch to rust, fuse the annotated/splice haps paths, bump seqpro 0.18->0.20 with to_numpy(validate=False) adoption, and reconcile the roadmap honestly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…op dead _fetch_* numba Reroute Reference.fetch to build a (n,3) regions array and call the module-level get_reference dispatcher (rust-default) instead of the private _fetch_impl_par/_fetch_impl_ser numba pair. Delete the now-dead _fetch_row, _fetch_impl_par, _fetch_impl_ser functions and update the unit test that directly imported them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…arity) Adds `reconstruct_annotated_haplotypes_fused` Rust FFI entry that combines diff-computation, output-length allocation, and reconstruction into one crossing, returning (out_data, annot_v, annot_pos, out_offsets). Routes the non-splice annotated haplotypes Python branch to this kernel when GVL_BACKEND=rust (default); numba branch unchanged. Parity test updated to spy the new fused entry and verify byte-identical (haps + var_idxs + ref_coords) across both backends. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ity) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…orm read-path sites Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…splice fusion scope Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…imization targets Replace the stale 500-batch-script numbers (~37 haps / ~20 tracks) with same-harness pytest-benchmark e2e results at HEAD on both backends: rust now within ~10-17% of numba on haps/tracks (0.85-0.90x), 0.65x on the new annotated path. py-spy --native profile of the rust annotated ds[r,s] (43k samples) ranks Phase 5 targets: (1) hoist per-batch ascontiguousarray of dataset-static arrays (~21%), (2) skip output-buffer zeroing (~8%), (3) scratch-pool the per-call allocs (~6%), (4) fold reverse_complement into the kernel. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… is a rust-only scalability defect
Profiling + a per-batch ascontiguousarray copy-trace revealed the ~20% self-time leaf is NOT
static-array churn but the fused track path materializing the full per-sample-scale interval
record store every batch: intervals are an array-of-structs memmap ({start:i4,end:i4,value:f4},
itemsize 12), so .starts/.ends/.values are strided field views; np.ascontiguousarray copies the
whole store (GB-scale / OOM at >1M samples). The numba path reads the strided views with no copy,
so this is a rust regression. Fix: Rust reads the contiguous record buffer directly (zero-copy).
Genotype memmap is the same pattern but currently benign (contiguous int32 -> no-op). Per-variant
arrays (sub-linear in samples) may be cached; per-sample-scale memmaps must never be materialized.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 3 close-out
Brings
phase-3-reconstructionto an honest, fully-rust-default state. Closes out the Phase 3 reconstruction + track-realignment migration.What changed (7 tasks, each parity-gated)
origin/main— pulls in the Tracks + max_jitter>0: silent wrong track output (numba) / panic (rust) from interval/query coordinate mismatch #242intervals_to_trackssub-query-start clip fix (PR fix(intervals): clip sub-query interval starts in both kernels (#242) #244, both backends) and the fix: SpliceIndexer double-applies sample-subset map (spliced+subset sample scramble) #243 SpliceIndexer subset double-apply fix (PR fix: SpliceIndexer double-applies sample-subset map (spliced+subset sample scramble) #243). The fused tracks kernel inherits the clip fix via the sharedintervals::intervals_to_trackscore._REASON_242constants + 10@pytest.mark.xfaildecorators; themax_jitter>0interval domain is now real, asserted coverage on both backends. The genuine trailing-under-writeassume(False)parity guards (a separate, still-live numba-undefined domain) were correctly retained.Reference.fetchthrough the dispatched rustget_reference; deleted the 3 zero-caller_fetch_*numba functions. Newtests/parity/test_reference_fetch_parity.pybackstop.reconstruct_annotated_haplotypes_fused(rust): the plain fused kernel + two i32 annotation buffers, reusing the shared reconstruct core. Byte-identical to the composed numba oracle (haps + var_idxs + ref_coords).reconstruct_haplotypes_spliced_fused(rust): the plain fused kernel minus the diff/offset computation (takes the precomputed spliceout_offsets). Newtests/parity/test_spliced_haplotypes_parity.py. (The annotated+spliced intersection remains on the unfused dispatched rust core — parity-gated and rust-by-default — with fusion deferred to Phase 5.)to_numpy(validate=False)at the one guaranteed-uniform read-path site (_reference.pyfixed-length branch). The seqpro-core 0.1.0Raggedlayout remains compatible (cargo + parity green).Verification
GVL_BACKEND=rust932 passed / 12 skipped / 5 xfailed / 0 failed;GVL_BACKEND=numbaidentical. The 5 xfails are all pre-existing non-Tracks + max_jitter>0: silent wrong track output (numba) / panic (rust) from interval/query coordinate mismatch #242 markers.__init__.py__all__unchanged vsorigin/main) →skills/genvarloader/SKILL.mdneeds no update.Phase 5 carry-forward (non-blocking)
output_lengththroughReconstructionRequest(retire the_out_per == hap_lengthsragged/fixed heuristic).os.environ["GVL_BACKEND"]reads in_haps.pythrough a single dispatch-aware helper.genvarloader.pyi(currently stubs only the 3 bigwig functions).🤖 Generated with Claude Code