fix: ndim guard on geno_offsets in choose_exonic_variants second loop#170
Merged
d-laub merged 1 commit intoMay 21, 2026
Merged
Conversation
The first prange loop in `choose_exonic_variants` and the sibling
`filter_af` kernel both branch on `geno_offsets.ndim == 1` before
indexing, but the second prange loop (lines ~471-473) unconditionally
used scalar arithmetic: `o_s, o_e = geno_offsets[o_idx], geno_offsets[
o_idx + 1]; qh_genos = geno_v_idxs[o_s:o_e]`.
When the caller passes a 2-D `geno_offsets` (shape `(total_variants,
2)`, an internal storage variant of the canonical 1-D layout),
`geno_offsets[o_idx]` returns a length-2 row instead of a scalar.
`geno_v_idxs[<row>:<row>]` then tries `slice(array(int64, 1d, C),
array(int64, 1d, C))` and numba bails at JIT-compile time:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<class 'slice'>) found for signature:
>>> slice(array(int64, 1d, C), array(int64, 1d, C))
Repro: any `Dataset.open(..., splice_info=..., var_filter="exonic")`
on a `.gvl/.svar` pair whose `geno_offsets` happens to materialize as
2-D. We hit it on chr2 of 1KG NYGC 30x (3,202 samples, MANE Select
panel); chr1 and chr3-22 materialise 1-D `geno_offsets` and didn't.
Fix mirrors the existing first-loop guard. Added regression test in
`tests/dataset/genotypes/test_choose_exonic_variants.py` exercising
both 1-D and 2-D layouts; both produce the same logical keep mask.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #169.
Summary
choose_exonic_variantshas twoprangeloops indexinggeno_offsets. The first branches ongeno_offsets.ndim(lines 455-458) — handling both the canonical 1-D(total_variants + 1,)and the alternate 2-D(total_variants, 2)storage. The second (lines 471-473) was unconditional scalar arithmetic, so when numba compiled with a 2-Dgeno_offsetsit producedslice(array(int64, 1d, C), array(int64, 1d, C))and bailed at JIT time.Mirror the existing first-loop guard into the second loop. Two-line change.
Why this matters
Reproduced on chr2 of the 1KG NYGC 30x phased panel (3,202 samples × MANE Select panel). chr1 and chr3-chr22 from the same cohort materialise 1-D
geno_offsets, so they never exercise the broken path and the bug has been latent. Thefilter_afkernel in the same file already has the symmetric guard — confirming this is a local asymmetry rather than a missing API change.Test
Adds
tests/dataset/genotypes/test_choose_exonic_variants.py:geno_offsetsbaseline (the always-working layout).geno_offsets(pre-fix: numbaTypingErrorat compile; post-fix: same output as 1-D).Both call the public
choose_exonic_variantsdirectly with minimal fixtures (1 region × ploidy 2 × 2 variants).Related
gvl.writestalls; different code path, same chrom)🤖 Generated with Claude Code