Skip to content

fix: ndim guard on geno_offsets in choose_exonic_variants second loop#170

Merged
d-laub merged 1 commit into
mcvickerlab:mainfrom
bschilder:fix/choose-exonic-variants-2d-offsets
May 21, 2026
Merged

fix: ndim guard on geno_offsets in choose_exonic_variants second loop#170
d-laub merged 1 commit into
mcvickerlab:mainfrom
bschilder:fix/choose-exonic-variants-2d-offsets

Conversation

@bschilder
Copy link
Copy Markdown
Collaborator

Closes #169.

Summary

choose_exonic_variants has two prange loops indexing geno_offsets. The first branches on geno_offsets.ndim (lines 455-458) — handling both the canonical 1-D (total_variants + 1,) and the alternate 2-D (total_variants, 2) storage. The second (lines 471-473) was unconditional scalar arithmetic, so when numba compiled with a 2-D geno_offsets it produced slice(array(int64, 1d, C), array(int64, 1d, C)) and bailed at JIT time.

Mirror the existing first-loop guard into the second loop. Two-line change.

Why this matters

Reproduced on chr2 of the 1KG NYGC 30x phased panel (3,202 samples × MANE Select panel). chr1 and chr3-chr22 from the same cohort materialise 1-D geno_offsets, so they never exercise the broken path and the bug has been latent. The filter_af kernel in the same file already has the symmetric guard — confirming this is a local asymmetry rather than a missing API change.

Test

Adds tests/dataset/genotypes/test_choose_exonic_variants.py:

  • 1-D geno_offsets baseline (the always-working layout).
  • 2-D geno_offsets (pre-fix: numba TypingError at compile; post-fix: same output as 1-D).

Both call the public choose_exonic_variants directly with minimal fixtures (1 region × ploidy 2 × 2 variants).

$ pixi run pytest tests/dataset/genotypes/test_choose_exonic_variants.py -v
============================== 2 passed in 33.22s ==============================

Related

🤖 Generated with Claude Code

The first prange loop in `choose_exonic_variants` and the sibling
`filter_af` kernel both branch on `geno_offsets.ndim == 1` before
indexing, but the second prange loop (lines ~471-473) unconditionally
used scalar arithmetic: `o_s, o_e = geno_offsets[o_idx], geno_offsets[
o_idx + 1]; qh_genos = geno_v_idxs[o_s:o_e]`.

When the caller passes a 2-D `geno_offsets` (shape `(total_variants,
2)`, an internal storage variant of the canonical 1-D layout),
`geno_offsets[o_idx]` returns a length-2 row instead of a scalar.
`geno_v_idxs[<row>:<row>]` then tries `slice(array(int64, 1d, C),
array(int64, 1d, C))` and numba bails at JIT-compile time:

    TypingError: Failed in nopython mode pipeline (step: nopython frontend)
    No implementation of function Function(<class 'slice'>) found for signature:
     >>> slice(array(int64, 1d, C), array(int64, 1d, C))

Repro: any `Dataset.open(..., splice_info=..., var_filter="exonic")`
on a `.gvl/.svar` pair whose `geno_offsets` happens to materialize as
2-D. We hit it on chr2 of 1KG NYGC 30x (3,202 samples, MANE Select
panel); chr1 and chr3-22 materialise 1-D `geno_offsets` and didn't.

Fix mirrors the existing first-loop guard. Added regression test in
`tests/dataset/genotypes/test_choose_exonic_variants.py` exercising
both 1-D and 2-D layouts; both produce the same logical keep mask.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-laub d-laub merged commit 44d6f79 into mcvickerlab:main May 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

choose_exonic_variants TypingError on 2-D geno_offsets (chr2, gvl 0.24.1)

2 participants