Summary
Dataset.open(..., splice_info=..., var_filter="exonic") crashes at the first transcript fetch on certain SparseVar inputs with:
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<class 'slice'>) found for signature:
>>> slice(array(int64, 1d, C), array(int64, 1d, C))
File "python/genvarloader/_dataset/_genotypes.py", line 473:
def choose_exonic_variants(
<source elided>
o_s, o_e = geno_offsets[o_idx], geno_offsets[o_idx + 1]
qh_genos = geno_v_idxs[o_s:o_e]
^
Root cause
choose_exonic_variants has two prange loops over the same indices. The first loop branches on geno_offsets.ndim (lines 455-458) — handling both 1-D (total_variants + 1,) and 2-D (total_variants, 2) layouts:
if geno_offsets.ndim == 1:
o_s, o_e = geno_offsets[o_idx], geno_offsets[o_idx + 1]
else:
o_s, o_e = geno_offsets[o_idx]
The second loop (lines 471-473) is unconditional:
o_s, o_e = geno_offsets[o_idx], geno_offsets[o_idx + 1]
qh_genos = geno_v_idxs[o_s:o_e]
When numba compiles with geno_offsets.ndim == 2, the second expression returns length-2 arrays, and geno_v_idxs[o_s:o_e] becomes slice(array, array) — invalid.
The sibling filter_af kernel (lines 549-552) correctly branches on ndim in the same shape regime, so the bug is a local asymmetry, not a missing API change.
Environment
|
|
genvarloader |
0.24.1 (PyPI) — also reproduces on main |
genoray |
2.4.0 |
numba |
matched to pod's pytorch image (runpod/pytorch:1.0.2-cu1281-torch280-ubuntu2404) |
| OS |
Ubuntu 24.04 |
| Input |
1KG NYGC 30x phased SparseVar (3,202 samples), splice_info=("transcript_id", "exon_number"), var_filter="exonic" |
Why we hit it only on chr2
chr1, chr3-chr22 in the same cohort materialise geno_offsets as 1-D — the first loop's else branch is never compiled, and the second loop's scalar arithmetic is always type-correct. chr2 alone materialises geno_offsets as 2-D, which exercises the broken path. The shape-divergence cause is likely a sparse-storage size threshold inside genoray; that's a separate question and not on the critical path to fix this kernel.
Fix
Mirror the ndim guard from the first loop into the second loop. Two-line change. Submitted as PR — see linked PR.
Reproduction
The submitted PR includes a unit test (tests/dataset/genotypes/test_choose_exonic_variants.py) constructing minimal 1-D and 2-D geno_offsets fixtures and asserting equal keep/keep_offsets output. The 2-D test crashes the function pre-fix and passes post-fix.
Related
- PR #151 (splicing reconstruction fix, merged) — prior work in this code area.
- Issue #164 (chr2
gvl.write stalls) — different code path, same chrom; closed.
Summary
Dataset.open(..., splice_info=..., var_filter="exonic")crashes at the first transcript fetch on certain SparseVar inputs with:Root cause
choose_exonic_variantshas twoprangeloops over the same indices. The first loop branches ongeno_offsets.ndim(lines 455-458) — handling both 1-D(total_variants + 1,)and 2-D(total_variants, 2)layouts:The second loop (lines 471-473) is unconditional:
When numba compiles with
geno_offsets.ndim == 2, the second expression returns length-2 arrays, andgeno_v_idxs[o_s:o_e]becomesslice(array, array)— invalid.The sibling
filter_afkernel (lines 549-552) correctly branches onndimin the same shape regime, so the bug is a local asymmetry, not a missing API change.Environment
genvarloader0.24.1(PyPI) — also reproduces onmaingenoray2.4.0numbarunpod/pytorch:1.0.2-cu1281-torch280-ubuntu2404)splice_info=("transcript_id", "exon_number"),var_filter="exonic"Why we hit it only on chr2
chr1, chr3-chr22 in the same cohort materialise
geno_offsetsas 1-D — the first loop'selsebranch is never compiled, and the second loop's scalar arithmetic is always type-correct. chr2 alone materialisesgeno_offsetsas 2-D, which exercises the broken path. The shape-divergence cause is likely a sparse-storage size threshold insidegenoray; that's a separate question and not on the critical path to fix this kernel.Fix
Mirror the
ndimguard from the first loop into the second loop. Two-line change. Submitted as PR — see linked PR.Reproduction
The submitted PR includes a unit test (
tests/dataset/genotypes/test_choose_exonic_variants.py) constructing minimal 1-D and 2-Dgeno_offsetsfixtures and asserting equalkeep/keep_offsetsoutput. The 2-D test crashes the function pre-fix and passes post-fix.Related
gvl.writestalls) — different code path, same chrom; closed.