Skip to content

choose_exonic_variants TypingError on 2-D geno_offsets (chr2, gvl 0.24.1) #169

@bschilder

Description

@bschilder

Summary

Dataset.open(..., splice_info=..., var_filter="exonic") crashes at the first transcript fetch on certain SparseVar inputs with:

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<class 'slice'>) found for signature:

 >>> slice(array(int64, 1d, C), array(int64, 1d, C))

File "python/genvarloader/_dataset/_genotypes.py", line 473:
def choose_exonic_variants(
    <source elided>
            o_s, o_e = geno_offsets[o_idx], geno_offsets[o_idx + 1]
            qh_genos = geno_v_idxs[o_s:o_e]
            ^

Root cause

choose_exonic_variants has two prange loops over the same indices. The first loop branches on geno_offsets.ndim (lines 455-458) — handling both 1-D (total_variants + 1,) and 2-D (total_variants, 2) layouts:

if geno_offsets.ndim == 1:
    o_s, o_e = geno_offsets[o_idx], geno_offsets[o_idx + 1]
else:
    o_s, o_e = geno_offsets[o_idx]

The second loop (lines 471-473) is unconditional:

o_s, o_e = geno_offsets[o_idx], geno_offsets[o_idx + 1]
qh_genos = geno_v_idxs[o_s:o_e]

When numba compiles with geno_offsets.ndim == 2, the second expression returns length-2 arrays, and geno_v_idxs[o_s:o_e] becomes slice(array, array) — invalid.

The sibling filter_af kernel (lines 549-552) correctly branches on ndim in the same shape regime, so the bug is a local asymmetry, not a missing API change.

Environment

genvarloader 0.24.1 (PyPI) — also reproduces on main
genoray 2.4.0
numba matched to pod's pytorch image (runpod/pytorch:1.0.2-cu1281-torch280-ubuntu2404)
OS Ubuntu 24.04
Input 1KG NYGC 30x phased SparseVar (3,202 samples), splice_info=("transcript_id", "exon_number"), var_filter="exonic"

Why we hit it only on chr2

chr1, chr3-chr22 in the same cohort materialise geno_offsets as 1-D — the first loop's else branch is never compiled, and the second loop's scalar arithmetic is always type-correct. chr2 alone materialises geno_offsets as 2-D, which exercises the broken path. The shape-divergence cause is likely a sparse-storage size threshold inside genoray; that's a separate question and not on the critical path to fix this kernel.

Fix

Mirror the ndim guard from the first loop into the second loop. Two-line change. Submitted as PR — see linked PR.

Reproduction

The submitted PR includes a unit test (tests/dataset/genotypes/test_choose_exonic_variants.py) constructing minimal 1-D and 2-D geno_offsets fixtures and asserting equal keep/keep_offsets output. The 2-D test crashes the function pre-fix and passes post-fix.

Related

  • PR #151 (splicing reconstruction fix, merged) — prior work in this code area.
  • Issue #164 (chr2 gvl.write stalls) — different code path, same chrom; closed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions