Skip to content

fix(perf): use single polars-bio overlap (no xprod) in gvl.Table#158

Merged
d-laub merged 7 commits into
mainfrom
fix/table-polars-bio-no-xprod
May 9, 2026
Merged

fix(perf): use single polars-bio overlap (no xprod) in gvl.Table#158
d-laub merged 7 commits into
mainfrom
fix/table-polars-bio-no-xprod

Conversation

@d-laub
Copy link
Copy Markdown
Collaborator

@d-laub d-laub commented May 9, 2026

Summary

  • Replace per-sample polars-bio loop / pyranges cross-product in Table.count_intervals and Table._intervals_from_offsets with a single pb.overlap(queries, contig_subset_filtered_to_samples) call. Drops pyranges as a dependency entirely; removes the Python ≥ 3.12 / pyranges1 version gate.
  • Bench (experiments/bench_table_overlap, 5 backends × 3 sizes × 20 trials): polars_bio no_xprod is fastest at every scale (0.173 s on 2k×200×500 vs 0.422 s for the per-sample loop, 1.543 s for pyranges1 xprod) and uses ~half the memory of either pyranges variant.
  • Implementation polish: vectorized intra-cell rank (np.arange - np.repeat(boundaries, counts)), pl.Series.replace_strict for sample→index mapping, np.lexsort instead of a polars sort round-trip, hoisted numpy to module scope.

Test Plan

  • `pixi run -e dev pytest tests/test_table.py -v` — 15 pass (incl. `test_table_count_intervals_matches_brute_force` and `test_table_intervals_from_offsets_roundtrip`)
  • `pixi run -e dev pytest tests/dataset/test_write_tracks.py -v` — 4 pass
  • `pixi run -e dev ruff check python/genvarloader/_table.py` — clean
  • Optional: `pixi run -e py312 pytest tests/test_table.py -v` to confirm polars-bio resolves on py312 (was previously gated to pyranges1)

🤖 Generated with Claude Code

@d-laub d-laub merged commit 34ab325 into main May 9, 2026
5 checks passed
@d-laub d-laub deleted the fix/table-polars-bio-no-xprod branch May 9, 2026 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant