Skip to content

v0.1.5 — chunked iteration: O(n²) → O(n)

Latest

Choose a tag to compare

@Starlitnightly Starlitnightly released this 30 May 16:36
· 23 commits to main since this release

Performance + correctness fixes for the chunked-read path. User-impacting because the previous behaviour turned ov.pp.scale on a million-cell h5ad into a ~one-day operation; the post-fix wall-clock is on the order of minutes for the same shape.

Fixes

  • chunked() on TransformedBackedArray: O(n²) → O(n) — the wrapped (normalize/log1p) chunked iterator was making each chunk read scan from row 0 of the Rust backend; combined with a stale _read_rows fallback this was quadratic in the number of chunks. (#3)
  • BackedArray._read_rows: use elem[s:e] as the primary slice path (the comment claiming PyArrayElem has no __getitem__ was outdated); fall back to the chunked scan only on exception. (#3)
  • chunked_scale refuses re-scale on an already-scaled X (the ScaledBackedArray subclass check that silently dropped the prior mean/std). (#3)
  • chunked_normalize_total NaN guardnp.nan_to_num(..., nan=1.0, posinf=1.0, neginf=1.0) on norm_factors before the == 0 check, before storing on adata.obs. (#3)
  • chunked_mean_var sparse-nativeE[X²] − E[X]² per batch via chunk.multiply(chunk).sum(axis=0) + Welford merge across batches. Eliminates the per-chunk densification that dominated wall-clock at atlas scale. (#3)

Tests

9 regression tests added in tests/test_chunked_perf.py + tests/test_chunked_correctness.py: position-independent slice reads, wrapped-vs-raw chunked ratio < 5×, chunk-content equivalence, double-scale refusal, zero-count cell NaN guard, sparse-native mean/var matches dense-Welford within float32-origin tolerance.

Wheels

Linux x86_64 · macOS x86_64 · macOS aarch64 + sdist. Trusted-publisher upload to PyPI.