Skip to content

perf: numba based aggregations for sparse data#4062

Merged
ilan-gold merged 24 commits intomainfrom
ig/numba_agg_main
Apr 16, 2026
Merged

perf: numba based aggregations for sparse data#4062
ilan-gold merged 24 commits intomainfrom
ig/numba_agg_main

Conversation

@ilan-gold
Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold commented Apr 15, 2026

Now that I know #4013 won't be hurt if we use the acceleration in this PR in-memory, I'm redoing #4041 against main with a standalone benchmark.

  • Closes #
  • Tests included or not required because:

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 97.22222% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 78.63%. Comparing base (87dc1ec) to head (83c0668).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/scanpy/get/_aggregated.py 96.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4062      +/-   ##
==========================================
+ Coverage   78.61%   78.63%   +0.02%     
==========================================
  Files         117      118       +1     
  Lines       12713    12729      +16     
==========================================
+ Hits         9994    10010      +16     
  Misses       2719     2719              
Flag Coverage Δ
hatch-test.low-vers 77.94% <97.22%> (+0.02%) ⬆️
hatch-test.pre 78.52% <97.22%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/scanpy/get/_kernels.py 100.00% <100.00%> (ø)
src/scanpy/get/_aggregated.py 93.30% <96.77%> (+0.34%) ⬆️

@scverse-benchmark
Copy link
Copy Markdown

scverse-benchmark bot commented Apr 15, 2026

Benchmark changes

Change Before [87dc1ec] After [83c0668] Ratio Benchmark (Parameter)
- 4.91G 4.05G 0.82 preprocessing_counts.Agg.peakmem_agg('mean')
- 4.91G 4.05G 0.82 preprocessing_counts.Agg.peakmem_agg('sum')
- 5.83G 4.06G 0.7 preprocessing_counts.Agg.peakmem_agg('var')
- 887±5ms 639±0.8ms 0.72 preprocessing_counts.Agg.time_agg('count_nonzero')
- 548±0.4ms 90.5±0.8ms 0.17 preprocessing_counts.Agg.time_agg('mean')
- 550±0.7ms 88.4±1ms 0.16 preprocessing_counts.Agg.time_agg('sum')
- 1.37±0.01s 134±5ms 0.1 preprocessing_counts.Agg.time_agg('var')

Comparison: https://github.com/scverse/scanpy/compare/87dc1eca044af39ab45a854a3297dbc3b72f6f0c..83c06680a6fae53dc63e478b88aa1cb02017f054
Last changed: Thu, 16 Apr 2026 15:49:52 +0000

More details: https://github.com/scverse/scanpy/pull/4062/checks?check_run_id=71658816499

@ilan-gold ilan-gold marked this pull request as ready for review April 15, 2026 19:56
@ilan-gold ilan-gold requested a review from flying-sheep April 15, 2026 19:56
@ilan-gold ilan-gold added this to the 1.12.2 milestone Apr 15, 2026
@scverse scverse deleted a comment from azure-pipelines bot Apr 15, 2026
@scverse scverse deleted a comment from azure-pipelines bot Apr 15, 2026
@scverse scverse deleted a comment from azure-pipelines bot Apr 16, 2026
@scverse scverse deleted a comment from azure-pipelines bot Apr 16, 2026
@scverse scverse deleted a comment from azure-pipelines bot Apr 16, 2026
Copy link
Copy Markdown
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! do these kernels come from other work (annbatch?) or did you make them just now?

Comment on lines -100 to -103
return (
utils.asarray(self.indicator_matrix @ self.data)
/ np.bincount(self.groupby.codes)[:, None]
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it would make sense to avoid re-executing sum by having basically _sum_mean and _sum_mean_var and using that in aggregate_array?

Or is sum so fast that re-executing it is fine?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now, this is probably fine. But it's a good point, no doubt! This PR was just focused on the current implementation. The perf difference between having 2 sum calls vs 1 in mean_var wasn't that huge so it probably is "very fast" as you say

@ilan-gold
Copy link
Copy Markdown
Contributor Author

ilan-gold commented Apr 16, 2026

awesome! do these kernels come from other work (annbatch?) or did you make them just now?

Just made them on the spot! Saw a problem (un-parallelized aggregation used in two-pass seurat HVG #4013 causing a performance regresion) and a solution (parallel kernels in numba).

@ilan-gold ilan-gold merged commit cb3e6c2 into main Apr 16, 2026
15 checks passed
@ilan-gold ilan-gold deleted the ig/numba_agg_main branch April 16, 2026 15:01
meeseeksmachine pushed a commit to meeseeksmachine/scanpy that referenced this pull request Apr 16, 2026
ilan-gold added a commit that referenced this pull request Apr 16, 2026
…for sparse data) (#4064)

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants