perf: numba based aggregations for sparse data#4062
Conversation
for more information, see https://pre-commit.ci
Co-authored-by: Philipp A <flying-sheep@web.de>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4062 +/- ##
==========================================
+ Coverage 78.61% 78.63% +0.02%
==========================================
Files 117 118 +1
Lines 12713 12729 +16
==========================================
+ Hits 9994 10010 +16
Misses 2719 2719
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Benchmark changes
Comparison: https://github.com/scverse/scanpy/compare/87dc1eca044af39ab45a854a3297dbc3b72f6f0c..83c06680a6fae53dc63e478b88aa1cb02017f054 More details: https://github.com/scverse/scanpy/pull/4062/checks?check_run_id=71658816499 |
flying-sheep
left a comment
There was a problem hiding this comment.
awesome! do these kernels come from other work (annbatch?) or did you make them just now?
| return ( | ||
| utils.asarray(self.indicator_matrix @ self.data) | ||
| / np.bincount(self.groupby.codes)[:, None] | ||
| ) |
There was a problem hiding this comment.
do you think it would make sense to avoid re-executing sum by having basically _sum_mean and _sum_mean_var and using that in aggregate_array?
Or is sum so fast that re-executing it is fine?
There was a problem hiding this comment.
I think for now, this is probably fine. But it's a good point, no doubt! This PR was just focused on the current implementation. The perf difference between having 2 sum calls vs 1 in mean_var wasn't that huge so it probably is "very fast" as you say
Just made them on the spot! Saw a problem (un-parallelized aggregation used in two-pass seurat HVG #4013 causing a performance regresion) and a solution (parallel kernels in numba). |
…for sparse data) (#4064) Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Now that I know #4013 won't be hurt if we use the acceleration in this PR in-memory, I'm redoing #4041 against
mainwith a standalone benchmark.