changed hvg with PR to work with numba #2612

Intron7 · 2023-08-14T11:54:40Z

The pearson residuals implementation for hvg is currently very slow and memory inefficient. I switched it to a numba kernel for csc and dense F-continous arrays.

It's based on the cuda-kernel in rapids-singlecell. For 90000 cells we go from 24 seconds to less than 5 with the new implementation. For smaller datasets we don't see a speedup.

codecov · 2023-08-14T12:15:54Z

Codecov Report

Merging #2612 (c21bee3) into master (386f0e3) will decrease coverage by 0.28%.
Report is 6 commits behind head on master.
The diff coverage is 23.94%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2612      +/-   ##
==========================================
- Coverage   72.27%   72.00%   -0.28%     
==========================================
  Files         105      105              
  Lines       11752    11840      +88     
==========================================
+ Hits         8494     8525      +31     
- Misses       3258     3315      +57

Files Changed	Coverage Δ
scanpy/experimental/pp/_highly_variable_genes.py	`63.69% <23.94%> (-33.37%)`	⬇️

... and 6 files with indirect coverage changes

flying-sheep

I’d like to see some deduplication and maintainability changes.

Please never create functions that don’t have a * after the 3rd parameter at the latest (unless the function is called rgba I guess).

Numba sadly has weird behavior around keyword-only arguments, but even if it ignores them (allowing to specify arguments positionally that shouldn’t), you should still call the function as if everything was working correctly. numba/numba#5655

>>> @numba.njit()
... def test(x, *, y):
...     return x + y 
>>> test(1, 2)
3  # no! bad numba! that should be a TypeError

scanpy/experimental/pp/_highly_variable_genes.py

Co-authored-by: Philipp A. <flying-sheep@web.de>

flying-sheep

see unresolved conversations

flying-sheep · 2023-08-22T15:17:28Z

Perf measurements for the use case of running the HVG tests on my machine (not very accurate, and not very reminiscent of how users use it).

Tests get a bit slower, real world gets faster.

scanpy master:

$ git switch master
$ perf stat -r 10 -B hatch run +py=3.11 test:run -n0 scanpy/tests/test_highly_variable_genes.py
   Performance counter stats for 'hatch run +py=3.11 test:run -n0 scanpy/tests/test_highly_variable_genes.py' (10 runs):
  
           71.915,07 msec task-clock:u                     #   14,035 CPUs utilized               ( +-  9,53% )
                   0      context-switches:u               #    0,000 /sec
                   0      cpu-migrations:u                 #    0,000 /sec
           1.168.035      page-faults:u                    #   29,496 K/sec                       ( +-  9,58% )
     191.815.791.770      cycles:u                         #    4,844 GHz                         ( +-  9,53% )  (83,37%)
      10.610.492.234      stalled-cycles-frontend:u        #   10,05% frontend cycles idle        ( +-  9,44% )  (83,34%)
      59.853.476.395      stalled-cycles-backend:u         #   56,69% backend cycles idle         ( +-  9,56% )  (83,32%)
     257.750.810.841      instructions:u                   #    2,44  insn per cycle
                                                           #    0,13  stalled cycles per insn     ( +-  9,57% )  (83,33%)
      45.773.330.764      branches:u                       #    1,156 G/sec                       ( +-  9,58% )  (83,33%)
       1.147.567.613      branch-misses:u                  #    4,56% of all branches             ( +-  9,54% )  (83,37%)
  
              5,1241 +- 0,0242 seconds time elapsed  ( +-  0,47% )

this PR:

$ git switch hvg_PR_numba
$ perf stat -r 10 -B hatch run +py=3.11 test:run -n0 scanpy/tests/test_highly_variable_genes.py
   Performance counter stats for 'hatch run +py=3.11 test:run -n0 scanpy/tests/test_highly_variable_genes.py' (10 runs):
  
          113.085,21 msec task-clock:u                     #   15,789 CPUs utilized               ( +-  9,56% )
                   0      context-switches:u               #    0,000 /sec
                   0      cpu-migrations:u                 #    0,000 /sec
           1.636.606      page-faults:u                    #   26,373 K/sec                       ( +-  9,55% )
     310.410.832.165      cycles:u                         #    5,002 GHz                         ( +-  9,55% )  (83,35%)
      14.117.222.045      stalled-cycles-frontend:u        #    8,30% frontend cycles idle        ( +-  9,46% )  (83,38%)
      75.813.970.243      stalled-cycles-backend:u         #   44,56% backend cycles idle         ( +-  9,57% )  (83,35%)
     373.047.679.552      instructions:u                   #    2,19  insn per cycle
                                                           #    0,11  stalled cycles per insn     ( +-  9,57% )  (83,34%)
      67.830.590.839      branches:u                       #    1,093 G/sec                       ( +-  9,58% )  (83,35%)
       1.702.825.180      branch-misses:u                  #    4,56% of all branches             ( +-  9,56% )  (83,28%)
  
              7,1623 +- 0,0560 seconds time elapsed  ( +-  0,78% )

Co-authored-by: Philipp A. <flying-sheep@web.de>

changed hvg with PR to work with numba

537a9a6

Intron7 requested review from jlause and Zethson August 14, 2023 11:54

Intron7 added this to the 1.10.0 milestone Aug 14, 2023

switch dense to 64bit

343b3a7

Zethson requested review from flying-sheep and removed request for Zethson August 14, 2023 12:07

Merge branch 'master' into hvg_PR_numba

72867ba

flying-sheep requested changes Aug 22, 2023

View reviewed changes

Intron7 and others added 3 commits August 22, 2023 13:10

Update scanpy/experimental/pp/_highly_variable_genes.py

d72c0b1

Co-authored-by: Philipp A. <flying-sheep@web.de>

Update scanpy/experimental/pp/_highly_variable_genes.py

6f0ac4a

Co-authored-by: Philipp A. <flying-sheep@web.de>

changed from inplace to return for numba

34569dd

Intron7 requested a review from flying-sheep August 22, 2023 11:55

flying-sheep requested changes Aug 22, 2023

View reviewed changes

Intron7 and others added 8 commits August 22, 2023 15:37

adding subfunctions

a8a98f9

eliminate 0 from csc

0be5c96

added docstring

20311ba

simplify calculate_res_dense

2a74429

add some typing

e9b3aad

simplify calculate_res

5c26296

deduplicate

c4c1845

unify

c21bee3

flying-sheep approved these changes Aug 22, 2023

View reviewed changes

flying-sheep merged commit 32a3567 into master Aug 22, 2023
10 checks passed

flying-sheep deleted the hvg_PR_numba branch August 22, 2023 15:20

flying-sheep added a commit that referenced this pull request Sep 7, 2023

changed hvg with PR to work with numba (#2612)

384682c

Co-authored-by: Philipp A. <flying-sheep@web.de>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changed hvg with PR to work with numba #2612

changed hvg with PR to work with numba #2612

Intron7 commented Aug 14, 2023

codecov bot commented Aug 14, 2023 •

edited

flying-sheep left a comment •

edited

flying-sheep left a comment

flying-sheep commented Aug 22, 2023 •

edited

changed hvg with PR to work with numba #2612

changed hvg with PR to work with numba #2612

Conversation

Intron7 commented Aug 14, 2023

codecov bot commented Aug 14, 2023 • edited

Codecov Report

flying-sheep left a comment • edited

Choose a reason for hiding this comment

flying-sheep left a comment

Choose a reason for hiding this comment

flying-sheep commented Aug 22, 2023 • edited

codecov bot commented Aug 14, 2023 •

edited

flying-sheep left a comment •

edited

flying-sheep commented Aug 22, 2023 •

edited