add pseudobulk distances for pertpy GPU#676
Conversation
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughThis PR adds six GPU-accelerated pseudobulk distance metrics (Euclidean, MSE, MAE, Pearson, Cosine, R²) for comparing group-level mean vectors in AnnData objects. It introduces paired/pairwise CUDA kernels, CuPy wrappers, metric implementations inheriting a shared workflow, and integration into the Distance API dispatcher. ChangesPseudobulk GPU metrics
🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/rapids_singlecell/_cuda/pseudobulk/kernels_pseudobulk.cuh (1)
48-48: 💤 Low valueUse named constant for shared memory array size.
The shared memory array size
32should be a named constant for consistency with the coding guidelines and to matchWARP_SIZE. This appears in both kernels.Suggested fix
static constexpr int WARP_SIZE = 32; +static constexpr int MAX_WARPS_PER_BLOCK = 32; // supports up to 1024 threads/block // ... in paired_kernel and pairwise_kernel: - __shared__ double s[32]; + __shared__ double s[MAX_WARPS_PER_BLOCK];Also applies to: 83-83
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rapids_singlecell/_cuda/pseudobulk/kernels_pseudobulk.cuh` at line 48, Replace the hard-coded shared-memory size literal 32 with the named constant used elsewhere (WARP_SIZE) in both shared array declarations (the lines declaring "__shared__ double s[32];" in each kernel) so the shared array is defined as "__shared__ double s[WARP_SIZE];" to follow the coding guideline and keep consistency with the existing WARP_SIZE constant.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/pertpy/test_distance_pseudobulk.py`:
- Line 397: The docstring text contains a unicode multiplication symbol "×"
which triggers Ruff RUF002; update the docstring in
tests/pertpy/test_distance_pseudobulk.py (the string: """K=1 pairwise returns a
1×1 zero matrix; onesided returns a scalar zero.""") to use a plain "x" instead
("""K=1 pairwise returns a 1x1 zero matrix; onesided returns a scalar zero.""")
so the linter no longer flags it.
- Around line 30-31: The helper incorrectly treats metric ==
"root_mean_squared_error" as the Euclidean norm; change the branch handling
metric to compute RMSE as np.sqrt(np.mean((mu_X - mu_Y) ** 2)) using the
existing mu_X and mu_Y instead of np.linalg.norm so the test compares against
the true root-mean-square error.
---
Nitpick comments:
In `@src/rapids_singlecell/_cuda/pseudobulk/kernels_pseudobulk.cuh`:
- Line 48: Replace the hard-coded shared-memory size literal 32 with the named
constant used elsewhere (WARP_SIZE) in both shared array declarations (the lines
declaring "__shared__ double s[32];" in each kernel) so the shared array is
defined as "__shared__ double s[WARP_SIZE];" to follow the coding guideline and
keep consistency with the existing WARP_SIZE constant.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a9be5f11-7432-4dcb-90b3-7b992ccd355a
📒 Files selected for processing (12)
CMakeLists.txtdocs/release-notes/0.15.2.mdsrc/rapids_singlecell/_cuda/pseudobulk/kernels_pseudobulk.cuhsrc/rapids_singlecell/_cuda/pseudobulk/pseudobulk.cusrc/rapids_singlecell/pertpy_gpu/_distance.pysrc/rapids_singlecell/pertpy_gpu/_metrics/__init__.pysrc/rapids_singlecell/pertpy_gpu/_metrics/_base_metric.pysrc/rapids_singlecell/pertpy_gpu/_metrics/_edistance.pysrc/rapids_singlecell/pertpy_gpu/_metrics/_kernels/__init__.pysrc/rapids_singlecell/pertpy_gpu/_metrics/_kernels/_pseudobulk.pysrc/rapids_singlecell/pertpy_gpu/_metrics/_pseudobulk.pytests/pertpy/test_distance_pseudobulk.py
Zethson
left a comment
There was a problem hiding this comment.
Thank you very much! Awesome stuff 🥇
|
Oh and @Intron7 it'd be awesome if you also showed this off super briefly somewhere in the tutorials, please. Like just 1-2 cells and then some intersphinx link to show what's available. Claude is also very good with tutorials and can surely do that for you super quickly. |
Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #676 +/- ##
==========================================
+ Coverage 88.65% 88.83% +0.18%
==========================================
Files 98 100 +2
Lines 7375 7615 +240
==========================================
+ Hits 6538 6765 +227
- Misses 837 850 +13
|
This PR adds the pertpy's Pseudobulk distances to RSC