fix non default edgecase bugs in harmony#668
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #668 +/- ##
=======================================
Coverage 88.63% 88.63%
=======================================
Files 98 98
Lines 7364 7364
=======================================
Hits 6527 6527
Misses 837 837 |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughThis PR systematically fixes Harmony CUDA kernel correctness and scalability issues: preventing 32-bit grid-dimension overflow via ChangesHarmony CUDA Kernel Correctness and Scalability
🎯 4 (Complex) | ⏱️ ~75 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/rapids_singlecell/_cuda/harmony/colsum/kernels_colsum.cuh`:
- Line 46: The col_sums shared array and its initialization are using a
hardcoded 32 and lack a defensive bounds check; change the declaration
__shared__ T col_sums[32] to use a named constant (e.g., constexpr int WARP_SIZE
or COL_SUMS_SIZE) and update any usages to that constant, and guard the
initializer so it only writes when threadIdx.y == 0 && threadIdx.x <
COL_SUMS_SIZE (or WARP_SIZE) to prevent out-of-bounds access if launch
dimensions change; update any other places that assumed the literal 32 to
reference the new constant.
In `@tests/test_harmony_kernels.py`:
- Line 32: The file-level skip guard is missing two CUDA module symbols used
later (_cl and _corr), so add them to the OR condition alongside _norm, _pen,
_scatter, _colsum, and _km; update the conditional that checks for missing
modules to include _cl and _corr so the test file will skip cleanly when any of
these CUDA dependencies are None (look for the boolean expression referencing
_norm/_pen/_scatter/_colsum/_km and append checks for _cl is None and _corr is
None).
- Around line 247-281: The tests test_colsum_columns_multiple_cols_per_block and
test_kmeans_err_offset_contiguous currently only compare CuPy results to other
CuPy expressions and lack edge cases; update them to validate against CPU
(NumPy) reference computations and add edge-case inputs (empty input rows=0,
single-row rows=1, plus a larger stress case) by enumerating shapes inside each
test or parametrizing over shapes; for colsum compute expected =
numpy_array.sum(axis=0) using x.get() (or recreate via numpy with same rng seed
and dtype conversion) and compare with cp.testing.assert_allclose(out, expected,
appropriate atol/rtol), and for kmeans_err compute expected = numpy.sum(r * 2 *
(1 - dot)) using NumPy arrays (matching dtype) and compare out[0] to expected;
ensure dtype mapping between CuPy and NumPy and keep existing tolerances for
float32/float64.
In `@tests/test_harmony.py`:
- Around line 303-340: Parametrize test_scatter_add_bias_csr_alignment to run
multiple n_cells edge cases (e.g., 0, 1, the existing 23, and a larger stress
case like 1024) so the kernel is validated for empty input, single-row input,
normal and large inputs; ensure all downstream arrays (X_base_np, X_np, bias_np,
cats_np), the call to _create_category_index_mapping(cats, n_batches), and the
call to _scatter_add_cp_bias_csr(...) use the current n_cells value and that the
expected_np computation (including expected_np[0] and each batch block
expected_np[batch+1]) correctly handles n_cells == 0 and n_cells == 1 shapes;
keep the existing dtype/n_pcs/offset parametrization and numerical tolerances.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c5c9307d-5b3d-432a-a301-f9296b039f34
📒 Files selected for processing (13)
docs/release-notes/0.15.1.mdsrc/rapids_singlecell/_cuda/harmony/clustering/clustering.cusrc/rapids_singlecell/_cuda/harmony/colsum/kernels_colsum.cuhsrc/rapids_singlecell/_cuda/harmony/correction/correction_batched.cusrc/rapids_singlecell/_cuda/harmony/correction/correction_fast.cusrc/rapids_singlecell/_cuda/harmony/correction/kernels_correction_fast.cuhsrc/rapids_singlecell/_cuda/harmony/kmeans/kernels_kmeans.cuhsrc/rapids_singlecell/_cuda/harmony/outer/kernels_outer.cuhsrc/rapids_singlecell/_cuda/harmony/outer/outer.cusrc/rapids_singlecell/_cuda/harmony/scatter/kernels_scatter.cuhsrc/rapids_singlecell/_cuda/harmony/scatter/scatter.cutests/test_harmony.pytests/test_harmony_kernels.py
No description provided.