fix non default edgecase bugs in harmony by Intron7 · Pull Request #668 · scverse/rapids-singlecell

Intron7 · 2026-05-18T21:30:07Z

No description provided.

codecov-commenter · 2026-05-18T21:48:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.63%. Comparing base (12364ef) to head (f83de8b).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #668   +/-   ##
=======================================
  Coverage   88.63%   88.63%           
=======================================
  Files          98       98           
  Lines        7364     7364           
=======================================
  Hits         6527     6527           
  Misses        837      837

Intron7 · 2026-05-19T08:25:38Z

@coderabbitai review

coderabbitai · 2026-05-19T08:25:43Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-19T08:29:16Z

📝 Walkthrough

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed memory alignment issues in Harmony CUDA kernels for certain input layouts
- Resolved synchronization race condition in column reduction operations
- Enhanced error detection and reporting for CUDA kernel operations
Tests
- Added alignment-aware test coverage for scatter operations
- Expanded kernel validation test suite

Walkthrough

This PR systematically fixes Harmony CUDA kernel correctness and scalability issues: preventing 32-bit grid-dimension overflow via strided_grid helper with long long arithmetic, adding runtime alignment checks before vectorized loads with scalar fallbacks, converting kernels to proper grid-stride loop patterns, adding missing synchronization barriers, and adding explicit error handling for CUB sort operations. New tests validate the fixes.

Changes

Harmony CUDA Kernel Correctness and Scalability

Layer / File(s)	Summary
Grid sizing safety: strided_grid helper and kernel launches `src/rapids_singlecell/_cuda/harmony/clustering/clustering.cu`, `src/rapids_singlecell/_cuda/harmony/correction/correction_batched.cu`, `src/rapids_singlecell/_cuda/harmony/correction/correction_fast.cu`, `src/rapids_singlecell/_cuda/harmony/outer/outer.cu`, `src/rapids_singlecell/_cuda/harmony/scatter/scatter.cu`	`grid_1d` helper now uses `long long` arithmetic and clamps to `INT_MAX` to prevent 32-bit overflow; all kernel launch sites across clustering, correction, outer, and scatter modules updated to use `strided_grid(...)` for safe grid dimension calculation.
Alignment-aware vector loading `src/rapids_singlecell/_cuda/harmony/kmeans/kernels_kmeans.cuh`, `src/rapids_singlecell/_cuda/harmony/scatter/kernels_scatter.cuh`	`kmeans_err_kernel` implements per-type vector traits (`float4`, `double2`) with alignment detection and scalar fallback; scatter bias kernels check pointer alignment via `uintptr_t` before reinterpret-casting to vectorized loads, falling back to scalar element access on misalignment.
Grid-stride loops and synchronization fixes `src/rapids_singlecell/_cuda/harmony/correction/kernels_correction_fast.cuh`, `src/rapids_singlecell/_cuda/harmony/outer/kernels_outer.cuh`, `src/rapids_singlecell/_cuda/harmony/scatter/kernels_scatter.cuh`, `src/rapids_singlecell/_cuda/harmony/colsum/kernels_colsum.cuh`	Three outer-module kernels converted to grid-stride `for` loops over `long long` indices; gather_column_kernel and scatter_add_kernel updated to loop-based patterns; missing `__syncthreads()` barrier added in colsum reduction; colsum atomic initialization condition changed from `threadIdx.x < 32` to `threadIdx.y == 0`.
Error handling for CUB operations `src/rapids_singlecell/_cuda/harmony/clustering/clustering.cu`	CUB `DeviceRadixSort::SortPairs` calls now capture `cudaError_t` and throw `std::runtime_error` with `cudaGetErrorString` on failure in both temp-storage queries and actual sort execution.
Test validation for alignment and kernel correctness `tests/test_harmony.py`, `tests/test_harmony_kernels.py`	New `test_scatter_add_bias_csr_alignment` test validates scatter kernel behavior with offset and misaligned inputs across dtypes; new `test_colsum` compares kernel output to CuPy `sum(axis=0)` reference; new `test_kmeans_err` validates error computation for offset contiguous arrays.
Release notes documentation `docs/release-notes/0.15.1.md`	Bug fixes section documents alignment and race-condition fixes: misaligned vector load avoidance for odd-PC and offset-contiguous layouts, shared-memory race prevention in column-sum reductions.

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	No pull request description was provided by the author, making it impossible to assess whether a description relates to the changeset.	Add a detailed pull request description explaining the specific bugs fixed, the changes made, and any testing performed to verify the fixes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'fix non default edgecase bugs in harmony' is partially related to the changeset, referring to bug fixes in the Harmony CUDA module, but uses vague language ('non default edgecase') that doesn't clearly convey the specific nature or scope of the fixes.	Consider using a more specific title that clearly describes the main fixes, such as 'Fix Harmony CUDA kernel alignment and synchronization issues' or similar.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-bugs-harmony

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/rapids_singlecell/_cuda/harmony/colsum/kernels_colsum.cuh`:
- Line 46: The col_sums shared array and its initialization are using a
hardcoded 32 and lack a defensive bounds check; change the declaration
__shared__ T col_sums[32] to use a named constant (e.g., constexpr int WARP_SIZE
or COL_SUMS_SIZE) and update any usages to that constant, and guard the
initializer so it only writes when threadIdx.y == 0 && threadIdx.x <
COL_SUMS_SIZE (or WARP_SIZE) to prevent out-of-bounds access if launch
dimensions change; update any other places that assumed the literal 32 to
reference the new constant.

In `@tests/test_harmony_kernels.py`:
- Line 32: The file-level skip guard is missing two CUDA module symbols used
later (_cl and _corr), so add them to the OR condition alongside _norm, _pen,
_scatter, _colsum, and _km; update the conditional that checks for missing
modules to include _cl and _corr so the test file will skip cleanly when any of
these CUDA dependencies are None (look for the boolean expression referencing
_norm/_pen/_scatter/_colsum/_km and append checks for _cl is None and _corr is
None).
- Around line 247-281: The tests test_colsum_columns_multiple_cols_per_block and
test_kmeans_err_offset_contiguous currently only compare CuPy results to other
CuPy expressions and lack edge cases; update them to validate against CPU
(NumPy) reference computations and add edge-case inputs (empty input rows=0,
single-row rows=1, plus a larger stress case) by enumerating shapes inside each
test or parametrizing over shapes; for colsum compute expected =
numpy_array.sum(axis=0) using x.get() (or recreate via numpy with same rng seed
and dtype conversion) and compare with cp.testing.assert_allclose(out, expected,
appropriate atol/rtol), and for kmeans_err compute expected = numpy.sum(r * 2 *
(1 - dot)) using NumPy arrays (matching dtype) and compare out[0] to expected;
ensure dtype mapping between CuPy and NumPy and keep existing tolerances for
float32/float64.

In `@tests/test_harmony.py`:
- Around line 303-340: Parametrize test_scatter_add_bias_csr_alignment to run
multiple n_cells edge cases (e.g., 0, 1, the existing 23, and a larger stress
case like 1024) so the kernel is validated for empty input, single-row input,
normal and large inputs; ensure all downstream arrays (X_base_np, X_np, bias_np,
cats_np), the call to _create_category_index_mapping(cats, n_batches), and the
call to _scatter_add_cp_bias_csr(...) use the current n_cells value and that the
expected_np computation (including expected_np[0] and each batch block
expected_np[batch+1]) correctly handles n_cells == 0 and n_cells == 1 shapes;
keep the existing dtype/n_pcs/offset parametrization and numerical tolerances.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c5c9307d-5b3d-432a-a301-f9296b039f34

📥 Commits

Reviewing files that changed from the base of the PR and between 12364ef and 04955ca.

📒 Files selected for processing (13)

docs/release-notes/0.15.1.md
src/rapids_singlecell/_cuda/harmony/clustering/clustering.cu
src/rapids_singlecell/_cuda/harmony/colsum/kernels_colsum.cuh
src/rapids_singlecell/_cuda/harmony/correction/correction_batched.cu
src/rapids_singlecell/_cuda/harmony/correction/correction_fast.cu
src/rapids_singlecell/_cuda/harmony/correction/kernels_correction_fast.cuh
src/rapids_singlecell/_cuda/harmony/kmeans/kernels_kmeans.cuh
src/rapids_singlecell/_cuda/harmony/outer/kernels_outer.cuh
src/rapids_singlecell/_cuda/harmony/outer/outer.cu
src/rapids_singlecell/_cuda/harmony/scatter/kernels_scatter.cuh
src/rapids_singlecell/_cuda/harmony/scatter/scatter.cu
tests/test_harmony.py
tests/test_harmony_kernels.py

fix non default edgecase bugs in harmony

d7f9558

Intron7 added the run-gpu-ci label May 18, 2026

github-actions Bot removed the run-gpu-ci label May 18, 2026

add release note

aaa5af5

Intron7 added the run-gpu-ci label May 18, 2026

github-actions Bot removed the run-gpu-ci label May 18, 2026

Merge branch 'main' into fix-bugs-harmony

04955ca

Intron7 added the run-gpu-ci label May 19, 2026

github-actions Bot removed the run-gpu-ci label May 19, 2026

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Comment thread src/rapids_singlecell/_cuda/harmony/colsum/kernels_colsum.cuh Outdated

Comment thread tests/test_harmony_kernels.py Outdated

Comment thread tests/test_harmony_kernels.py

Comment thread tests/test_harmony.py

add tests and bounds checks

f83de8b

Intron7 added the run-gpu-ci label May 19, 2026

github-actions Bot removed the run-gpu-ci label May 19, 2026

Intron7 enabled auto-merge (squash) May 19, 2026 08:58

Intron7 merged commit 282e207 into main May 19, 2026
21 of 25 checks passed

Intron7 deleted the fix-bugs-harmony branch May 19, 2026 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix non default edgecase bugs in harmony#668

fix non default edgecase bugs in harmony#668
Intron7 merged 4 commits into
mainfrom
fix-bugs-harmony

Intron7 commented May 18, 2026

Uh oh!

codecov-commenter commented May 18, 2026 •

edited

Loading

Uh oh!

Intron7 commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Intron7 commented May 18, 2026

Uh oh!

codecov-commenter commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Intron7 commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented May 18, 2026 •

edited

Loading