Skip to content

fix non default edgecase bugs in harmony#668

Merged
Intron7 merged 4 commits into
mainfrom
fix-bugs-harmony
May 19, 2026
Merged

fix non default edgecase bugs in harmony#668
Intron7 merged 4 commits into
mainfrom
fix-bugs-harmony

Conversation

@Intron7
Copy link
Copy Markdown
Member

@Intron7 Intron7 commented May 18, 2026

No description provided.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.63%. Comparing base (12364ef) to head (f83de8b).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #668   +/-   ##
=======================================
  Coverage   88.63%   88.63%           
=======================================
  Files          98       98           
  Lines        7364     7364           
=======================================
  Hits         6527     6527           
  Misses        837      837           

@Intron7
Copy link
Copy Markdown
Member Author

Intron7 commented May 19, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed memory alignment issues in Harmony CUDA kernels for certain input layouts
    • Resolved synchronization race condition in column reduction operations
    • Enhanced error detection and reporting for CUDA kernel operations
  • Tests

    • Added alignment-aware test coverage for scatter operations
    • Expanded kernel validation test suite

Walkthrough

This PR systematically fixes Harmony CUDA kernel correctness and scalability issues: preventing 32-bit grid-dimension overflow via strided_grid helper with long long arithmetic, adding runtime alignment checks before vectorized loads with scalar fallbacks, converting kernels to proper grid-stride loop patterns, adding missing synchronization barriers, and adding explicit error handling for CUB sort operations. New tests validate the fixes.

Changes

Harmony CUDA Kernel Correctness and Scalability

Layer / File(s) Summary
Grid sizing safety: strided_grid helper and kernel launches
src/rapids_singlecell/_cuda/harmony/clustering/clustering.cu, src/rapids_singlecell/_cuda/harmony/correction/correction_batched.cu, src/rapids_singlecell/_cuda/harmony/correction/correction_fast.cu, src/rapids_singlecell/_cuda/harmony/outer/outer.cu, src/rapids_singlecell/_cuda/harmony/scatter/scatter.cu
grid_1d helper now uses long long arithmetic and clamps to INT_MAX to prevent 32-bit overflow; all kernel launch sites across clustering, correction, outer, and scatter modules updated to use strided_grid(...) for safe grid dimension calculation.
Alignment-aware vector loading
src/rapids_singlecell/_cuda/harmony/kmeans/kernels_kmeans.cuh, src/rapids_singlecell/_cuda/harmony/scatter/kernels_scatter.cuh
kmeans_err_kernel implements per-type vector traits (float4, double2) with alignment detection and scalar fallback; scatter bias kernels check pointer alignment via uintptr_t before reinterpret-casting to vectorized loads, falling back to scalar element access on misalignment.
Grid-stride loops and synchronization fixes
src/rapids_singlecell/_cuda/harmony/correction/kernels_correction_fast.cuh, src/rapids_singlecell/_cuda/harmony/outer/kernels_outer.cuh, src/rapids_singlecell/_cuda/harmony/scatter/kernels_scatter.cuh, src/rapids_singlecell/_cuda/harmony/colsum/kernels_colsum.cuh
Three outer-module kernels converted to grid-stride for loops over long long indices; gather_column_kernel and scatter_add_kernel updated to loop-based patterns; missing __syncthreads() barrier added in colsum reduction; colsum atomic initialization condition changed from threadIdx.x < 32 to threadIdx.y == 0.
Error handling for CUB operations
src/rapids_singlecell/_cuda/harmony/clustering/clustering.cu
CUB DeviceRadixSort::SortPairs calls now capture cudaError_t and throw std::runtime_error with cudaGetErrorString on failure in both temp-storage queries and actual sort execution.
Test validation for alignment and kernel correctness
tests/test_harmony.py, tests/test_harmony_kernels.py
New test_scatter_add_bias_csr_alignment test validates scatter kernel behavior with offset and misaligned inputs across dtypes; new test_colsum compares kernel output to CuPy sum(axis=0) reference; new test_kmeans_err validates error computation for offset contiguous arrays.
Release notes documentation
docs/release-notes/0.15.1.md
Bug fixes section documents alignment and race-condition fixes: misaligned vector load avoidance for odd-PC and offset-contiguous layouts, shared-memory race prevention in column-sum reductions.

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning No pull request description was provided by the author, making it impossible to assess whether a description relates to the changeset. Add a detailed pull request description explaining the specific bugs fixed, the changes made, and any testing performed to verify the fixes.
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'fix non default edgecase bugs in harmony' is partially related to the changeset, referring to bug fixes in the Harmony CUDA module, but uses vague language ('non default edgecase') that doesn't clearly convey the specific nature or scope of the fixes. Consider using a more specific title that clearly describes the main fixes, such as 'Fix Harmony CUDA kernel alignment and synchronization issues' or similar.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-bugs-harmony

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/rapids_singlecell/_cuda/harmony/colsum/kernels_colsum.cuh`:
- Line 46: The col_sums shared array and its initialization are using a
hardcoded 32 and lack a defensive bounds check; change the declaration
__shared__ T col_sums[32] to use a named constant (e.g., constexpr int WARP_SIZE
or COL_SUMS_SIZE) and update any usages to that constant, and guard the
initializer so it only writes when threadIdx.y == 0 && threadIdx.x <
COL_SUMS_SIZE (or WARP_SIZE) to prevent out-of-bounds access if launch
dimensions change; update any other places that assumed the literal 32 to
reference the new constant.

In `@tests/test_harmony_kernels.py`:
- Line 32: The file-level skip guard is missing two CUDA module symbols used
later (_cl and _corr), so add them to the OR condition alongside _norm, _pen,
_scatter, _colsum, and _km; update the conditional that checks for missing
modules to include _cl and _corr so the test file will skip cleanly when any of
these CUDA dependencies are None (look for the boolean expression referencing
_norm/_pen/_scatter/_colsum/_km and append checks for _cl is None and _corr is
None).
- Around line 247-281: The tests test_colsum_columns_multiple_cols_per_block and
test_kmeans_err_offset_contiguous currently only compare CuPy results to other
CuPy expressions and lack edge cases; update them to validate against CPU
(NumPy) reference computations and add edge-case inputs (empty input rows=0,
single-row rows=1, plus a larger stress case) by enumerating shapes inside each
test or parametrizing over shapes; for colsum compute expected =
numpy_array.sum(axis=0) using x.get() (or recreate via numpy with same rng seed
and dtype conversion) and compare with cp.testing.assert_allclose(out, expected,
appropriate atol/rtol), and for kmeans_err compute expected = numpy.sum(r * 2 *
(1 - dot)) using NumPy arrays (matching dtype) and compare out[0] to expected;
ensure dtype mapping between CuPy and NumPy and keep existing tolerances for
float32/float64.

In `@tests/test_harmony.py`:
- Around line 303-340: Parametrize test_scatter_add_bias_csr_alignment to run
multiple n_cells edge cases (e.g., 0, 1, the existing 23, and a larger stress
case like 1024) so the kernel is validated for empty input, single-row input,
normal and large inputs; ensure all downstream arrays (X_base_np, X_np, bias_np,
cats_np), the call to _create_category_index_mapping(cats, n_batches), and the
call to _scatter_add_cp_bias_csr(...) use the current n_cells value and that the
expected_np computation (including expected_np[0] and each batch block
expected_np[batch+1]) correctly handles n_cells == 0 and n_cells == 1 shapes;
keep the existing dtype/n_pcs/offset parametrization and numerical tolerances.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c5c9307d-5b3d-432a-a301-f9296b039f34

📥 Commits

Reviewing files that changed from the base of the PR and between 12364ef and 04955ca.

📒 Files selected for processing (13)
  • docs/release-notes/0.15.1.md
  • src/rapids_singlecell/_cuda/harmony/clustering/clustering.cu
  • src/rapids_singlecell/_cuda/harmony/colsum/kernels_colsum.cuh
  • src/rapids_singlecell/_cuda/harmony/correction/correction_batched.cu
  • src/rapids_singlecell/_cuda/harmony/correction/correction_fast.cu
  • src/rapids_singlecell/_cuda/harmony/correction/kernels_correction_fast.cuh
  • src/rapids_singlecell/_cuda/harmony/kmeans/kernels_kmeans.cuh
  • src/rapids_singlecell/_cuda/harmony/outer/kernels_outer.cuh
  • src/rapids_singlecell/_cuda/harmony/outer/outer.cu
  • src/rapids_singlecell/_cuda/harmony/scatter/kernels_scatter.cuh
  • src/rapids_singlecell/_cuda/harmony/scatter/scatter.cu
  • tests/test_harmony.py
  • tests/test_harmony_kernels.py

Comment thread src/rapids_singlecell/_cuda/harmony/colsum/kernels_colsum.cuh Outdated
Comment thread tests/test_harmony_kernels.py Outdated
Comment thread tests/test_harmony_kernels.py
Comment thread tests/test_harmony.py
@Intron7 Intron7 enabled auto-merge (squash) May 19, 2026 08:58
@Intron7 Intron7 merged commit 282e207 into main May 19, 2026
21 of 25 checks passed
@Intron7 Intron7 deleted the fix-bugs-harmony branch May 19, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants