Conversation
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
🧩 Analysis chain🏁 Script executed: Length of output: 20602 🏁 Script executed: Length of output: 2828 🏁 Script executed: Length of output: 9009
SummaryThis PR correctly addresses both root causes from
The CUDA crash fix (adaptive block-size halving for large Issues1.
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR resolves bbknn divergence from upstream ChangesBBKNN Alignment and Trimming Improvements
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/rapids_singlecell/preprocessing/_neighbors/_helper/__init__.py (1)
133-133: 💤 Low valueConsider adding a brief comment documenting the threshold rationale.
The constant
_TRIM_SORT_THRESHOLD = 100lacks documentation. A brief comment explaining why 100 was chosen as the cutoff for kernel auto-selection would improve maintainability.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rapids_singlecell/preprocessing/_neighbors/_helper/__init__.py` at line 133, Add a brief inline comment above the constant _TRIM_SORT_THRESHOLD = 100 that documents the rationale for choosing 100 as the cutoff used in kernel auto-selection (e.g., tradeoff between sort/trim overhead and expected neighborhood sizes, empirical benchmark, or reference to the algorithmic behavior), mention units (number of neighbors/samples) and when it should be adjusted; keep the comment short (1–2 sentences) and reference _TRIM_SORT_THRESHOLD and the kernel auto-selection logic so future maintainers can understand and tune it.tests/test_neighbors.py (2)
158-166: ⚡ Quick winConsider comparing against scanpy reference instead of hard-coded thresholds.
The hard-coded thresholds (
mean < 0.7,> 0.99 fraction < 0.5) are specific topbmc68k_reducedand may be fragile if the dataset changes or when testing other datasets. Since the PR objective is to matchscanpy.external.pp.bbknnbehavior, comparing the connectivity weight distribution (e.g., mean, percentiles) against scanpy's output would provide a more robust regression guard.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_neighbors.py` around lines 158 - 166, The test test_bbknn_connectivities_not_collapsed uses fragile hard-coded thresholds; instead run scanpy.external.pp.bbknn on a copy of the same AnnData (use pbmc68k_reduced() to get a fresh copy), extract its connectivities via adata_ref.obsp["connectivities"].data and compare distribution metrics (e.g., mean and a percentile like 99th) between the bbknn output and the scanpy reference (from bbknn(adata, ...)) using numeric assertions (e.g., numpy.testing.assert_allclose or small relative tolerances) so the test asserts equality/near-equality to scanpy behavior rather than fixed thresholds. Ensure you reference the existing functions/objects: test_bbknn_connectivities_not_collapsed, pbmc68k_reduced, bbknn, and adata.obsp["connectivities"].data.
147-222: 🏗️ Heavy liftAdd edge case coverage per coding guidelines.
All new tests use only
pbmc68k_reduced(). The coding guidelines require tests to "cover edge cases (single row, empty input, max-size input) beyond just 'runs without error'". Consider adding:
- Single-row input to test per-row logic boundaries (sorting, trimming)
- Empty or minimal-batch scenarios for bbknn
- Large trim values relative to row nonzero counts to stress kernel selection
As per coding guidelines, Test validation: include numerical correctness checks against CPU reference implementations, and cover edge cases (single row, empty input, max-size input) beyond just 'runs without error'.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_neighbors.py` around lines 147 - 222, Add edge-case test cases alongside the existing ones: create new tests that call bbknn and trimming_gpu/trimming_cpu using a single-row AnnData (use pbmc68k_reduced() then slice to one observation to exercise per-row sorting/trimming boundaries), an empty/minimal-batch AnnData (zero rows or a single batch with zero neighbors) to verify bbknn handles empty inputs, and a test that sets trim much larger than available nonzeros to force the kernel paths; in each new test compare numerical results to the CPU reference (trimming_cpu, adata.obsp connectivities from bbknn, and X_to_GPU conversions) using the same assert_array_equal-style checks so GPU kernels and bbknn outputs match the CPU reference.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/rapids_singlecell/preprocessing/_neighbors/_helper/__init__.py`:
- Line 133: Add a brief inline comment above the constant _TRIM_SORT_THRESHOLD =
100 that documents the rationale for choosing 100 as the cutoff used in kernel
auto-selection (e.g., tradeoff between sort/trim overhead and expected
neighborhood sizes, empirical benchmark, or reference to the algorithmic
behavior), mention units (number of neighbors/samples) and when it should be
adjusted; keep the comment short (1–2 sentences) and reference
_TRIM_SORT_THRESHOLD and the kernel auto-selection logic so future maintainers
can understand and tune it.
In `@tests/test_neighbors.py`:
- Around line 158-166: The test test_bbknn_connectivities_not_collapsed uses
fragile hard-coded thresholds; instead run scanpy.external.pp.bbknn on a copy of
the same AnnData (use pbmc68k_reduced() to get a fresh copy), extract its
connectivities via adata_ref.obsp["connectivities"].data and compare
distribution metrics (e.g., mean and a percentile like 99th) between the bbknn
output and the scanpy reference (from bbknn(adata, ...)) using numeric
assertions (e.g., numpy.testing.assert_allclose or small relative tolerances) so
the test asserts equality/near-equality to scanpy behavior rather than fixed
thresholds. Ensure you reference the existing functions/objects:
test_bbknn_connectivities_not_collapsed, pbmc68k_reduced, bbknn, and
adata.obsp["connectivities"].data.
- Around line 147-222: Add edge-case test cases alongside the existing ones:
create new tests that call bbknn and trimming_gpu/trimming_cpu using a
single-row AnnData (use pbmc68k_reduced() then slice to one observation to
exercise per-row sorting/trimming boundaries), an empty/minimal-batch AnnData
(zero rows or a single batch with zero neighbors) to verify bbknn handles empty
inputs, and a test that sets trim much larger than available nonzeros to force
the kernel paths; in each new test compare numerical results to the CPU
reference (trimming_cpu, adata.obsp connectivities from bbknn, and X_to_GPU
conversions) using the same assert_array_equal-style checks so GPU kernels and
bbknn outputs match the CPU reference.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 15907752-43b0-4ab9-91a1-befb828eb7e1
📒 Files selected for processing (6)
docs/release-notes/0.15.1.mdsrc/rapids_singlecell/_cuda/bbknn/bbknn.cusrc/rapids_singlecell/_cuda/bbknn/kernels_bbknn.cuhsrc/rapids_singlecell/preprocessing/_neighbors/__init__.pysrc/rapids_singlecell/preprocessing/_neighbors/_helper/__init__.pytests/test_neighbors.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #659 +/- ##
==========================================
+ Coverage 88.13% 88.14% +0.01%
==========================================
Files 96 96
Lines 7045 7060 +15
==========================================
+ Hits 6209 6223 +14
- Misses 836 837 +1
|
This fixes #657.