Skip to content

64 bit indptr csr support#620

Merged
Intron7 merged 11 commits into
mainfrom
get-ready-for-64-bit-indptr
May 18, 2026
Merged

64 bit indptr csr support#620
Intron7 merged 11 commits into
mainfrom
get-ready-for-64-bit-indptr

Conversation

@Intron7
Copy link
Copy Markdown
Member

@Intron7 Intron7 commented Mar 21, 2026

Cupy will add 64 bit indptr support for sparse matrices. This Draft PR is there to test the implementation and to later enable day 1 support.

Todo: Tests

@Intron7 Intron7 marked this pull request as ready for review May 18, 2026 12:17
@Intron7
Copy link
Copy Markdown
Member Author

Intron7 commented May 18, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors CUDA kernels and Python bindings across 20+ modules to support 64-bit (long long) sparse index types alongside existing 32-bit (int) support. It introduces grid-stride loop infrastructure, generalizes kernels via IdxT templating, and consolidates repetitive binding code into reusable templates, with comprehensive parity testing.

Changes

64-bit Sparse Index Type Support & Binding Consolidation

Layer / File(s) Summary
Grid-stride infrastructure for large problem sizes
src/rapids_singlecell/_cuda/nb_types.h
Adds thread-local cached CUDA device property queries (max_grid_dims(), max_grid_dim_x/y/z()) and grid-stride loop helpers (strided_grid(), strided_grid_y()) to enable safe kernel launches when problem sizes exceed 32-bit range.
Sparse aggregation kernels and binding generalization
src/rapids_singlecell/_cuda/aggr/{kernels_aggr.cuh, aggr.cu}
Generalizes CSR/CSC aggregation kernels to template on IdxT for index pointers; switches dense aggregation grid sizing to strided_grid(); adds def_sparse_aggr, def_csr_to_coo, def_sparse_var binding helpers to register operations for both int and long long indices.
Autocorrelation and spatial statistics
src/rapids_singlecell/_cuda/autocorr/{kernels_autocorr.cuh, autocorr.cu}
Generalizes pre_den_sparse_kernel to use IdxT and grid-stride loops with long long nnz; adds templated def_morans_dense/sparse, def_gearys_dense/sparse, def_pre_den_sparse binding helpers for organizing registration by operation type.
Binding consolidation (non-index generalization)
src/rapids_singlecell/_cuda/{edistance,gmm,harmony/*,hvg}/...cu
Replaces repetitive per-dtype m.def(...) blocks with templated def_* helpers (def_compute_distances, def_e_step, def_kmeans_err, def_normalize, def_outer, def_*_scatter, def_expected_zeros) while preserving semantics and stream casting.
Mean/variance computation with grid-stride minor path
src/rapids_singlecell/_cuda/mean_var/{kernels_mv.cuh, mean_var.cu}
Templates mean_var_major/minor_kernel on IdxT; replaces int bounds-check in minor kernel with grid-stride loop over long long nnz; adds def_mean_var_major/minor binding helpers for both index types.
NaN-aware mean with index generalization
src/rapids_singlecell/_cuda/nanmean/{kernels_nanmean.cuh, nanmean.cu}
Templates nan_mean_minor/major_kernel on IdxT and replaces int bounds-checks with grid-stride loops; adds def_nan_mean_minor/major binding helpers; registers overloads for IdxT ∈ {int, long long}.
Normalization kernel suite
src/rapids_singlecell/_cuda/norm/{kernels_norm.cuh, norm.cu}
Generalizes six CSR kernels (csr_row_scale, csr_sum_major, find_hi_genes_csr, masked_mul, masked_sum_major, prescaled_mul) to template on IdxT; adds launch helpers and def_* binding templates; registers sparse operations for both index types while keeping dense operations single-dtype.
Residual normalization and CSC HVG operations
src/rapids_singlecell/_cuda/pr/{kernels_pr.cuh, kernels_pr_hvg.cuh, pr.cu}
Generalizes CSC/CSR residual-norm and CSC HVG kernels to use IdxT for index pointers and loop ranges; updates launch helpers and binding registration to support both int and long long indices.
QC sparse operations (CSC/CSR and sub variants)
src/rapids_singlecell/_cuda/qc/{kernels_qc.cuh, qc.cu}
Generalizes four QC kernels to use IdxT; adds def_sparse_qc_* binding helpers for CSC/CSR (including masked sub variants); expands registration to cover both index types.
QC Dask operations with grid-stride genes
src/rapids_singlecell/_cuda/qc_dask/{kernels_qcd.cuh, qc_kernels_dask.cu}
Templates CSR cells/genes kernels on IdxT; replaces int bounds-check in genes kernel with grid-stride loop over long long nnz; adds binding helpers and expands registration for both index types.
Scale difference kernels
src/rapids_singlecell/_cuda/scale/{kernels_scale.cuh, scale.cu}
Templates CSC/CSR scale kernels on IdxT with long long index arithmetic; updates dense kernel index computation; adds def_*_scale_diff binding helpers; registers sparse operations for both index types.
Sparse-to-dense conversion
src/rapids_singlecell/_cuda/sparse2dense/{kernels_s2d.cuh, sparse2dense.cu}
Templates kernel on IdxT; removes per-launch device property query and uses strided_grid_y() for grid sizing; expands def_sparse2dense binding to register for both index types and both memory orderings (f_contig, c_contig).
SPCA gram and gene checking
src/rapids_singlecell/_cuda/spca/{kernels_spca.cuh, spca.cu}
Templates gram upper and check-zero-genes kernels on IdxT; replaces int bounds-check in check-zero-genes with grid-stride loop over long long nnz; adds binding helpers; expands registration for both index types.
Wilcoxon histogram kernels
src/rapids_singlecell/_cuda/wilcoxon_binned/{kernels_wilcoxon_binned.cuh, wilcoxon_binned.cu}
Templates CSR/CSC histogram kernels on IdxT for row/column bounds and loop indices; adds def_*_hist binding helpers; expands registration for both index types.
Ligrec sparse operations
src/rapids_singlecell/_cuda/ligrec/{kernels_ligrec.cuh, ligrec.cu}
Templates sum/count and mean sparse kernels on IdxT; reorganizes large monolithic register_bindings into multiple templated def_* helpers (def_sum_count_dense/sparse, def_mean_dense/sparse, etc.); registers for both index types and data types.
Comprehensive parity test suite
tests/test_int64_indptr.py
Adds 40+ pytest test functions comparing int32 vs int64 sparse index paths; covers QC (sparse/dask/sub), mean/variance, nanmean, normalization (mul/sum/masked), residual, scale, sparse-to-dense, SPCA, aggregation, CSR-to-COO, variance grouping, Ligrec, autocorrelation, and histogram routines; uses exact equality for integers and tolerant comparison for atomic float accumulators.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • scverse/rapids-singlecell#619: Updates aggregation launch helpers in aggr.cu to use named block constants and error checking, overlapping directly with this PR's refactoring of the same sparse aggregation launch code paths.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch get-ready-for-64-bit-indptr

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/test_int64_indptr.py (1)

73-933: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Add CPU-reference and edge-case assertions, not only int32/int64 parity.

This suite is great for dispatch parity, but it can still pass when both paths are wrong in the same way. Please add numerical checks against NumPy/SciPy references for each operation family, and parameterize edge cases (single-row, empty input, and a stress/max-size shape used in CI).

As per coding guidelines tests/**/*.py: Test validation: include numerical correctness checks against CPU reference implementations, and cover edge cases (single row, empty input, max-size input) beyond just 'runs without error'

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_int64_indptr.py` around lines 73 - 933, The tests currently only
assert int32 vs int64 parity; update each test family (functions like
test_sparse_qc_csr/test_sparse_qc_csc, test_mean_var_major/minor,
test_nan_mean_major/minor, test_sparse2dense_c/test_sparse2dense_f,
test_sparse_norm_res_csr/csc, test_sparse_aggr, test_sparse_var, etc.) to also
compute a CPU reference using NumPy/SciPy (using _make_dense_cpu or SciPy sparse
operations) and assert numerical equality (or use _close where floating rounding
is expected) between the GPU result and the CPU reference rather than only
between idx dtypes; additionally parameterize critical tests to include edge
cases (single-row, empty matrix, and a CI-stress large shape) by reusing
_make_csr/_make_csc with those shapes and adding assertions for those cases so
each kernel is validated for correctness and edge behavior beyond int32/int64
parity, keeping existing helpers _eq/_close for comparisons.
src/rapids_singlecell/_cuda/sparse2dense/kernels_s2d.cuh (1)

23-24: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add negative index guard to prevent out-of-bounds memory access.

Line 23 only checks idx >= minor but does not guard against negative idx values. If idx is negative, it passes the guard and produces incorrect calculations at lines 29/32, leading to out-of-bounds memory access in the atomicAdd at line 34.

Suggested fix
-            if (idx >= minor) {
+            if (idx < 0 || idx >= minor) {
                 continue;
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rapids_singlecell/_cuda/sparse2dense/kernels_s2d.cuh` around lines 23 -
24, The current guard only checks "idx >= minor" which allows negative idx
values to slip through and cause out-of-bounds writes in the subsequent
atomicAdd; modify the check around the kernel's index validation (the condition
that currently reads "if (idx >= minor) { continue; }") to also guard against
negative indices (e.g., ensure idx < 0 is handled) so that any idx < 0 or idx >=
minor will skip processing, preventing invalid memory accesses in the atomicAdd
using idx.
🧹 Nitpick comments (1)
src/rapids_singlecell/_cuda/mean_var/kernels_mv.cuh (1)

18-19: 💤 Low value

Hard-coded shared memory size should be a named constant.

The shared memory arrays use a raw literal 64 which should match BLOCK_SIZE_MAJOR defined in mean_var.cu. Per coding guidelines, numeric literals for block sizes and shared memory sizes should be defined as named constants.

Consider defining this in the header or passing as a template parameter to ensure consistency:

constexpr int BLOCK_SIZE_MAJOR = 64;
// ...
__shared__ double mean_place[BLOCK_SIZE_MAJOR];
__shared__ double var_place[BLOCK_SIZE_MAJOR];

As per coding guidelines: "All numeric literals for block sizes, tile dimensions, shared memory sizes... MUST be defined as named constants."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rapids_singlecell/_cuda/mean_var/kernels_mv.cuh` around lines 18 - 19,
Replace the hard-coded literal 64 used for shared memory arrays mean_place and
var_place with a named constant (e.g., BLOCK_SIZE_MAJOR) so the size stays in
sync with mean_var.cu; declare constexpr int BLOCK_SIZE_MAJOR = 64 in an
appropriate header or pass it as a template parameter and update the
declarations __shared__ double mean_place[BLOCK_SIZE_MAJOR]; and __shared__
double var_place[BLOCK_SIZE_MAJOR]; ensuring the symbol name matches the one
used in mean_var.cu.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/rapids_singlecell/_cuda/aggr/aggr.cu`:
- Around line 17-18: The sparse-kernel launch sites compute dim3
grid((unsigned)n_cells) and dim3 block(BLOCK_SIZE_SPARSE) and can produce
invalid launches when a dimension is zero; add an early return guard in each of
the four sparse launch helper functions so they return immediately if the work
size is zero (e.g., if n_cells==0 or any other computed grid dimension == 0)
before computing dim3 grid/block or launching the kernel (references: dim3 grid,
dim3 block, BLOCK_SIZE_SPARSE).

In `@src/rapids_singlecell/_cuda/aggr/kernels_aggr.cuh`:
- Around line 56-73: csr_to_coo_kernel currently truncates 64-bit indices by
casting index[p] to int and writing into int col buffers, corrupting indices ≥
INT_MAX; fix by propagating IdxT for column buffers and bindings: change
csr_to_coo_kernel signature to accept IdxT* __restrict__ col (and ensure row/col
uses IdxT where appropriate), update the nanobind binding that currently exposes
gpu_array_c<int, Device> to accept gpu_array_c<IdxT, Device> for the output
column array, and update the callers in _aggregated.py to allocate output column
buffers matching the input index dtype; alternatively, if API change is
unacceptable, add pre-launch validation that checks max(index) < INT_MAX and
fail with a clear error (guarding launch) and document the limitation.

In `@src/rapids_singlecell/_cuda/ligrec/kernels_ligrec.cuh`:
- Around line 36-38: The code narrows 64-bit sparse indices by assigning
index[gene] (type IdxT) to an int (gene_number) in the loop inside
kernels_ligrec.cuh; update that local variable to use IdxT (or another 64-bit
type) so indices aren't truncated and repeat the same change in
mean_sparse_kernel, ensuring all uses that compute offsets/column indices (e.g.,
the local gene_number and any index-based arithmetic) operate in 64-bit to match
the explicit instantiation with IdxT = long long.

In `@src/rapids_singlecell/_cuda/ligrec/ligrec.cu`:
- Around line 180-197: The binding def_interaction advertises stream support but
the kernel launch inside launch_interaction ignores the passed stream; update
launch_interaction<T> to take a cudaStream_t (or std::uintptr_t cast to
cudaStream_t) and use that stream in the kernel launch (use <<<grid, block, 0,
stream>>> or the equivalent CUDA launch API) so ordering with upstream writes to
mean/mask/g and downstream reads from res is respected; keep the def_interaction
call to launch_interaction<T>(..., (cudaStream_t)stream) but change
launch_interaction's signature and its internal kernel invocation to accept and
use the stream.
- Around line 24-35: The sparse launcher launch_sum_count_sparse can compute
grid.x == 0 when rows == 0 which yields an invalid CUDA launch; add an
early-return guard at the top of launch_sum_count_sparse (and any other sparse
launcher that computes grid from rows) to return immediately when rows <= 0 so
no kernel is launched, and keep the rest unchanged; also fix launch_interaction
so the interaction kernel is launched with the provided cudaStream_t stream (use
<<<grid, block, 0, stream>>> for interaction_kernel) instead of the current
launch that ignores the stream. Ensure you reference the functions
sum_and_count_sparse_kernel and interaction_kernel in the fixes.

In `@src/rapids_singlecell/_cuda/nb_types.h`:
- Around line 31-45: The function max_grid_dims reads CUDA results without
checking return codes, so wrap the calls to cudaGetDevice and
cudaGetDeviceProperties in error-checks inside max_grid_dims: call cudaGetDevice
and if it returns non-success do not change cached_dev or cached (keep the safe
fallback) and return cached; if cudaGetDevice succeeds then call
cudaGetDeviceProperties and only update cached[0..2] and cached_dev when
cudaGetDeviceProperties returns success; optionally log or assert the CUDA error
before returning the fallback. Ensure you reference and protect the existing
static thread_local variables cached and cached_dev and the calls to
cudaGetDevice and cudaGetDeviceProperties in the max_grid_dims function.

In `@src/rapids_singlecell/_cuda/sparse2dense/sparse2dense.cu`:
- Around line 12-13: The max_nnz parameter is currently declared as int and must
be widened to long long everywhere to avoid truncation for per-row nnz >
INT_MAX; update the function signatures (the top-level host function(s) in
sparse2dense.cu that accept max_nnz), any device/kernel wrappers and internal
calls that pass max_nnz to strided_grid_y(), and all call sites so that max_nnz
is consistently long long; ensure strided_grid_y() invocations and any
arithmetic or loop bounds using max_nnz are updated to use 64-bit types to
preserve correct grid_y calculation and kernel grid-stride behavior.

In `@src/rapids_singlecell/_cuda/spca/kernels_spca.cuh`:
- Around line 56-58: The code casts indices[value] (IdxT) to int in
kernels_spca.cuh before bounds checking, which can truncate 64-bit indices; fix
by first reading into the wide type (e.g., IdxT raw = indices[value] or int64_t
tmp = static_cast<int64_t>(indices[value])), validate that raw is within [0,
num_genes-1] and also fits in int32 range (<= INT_MAX and >= INT_MIN) and only
then static_cast<int> to produce gene_index used by
atomicAdd(&genes[gene_index], 1); update the checks around gene_index/indices so
bounds checks operate on the wide type (raw/tmp) rather than the narrowed int to
avoid silent truncation and index corruption.

In `@src/rapids_singlecell/_cuda/spca/spca.cu`:
- Around line 16-20: The kernel launch for gram_csr_upper_kernel uses dim3
grid(nrows) which is invalid when nrows == 0; add an early return guard (check
nrows == 0) before setting block/grid and before calling
gram_csr_upper_kernel<T, IdxT> to skip the launch (matching the pattern used by
launch_check_zero_genes), leaving CUDA_CHECK_LAST_ERROR call for non-zero
launches.

In `@src/rapids_singlecell/_cuda/wilcoxon_binned/kernels_wilcoxon_binned.cuh`:
- Around line 50-52: The CSC kernel narrows indices[i] to int (col/row) then
indexes gcodes[row] without validating the narrowed row, risking
wraparound/out-of-bounds; add an explicit bounds check after narrowing (e.g.,
int row = (int)indices[i]; if (row < 0 || row >= nrows) continue;) before any
gcodes[row] access in the CSC kernel, and likewise add a similar validation for
col in the CSR kernel (validate the narrowed int col is within [gene_start,
gene_stop) before using it) so both kernels mirror the CSR row check and prevent
buffer underruns.

In `@src/rapids_singlecell/_cuda/wilcoxon_binned/wilcoxon_binned.cu`:
- Around line 31-33: The kernel launches in launch_csr_hist, launch_csc_hist,
and launch_dense_hist can be invoked with a zero-sized grid (grid(n_cells) or
grid(n_genes)), which is invalid; add an early return guard at the start of each
function to check the relevant size (e.g., if n_cells == 0 return; in
launch_csr_hist, and if n_genes == 0 return; in launch_csc_hist and
launch_dense_hist) so the subsequent dim3 grid(...) and
csr_hist_kernel/csc_hist_kernel/dense_hist_kernel launches are never called with
zero dimensions.

In `@tests/test_int64_indptr.py`:
- Line 148: Update the comparison loops that iterate over outs to use strict
zipping so length mismatches raise errors: replace occurrences like "for a, b in
zip(outs[np.int32], outs[np.int64]):" with a strict zip call (add strict=True)
in all listed callsites (the for-loops comparing outs at lines referenced, e.g.,
the loop using outs[np.int32] and outs[np.int64] and the other similar for-loops
throughout tests/test_int64_indptr.py). Ensure each zip(...) call used for
pairwise assertions (comparing tuples/iterables from outs) includes strict=True
so tuple parity is enforced.

---

Outside diff comments:
In `@src/rapids_singlecell/_cuda/sparse2dense/kernels_s2d.cuh`:
- Around line 23-24: The current guard only checks "idx >= minor" which allows
negative idx values to slip through and cause out-of-bounds writes in the
subsequent atomicAdd; modify the check around the kernel's index validation (the
condition that currently reads "if (idx >= minor) { continue; }") to also guard
against negative indices (e.g., ensure idx < 0 is handled) so that any idx < 0
or idx >= minor will skip processing, preventing invalid memory accesses in the
atomicAdd using idx.

In `@tests/test_int64_indptr.py`:
- Around line 73-933: The tests currently only assert int32 vs int64 parity;
update each test family (functions like test_sparse_qc_csr/test_sparse_qc_csc,
test_mean_var_major/minor, test_nan_mean_major/minor,
test_sparse2dense_c/test_sparse2dense_f, test_sparse_norm_res_csr/csc,
test_sparse_aggr, test_sparse_var, etc.) to also compute a CPU reference using
NumPy/SciPy (using _make_dense_cpu or SciPy sparse operations) and assert
numerical equality (or use _close where floating rounding is expected) between
the GPU result and the CPU reference rather than only between idx dtypes;
additionally parameterize critical tests to include edge cases (single-row,
empty matrix, and a CI-stress large shape) by reusing _make_csr/_make_csc with
those shapes and adding assertions for those cases so each kernel is validated
for correctness and edge behavior beyond int32/int64 parity, keeping existing
helpers _eq/_close for comparisons.

---

Nitpick comments:
In `@src/rapids_singlecell/_cuda/mean_var/kernels_mv.cuh`:
- Around line 18-19: Replace the hard-coded literal 64 used for shared memory
arrays mean_place and var_place with a named constant (e.g., BLOCK_SIZE_MAJOR)
so the size stays in sync with mean_var.cu; declare constexpr int
BLOCK_SIZE_MAJOR = 64 in an appropriate header or pass it as a template
parameter and update the declarations __shared__ double
mean_place[BLOCK_SIZE_MAJOR]; and __shared__ double var_place[BLOCK_SIZE_MAJOR];
ensuring the symbol name matches the one used in mean_var.cu.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f5b7164-8c8f-49b5-8450-ffdb2781ea8d

📥 Commits

Reviewing files that changed from the base of the PR and between b7e6577 and 4f2774c.

📒 Files selected for processing (37)
  • src/rapids_singlecell/_cuda/aggr/aggr.cu
  • src/rapids_singlecell/_cuda/aggr/kernels_aggr.cuh
  • src/rapids_singlecell/_cuda/autocorr/autocorr.cu
  • src/rapids_singlecell/_cuda/autocorr/kernels_autocorr.cuh
  • src/rapids_singlecell/_cuda/edistance/edistance.cu
  • src/rapids_singlecell/_cuda/gmm/gmm.cu
  • src/rapids_singlecell/_cuda/harmony/colsum/colsum.cu
  • src/rapids_singlecell/_cuda/harmony/kmeans/kmeans.cu
  • src/rapids_singlecell/_cuda/harmony/normalize/normalize.cu
  • src/rapids_singlecell/_cuda/harmony/outer/outer.cu
  • src/rapids_singlecell/_cuda/harmony/scatter/scatter.cu
  • src/rapids_singlecell/_cuda/hvg/hvg.cu
  • src/rapids_singlecell/_cuda/ligrec/kernels_ligrec.cuh
  • src/rapids_singlecell/_cuda/ligrec/ligrec.cu
  • src/rapids_singlecell/_cuda/mean_var/kernels_mv.cuh
  • src/rapids_singlecell/_cuda/mean_var/mean_var.cu
  • src/rapids_singlecell/_cuda/nanmean/kernels_nanmean.cuh
  • src/rapids_singlecell/_cuda/nanmean/nanmean.cu
  • src/rapids_singlecell/_cuda/nb_types.h
  • src/rapids_singlecell/_cuda/norm/kernels_norm.cuh
  • src/rapids_singlecell/_cuda/norm/norm.cu
  • src/rapids_singlecell/_cuda/pr/kernels_pr.cuh
  • src/rapids_singlecell/_cuda/pr/kernels_pr_hvg.cuh
  • src/rapids_singlecell/_cuda/pr/pr.cu
  • src/rapids_singlecell/_cuda/qc/kernels_qc.cuh
  • src/rapids_singlecell/_cuda/qc/qc.cu
  • src/rapids_singlecell/_cuda/qc_dask/kernels_qcd.cuh
  • src/rapids_singlecell/_cuda/qc_dask/qc_kernels_dask.cu
  • src/rapids_singlecell/_cuda/scale/kernels_scale.cuh
  • src/rapids_singlecell/_cuda/scale/scale.cu
  • src/rapids_singlecell/_cuda/sparse2dense/kernels_s2d.cuh
  • src/rapids_singlecell/_cuda/sparse2dense/sparse2dense.cu
  • src/rapids_singlecell/_cuda/spca/kernels_spca.cuh
  • src/rapids_singlecell/_cuda/spca/spca.cu
  • src/rapids_singlecell/_cuda/wilcoxon_binned/kernels_wilcoxon_binned.cuh
  • src/rapids_singlecell/_cuda/wilcoxon_binned/wilcoxon_binned.cu
  • tests/test_int64_indptr.py

Comment thread src/rapids_singlecell/_cuda/aggr/aggr.cu
Comment thread src/rapids_singlecell/_cuda/aggr/kernels_aggr.cuh Outdated
Comment thread src/rapids_singlecell/_cuda/ligrec/kernels_ligrec.cuh Outdated
Comment thread src/rapids_singlecell/_cuda/ligrec/ligrec.cu
Comment thread src/rapids_singlecell/_cuda/ligrec/ligrec.cu
Comment thread src/rapids_singlecell/_cuda/spca/kernels_spca.cuh Outdated
Comment thread src/rapids_singlecell/_cuda/spca/spca.cu
Comment thread src/rapids_singlecell/_cuda/wilcoxon_binned/wilcoxon_binned.cu
Comment thread tests/test_int64_indptr.py Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 18, 2026

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@05fcba0). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/rapids_singlecell/get/_aggregated.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #620   +/-   ##
=======================================
  Coverage        ?   88.63%           
=======================================
  Files           ?       98           
  Lines           ?     7364           
  Branches        ?        0           
=======================================
  Hits            ?     6527           
  Misses          ?      837           
  Partials        ?        0           
Files with missing lines Coverage Δ
...apids_singlecell/get/_kernels/_aggr_elementwise.py 100.00% <ø> (ø)
src/rapids_singlecell/get/_aggregated.py 90.71% <90.90%> (ø)

@Intron7 Intron7 changed the title 64 bit support 64 bit indptr csr support May 18, 2026
@Intron7 Intron7 merged commit ce3c95a into main May 18, 2026
23 of 26 checks passed
@Intron7 Intron7 deleted the get-ready-for-64-bit-indptr branch May 18, 2026 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants