64 bit indptr csr support by Intron7 · Pull Request #620 · scverse/rapids-singlecell

Intron7 · 2026-03-21T09:09:44Z

Cupy will add 64 bit indptr support for sparse matrices. This Draft PR is there to test the implementation and to later enable day 1 support.

Todo: Tests

Intron7 · 2026-05-18T13:36:15Z

@coderabbitai review

coderabbitai · 2026-05-18T13:36:21Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-18T13:44:29Z

📝 Walkthrough

Walkthrough

This PR refactors CUDA kernels and Python bindings across 20+ modules to support 64-bit (long long) sparse index types alongside existing 32-bit (int) support. It introduces grid-stride loop infrastructure, generalizes kernels via IdxT templating, and consolidates repetitive binding code into reusable templates, with comprehensive parity testing.

Changes

64-bit Sparse Index Type Support & Binding Consolidation

Layer / File(s)	Summary
Grid-stride infrastructure for large problem sizes `src/rapids_singlecell/_cuda/nb_types.h`	Adds thread-local cached CUDA device property queries (`max_grid_dims()`, `max_grid_dim_x/y/z()`) and grid-stride loop helpers (`strided_grid()`, `strided_grid_y()`) to enable safe kernel launches when problem sizes exceed 32-bit range.
Sparse aggregation kernels and binding generalization `src/rapids_singlecell/_cuda/aggr/{kernels_aggr.cuh, aggr.cu}`	Generalizes CSR/CSC aggregation kernels to template on `IdxT` for index pointers; switches dense aggregation grid sizing to `strided_grid()`; adds `def_sparse_aggr`, `def_csr_to_coo`, `def_sparse_var` binding helpers to register operations for both `int` and `long long` indices.
Autocorrelation and spatial statistics `src/rapids_singlecell/_cuda/autocorr/{kernels_autocorr.cuh, autocorr.cu}`	Generalizes `pre_den_sparse_kernel` to use `IdxT` and grid-stride loops with `long long nnz`; adds templated `def_morans_dense/sparse`, `def_gearys_dense/sparse`, `def_pre_den_sparse` binding helpers for organizing registration by operation type.
Binding consolidation (non-index generalization) `src/rapids_singlecell/_cuda/{edistance,gmm,harmony/*,hvg}/...cu`	Replaces repetitive per-dtype `m.def(...)` blocks with templated `def_` helpers (`def_compute_distances`, `def_e_step`, `def_kmeans_err`, `def_normalize`, `def_outer`, `def__scatter`, `def_expected_zeros`) while preserving semantics and stream casting.
Mean/variance computation with grid-stride minor path `src/rapids_singlecell/_cuda/mean_var/{kernels_mv.cuh, mean_var.cu}`	Templates `mean_var_major/minor_kernel` on `IdxT`; replaces `int` bounds-check in minor kernel with grid-stride loop over `long long nnz`; adds `def_mean_var_major/minor` binding helpers for both index types.
NaN-aware mean with index generalization `src/rapids_singlecell/_cuda/nanmean/{kernels_nanmean.cuh, nanmean.cu}`	Templates `nan_mean_minor/major_kernel` on `IdxT` and replaces `int` bounds-checks with grid-stride loops; adds `def_nan_mean_minor/major` binding helpers; registers overloads for `IdxT ∈ {int, long long}`.
Normalization kernel suite `src/rapids_singlecell/_cuda/norm/{kernels_norm.cuh, norm.cu}`	Generalizes six CSR kernels (`csr_row_scale`, `csr_sum_major`, `find_hi_genes_csr`, `masked_mul`, `masked_sum_major`, `prescaled_mul`) to template on `IdxT`; adds launch helpers and `def_*` binding templates; registers sparse operations for both index types while keeping dense operations single-dtype.
Residual normalization and CSC HVG operations `src/rapids_singlecell/_cuda/pr/{kernels_pr.cuh, kernels_pr_hvg.cuh, pr.cu}`	Generalizes CSC/CSR residual-norm and CSC HVG kernels to use `IdxT` for index pointers and loop ranges; updates launch helpers and binding registration to support both `int` and `long long` indices.
QC sparse operations (CSC/CSR and sub variants) `src/rapids_singlecell/_cuda/qc/{kernels_qc.cuh, qc.cu}`	Generalizes four QC kernels to use `IdxT`; adds `def_sparse_qc_*` binding helpers for CSC/CSR (including masked sub variants); expands registration to cover both index types.
QC Dask operations with grid-stride genes `src/rapids_singlecell/_cuda/qc_dask/{kernels_qcd.cuh, qc_kernels_dask.cu}`	Templates CSR cells/genes kernels on `IdxT`; replaces `int` bounds-check in genes kernel with grid-stride loop over `long long nnz`; adds binding helpers and expands registration for both index types.
Scale difference kernels `src/rapids_singlecell/_cuda/scale/{kernels_scale.cuh, scale.cu}`	Templates CSC/CSR scale kernels on `IdxT` with `long long` index arithmetic; updates dense kernel index computation; adds `def_*_scale_diff` binding helpers; registers sparse operations for both index types.
Sparse-to-dense conversion `src/rapids_singlecell/_cuda/sparse2dense/{kernels_s2d.cuh, sparse2dense.cu}`	Templates kernel on `IdxT`; removes per-launch device property query and uses `strided_grid_y()` for grid sizing; expands `def_sparse2dense` binding to register for both index types and both memory orderings (`f_contig`, `c_contig`).
SPCA gram and gene checking `src/rapids_singlecell/_cuda/spca/{kernels_spca.cuh, spca.cu}`	Templates gram upper and check-zero-genes kernels on `IdxT`; replaces `int` bounds-check in check-zero-genes with grid-stride loop over `long long nnz`; adds binding helpers; expands registration for both index types.
Wilcoxon histogram kernels `src/rapids_singlecell/_cuda/wilcoxon_binned/{kernels_wilcoxon_binned.cuh, wilcoxon_binned.cu}`	Templates CSR/CSC histogram kernels on `IdxT` for row/column bounds and loop indices; adds `def_*_hist` binding helpers; expands registration for both index types.
Ligrec sparse operations `src/rapids_singlecell/_cuda/ligrec/{kernels_ligrec.cuh, ligrec.cu}`	Templates sum/count and mean sparse kernels on `IdxT`; reorganizes large monolithic `register_bindings` into multiple templated `def_*` helpers (`def_sum_count_dense/sparse`, `def_mean_dense/sparse`, etc.); registers for both index types and data types.
Comprehensive parity test suite `tests/test_int64_indptr.py`	Adds 40+ pytest test functions comparing int32 vs int64 sparse index paths; covers QC (sparse/dask/sub), mean/variance, nanmean, normalization (mul/sum/masked), residual, scale, sparse-to-dense, SPCA, aggregation, CSR-to-COO, variance grouping, Ligrec, autocorrelation, and histogram routines; uses exact equality for integers and tolerant comparison for atomic float accumulators.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

scverse/rapids-singlecell#619: Updates aggregation launch helpers in aggr.cu to use named block constants and error checking, overlapping directly with this PR's refactoring of the same sparse aggregation launch code paths.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch get-ready-for-64-bit-indptr

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/test_int64_indptr.py (1)
73-933: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Add CPU-reference and edge-case assertions, not only int32/int64 parity.

This suite is great for dispatch parity, but it can still pass when both paths are wrong in the same way. Please add numerical checks against NumPy/SciPy references for each operation family, and parameterize edge cases (single-row, empty input, and a stress/max-size shape used in CI).

As per coding guidelines tests/**/*.py: Test validation: include numerical correctness checks against CPU reference implementations, and cover edge cases (single row, empty input, max-size input) beyond just 'runs without error'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_int64_indptr.py` around lines 73 - 933, The tests currently only
assert int32 vs int64 parity; update each test family (functions like
test_sparse_qc_csr/test_sparse_qc_csc, test_mean_var_major/minor,
test_nan_mean_major/minor, test_sparse2dense_c/test_sparse2dense_f,
test_sparse_norm_res_csr/csc, test_sparse_aggr, test_sparse_var, etc.) to also
compute a CPU reference using NumPy/SciPy (using _make_dense_cpu or SciPy sparse
operations) and assert numerical equality (or use _close where floating rounding
is expected) between the GPU result and the CPU reference rather than only
between idx dtypes; additionally parameterize critical tests to include edge
cases (single-row, empty matrix, and a CI-stress large shape) by reusing
_make_csr/_make_csc with those shapes and adding assertions for those cases so
each kernel is validated for correctness and edge behavior beyond int32/int64
parity, keeping existing helpers _eq/_close for comparisons.
src/rapids_singlecell/_cuda/sparse2dense/kernels_s2d.cuh (1)
23-24: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add negative index guard to prevent out-of-bounds memory access.

Line 23 only checks idx >= minor but does not guard against negative idx values. If idx is negative, it passes the guard and produces incorrect calculations at lines 29/32, leading to out-of-bounds memory access in the atomicAdd at line 34.
Suggested fix
-            if (idx >= minor) {
+            if (idx < 0 || idx >= minor) {
                 continue;
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rapids_singlecell/_cuda/sparse2dense/kernels_s2d.cuh` around lines 23 -
24, The current guard only checks "idx >= minor" which allows negative idx
values to slip through and cause out-of-bounds writes in the subsequent
atomicAdd; modify the check around the kernel's index validation (the condition
that currently reads "if (idx >= minor) { continue; }") to also guard against
negative indices (e.g., ensure idx < 0 is handled) so that any idx < 0 or idx >=
minor will skip processing, preventing invalid memory accesses in the atomicAdd
using idx.

🧹 Nitpick comments (1)

src/rapids_singlecell/_cuda/mean_var/kernels_mv.cuh (1)
18-19: 💤 Low value

Hard-coded shared memory size should be a named constant.

The shared memory arrays use a raw literal 64 which should match BLOCK_SIZE_MAJOR defined in mean_var.cu. Per coding guidelines, numeric literals for block sizes and shared memory sizes should be defined as named constants.

Consider defining this in the header or passing as a template parameter to ensure consistency:
constexpr int BLOCK_SIZE_MAJOR = 64;
// ...
__shared__ double mean_place[BLOCK_SIZE_MAJOR];
__shared__ double var_place[BLOCK_SIZE_MAJOR];
As per coding guidelines: "All numeric literals for block sizes, tile dimensions, shared memory sizes... MUST be defined as named constants."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rapids_singlecell/_cuda/mean_var/kernels_mv.cuh` around lines 18 - 19,
Replace the hard-coded literal 64 used for shared memory arrays mean_place and
var_place with a named constant (e.g., BLOCK_SIZE_MAJOR) so the size stays in
sync with mean_var.cu; declare constexpr int BLOCK_SIZE_MAJOR = 64 in an
appropriate header or pass it as a template parameter and update the
declarations __shared__ double mean_place[BLOCK_SIZE_MAJOR]; and __shared__
double var_place[BLOCK_SIZE_MAJOR]; ensuring the symbol name matches the one
used in mean_var.cu.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/rapids_singlecell/_cuda/aggr/aggr.cu`:
- Around line 17-18: The sparse-kernel launch sites compute dim3
grid((unsigned)n_cells) and dim3 block(BLOCK_SIZE_SPARSE) and can produce
invalid launches when a dimension is zero; add an early return guard in each of
the four sparse launch helper functions so they return immediately if the work
size is zero (e.g., if n_cells==0 or any other computed grid dimension == 0)
before computing dim3 grid/block or launching the kernel (references: dim3 grid,
dim3 block, BLOCK_SIZE_SPARSE).

In `@src/rapids_singlecell/_cuda/aggr/kernels_aggr.cuh`:
- Around line 56-73: csr_to_coo_kernel currently truncates 64-bit indices by
casting index[p] to int and writing into int col buffers, corrupting indices ≥
INT_MAX; fix by propagating IdxT for column buffers and bindings: change
csr_to_coo_kernel signature to accept IdxT* __restrict__ col (and ensure row/col
uses IdxT where appropriate), update the nanobind binding that currently exposes
gpu_array_c<int, Device> to accept gpu_array_c<IdxT, Device> for the output
column array, and update the callers in _aggregated.py to allocate output column
buffers matching the input index dtype; alternatively, if API change is
unacceptable, add pre-launch validation that checks max(index) < INT_MAX and
fail with a clear error (guarding launch) and document the limitation.

In `@src/rapids_singlecell/_cuda/ligrec/kernels_ligrec.cuh`:
- Around line 36-38: The code narrows 64-bit sparse indices by assigning
index[gene] (type IdxT) to an int (gene_number) in the loop inside
kernels_ligrec.cuh; update that local variable to use IdxT (or another 64-bit
type) so indices aren't truncated and repeat the same change in
mean_sparse_kernel, ensuring all uses that compute offsets/column indices (e.g.,
the local gene_number and any index-based arithmetic) operate in 64-bit to match
the explicit instantiation with IdxT = long long.

In `@src/rapids_singlecell/_cuda/ligrec/ligrec.cu`:
- Around line 180-197: The binding def_interaction advertises stream support but
the kernel launch inside launch_interaction ignores the passed stream; update
launch_interaction<T> to take a cudaStream_t (or std::uintptr_t cast to
cudaStream_t) and use that stream in the kernel launch (use <<<grid, block, 0,
stream>>> or the equivalent CUDA launch API) so ordering with upstream writes to
mean/mask/g and downstream reads from res is respected; keep the def_interaction
call to launch_interaction<T>(..., (cudaStream_t)stream) but change
launch_interaction's signature and its internal kernel invocation to accept and
use the stream.
- Around line 24-35: The sparse launcher launch_sum_count_sparse can compute
grid.x == 0 when rows == 0 which yields an invalid CUDA launch; add an
early-return guard at the top of launch_sum_count_sparse (and any other sparse
launcher that computes grid from rows) to return immediately when rows <= 0 so
no kernel is launched, and keep the rest unchanged; also fix launch_interaction
so the interaction kernel is launched with the provided cudaStream_t stream (use
<<<grid, block, 0, stream>>> for interaction_kernel) instead of the current
launch that ignores the stream. Ensure you reference the functions
sum_and_count_sparse_kernel and interaction_kernel in the fixes.

In `@src/rapids_singlecell/_cuda/nb_types.h`:
- Around line 31-45: The function max_grid_dims reads CUDA results without
checking return codes, so wrap the calls to cudaGetDevice and
cudaGetDeviceProperties in error-checks inside max_grid_dims: call cudaGetDevice
and if it returns non-success do not change cached_dev or cached (keep the safe
fallback) and return cached; if cudaGetDevice succeeds then call
cudaGetDeviceProperties and only update cached[0..2] and cached_dev when
cudaGetDeviceProperties returns success; optionally log or assert the CUDA error
before returning the fallback. Ensure you reference and protect the existing
static thread_local variables cached and cached_dev and the calls to
cudaGetDevice and cudaGetDeviceProperties in the max_grid_dims function.

In `@src/rapids_singlecell/_cuda/sparse2dense/sparse2dense.cu`:
- Around line 12-13: The max_nnz parameter is currently declared as int and must
be widened to long long everywhere to avoid truncation for per-row nnz >
INT_MAX; update the function signatures (the top-level host function(s) in
sparse2dense.cu that accept max_nnz), any device/kernel wrappers and internal
calls that pass max_nnz to strided_grid_y(), and all call sites so that max_nnz
is consistently long long; ensure strided_grid_y() invocations and any
arithmetic or loop bounds using max_nnz are updated to use 64-bit types to
preserve correct grid_y calculation and kernel grid-stride behavior.

In `@src/rapids_singlecell/_cuda/spca/kernels_spca.cuh`:
- Around line 56-58: The code casts indices[value] (IdxT) to int in
kernels_spca.cuh before bounds checking, which can truncate 64-bit indices; fix
by first reading into the wide type (e.g., IdxT raw = indices[value] or int64_t
tmp = static_cast<int64_t>(indices[value])), validate that raw is within [0,
num_genes-1] and also fits in int32 range (<= INT_MAX and >= INT_MIN) and only
then static_cast<int> to produce gene_index used by
atomicAdd(&genes[gene_index], 1); update the checks around gene_index/indices so
bounds checks operate on the wide type (raw/tmp) rather than the narrowed int to
avoid silent truncation and index corruption.

In `@src/rapids_singlecell/_cuda/spca/spca.cu`:
- Around line 16-20: The kernel launch for gram_csr_upper_kernel uses dim3
grid(nrows) which is invalid when nrows == 0; add an early return guard (check
nrows == 0) before setting block/grid and before calling
gram_csr_upper_kernel<T, IdxT> to skip the launch (matching the pattern used by
launch_check_zero_genes), leaving CUDA_CHECK_LAST_ERROR call for non-zero
launches.

In `@src/rapids_singlecell/_cuda/wilcoxon_binned/kernels_wilcoxon_binned.cuh`:
- Around line 50-52: The CSC kernel narrows indices[i] to int (col/row) then
indexes gcodes[row] without validating the narrowed row, risking
wraparound/out-of-bounds; add an explicit bounds check after narrowing (e.g.,
int row = (int)indices[i]; if (row < 0 || row >= nrows) continue;) before any
gcodes[row] access in the CSC kernel, and likewise add a similar validation for
col in the CSR kernel (validate the narrowed int col is within [gene_start,
gene_stop) before using it) so both kernels mirror the CSR row check and prevent
buffer underruns.

In `@src/rapids_singlecell/_cuda/wilcoxon_binned/wilcoxon_binned.cu`:
- Around line 31-33: The kernel launches in launch_csr_hist, launch_csc_hist,
and launch_dense_hist can be invoked with a zero-sized grid (grid(n_cells) or
grid(n_genes)), which is invalid; add an early return guard at the start of each
function to check the relevant size (e.g., if n_cells == 0 return; in
launch_csr_hist, and if n_genes == 0 return; in launch_csc_hist and
launch_dense_hist) so the subsequent dim3 grid(...) and
csr_hist_kernel/csc_hist_kernel/dense_hist_kernel launches are never called with
zero dimensions.

In `@tests/test_int64_indptr.py`:
- Line 148: Update the comparison loops that iterate over outs to use strict
zipping so length mismatches raise errors: replace occurrences like "for a, b in
zip(outs[np.int32], outs[np.int64]):" with a strict zip call (add strict=True)
in all listed callsites (the for-loops comparing outs at lines referenced, e.g.,
the loop using outs[np.int32] and outs[np.int64] and the other similar for-loops
throughout tests/test_int64_indptr.py). Ensure each zip(...) call used for
pairwise assertions (comparing tuples/iterables from outs) includes strict=True
so tuple parity is enforced.

---

Outside diff comments:
In `@src/rapids_singlecell/_cuda/sparse2dense/kernels_s2d.cuh`:
- Around line 23-24: The current guard only checks "idx >= minor" which allows
negative idx values to slip through and cause out-of-bounds writes in the
subsequent atomicAdd; modify the check around the kernel's index validation (the
condition that currently reads "if (idx >= minor) { continue; }") to also guard
against negative indices (e.g., ensure idx < 0 is handled) so that any idx < 0
or idx >= minor will skip processing, preventing invalid memory accesses in the
atomicAdd using idx.

In `@tests/test_int64_indptr.py`:
- Around line 73-933: The tests currently only assert int32 vs int64 parity;
update each test family (functions like test_sparse_qc_csr/test_sparse_qc_csc,
test_mean_var_major/minor, test_nan_mean_major/minor,
test_sparse2dense_c/test_sparse2dense_f, test_sparse_norm_res_csr/csc,
test_sparse_aggr, test_sparse_var, etc.) to also compute a CPU reference using
NumPy/SciPy (using _make_dense_cpu or SciPy sparse operations) and assert
numerical equality (or use _close where floating rounding is expected) between
the GPU result and the CPU reference rather than only between idx dtypes;
additionally parameterize critical tests to include edge cases (single-row,
empty matrix, and a CI-stress large shape) by reusing _make_csr/_make_csc with
those shapes and adding assertions for those cases so each kernel is validated
for correctness and edge behavior beyond int32/int64 parity, keeping existing
helpers _eq/_close for comparisons.

---

Nitpick comments:
In `@src/rapids_singlecell/_cuda/mean_var/kernels_mv.cuh`:
- Around line 18-19: Replace the hard-coded literal 64 used for shared memory
arrays mean_place and var_place with a named constant (e.g., BLOCK_SIZE_MAJOR)
so the size stays in sync with mean_var.cu; declare constexpr int
BLOCK_SIZE_MAJOR = 64 in an appropriate header or pass it as a template
parameter and update the declarations __shared__ double
mean_place[BLOCK_SIZE_MAJOR]; and __shared__ double var_place[BLOCK_SIZE_MAJOR];
ensuring the symbol name matches the one used in mean_var.cu.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f5b7164-8c8f-49b5-8450-ffdb2781ea8d

📥 Commits

Reviewing files that changed from the base of the PR and between b7e6577 and 4f2774c.

📒 Files selected for processing (37)

src/rapids_singlecell/_cuda/aggr/aggr.cu
src/rapids_singlecell/_cuda/aggr/kernels_aggr.cuh
src/rapids_singlecell/_cuda/autocorr/autocorr.cu
src/rapids_singlecell/_cuda/autocorr/kernels_autocorr.cuh
src/rapids_singlecell/_cuda/edistance/edistance.cu
src/rapids_singlecell/_cuda/gmm/gmm.cu
src/rapids_singlecell/_cuda/harmony/colsum/colsum.cu
src/rapids_singlecell/_cuda/harmony/kmeans/kmeans.cu
src/rapids_singlecell/_cuda/harmony/normalize/normalize.cu
src/rapids_singlecell/_cuda/harmony/outer/outer.cu
src/rapids_singlecell/_cuda/harmony/scatter/scatter.cu
src/rapids_singlecell/_cuda/hvg/hvg.cu
src/rapids_singlecell/_cuda/ligrec/kernels_ligrec.cuh
src/rapids_singlecell/_cuda/ligrec/ligrec.cu
src/rapids_singlecell/_cuda/mean_var/kernels_mv.cuh
src/rapids_singlecell/_cuda/mean_var/mean_var.cu
src/rapids_singlecell/_cuda/nanmean/kernels_nanmean.cuh
src/rapids_singlecell/_cuda/nanmean/nanmean.cu
src/rapids_singlecell/_cuda/nb_types.h
src/rapids_singlecell/_cuda/norm/kernels_norm.cuh
src/rapids_singlecell/_cuda/norm/norm.cu
src/rapids_singlecell/_cuda/pr/kernels_pr.cuh
src/rapids_singlecell/_cuda/pr/kernels_pr_hvg.cuh
src/rapids_singlecell/_cuda/pr/pr.cu
src/rapids_singlecell/_cuda/qc/kernels_qc.cuh
src/rapids_singlecell/_cuda/qc/qc.cu
src/rapids_singlecell/_cuda/qc_dask/kernels_qcd.cuh
src/rapids_singlecell/_cuda/qc_dask/qc_kernels_dask.cu
src/rapids_singlecell/_cuda/scale/kernels_scale.cuh
src/rapids_singlecell/_cuda/scale/scale.cu
src/rapids_singlecell/_cuda/sparse2dense/kernels_s2d.cuh
src/rapids_singlecell/_cuda/sparse2dense/sparse2dense.cu
src/rapids_singlecell/_cuda/spca/kernels_spca.cuh
src/rapids_singlecell/_cuda/spca/spca.cu
src/rapids_singlecell/_cuda/wilcoxon_binned/kernels_wilcoxon_binned.cuh
src/rapids_singlecell/_cuda/wilcoxon_binned/wilcoxon_binned.cu
tests/test_int64_indptr.py

codecov-commenter · 2026-05-18T14:08:04Z

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@05fcba0). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/rapids_singlecell/get/_aggregated.py	90.90%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #620   +/-   ##
=======================================
  Coverage        ?   88.63%           
=======================================
  Files           ?       98           
  Lines           ?     7364           
  Branches        ?        0           
=======================================
  Hits            ?     6527           
  Misses          ?      837           
  Partials        ?        0

Files with missing lines	Coverage Δ
...apids_singlecell/get/_kernels/_aggr_elementwise.py	`100.00% <ø> (ø)`
src/rapids_singlecell/get/_aggregated.py	`90.71% <90.90%> (ø)`

first draft for 64 bit support

df69917

Intron7 marked this pull request as draft March 21, 2026 09:09

eriknw mentioned this pull request Mar 23, 2026

Add int64 index support to cupyx.scipy.sparse cupy/cupy#9825

Merged

Merge branch 'main' into get-ready-for-64-bit-indptr

6d7612e

Intron7 marked this pull request as ready for review May 18, 2026 12:17

Intron7 added 3 commits May 18, 2026 14:25

make templating better

5785697

add 64 bit tests

96526ba

make ad gird strides

4f2774c

Intron7 added the run-gpu-ci label May 18, 2026