PCA fails with CUDA out-of-memory on large datasets (2.7M cells) despite sufficient VRAM

Hi,

I'm scalesc to run a big-scale spatial transcriptomic data using it's GPU supported. My data is ~2.7 million cells x 6,175 genes. However, when I tried to run PCA, it would always return me the CUDA memory error. More specifically, I suspect that it's due to the process when it tries to convert the sparse data to dense. Please see the detailed version and error message below. I'm wondering if the team has any insights in fixing this? 

### Environment
- ScaleSC version: 0.1.0
- CuPy version: 13.6.0
- GPU: NVIDIA L40S (47.67GB VRAM × 2)
- CUDA: 13.0, Driver: 580.159.03

### Dataset
- 2,702,045 cells × 6,157 genes
- Sparse CSR float32
- 3,000 HVGs selected via seurat_v3

### Note on initialization
I'm using a custom wrapper to initialize ScaleSC due to a separate shape reader issue on large datasets:

```python
def init_scalesc_with_fix(data_dir, **kwargs):
    ssc_obj = ssc.ScaleSC(data_dir=data_dir, **kwargs)
    adata = ssc_obj.reader._get_anndata_obj()
    n_obs, n_vars = adata.shape
    ssc_obj.reader.n_cell = n_obs
    ssc_obj.reader.n_gene = n_vars
    ssc_obj.reader.n_cell_origin = n_obs
    ssc_obj.reader.n_gene_origin = n_vars
    return ssc_obj

ssc_obj = init_scalesc_with_fix(
    data_dir=SCALESC_DIR,
    preload_on_cpu=True,
    preload_on_gpu=False,
    gpus=[0, 1],
    save_after_each_step=True,
    max_cell_batch=25000
)
```

`normalize_log1p` and `highly_variable_genes` both complete successfully. The full pipeline also ran without issues on smaller datasets.

---

### Problem
`pca()` consistently fails with:

```
MemoryError: std::bad_alloc: out_of_memory: CUDA error at: /gscratch/stf/wz34/envs/ScaleSC/include/rmm/mr/device/cuda_memory_resource.hpp
```
I noticed
#### 1 — allocation is always exactly ~17GB
The allocation request is **consistently 17383920128B (~17GB) regardless of `max_cell_batch` value** (tested with 100,000, 50,000, 25,000, and 1,000). This suggests the ~17GB is a fixed overhead unrelated to batch size, making `max_cell_batch` ineffective as a workaround.

#### 2 — only GPU 0 is used
GPU memory monitoring confirms GPU 1 is completely unused despite `gpus=[0, 1]`:
```
GPU 0: free=30.69 GB, total=47.67 GB  ← all allocations here
GPU 1: free=47.18 GB, total=47.67 GB  ← completely idle
```

Any insights or help would be greatly appreciated. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCA fails with CUDA out-of-memory on large datasets (2.7M cells) despite sufficient VRAM #5

Environment

Dataset

Note on initialization

Problem

1 — allocation is always exactly ~17GB

2 — only GPU 0 is used

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

PCA fails with CUDA out-of-memory on large datasets (2.7M cells) despite sufficient VRAM #5

Description

Environment

Dataset

Note on initialization

Problem

1 — allocation is always exactly ~17GB

2 — only GPU 0 is used

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions