Skip to content

PCA fails with CUDA out-of-memory on large datasets (2.7M cells) despite sufficient VRAM #5

@wzeng96

Description

@wzeng96

Hi,

I'm scalesc to run a big-scale spatial transcriptomic data using it's GPU supported. My data is ~2.7 million cells x 6,175 genes. However, when I tried to run PCA, it would always return me the CUDA memory error. More specifically, I suspect that it's due to the process when it tries to convert the sparse data to dense. Please see the detailed version and error message below. I'm wondering if the team has any insights in fixing this?

Environment

  • ScaleSC version: 0.1.0
  • CuPy version: 13.6.0
  • GPU: NVIDIA L40S (47.67GB VRAM × 2)
  • CUDA: 13.0, Driver: 580.159.03

Dataset

  • 2,702,045 cells × 6,157 genes
  • Sparse CSR float32
  • 3,000 HVGs selected via seurat_v3

Note on initialization

I'm using a custom wrapper to initialize ScaleSC due to a separate shape reader issue on large datasets:

def init_scalesc_with_fix(data_dir, **kwargs):
    ssc_obj = ssc.ScaleSC(data_dir=data_dir, **kwargs)
    adata = ssc_obj.reader._get_anndata_obj()
    n_obs, n_vars = adata.shape
    ssc_obj.reader.n_cell = n_obs
    ssc_obj.reader.n_gene = n_vars
    ssc_obj.reader.n_cell_origin = n_obs
    ssc_obj.reader.n_gene_origin = n_vars
    return ssc_obj

ssc_obj = init_scalesc_with_fix(
    data_dir=SCALESC_DIR,
    preload_on_cpu=True,
    preload_on_gpu=False,
    gpus=[0, 1],
    save_after_each_step=True,
    max_cell_batch=25000
)

normalize_log1p and highly_variable_genes both complete successfully. The full pipeline also ran without issues on smaller datasets.


Problem

pca() consistently fails with:

MemoryError: std::bad_alloc: out_of_memory: CUDA error at: /gscratch/stf/wz34/envs/ScaleSC/include/rmm/mr/device/cuda_memory_resource.hpp

I noticed

1 — allocation is always exactly ~17GB

The allocation request is consistently 17383920128B (~17GB) regardless of max_cell_batch value (tested with 100,000, 50,000, 25,000, and 1,000). This suggests the ~17GB is a fixed overhead unrelated to batch size, making max_cell_batch ineffective as a workaround.

2 — only GPU 0 is used

GPU memory monitoring confirms GPU 1 is completely unused despite gpus=[0, 1]:

GPU 0: free=30.69 GB, total=47.67 GB  ← all allocations here
GPU 1: free=47.18 GB, total=47.67 GB  ← completely idle

Any insights or help would be greatly appreciated. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions