# Remove ambient RNA with CellBender

Despite the progress in the optimization and standardiyation of droplet-based single-cell omics protocols like single cell/nuclei RNA-seq (sc/snRNAseq), these experiments are not exempt from systematic biases and background noise. In snRNAseq ambient RNA can cause an overestimation of the expression of some genes. To tackle these biases there are computational tools like [CellBender](https://cellbender.readthedocs.io/en/latest/) which allow to correct for the presence of this ambient RNA.

We have implemented a method to run CellBender in the `DOTools_py` package. Currently, the implementation supports the analysis of samples processed with CellRanger.

In [3]:
# Set-Up
import dotools_py as do

In [2]:
do.dt.example_10x(path="/home/drodriguez/DOTools_py/example_10X")

2025-06-13 14:57:07,085 - Downloading data to /home/drodriguez/DOTools_py/example_10X


Downloading healthy filtered: 100%|█████████████████████████████████████████████████████████████████████████████████████| 20.8M/20.8M [00:00<00:00, 65.2MiB/s]
Downloading healthy raw: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 147M/147M [00:01<00:00, 83.7MiB/s]
Downloading disease filtered: 100%|█████████████████████████████████████████████████████████████████████████████████████| 18.7M/18.7M [00:00<00:00, 78.3MiB/s]
Downloading disease raw: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 144M/144M [00:01<00:00, 84.9MiB/s]


In [3]:
do.pp.run_cellbender(
    cellranger_path="/home/drodriguez/DOTools_py/example_10X",  # Contains subfolders for every sample map with CellRanger
    output_path="/home/drodriguez/DOTools_py/cellbender",  # Save the output files from CellBender
    samplenames=["healthy", "disease"],  # Name of subfolders, if not specified detected automatically
    cuda=True,  # Run on GPU !!Recommended (Can take up to 1 hour)
    cpu_threads=20,  # If not GPU available, control how many CPUs to use
    epochs=150,  # Default is enough
    lr=0.00001,  # Learning Rate
    log=True,  # Generates a log file for each sample with the stdout
)

2025-06-13 14:57:12,200 - Conda environment with CellBender available using (/home/drodriguez/.venv/cellbender)
2025-06-13 14:57:12,201 - Running cellbender for 2 samples
2025-06-13 14:57:12,203 - Running Cellbender for healthy, might take a while

cellbender:remove-background: Command:
cellbender remove-background --input=/home/drodriguez/DOTools_py/example_10X/healthy/outs/raw_feature_bc_matrix.h5 --output=/home/drodriguez/DOTools_py/cellbender/healthy_out.h5 --cpu-threads=20 --epochs=150 --learning-rate=1e-05 --force-cell-umi-prior=500 --cuda
cellbender:remove-background: CellBender 0.3.0
cellbender:remove-background: (Workflow hash 5673c87ca4)
cellbender:remove-background: 2025-06-13 14:57:25
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from /home/drodriguez/DOTools_py/example_10X/healthy/outs/raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Features in dataset: 17 Antibod

ERROR conda.cli.main_run:execute(125): `conda run bash /home/drodriguez/miniconda3/envs/do_env/lib/python3.10/site-packages/dotools_py/util_scripts/_run_CellBender.sh -i healthy -o /home/drodriguez/DOTools_py/cellbender --cellRanger-output /home/drodriguez/DOTools_py/example_10X --cpu-threads 20 --epochs 150 --lr 1e-05 --cuda --log` failed. (See above for error)



cellbender:remove-background: Command:
cellbender remove-background --input=/home/drodriguez/DOTools_py/example_10X/disease/outs/raw_feature_bc_matrix.h5 --output=/home/drodriguez/DOTools_py/cellbender/disease_out.h5 --cpu-threads=20 --epochs=150 --learning-rate=1e-05 --force-cell-umi-prior=500 --cuda
cellbender:remove-background: CellBender 0.3.0
cellbender:remove-background: (Workflow hash 1a031bf435)
cellbender:remove-background: 2025-06-13 15:07:32
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from /home/drodriguez/DOTools_py/example_10X/disease/outs/raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Features in dataset: 17 Antibody Capture, 33538 Gene Expression
cellbender:remove-background: Trimming features for inference.
cellbender:remove-background: 22506 features have nonzero counts.
cellbender:remove-background: Prior on counts for cells is 500
cellbender:remove-backg

ERROR conda.cli.main_run:execute(125): `conda run bash /home/drodriguez/miniconda3/envs/do_env/lib/python3.10/site-packages/dotools_py/util_scripts/_run_CellBender.sh -i disease -o /home/drodriguez/DOTools_py/cellbender --cellRanger-output /home/drodriguez/DOTools_py/example_10X --cpu-threads 20 --epochs 150 --lr 1e-05 --cuda --log` failed. (See above for error)


After running the analysis we have several files in the `output_folder`, including a report file where we can check if there was a problem when running cellbender, a log file for each sample and a `commands_Cellbender.txt` file with the exact call used to run cellbender. We can now use the H5 files with ambient RNA correction for downstream analysis

In [2]:
import session_info

session_info.show(na=False, cpu=True, excludes=["backports"], std_lib=True, dependencies=True, html=True)