# Ligrec Benchmark
This notebook benchmarks `gr.ligrec` for *squidpy* and *rapids-singlecell*.

To run this notebook please make sure you have a working rapids environment with all nessaray dependencies. Run the [data_downloader](https://github.com/scverse/rapids_singlecell-notebooks/blob/abc4fc6f3fe7f85cbffb94e76d190cad0ae00a5f/data_downloader.ipynb) notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of ca. 90000 cells from [Quin et al., Cell Research 2020](https://www.nature.com/articles/s41422-020-0355-0).

In [1]:
import scanpy as sc
import squidpy as sq
import cupy as cp
import rapids_singlecell as rsc

import warnings

warnings.filterwarnings("ignore")

  return _bootstrap._gcd_import(name[level:], package, level)
  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator

rmm.reinitialize(
    managed_memory=False,  # Allows oversubscription
    pool_allocator=False,  # default is False
    devices=0,  # GPU device IDs to register. By default registers only GPU 0.
)
cp.cuda.set_allocator(rmm_cupy_allocator)

## Load and Prepare Data

We load the sparse count matrix from an `h5ad` file using Scanpy. The sparse count matrix will then be placed on the GPU and run basic preprocessing for `rsc.gr.ligrec`

In [3]:
%%time
adata = sc.read("h5/adata.raw.h5ad")

CPU times: user 3.35 s, sys: 401 ms, total: 3.75 s
Wall time: 3.75 s


In [4]:
rsc.get.anndata_to_GPU(adata)

In [5]:
%%time
rsc.pp.flag_gene_family(adata, gene_family_name="MT", gene_family_prefix="MT-")

CPU times: user 5.55 ms, sys: 153 μs, total: 5.7 ms
Wall time: 5.56 ms


In [6]:
%%time
rsc.pp.calculate_qc_metrics(adata, qc_vars=["MT"])

CPU times: user 40.8 ms, sys: 4.95 ms, total: 45.8 ms
Wall time: 45.5 ms


In [7]:
%%time
adata = adata[adata.obs["n_genes_by_counts"] < 5000]
adata.shape

CPU times: user 13.5 ms, sys: 2.39 ms, total: 15.9 ms
Wall time: 15.5 ms


(92666, 33694)

In [8]:
%%time
adata = adata[adata.obs["pct_counts_MT"] < 20]
adata.shape

CPU times: user 11.2 ms, sys: 2.06 ms, total: 13.3 ms
Wall time: 13.1 ms


(91180, 33694)

In [9]:
%%time
rsc.pp.filter_genes(adata, min_count=3)

filtered out 8034 genes based on n_cells_by_counts
CPU times: user 124 ms, sys: 55.1 ms, total: 179 ms
Wall time: 179 ms


In [10]:
%%time
rsc.pp.normalize_total(adata, target_sum=1e4)

CPU times: user 413 μs, sys: 1.12 ms, total: 1.53 ms
Wall time: 1.11 ms


In [11]:
%%time
rsc.pp.log1p(adata)

CPU times: user 2.21 ms, sys: 12.1 ms, total: 14.3 ms
Wall time: 14.1 ms


In [12]:
%%time
rsc.get.anndata_to_CPU(adata)
adata.raw = adata

CPU times: user 110 ms, sys: 140 ms, total: 250 ms
Wall time: 249 ms


In [13]:
adata

AnnData object with n_obs × n_vars = 91180 × 25660
    obs: 'nGene', 'nUMI', 'CellFromTumor', 'PatientNumber', 'TumorType', 'TumorSite', 'CellType', 'n_genes_by_counts', 'total_counts', 'log1p_n_genes_by_counts', 'log1p_total_counts', 'total_counts_MT', 'pct_counts_MT', 'log1p_total_counts_MT'
    var: 'gene_ids', 'MT', 'n_cells_by_counts', 'total_counts', 'mean_counts', 'pct_dropout_by_counts', 'log1p_total_counts', 'log1p_mean_counts'
    uns: 'log1p'

## Ligrec Benchmark

First we download the interactions so that both function get evaluated in the same way

In [14]:
interactions = rsc.squidpy_gpu._ligrec._get_interactions()


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/urllib3/connectionpool.py", line 944, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/urllib3/connectionpool.py", line 944, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/urllib3/connectionpool.py", line 944, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/urllib3/connectionpool.py", line 934, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/urllib3/util/retr

RuntimeError: No active exception to reraise

Next, we execute the function using both the *rapids-singlecell* and *squidpy* versions for comparison

In [None]:
%%time
res_rsc = rsc.gr.ligrec(
    adata,
    n_perms=1000,
    interactions=interactions,
    cluster_key="CellType",
    copy=True,
    use_raw=True,
)

In [None]:
res_rsc["means"].iloc[:10, :10]

In [None]:
res_rsc["pvalues"].iloc[:10, :10]

In [None]:
%%time
res_sq = sq.gr.ligrec(
    adata,
    n_perms=1000,
    interactions=interactions,
    cluster_key="CellType",
    copy=True,
    use_raw=True,
    n_jobs=32,
)

In [None]:
res_sq["means"].iloc[:10, :10]

In [None]:
res_sq["pvalues"].iloc[:10, :10]