This notebook reproduces the example from scDesign package: [Simulate datasets with condition effect](https://songdongyuan1994.github.io/scDesign3/docs/articles/scDesign3-conditionEffect-vignette.html)

In [1]:
import anndata
import os
import requests

save_path = "data/ifnb.h5ad"
if not os.path.exists(save_path):
    response = requests.get("https://go.wisc.edu/qi5grg")
    with open(save_path, "wb") as f:
        f.write(response.content)

example_sce = anndata.read_h5ad(save_path)

In [2]:
example_sce.obs['cell_type'].value_counts()

cell_type
CD14 Mono       4362
CD4 Naive T     2504
CD4 Memory T    1762
CD16 Mono       1044
B                978
CD8 T            814
T activated      633
NK               619
DC               472
B Activated      388
Mk               236
pDC              132
Eryth             55
Name: count, dtype: int64

In [3]:
example_sce = example_sce[example_sce.obs["cell_type"].isin(["CD14 Mono", "B"]), :100].to_memory()
example_sce.obs

Unnamed: 0,orig.ident,nCount_RNA,nFeature_RNA,stim,seurat_annotations,ident,cell_type,condition
AAACATACATTTCC.1,IMMUNE_CTRL,3017.0,877,CTRL,CD14 Mono,IMMUNE_CTRL,CD14 Mono,CTRL
AAACATACCAGAAA.1,IMMUNE_CTRL,2481.0,713,CTRL,CD14 Mono,IMMUNE_CTRL,CD14 Mono,CTRL
AAACATACCTCGCT.1,IMMUNE_CTRL,3420.0,850,CTRL,CD14 Mono,IMMUNE_CTRL,CD14 Mono,CTRL
AAACATACGGCATT.1,IMMUNE_CTRL,1581.0,557,CTRL,CD14 Mono,IMMUNE_CTRL,CD14 Mono,CTRL
AAACATTGCTTCGC.1,IMMUNE_CTRL,2536.0,669,CTRL,CD14 Mono,IMMUNE_CTRL,CD14 Mono,CTRL
...,...,...,...,...,...,...,...,...
TTTGACTGCCCACT.1,IMMUNE_STIM,2751.0,743,STIM,CD14 Mono,IMMUNE_STIM,CD14 Mono,STIM
TTTGACTGCCCTAC.1,IMMUNE_STIM,2403.0,722,STIM,CD14 Mono,IMMUNE_STIM,CD14 Mono,STIM
TTTGACTGGCGAAG.1,IMMUNE_STIM,2205.0,760,STIM,B,IMMUNE_STIM,B,STIM
TTTGACTGGGTACT.1,IMMUNE_STIM,1123.0,507,STIM,B,IMMUNE_STIM,B,STIM


This is not quite the simulator used in the scDesign3 vignette, because we are using the same copula correlation across all groups. We need a version of negative_binomial_copula that takes a grouping variable in the formula as well.

In [4]:
from scdesigner.estimators import negbin_copula
from scdesigner.estimators.gaussian_copula_factory import group_indices

formula = "~ cell_type + condition + cell_type * condition"
params = negbin_copula(example_sce, formula, "cell_type", epochs=100)

                                                                                   

In [5]:
from scdesigner.samplers import negbin_copula_sample
from scdesigner.predictors import negbin_predict

groups = group_indices("cell_type", example_sce.obs)
local_params = negbin_predict(params, example_sce.obs, formula)
simulated = negbin_copula_sample(local_params, params["covariance"], groups, example_sce.obs)

In [9]:
from copy import deepcopy

new_params = deepcopy(params)
new_params["beta"].loc["cell_type[T.B]:condition[T.STIM]"] = -params["beta"].loc["condition[T.STIM]"]
new_local = negbin_predict(new_params, example_sce.obs, formula)
synthetic_null = negbin_copula_sample(new_local, params["covariance"], groups, example_sce.obs)
synthetic_null

AnnData object with n_obs × n_vars = 5340 × 100
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'stim', 'seurat_annotations', 'ident', 'cell_type', 'condition'

The real data are more "regular" than the original data. Is the UMAP from the online vignette being run simultaneously across real and simulated data (otherwise, how does it result in plots that aren't so spherical? Are the NB fits not sufficient?). We do see the condition effect disappear for cell type B in the synthetic null data.

In [None]:
import altair
from scdesigner.diagnose import plot_umap

altair.data_transformers.enable("vegafusion")
combined = anndata.concat({"real": example_sce, "sim": simulated, "null": synthetic_null}, label="source")
plot_umap(combined, color="cell_type", shape="condition", facet="source")