This notebook reproduces the example from scDesign package: [Simulate datasets with batch effect](https://songdongyuan1994.github.io/scDesign3/docs/articles/scDesign3-batchEffect-vignette.html)

In [1]:
import anndata
import os
import requests

save_path = "data/pbmsca.h5ad"
if not os.path.exists(save_path):
    response = requests.get("https://go.wisc.edu/w46b81")
    with open(save_path, "wb") as f:
        f.write(response.content)

example_sce = anndata.read_h5ad(save_path)

In [2]:
example_sce = example_sce[:, :100].to_memory()
example_sce.obs

Unnamed: 0,phenoid,orig.ident,nCount_RNA,nFeature_RNA,nGene,nUMI,percent.mito,Cluster,Experiment,Method,ident,batch,cell_type
pbmc2_10X_V2_AAACCTGAGATGGGTC,B cell,pbmc2,2360.0,1044,1044,2360,0.0419491525423729,2,pbmc2,10x Chromium (v2),pbmc2,10x Chromium (v2),B cell
pbmc2_10X_V2_AAACCTGAGCGTAATA,B cell,pbmc2,1888.0,803,803,1888,0.0413135593220339,2,pbmc2,10x Chromium (v2),pbmc2,10x Chromium (v2),B cell
pbmc2_10X_V2_AAACCTGAGCTAGGCA,Cytotoxic T cell,pbmc2,3456.0,1372,1372,3456,0.0353009259259259,1,pbmc2,10x Chromium (v2),pbmc2,10x Chromium (v2),Cytotoxic T cell
pbmc2_10X_V2_AAACCTGAGGGTCTCC,Dendritic cell,pbmc2,3802.0,1519,1519,3802,0.0420831141504471,6,pbmc2,10x Chromium (v2),pbmc2,10x Chromium (v2),Dendritic cell
pbmc2_10X_V2_AAACCTGGTCCGAACC,CD4+ T cell,pbmc2,3826.0,1451,1451,3826,0.0371144798745426,0,pbmc2,10x Chromium (v2),pbmc2,10x Chromium (v2),CD4+ T cell
...,...,...,...,...,...,...,...,...,...,...,...,...,...
pbmc1_10x_v3_TCATGCCCAATGAAAC,Dendritic cell,pbmc1,13524.0,3042,3042,13524,0.0936113575865129,8,pbmc1,10x Chromium (v3),pbmc1,10x Chromium (v3),Dendritic cell
pbmc1_10x_v3_TCCTTTCGTGCGAGTA,Dendritic cell,pbmc1,15400.0,3166,3166,15400,0.0883766233766234,8,pbmc1,10x Chromium (v3),pbmc1,10x Chromium (v3),Dendritic cell
pbmc1_10x_v3_TGTGATGCATCATCTT,Dendritic cell,pbmc1,15566.0,3303,3303,15566,0.0920596171142233,8,pbmc1,10x Chromium (v3),pbmc1,10x Chromium (v3),Dendritic cell
pbmc1_10x_v3_TTGAACGCATGCAGCC,Dendritic cell,pbmc1,16389.0,3361,3361,16389,0.0961010433827567,8,pbmc1,10x Chromium (v3),pbmc1,10x Chromium (v3),Dendritic cell


This is not quite the simulator used in the scDesign3 vignette, because we are using the same copula correlation across all groups. We need a version of negative_binomial_copula that takes a grouping variable in the formula as well.

In [3]:
from scdesigner.experimental.estimators import negative_binomial_copula

formula = "~ cell_type + batch"
params = negative_binomial_copula(example_sce, formula)

In [4]:
from scdesigner.experimental.samplers import negative_binomial_copula_sample

simulated = negative_binomial_copula_sample(params, example_sce.obs, formula)

In [5]:
import scdesigner.experimental.transform as tr

mask = tr.data_frame_mask(params["coefficient"], "batch")
null_params = tr.nullify(params, "coefficient", mask)
synthetic_null = negative_binomial_copula_sample(null_params, example_sce.obs, formula)

In [6]:
import numpy as np

mask = tr.data_frame_mask(params["coefficient"], "batch")
new_value = np.random.normal(5, 2, (10, 100))
alter_params = tr.substitute(params, "coefficient", new_value, mask)
synthetic_alter = negative_binomial_copula_sample(alter_params, example_sce.obs, formula)

In [7]:
import altair
from scdesigner.experimental.diagnose import plot_umap

altair.data_transformers.enable("vegafusion")
altair.renderers.enable("jupyter")

combined = anndata.concat({"real": example_sce, "sim": simulated, "null": synthetic_null, "alter": synthetic_alter}, label="source")
plot_umap(combined, color="cell_type", shape="batch", facet="source")

JupyterChart()