## scANVI analysis for healthy PBMC pilot study (Cai 2020 and Cai 2022)

**Objective**: Run scANVI analysis for label transfer for healthy PBMCs [Cai 2020 and Cai 2022]


- **Developed by**: Mairi McClean

- **Institute of Computational Biology - Computational Health Centre - Helmholtz Munich**

- v230321

- Following this tutorial [Tabla Muris]: https://docs.scvi-tools.org/en/stable/tutorials/notebooks/tabula_muris.html



### Import modules


In [None]:
!pip install --quiet scvi-colab
from scvi_colab import install

install()

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scanpy as sc
import scvi
from scvi.model.utils import mde

In [None]:
sc.set_figure_params(figsize=(4, 4))

%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}
%config InlineBackend.figure_format='retina'

### Read in datasets

In [None]:
reference = sc.read('/Volumes/Lacie/data_lake/Mairi_example/INBOX/sc_downloads/yoshida_2021/meyer_nikolic_covid_pbmc.cellxgene.20210813.h5ad')
query = sc.read('/Volumes/Lacie/data_lake/Mairi_example/processed_files/scvi/post_sccaf/CaiY_healthy_scRNA_PBMC_mm230316_scVI-clustered.raw.h5ad')

### Dataset concatenation and HVG selection

In [None]:
adata = reference.concatenate(query)
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
adata.raw = adata  # keep full dimension safe
sc.pp.highly_variable_genes(
    adata,
    flavor="seurat_v3",
    n_top_genes=2000,
    layer="counts",
    batch_key="sequencing_library",
    subset=True,
)

In [None]:
scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="sequencing_library")

### Transfer of annotations using scANVI

scANVI uses semi-supervised learning to improve the model learned with scVI, allowing us to transfer our cell type knowledge from the reference to the query data. For this, we simply need to indicate to scANVI:

> the sample identifier for each cell (as in scVI), which in this case is the technology (10x vs SS2)
> the cell type, or an unnassigned label for each cell

#### Confusion matrix