# Advanced Tutorial: Multi-Panel Integration and Downstream Analysis with CytoVI

In this tutorial, we demonstrate advanced functionality of **CytoVI**, a deep generative model for protein expression measurements from technologies such as flow cytometry, mass cytometry, or CITE-seq. Building on the quick start tutorial, we now explore how CytoVI can be used to integrate multiple cytometry panels, impute missing markers, transfer annotations between datasets, and uncover biological differences through differential expression and abundance analysis.

If you are new to CytoVI or unfamiliar with data loading, preprocessing, or training the model, we recommend starting with the [quick start tutorial](#) where these fundamental steps are introduced in detail. In this tutorial, we will work with preprocessed and partially annotated data to focus on the advanced use cases of the model.

Specifically, we analyze conventional flow cytometry data of tumor-infiltrating T cells obtained from patients with B-cell non-Hodgkin lymphoma (BNHL) These samples were profiled using two distinct antibody panels, which share a subset of common markers. Using CytoVI, we will integrate both panels into a shared representation space, infer missing marker expression, and perform downstream biological analysis to gain insights into T cell heterogeneity across patients.

Plan for this tutorial:

1. Load and inspect preprocessed data
2. Train a CytoVI model that integrates both antibody panels
3. Visualize the joint latent space and evaluate panel integration
4. Impute non-overlapping protein markers and assess imputation quality
5. Automatically annotate immune cell types via label transfer
6. Quantify differential protein expression across conditions or clusters
7. Detect disease-associated T cell states using label-free differential abundance analysis

In [None]:
# Install from GitHub for now
!pip install --quiet scvi-colab
from scvi_colab import install

install()

In [1]:
import os
import random
import tempfile
import requests
import scvi

import numpy as np  # type: ignore
import matplotlib.pyplot as plt # type: ignore
import scanpy as sc  # type: ignore
from scvi.external import cytovi # type: ignore
import torch  # type: ignore
from rich import print  # type: ignore

sc.set_figure_params(figsize=(4, 4))

scvi.settings.seed = 0
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)
print("Last run with scvi-tools version:", scvi.__version__)

Seed set to 0


## Loading the data

In this tutorial, we will work with a curated, lightweight subset of flow cytometry data from the BNHL study by Roider et al. 2024 (Nature Cell Biology, https://doi.org/10.1038/s41556-024-01358-2). The dataset includes flow cytometry measurements of T cells from 33 donors across two distinct antibody panels, each profiling 12 protein markers along with morphological features such as forward and side scatter (FSC and SSC). Samples were acquired across four independent experimental batches. For ease of use, the data have been preprocessed to correct for fluorescent spillover, restricted to live single-cell events, and transformed using a hyperbolic arcsin transformation, scaled and subsampled to ~5000 cells per panel. We will access the dataset as preprocessed .h5ad files. For demonstration purposes data from one of the panels comes with cell type annotations.

In [6]:
temp_dir_obj = tempfile.TemporaryDirectory()

adata_p1_path = os.path.join(temp_dir_obj.name, "Roider_et_al_BNHL_panel1.h5ad")
adata_p1 = sc.read(adata_p1_path, backup_url='https://figshare.com/ndownloader/files/56891468')

adata_p2_path = os.path.join(temp_dir_obj.name, "Roider_et_al_BNHL_panel2.h5ad")
adata_p2 = sc.read(adata_p2_path, backup_url='https://figshare.com/ndownloader/files/56891471')

  0%|          | 0.00/2.42M [00:00<?, ?B/s]

  0%|          | 0.00/1.89M [00:00<?, ?B/s]

In [13]:
adata_p1

AnnData object with n_obs × n_vars = 4983 × 14
    obs: 'sample_id', 'PatientID', 'batch', 'panel', 'Entity', 'cell_type'
    layers: '_nan_mask', 'raw', 'scaled', 'transformed'

As the data has been preprocessed already, we can directly merge the two panels into one anndata object using `cytovi.merge_batches()`. This will automatically register a `nan_layer` that will handle the modeling of missing markers under the hood.

In [14]:
adata = cytovi.merge_batches([adata_p1, adata_p2], batch_key='panel_batch')
adata

Backbone markers: CD3, CD4, CD45RA, CD69, CD8, FSC-A, FoxP3, Ki67, PD1, SSC-A
  adata = cytovi.merge_batches([adata_p1, adata_p2], batch_key='panel_batch')
  adata = register_nan_layer(


AnnData object with n_obs × n_vars = 9966 × 18
    obs: 'sample_id', 'PatientID', 'batch', 'panel', 'Entity', 'cell_type', 'panel_batch'
    var: '_batch_0', '_batch_1'
    layers: '_nan_mask', 'raw', 'scaled', 'transformed'

# load data from figshare
# show histograms split by panel
# show biaxial for a combination that we wanna impute