# B-Cell Receptor (BCR) Visualization for Anti-GAD65 Immune Cell Profiling

This notebook performs comprehensive BCR repertoire visualization and clonotype analysis using single-cell RNA-seq data from GAD65 autoimmune disease patients and controls.

## Overview
- **Input**: AnnData objects with BCR data from CSF and PBMC samples
- **Analysis**: BCR chain filtering, clonotype network analysis, and visualization
- **Tools**: Scanpy, Dandelion, Scirpy

## Requirements
Key packages:
- dandelion
- scanpy
- scirpy
- numpy, pandas, matplotlib

## 1. Import Libraries

In [None]:
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import dandelion as ddl
import scirpy as ir
import warnings
import os

warnings.filterwarnings('ignore')
ddl.logging.print_header()
sc.settings.verbosity = 1

## 2. Configuration

In [None]:
# File paths - update these to your data locations
INPUT_ADATA = 'seurat_object_combined_singlets_integrated_GAD_IIH_CSF_PBMC_Bcell_BCR_transfered_ADATA.h5ad'
INPUT_VDJ = 'seurat_object_combined_singlets_integrated_GAD_IIH_CSF_PBMC_Bcell_BCR_transfered_VDJ.h5ddl'
OUTPUT_FILTERED = 'seurat_object_combined_singlets_integrated_GAD_IIH_CSF_PBMC_Bcell_BCR_transfered_ADATA_filtered.h5ad'

# Color palettes
tab20 = plt.cm.tab20
CUSTOM_PALETTE = [
    tab20(0), tab20(4), tab20(19), tab20(10), tab20(17), 
    tab20(15), tab20(13), tab20(8), tab20(2), tab20(6)
]

## 3. Load Data

In [None]:
adata = sc.read(INPUT_ADATA)
vdj = ddl.read_h5ddl(INPUT_VDJ)

print(f"Loaded AnnData: {adata.n_obs} cells, {adata.n_vars} genes")

## 4. Quality Control - BCR Chain Status

### 4.1. Pre-filtering QC

In [None]:
ddl.pl.stackedbarplot(
    adata,
    color="Disease",
    groupby="locus_status",
    xtick_rotation=0,
    figsize=(8, 6),
    xtick_fontsize=14
)
plt.legend(bbox_to_anchor=(1, 1), loc="upper left", frameon=False)
plt.tight_layout()
plt.show()

In [None]:
ddl.pl.stackedbarplot(
    adata,
    color="Disease",
    groupby="chain_status",
    xtick_rotation=0,
    figsize=(8, 6),
    xtick_fontsize=14
)
plt.legend(bbox_to_anchor=(1, 1), loc="upper left", frameon=False)
plt.tight_layout()
plt.show()

### 4.2. Filter BCR Chains

Retain only cells with valid BCR chain combinations:
- Chain status: 'Single pair' or 'Extra pair'
- Locus status: 'IGH + IGK', 'IGH + IGL', or 'IGH + Extra VJ'

In [None]:
valid_chain_status = ['Single pair', 'Extra pair']
valid_locus_status = ['IGH + IGK', 'IGH + IGL', 'IGH + Extra VJ']

adata_filtered = adata[adata.obs['chain_status'].isin(valid_chain_status)].copy()
adata_filtered = adata_filtered[adata_filtered.obs['locus_status'].isin(valid_locus_status)].copy()

print(f"\nFiltered data: {adata_filtered.n_obs} cells retained ({adata_filtered.n_obs/adata.n_obs*100:.1f}%)")

### 4.3. Post-filtering QC

In [None]:
ddl.pl.stackedbarplot(
    adata_filtered,
    color="Sampletype",
    groupby="locus_status",
    xtick_rotation=0,
    figsize=(8, 6),
    xtick_fontsize=14
)
plt.legend(bbox_to_anchor=(1, 1), loc="upper left", frameon=False)
plt.tight_layout()
plt.show()

In [None]:
ddl.pl.stackedbarplot(
    adata_filtered,
    color="Sampletype",
    groupby="chain_status",
    xtick_rotation=0,
    figsize=(8, 6),
    xtick_fontsize=14
)
plt.legend(bbox_to_anchor=(1, 1), loc="upper left", frameon=False)
plt.tight_layout()
plt.show()

## 5. Clonal Expansion Analysis

### 5.1. Define Clonal Expansion Status

In [None]:
adata_filtered.obs['expansion'] = np.where(
    adata_filtered.obs['clone_id_size'] > 1, 
    'expansion',
    np.where(adata_filtered.obs['clone_id_size'] == 1, 'non-expansion', np.nan)
)

expansion_counts = adata_filtered.obs['expansion'].value_counts()
print(f"\nClonal expansion summary:")
print(expansion_counts)

### 5.2. UMAP Visualization

In [None]:
sc.set_figure_params(figsize=[7, 6])
sc.pl.umap(
    adata_filtered, 
    color=["annotate"], 
    na_in_legend=False, 
    palette="tab20", 
    edges_width=1, 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

### 5.3. Save Filtered Data

In [None]:
adata_filtered.write(OUTPUT_FILTERED)
print(f"Filtered data saved to: {OUTPUT_FILTERED}")

### 5.4. Subset Expanded Clones

In [None]:
adata_filtered_expanded = adata_filtered[adata_filtered.obs['expansion'] == "expansion"]
print(f"Expanded clones: {adata_filtered_expanded.n_obs} cells")

---
# Visualization Analysis

## A. GAD vs IIH Comparison

### A.1. Clonotype Networks

#### All clones by disease

In [None]:
sc.set_figure_params(figsize=[6, 6])
ddl.pl.clone_network(
    adata_filtered, 
    color=["Disease"], 
    ncols=1,
    legend_fontoutline=3, 
    edges_width=1, 
    palette="tab20", 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

#### Expanded clones by disease

In [None]:
sc.set_figure_params(figsize=[6, 6])
ddl.pl.clone_network(
    adata_filtered_expanded, 
    color=["Disease"], 
    ncols=1,
    legend_fontoutline=3, 
    edges_width=1, 
    palette="tab20", 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

#### All clones by clone size

In [None]:
sc.set_figure_params(figsize=[6, 6])
ddl.pl.clone_network(
    adata_filtered,
    color=["clone_id_size_max_10"],
    ncols=2,
    legend_fontoutline=3,
    edges_width=1,
    palette=CUSTOM_PALETTE,
    size=20,
    na_in_legend=False, 
    show=False
)
plt.tight_layout()
plt.show()

#### Expanded clones by clone size

In [None]:
sc.set_figure_params(figsize=[6, 6])
ddl.pl.clone_network(
    adata_filtered_expanded,
    color=["clone_id_size_max_10"],
    ncols=2,
    legend_fontoutline=3,
    edges_width=1,
    palette=CUSTOM_PALETTE,
    size=20,
    na_in_legend=False, 
    show=False
)
plt.tight_layout()
plt.show()

### A.2. UMAP Visualization

In [None]:
sc.set_figure_params(figsize=[7, 6])
sc.pl.umap(
    adata_filtered, 
    color=["Disease"], 
    na_in_legend=False, 
    palette="tab20", 
    edges_width=1, 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

### A.3. Clonal Expansion Bar Plots

#### All clones

In [None]:
bar_palette = [tab20(0), tab20(4), tab20(8), tab20(2), tab20(6)]
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=bar_palette)

ir.pl.clonal_expansion(
    adata_filtered, 
    target_col="clone_id_by_size", 
    groupby="Sampletype",
    normalize=False, 
    breakpoints=(1, 3, 5, 10), 
    figsize=(8, 4)
)
plt.legend(bbox_to_anchor=(1, 1), loc="upper left", frameon=False)
plt.tight_layout()
plt.show()

#### Expanded clones

In [None]:
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=bar_palette)

ir.pl.clonal_expansion(
    adata_filtered_expanded, 
    target_col="clone_id_by_size", 
    groupby="Sampletype",
    normalize=False, 
    breakpoints=(1, 3, 5, 10), 
    figsize=(8, 4)
)
plt.legend(bbox_to_anchor=(1, 1), loc="upper left", frameon=False)
plt.tight_layout()
plt.show()

---
## B. CSF vs PBMC Compartment Comparison

### B.1. Clonotype Networks

#### All clones by sample type

In [None]:
sc.set_figure_params(figsize=[6, 6])
ddl.pl.clone_network(
    adata_filtered, 
    color=["Sampletype"], 
    ncols=1,
    legend_fontoutline=3, 
    edges_width=1, 
    palette="tab20", 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

#### Expanded clones by sample type

In [None]:
sc.set_figure_params(figsize=[6, 6])
ddl.pl.clone_network(
    adata_filtered_expanded, 
    color=["Sampletype"], 
    ncols=1,
    legend_fontoutline=3, 
    edges_width=1, 
    palette="tab20", 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

#### All clones by clone size

In [None]:
sc.set_figure_params(figsize=[6, 6])
ddl.pl.clone_network(
    adata_filtered,
    color=["clone_id_size_max_10"],
    ncols=2,
    legend_fontoutline=3,
    edges_width=1,
    palette=CUSTOM_PALETTE,
    size=20,
    na_in_legend=False, 
    show=False
)
plt.tight_layout()
plt.show()

#### Expanded clones by clone size

In [None]:
sc.set_figure_params(figsize=[6, 6])
ddl.pl.clone_network(
    adata_filtered_expanded,
    color=["clone_id_size_max_10"],
    ncols=2,
    legend_fontoutline=3,
    edges_width=1,
    palette=CUSTOM_PALETTE,
    size=20,
    na_in_legend=False, 
    show=False
)
plt.tight_layout()
plt.show()

#### Clonotype networks colored by clone size (continuous)

In [None]:
sc.set_figure_params(figsize=[6, 6])
ddl.pl.clone_network(
    adata_filtered, 
    color='clone_id_size', 
    edges_width=1, 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

In [None]:
sc.set_figure_params(figsize=[6, 6])
ddl.pl.clone_network(
    adata_filtered_expanded, 
    color='clone_id_size', 
    edges_width=1, 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

### B.2. UMAP Visualizations

#### By sample type

In [None]:
sc.set_figure_params(figsize=[7, 6])
sc.pl.umap(
    adata_filtered, 
    color=["Sampletype"], 
    na_in_legend=False, 
    palette="tab20", 
    edges_width=1, 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

#### By clone size groups

In [None]:
sc.set_figure_params(figsize=[7, 6])
sc.pl.umap(
    adata_filtered, 
    color=["clone_id_size_max_10"], 
    groups=["1", "2", "3", "4", "5", "6", "8", "9", ">= 10"], 
    na_in_legend=False, 
    palette=CUSTOM_PALETTE, 
    edges_width=1, 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

#### By continuous clone size (all clones)

In [None]:
sc.set_figure_params(figsize=[7, 6])
sc.pl.umap(
    adata_filtered, 
    color=["clone_id_size"], 
    na_in_legend=False, 
    edges_width=1, 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

#### By continuous clone size (expanded clones only)

In [None]:
sc.set_figure_params(figsize=[7, 6])
sc.pl.umap(
    adata_filtered_expanded, 
    color=["clone_id_size"], 
    na_in_legend=False, 
    edges_width=1, 
    size=20, 
    show=False
)
plt.tight_layout()
plt.show()

### B.3. Clonal Expansion Bar Plots

#### All clones

In [None]:
bar_palette = [tab20(0), tab20(4), tab20(8), tab20(2), tab20(6)]
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=bar_palette)

ir.pl.clonal_expansion(
    adata_filtered, 
    target_col="clone_id_by_size", 
    groupby="Sampletype",
    normalize=False, 
    breakpoints=(1, 3, 5, 10), 
    figsize=(8, 4)
)
plt.legend(bbox_to_anchor=(1, 1), loc="upper left", frameon=False)
plt.tight_layout()
plt.show()

#### Expanded clones

In [None]:
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=bar_palette)

ir.pl.clonal_expansion(
    adata_filtered_expanded, 
    target_col="clone_id_by_size", 
    groupby="Sampletype",
    normalize=False, 
    breakpoints=(1, 3, 5, 10), 
    figsize=(8, 4)
)
plt.legend(bbox_to_anchor=(1, 1), loc="upper left", frameon=False)
plt.tight_layout()
plt.show()

### B.4. Individual Clonotype Network Analysis

#### Compute clonotype networks (minimum 2 cells per clone)

In [None]:
ddl.tl.transfer(adata_filtered, vdj, clone_key="clone_id_by_size")
ir.tl.clonotype_network(adata_filtered, clonotype_key="clone_id_by_size", min_cells=2)
ir.pl.clonotype_network(adata_filtered, color="clone_id_by_size", panel_size=(7, 7))

#### By sample

In [None]:
ir.pl.clonotype_network(
    adata_filtered, 
    color="sample", 
    base_size=20, 
    palette="tab20", 
    label_fontsize=10, 
    panel_size=(12, 8)
)

#### By disease (GAD vs IIH)

In [None]:
ir.pl.clonotype_network(
    adata_filtered, 
    color="Disease", 
    base_size=20, 
    palette="tab20", 
    label_fontsize=10, 
    panel_size=(12, 8)
)

#### Top 20 expanded clones by disease

In [None]:
ir.pl.group_abundance(
    adata_filtered, 
    groupby="clone_id_by_size", 
    target_col="Disease", 
    max_cols=20, 
    figsize=(8, 5)
)
plt.legend(bbox_to_anchor=(1, 1), loc="upper left", frameon=False)
plt.tight_layout()
plt.show()

#### By sample type (CSF vs PBMC)

In [None]:
ir.pl.clonotype_network(
    adata_filtered, 
    color="Sampletype", 
    base_size=20, 
    palette="tab20", 
    label_fontsize=10, 
    panel_size=(12, 8)
)

#### Top 20 expanded clones by sample type

In [None]:
ir.pl.group_abundance(
    adata_filtered, 
    groupby="clone_id_by_size", 
    target_col="Sampletype", 
    max_cols=20, 
    figsize=(8, 5)
)
plt.legend(bbox_to_anchor=(1, 1), loc="upper left", frameon=False)
plt.tight_layout()
plt.show()

---
## C. GAD-only Analysis (CSF vs PBMC)

Focused analysis on GAD patients comparing CSF and PBMC compartments.

In [None]:
# Load GAD-specific data
adata_gad = sc.read("seurat_object_combined_singlets_integrated_GAD_CSF_PBMC_Bcell_BCR_transfered_ADATA.h5ad")
vdj_gad = ddl.read_h5ddl("seurat_object_combined_singlets_integrated_GAD_CSF_PBMC_Bcell_BCR_transfered_VDJ.h5ddl")

print(f"Loaded GAD data: {adata_gad.n_obs} cells")
print("\nFollow sections 4-5 and A-B above for complete analysis")

---
## D. IIH-only Analysis (CSF vs PBMC)

Focused analysis on IIH patients comparing CSF and PBMC compartments.

In [None]:
# Load IIH-specific data
adata_iih = sc.read("seurat_object_combined_singlets_integrated_IIH_CSF_PBMC_Bcell_BCR_transfered_ADATA.h5ad")
vdj_iih = ddl.read_h5ddl("seurat_object_combined_singlets_integrated_IIH_CSF_PBMC_Bcell_BCR_transfered_VDJ.h5ddl")

print(f"Loaded IIH data: {adata_iih.n_obs} cells")
print("\nFollow sections 4-5 and A-B above for complete analysis")

---
## Session Information

In [None]:
sc.logging.print_versions()