# Lab 9A: CellRank 2 — CytoTRACEKernel

**Module 9** — Developmental Potential-Based Fate Mapping

## Objectives
- Understand CytoTRACE as a measure of developmental potential
- Build a CytoTRACEKernel transition matrix without RNA velocity
- Identify terminal states and compute fate probabilities using CytoTRACEKernel
- Compare results with VelocityKernel from Lab 8

## When to Use CytoTRACEKernel
- When RNA velocity data is **not available** (no spliced/unspliced counts)
- When velocity quality is poor (noisy, inconsistent arrows)
- As a **complementary** view to velocity-based analysis
- When studying development/differentiation (where gene count complexity decreases)

## Reference
- Weiler & Theis (2026) *Nature Protocols* — `notebooks/cytotrace/` in [CellRank Protocol](https://github.com/theislab/cellrank_protocol)
- CytoTRACE: Gulati et al. (2020) *Science* 367:405-411

---

## 1. Setup & Data Loading

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scanpy as sc
import cellrank as cr

sc.settings.set_figure_params(dpi=100, facecolor='white')
cr.settings.verbosity = 2

print(f"scanpy:   {sc.__version__}")
print(f"cellrank: {cr.__version__}")

In [None]:
# Load human bone marrow data
# This is a hematopoiesis dataset with multiple differentiation lineages
# The CellRank Protocol uses this as the primary CytoTRACEKernel example
import scvelo as scv
adata = scv.datasets.bonemarrow()
print(f"Cells: {adata.n_obs}, Genes: {adata.n_vars}")
print(f"Cell types: {adata.obs['clusters'].unique().tolist()}")

In [None]:
# Basic preprocessing
sc.pp.filter_genes(adata, min_cells=10)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pp.pca(adata, n_comps=30)
sc.pp.neighbors(adata, n_neighbors=30)
sc.tl.umap(adata)

sc.pl.umap(adata, color='clusters', title='Bone Marrow Cell Types')

## 2. Understanding CytoTRACE

**CytoTRACE** estimates developmental potential based on the number of expressed genes per cell.
The key insight: **less differentiated (more stem-like) cells express more genes**.

CytoTRACEKernel uses this score to infer directionality:
- High CytoTRACE score → more stem-like (early)
- Low CytoTRACE score → more differentiated (late)
- Transitions flow from high → low CytoTRACE

In [None]:
from cellrank.kernels import CytoTRACEKernel

# Create the CytoTRACEKernel
# This computes CytoTRACE scores internally and builds a transition matrix
ctk = CytoTRACEKernel(adata)
ctk.compute_transition_matrix()

print(f"CytoTRACEKernel transition matrix: {ctk.transition_matrix.shape}")
print(f"\nCytoTRACE scores stored in adata.obs['ct_pseudotime']")

In [None]:
# Visualize CytoTRACE scores
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

sc.pl.umap(adata, color='ct_pseudotime', ax=axes[0], show=False,
           title='CytoTRACE Pseudotime\n(0=stem-like, 1=differentiated)')
sc.pl.umap(adata, color='clusters', ax=axes[1], show=False,
           title='Cell Types')

plt.tight_layout()
plt.show()

# QUESTION: Which clusters appear most stem-like? Most differentiated?
# Does this match your biological expectation?

## 3. GPCCA: Terminal States from CytoTRACEKernel

In [None]:
from cellrank.estimators import GPCCA

# Create GPCCA estimator from CytoTRACEKernel
g = GPCCA(ctk)

# Compute Schur decomposition
g.compute_schur(n_components=20)
g.plot_spectrum(real_only=True)
print("Look for an eigenvalue gap to determine n_states")

In [None]:
# Compute macrostates — adjust n_states based on eigenvalue gap
g.compute_macrostates(n_states=6, cluster_key='clusters')
g.plot_macrostates(which='all', basis='umap', legend_loc='right', s=30)

print("\nMacrostates:")
print(g.macrostates.cat.categories.tolist())

In [None]:
# Set terminal states
g.set_terminal_states()
g.plot_macrostates(which='terminal', basis='umap', legend_loc='right', s=30)

print("\nTerminal states identified:")
print(g.terminal_states.cat.categories.tolist())

## 4. Fate Probabilities from CytoTRACEKernel

In [None]:
# Compute fate probabilities
g.compute_fate_probabilities()

# Visualize
g.plot_fate_probabilities(basis='umap', ncols=3, figsize=(15, 5))

print(f"Fate probabilities shape: {g.fate_probabilities.shape}")
print(f"Lineages: {g.fate_probabilities.names.tolist()}")

In [None]:
# Fate probabilities by cluster
g.plot_fate_probabilities(
    mode='violin',
    cluster_key='clusters',
    figsize=(14, 4)
)

## 5. Initial States (Optional)

CytoTRACEKernel can also identify **initial states** (the stem-like populations)
by analyzing the transition matrix in reverse.

In [None]:
# Compute initial states
g.compute_macrostates(n_states=6, cluster_key='clusters')
g.set_initial_states()

print("\nInitial states (stem-like populations):")
if g.initial_states is not None:
    print(g.initial_states.cat.categories.tolist())
    g.plot_macrostates(which='initial', basis='umap', legend_loc='right', s=30)

## 6. Exercises

### Exercise 9A.1: CytoTRACE Score Interpretation
Plot the CytoTRACE pseudotime distribution per cluster as a violin plot.
Which clusters have the highest (most stem-like) scores? Do these correspond
to known progenitor populations in hematopoiesis?

### Exercise 9A.2: Compare with VelocityKernel
If your data has velocity information:
1. Run VelocityKernel (as in Lab 8)
2. Compare terminal states: do both kernels identify the same endpoints?
3. Compare fate probabilities: are they consistent?
4. Where do they disagree? What might explain the differences?

### Exercise 9A.3: Number of Macrostates
Try `n_states=3, 5, 8, 10`. How does the macrostate decomposition change?
At what point do macrostates start splitting biologically coherent populations?

### Exercise 9A.4: No-Velocity Scenario
Imagine you received data without spliced/unspliced layers.
CytoTRACEKernel would be your primary directionality estimate.
What additional information could you use to validate the trajectory direction?

---

## Key Takeaways

1. CytoTRACEKernel provides directionality **without RNA velocity**
2. It leverages the principle that stem cells express more genes than differentiated cells
3. It works best for **developmental/differentiation** trajectories
4. It may be less suitable for cyclic processes or dedifferentiation
5. Always compare with other kernels when possible

**Next:** Lab 9B explores PseudotimeKernel.