# Lab 9C: CellRank 2 — RealTimeKernel

**Module 11** — Experimental Time Points & Optimal Transport

## Objectives
- Understand when RealTimeKernel is appropriate (time-course scRNA-seq)
- Build a transition matrix from experimental time points using optimal transport
- Infer terminal states and fate probabilities from real time data
- Understand connections to metabolic labeling approaches

## When to Use RealTimeKernel
- Time-course experiments with **measured experimental time points** (e.g., day 0, 3, 7, 14)
- Reprogramming or perturbation time series
- Metabolic labeling data (scSLAM-seq, sci-fate, scNT-seq)
- When cells from different time points are profiled separately

## Key Difference from Other Kernels
- VelocityKernel: direction from splicing dynamics (intrinsic)
- CytoTRACEKernel: direction from gene count complexity (intrinsic)
- PseudotimeKernel: direction from computational ordering (inferred)
- **RealTimeKernel: direction from actual experimental time (known)**

## Reference
- Weiler & Theis (2026) *Nature Protocols* — `notebooks/realtime/` in [CellRank Protocol](https://github.com/theislab/cellrank_protocol)
- Schiebinger et al. (2019) "Optimal-transport analysis" *Cell* 176:928-943

---

## 1. Setup

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scanpy as sc
import cellrank as cr

sc.settings.set_figure_params(dpi=100, facecolor='white')
cr.settings.verbosity = 2

print(f"scanpy:   {sc.__version__}")
print(f"cellrank: {cr.__version__}")

## 2. Understanding Time-Resolved scRNA-seq

In a time-course experiment, cells are collected at discrete time points.
The challenge: cells at time T are **destroyed** during sequencing, so we
don't directly observe which cell at time T becomes which cell at time T+1.

**Optimal transport** solves this by finding the most likely mapping between
cell populations at consecutive time points, minimizing the total "transport cost"
(distance in gene expression space).

```
Time 0     Time 1     Time 2     Time 3
  ○  ────→   ●  ────→   ◆  ────→   ★
  ○  ────→   ●  ────→   ◆  ────→   ★
  ○  ────→   ●  ────→   ◆  ────→   ★

  Optimal transport infers the arrows (transitions)
  between destructively sampled time points
```

In [None]:
# Simulate a time-course dataset for demonstration
# In practice, you'd load your own time-course data

# We'll use the pancreas data and treat cluster progression as "time"
# This is a simplification for teaching purposes
import scvelo as scv
adata = scv.datasets.pancreas()

# Preprocessing
sc.pp.filter_genes(adata, min_cells=10)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pp.pca(adata, n_comps=30)
sc.pp.neighbors(adata, n_neighbors=30)
sc.tl.umap(adata)

# Create synthetic time labels based on differentiation stage
# Map cell types to approximate developmental time
time_mapping = {
    'Ductal': 0, 'Ngn3 low EP': 1, 'Ngn3 high EP': 2,
    'Pre-endocrine': 3, 'Alpha': 4, 'Beta': 4,
    'Delta': 4, 'Epsilon': 4
}

adata.obs['experimental_time'] = adata.obs['clusters'].map(time_mapping).astype(float)
# Handle any unmapped clusters
adata.obs['experimental_time'] = adata.obs['experimental_time'].fillna(2)

print(f"Time points: {sorted(adata.obs['experimental_time'].unique())}")
print(f"Cells per time point:")
print(adata.obs['experimental_time'].value_counts().sort_index())

sc.pl.umap(adata, color='experimental_time', title='Experimental Time Points')

## 3. RealTimeKernel

The RealTimeKernel uses optimal transport to connect cells across time points.

In [None]:
from cellrank.kernels import RealTimeKernel

# Create the RealTimeKernel
# time_key: column in adata.obs with experimental time points
rtk = RealTimeKernel(adata, time_key='experimental_time')
rtk.compute_transition_matrix(
    threshold='auto',  # Automatically determine sparsity threshold
)

print(f"RealTimeKernel transition matrix: {rtk.transition_matrix.shape}")
print(f"Sparsity: {1 - rtk.transition_matrix.nnz / np.prod(rtk.transition_matrix.shape):.4f}")

## 4. GPCCA: Terminal States & Fate Probabilities

In [None]:
from cellrank.estimators import GPCCA

g = GPCCA(rtk)

# Schur decomposition
g.compute_schur(n_components=15)
g.plot_spectrum(real_only=True)

# Macrostates
g.compute_macrostates(n_states=5, cluster_key='clusters')
g.plot_macrostates(which='all', basis='umap', legend_loc='right', s=30)

print("\nMacrostates:")
print(g.macrostates.cat.categories.tolist())

In [None]:
# Terminal states and fate probabilities
g.set_terminal_states()
g.compute_fate_probabilities()

print("\nTerminal states:")
print(g.terminal_states.cat.categories.tolist())

g.plot_fate_probabilities(basis='umap', ncols=3, figsize=(15, 5))

## 5. Metabolic Labeling: A Special Case

Metabolic labeling experiments (scSLAM-seq, sci-fate, scNT-seq) introduce
nucleotide analogs to distinguish newly synthesized ("labeled") from pre-existing ("unlabeled") RNA.

This gives you **real temporal information within a single time point**:
- Unlabeled RNA = past state
- Labeled RNA = current state

CellRank can use metabolic labeling as input via the VelocityKernel
(when processed through tools like `dynamo` or `scVelo` metabolic mode).

See the CellRank Protocol `notebooks/metabolic_labeling/` for a complete example.

```
Standard RNA velocity:     Metabolic labeling:
  unspliced → spliced      unlabeled → labeled
  (intrinsic splicing)     (experimental time)
```

In [None]:
# NOTE: Metabolic labeling analysis requires specialized preprocessing
# This is a conceptual overview — see the CellRank Protocol for full code

# The workflow would be:
# 1. Process labeled/unlabeled counts (e.g., with dynamo or kb-python)
# 2. Compute velocity from labeling dynamics
# 3. Use VelocityKernel (which accepts these velocity estimates)
# 4. Proceed with GPCCA as usual

print("Metabolic labeling analysis workflow:")
print("1. Experimental labeling (4sU, s4U pulse)")
print("2. Sequencing + computational separation of labeled/unlabeled")
print("3. Velocity inference from labeling dynamics")
print("4. CellRank VelocityKernel on labeling-derived velocity")
print("5. Standard GPCCA analysis")
print("")
print("Key advantage: labeling-based velocity is often more reliable")
print("than splicing-based velocity (direct temporal measurement).")
print("")
print("Reference: CellRank Protocol notebooks/metabolic_labeling/inference.ipynb")

## 6. Exercises

### Exercise 9C.1: Time Point Granularity
What happens if you merge time points (e.g., combine time 3 and 4 into one)?
How does the resolution of time-point sampling affect RealTimeKernel results?

### Exercise 9C.2: Compare with Pseudotime-Based Approaches
For this dataset:
1. Run PseudotimeKernel (Lab 9B) using DPT
2. Compare fate probabilities with RealTimeKernel
3. Where do they agree/disagree?

### Exercise 9C.3: Advantages of Real Time
List 3 scenarios where RealTimeKernel would be more reliable than:
- VelocityKernel
- CytoTRACEKernel
- PseudotimeKernel

### Exercise 9C.4: Study Design
You're designing a time-course scRNA-seq experiment to study T cell activation.
1. How many time points would you choose? At what intervals?
2. How many cells per time point?
3. Would you add metabolic labeling? Why or why not?
4. Which CellRank kernel(s) would you plan to use?

---

## Key Takeaways

1. RealTimeKernel uses **known experimental time** — the strongest directional signal
2. Optimal transport infers cell-cell mappings between destructively sampled time points
3. Metabolic labeling provides within-time-point temporal information
4. Real time data is not always available — that's when other kernels shine
5. All kernels share the same downstream API (GPCCA, fate probabilities, driver genes)

**Next:** Lab 9D combines multiple kernels for multiview fate mapping.