# Lab 9B: CellRank 2 — PseudotimeKernel

**Module 10** — Pseudotime-Driven Fate Mapping

## Objectives
- Use any pseudotime method as input to CellRank's PseudotimeKernel
- Build a transition matrix from diffusion pseudotime
- Compare automatic vs manual terminal state definition
- Compare PseudotimeKernel results with CytoTRACEKernel (Lab 9A)

## Key Idea
The PseudotimeKernel wraps **any** pseudotime ordering (DPT, Monocle, Slingshot, etc.)
into CellRank's Markov chain framework. This lets you compute fate probabilities,
terminal states, and driver genes from pseudotime — using the same API as all other kernels.

## Reference
- Weiler & Theis (2026) *Nature Protocols* — `notebooks/pseudotime/` in [CellRank Protocol](https://github.com/theislab/cellrank_protocol)

---

## 1. Setup & Preprocessing

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scanpy as sc
import scvelo as scv
import cellrank as cr

sc.settings.set_figure_params(dpi=100, facecolor='white')
cr.settings.verbosity = 2

# Load bone marrow hematopoiesis data (same as Lab 9A for comparison)
adata = scv.datasets.bonemarrow()

# Preprocessing
sc.pp.filter_genes(adata, min_cells=10)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pp.pca(adata, n_comps=30)
sc.pp.neighbors(adata, n_neighbors=30)
sc.tl.umap(adata)

print(f"Cells: {adata.n_obs}, Genes: {adata.n_vars}")
sc.pl.umap(adata, color='clusters', title='Bone Marrow Cell Types')

## 2. Compute Pseudotime (Diffusion Pseudotime)

First, we compute DPT as our pseudotime. You could substitute Monocle 3
pseudotime, Slingshot pseudotime, or any other ordering.

In [None]:
# Compute diffusion map
sc.tl.diffmap(adata, n_comps=15)

# Set root cell — choose a cell in the HSC/progenitor cluster
# Find the most stem-like cell in a progenitor cluster
root_mask = adata.obs['clusters'].isin(['HSC', 'Progenitors', 'HSPC'])
if root_mask.sum() > 0:
    root_idx = np.where(root_mask)[0][0]
else:
    root_idx = 0  # Fallback

adata.uns['iroot'] = root_idx
sc.tl.dpt(adata)

print(f"Root cell index: {root_idx}")
print(f"Root cell type: {adata.obs['clusters'].iloc[root_idx]}")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
sc.pl.umap(adata, color='dpt_pseudotime', ax=axes[0], show=False,
           title='Diffusion Pseudotime')
sc.pl.umap(adata, color='clusters', ax=axes[1], show=False,
           title='Cell Types')
plt.tight_layout()
plt.show()

## 3. PseudotimeKernel

The PseudotimeKernel takes any pseudotime stored in `adata.obs` and uses it
to bias the cell-cell transition matrix in the direction of increasing pseudotime.

**Key parameter: `time_key`** — the column in `adata.obs` containing pseudotime values.

In [None]:
from cellrank.kernels import PseudotimeKernel

# Create PseudotimeKernel using DPT
ptk = PseudotimeKernel(adata, time_key='dpt_pseudotime')
ptk.compute_transition_matrix()

print(f"PseudotimeKernel transition matrix: {ptk.transition_matrix.shape}")
print("\nThe transition matrix biases transitions toward increasing pseudotime.")

## 4. GPCCA: Terminal States & Fate Probabilities

In [None]:
from cellrank.estimators import GPCCA

g = GPCCA(ptk)

# Schur decomposition
g.compute_schur(n_components=20)
g.plot_spectrum(real_only=True)

# Macrostates
g.compute_macrostates(n_states=6, cluster_key='clusters')
g.plot_macrostates(which='all', basis='umap', legend_loc='right', s=30)

print("\nMacrostates:")
print(g.macrostates.cat.categories.tolist())

In [None]:
# Automatic terminal state selection
g.set_terminal_states()
print("\nAutomatic terminal states:")
print(g.terminal_states.cat.categories.tolist())

g.plot_macrostates(which='terminal', basis='umap', legend_loc='right', s=30)

In [None]:
# Fate probabilities
g.compute_fate_probabilities()
g.plot_fate_probabilities(basis='umap', ncols=3, figsize=(15, 5))

print(f"\nFate probabilities: {g.fate_probabilities.shape}")
print(f"Lineages: {g.fate_probabilities.names.tolist()}")

## 5. Manual Terminal State Selection

Sometimes, you know which cell types should be terminal based on biology.
CellRank allows **manual** terminal state specification.

In [None]:
# Manual terminal state selection example
# Recompute macrostates first
g2 = GPCCA(ptk)
g2.compute_schur(n_components=20)
g2.compute_macrostates(n_states=8, cluster_key='clusters')

# Print all macrostates to choose from
print("Available macrostates:")
for i, ms in enumerate(g2.macrostates.cat.categories):
    print(f"  {i}: {ms}")

# Set specific macrostates as terminal
# Adjust these names to match your macrostates!
# g2.set_terminal_states(states=['Erythroid', 'Monocytes', 'B cells'])
print("\nUncomment the line above and set your terminal states manually.")

## 6. Compare PseudotimeKernel vs CytoTRACEKernel

A key strength of CellRank 2 is the ability to compare results from different kernels.

In [None]:
from cellrank.kernels import CytoTRACEKernel

# Run CytoTRACEKernel on the same data
ctk = CytoTRACEKernel(adata)
ctk.compute_transition_matrix()

g_ct = GPCCA(ctk)
g_ct.compute_schur(n_components=20)
g_ct.compute_macrostates(n_states=6, cluster_key='clusters')
g_ct.set_terminal_states()
g_ct.compute_fate_probabilities()

print("CytoTRACEKernel terminal states:")
print(g_ct.terminal_states.cat.categories.tolist())
print("\nPseudotimeKernel terminal states:")
print(g.terminal_states.cat.categories.tolist())

In [None]:
# Visual comparison of fate probabilities from both kernels
# Compare shared lineages
ct_lineages = set(g_ct.fate_probabilities.names.tolist())
pt_lineages = set(g.fate_probabilities.names.tolist())
shared = ct_lineages & pt_lineages

print(f"CytoTRACE lineages: {ct_lineages}")
print(f"Pseudotime lineages: {pt_lineages}")
print(f"Shared lineages: {shared}")

if shared:
    lineage = list(shared)[0]
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # CytoTRACEKernel fate
    adata.obs['fate_cytotrace'] = g_ct.fate_probabilities[lineage].X.flatten()
    sc.pl.umap(adata, color='fate_cytotrace', ax=axes[0], show=False,
               title=f'CytoTRACEKernel: {lineage} fate', vmin=0, vmax=1)
    
    # PseudotimeKernel fate
    adata.obs['fate_pseudotime'] = g.fate_probabilities[lineage].X.flatten()
    sc.pl.umap(adata, color='fate_pseudotime', ax=axes[1], show=False,
               title=f'PseudotimeKernel: {lineage} fate', vmin=0, vmax=1)
    
    plt.tight_layout()
    plt.show()
    
    # Correlation
    from scipy.stats import spearmanr
    corr, pval = spearmanr(
        adata.obs['fate_cytotrace'],
        adata.obs['fate_pseudotime']
    )
    print(f"\nSpearman correlation between kernel fate probabilities: {corr:.3f} (p={pval:.2e})")

## 7. Exercises

### Exercise 9B.1: Different Pseudotime Inputs
Instead of DPT, try:
1. CytoTRACE pseudotime (`ct_pseudotime` from Lab 9A) as input to PseudotimeKernel
2. A random ordering — what happens to the fate probabilities?
This demonstrates that PseudotimeKernel trusts whatever pseudotime you give it.

### Exercise 9B.2: Root Cell Sensitivity
Change the root cell for DPT computation:
1. Choose a different cell in the same progenitor cluster
2. Choose a cell in a terminal cluster
How do PseudotimeKernel fate probabilities change?

### Exercise 9B.3: Kernel Consistency
For all terminal states identified by both kernels:
1. Compute Spearman correlation of fate probabilities
2. Plot scatter: CytoTRACE fate vs Pseudotime fate
3. Where are the biggest disagreements? Can you explain them biologically?

### Exercise 9B.4: Manual vs Automatic Terminal States
Compare results using:
1. Automatic terminal state selection (`g.set_terminal_states()`)
2. Manual selection based on known biology
When might manual selection be preferable?

---

## Key Takeaways

1. PseudotimeKernel wraps **any** pseudotime into CellRank's Markov chain framework
2. It's only as good as the input pseudotime — garbage in, garbage out
3. Different pseudotime methods can give different results through PseudotimeKernel
4. Comparing PseudotimeKernel vs CytoTRACEKernel reveals how much directionality depends on the data view
5. Manual terminal state selection is valuable when you have strong biological priors

**Next:** Lab 9C explores RealTimeKernel for time-course experiments.