# Tumor Microenvironment — scRNA-seq Gene Expression

Interactive heatmap tutorial using simulated single-cell RNA-seq data from a tumor microenvironment.

**Dataset**: 20 marker genes × 40 cells across 5 cell types (T cell, B cell, NK cell, Macrophage, Tumor).
Expression values are scaled 0–1. Gene markers are designed to be highly expressed in their corresponding cell type, mimicking real immune/tumor signatures.

This notebook demos every feature of `dream-heatmap`: colormaps, labels, annotations (categorical, bar, label, sparkline, box, violin), splits, sorting, clustering, dendrograms, zoom, selection callbacks, HTML export, and multi-panel concatenation.

In [None]:
import numpy as np
import pandas as pd
import dream_heatmap as dh

# --- Simulated tumor microenvironment scRNA-seq: 20 genes × 40 cells ---
rng = np.random.default_rng(42)

# Gene names (real cancer/immune markers)
genes = [
    "CD3D", "CD3E", "CD8A", "CD4", "FOXP3",     # T cell markers
    "CD19", "MS4A1", "CD79A",                     # B cell markers
    "NKG7", "GNLY", "KLRD1",                      # NK cell markers
    "CD68", "CD163", "CSF1R",                      # Macrophage markers
    "EPCAM", "KRT18", "MKI67", "TOP2A",          # Tumor markers
    "PDCD1", "CD274",                              # Checkpoint
]

# Cell IDs (40 cells, 5 types)
cell_ids = (
    [f"T_{i:02d}" for i in range(1, 11)]       # 10 T cells
    + [f"B_{i:02d}" for i in range(1, 7)]      # 6 B cells
    + [f"NK_{i:02d}" for i in range(1, 7)]     # 6 NK cells
    + [f"Mac_{i:02d}" for i in range(1, 9)]    # 8 macrophages
    + [f"Tum_{i:02d}" for i in range(1, 11)]   # 10 tumor cells
)

# Build expression matrix (scaled 0–1)
expr = rng.uniform(0.0, 0.25, (20, 40))

# Inject biological signal: marker genes high in their cell type
expr[0:5,   0:10]  = rng.uniform(0.7, 1.0, (5, 10))   # T cell markers → T cells
expr[5:8,  10:16]  = rng.uniform(0.7, 1.0, (3, 6))    # B cell markers → B cells
expr[8:11, 16:22]  = rng.uniform(0.7, 1.0, (3, 6))    # NK markers → NK cells
expr[11:14, 22:30] = rng.uniform(0.7, 1.0, (3, 8))    # Mac markers → Macrophages
expr[14:18, 30:40] = rng.uniform(0.7, 1.0, (4, 10))   # Tumor markers → Tumor cells

# Checkpoint: moderate expression
expr[18, 0:10]  = rng.uniform(0.3, 0.6, 10)   # PDCD1 moderate in T cells
expr[19, 30:40] = rng.uniform(0.4, 0.7, 10)   # CD274 (PD-L1) moderate in Tumor

# Add noise and clip to [0, 1]
expr += rng.normal(0, 0.03, expr.shape)
expr = np.clip(expr, 0.0, 1.0)

matrix = pd.DataFrame(expr, index=genes, columns=cell_ids)

# Column metadata
cell_types = (
    ["T cell"] * 10 + ["B cell"] * 6 + ["NK cell"] * 6
    + ["Macrophage"] * 8 + ["Tumor"] * 10
)
clusters = (
    [1]*5 + [2]*5 + [3]*3 + [4]*3 + [5]*3 + [6]*3
    + [7]*4 + [8]*4 + [9]*5 + [10]*5
)
col_meta = pd.DataFrame({
    "cell_type": cell_types,
    "cluster": clusters,
}, index=cell_ids)

# Row metadata
gene_groups = (
    ["T cell markers"] * 5 + ["B cell markers"] * 3
    + ["NK markers"] * 3 + ["Macrophage markers"] * 3
    + ["Tumor markers"] * 4 + ["Checkpoint"] * 2
)
row_meta = pd.DataFrame({
    "gene_group": gene_groups,
    "mean_expr": matrix.mean(axis=1).values,
}, index=genes)

# Color palettes
cell_type_colors = {
    "T cell": "#e41a1c", "B cell": "#377eb8", "NK cell": "#4daf4a",
    "Macrophage": "#984ea3", "Tumor": "#ff7f00",
}
gene_group_colors = {
    "T cell markers": "#e41a1c", "B cell markers": "#377eb8",
    "NK markers": "#4daf4a", "Macrophage markers": "#984ea3",
    "Tumor markers": "#ff7f00", "Checkpoint": "#a65628",
}

print(f"Matrix: {matrix.shape[0]} genes \u00d7 {matrix.shape[1]} cells")
print(f"Gene groups: {', '.join(row_meta['gene_group'].unique())}")
print(f"Cell types: {', '.join(col_meta['cell_type'].unique())}")
print(f"Expression range: [{matrix.values.min():.3f}, {matrix.values.max():.3f}]")

---
## 1. Basic Heatmap

Pass a DataFrame, call `.show()`. That's it.

- **Hover** over cells to see gene, cell, and expression value
- **Crosshair** lines track your cursor
- **Color bar** on the right shows the value scale
- **Toolbar** (top-right on hover): zoom, reset, download PNG, toggle crosshair
- **Drag** a rectangle to select cells

In [None]:
hm = dh.Heatmap(matrix)
hm.show()

---
## 2. Colormaps

Use any [matplotlib colormap](https://matplotlib.org/stable/gallery/color/colormap_reference.html). For 0–1 expression data, a sequential colormap like `YlOrRd` works well. Control the range with `vmin`/`vmax`.

In [None]:
hm = dh.Heatmap(matrix)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)
hm.show()

---
## 3. Labels

Control row and column label visibility: `'all'` (show every label), `'auto'` (skip overlapping), or `'none'` (hide).

With 20 genes, `rows="all"` fits nicely. We hide cell IDs with `cols="none"` to reduce clutter.

In [None]:
hm = dh.Heatmap(matrix)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)
hm.set_label_display(rows="all", cols="none")
hm.show()

---
## 4. Annotations: Categorical + Bar

Add up to 3 annotation tracks per edge (`left`, `right`, `top`, `bottom`).

- **CategoricalAnnotation** — colored blocks for groups (with legend)
- **BarChartAnnotation** — numeric bar chart per row/column

Hover over annotation cells to see category names or values.

In [None]:
hm = dh.Heatmap(matrix)
hm.set_row_metadata(row_meta)
hm.set_col_metadata(col_meta)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)

# Left: gene group (categorical)
hm.add_annotation("left", dh.CategoricalAnnotation(
    "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
))
# Right: mean expression (bar chart)
hm.add_annotation("right", dh.BarChartAnnotation(
    "Mean Expr", row_meta["mean_expr"], color="#ff7f00",
))
# Top: cell type (categorical)
hm.add_annotation("top", dh.CategoricalAnnotation(
    "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
))

hm.show()

---
## 4b. Dual Column Annotations + Color Bar Title

Two categorical annotations on the same edge, plus a named color bar. The legend panel below shows the color bar and both legends side-by-side in a horizontal flow layout.

In [None]:
hm = dh.Heatmap(matrix)
hm.set_col_metadata(col_meta)
hm.set_colormap("YlOrRd", vmin=0, vmax=1, color_bar_title="Expression")
hm.add_annotation("top", dh.CategoricalAnnotation(
    "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
))
hm.add_annotation("top", dh.CategoricalAnnotation(
    "Cluster", col_meta["cluster"].astype(str),
))
hm.set_label_display(rows="all", cols="none")
hm.show()

---
## 5. Annotations: Label

**LabelAnnotation** renders text alongside the heatmap. When combined with `set_label_display(rows="none")`, the labels come from the annotation track instead of the built-in axis labels.

This is useful when you want labels on the left side next to a categorical track.

In [None]:
hm = dh.Heatmap(matrix)
hm.set_row_metadata(row_meta)
hm.set_col_metadata(col_meta)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)
hm.set_label_display(rows="none", cols="none")

# Left: categorical + label annotation (replaces built-in row labels)
hm.add_annotation("left", dh.CategoricalAnnotation(
    "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
))
hm.add_annotation("left", dh.LabelAnnotation("Gene"))

# Top: cell type
hm.add_annotation("top", dh.CategoricalAnnotation(
    "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
))

hm.show()

---
## 6. Annotations: Sparkline, BoxPlot, Violin

Mini-graph annotations visualize per-gene distributions and profiles:

- **SparklineAnnotation** — line chart showing mean expression across cell types
- **BoxPlotAnnotation** — five-number summary of expression across all cells
- **ViolinPlotAnnotation** — density shape showing expression distribution

All three shown on the right edge (maximum 3 per edge).

In [None]:
# Sparkline data: mean expression per gene per cell type (20 × 5)
cell_type_order = ["T cell", "B cell", "NK cell", "Macrophage", "Tumor"]
sparkline_data = pd.DataFrame(index=genes, columns=cell_type_order, dtype=float)
for ct in cell_type_order:
    ct_cells = col_meta[col_meta["cell_type"] == ct].index
    sparkline_data[ct] = matrix[ct_cells].mean(axis=1)

# Distribution data: full matrix (each row = distribution across all 40 cells)
dist_data = matrix.copy()

hm = dh.Heatmap(matrix)
hm.set_row_metadata(row_meta)
hm.set_col_metadata(col_meta)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)
hm.set_label_display(rows="all", cols="none")

# Left: gene group
hm.add_annotation("left", dh.CategoricalAnnotation(
    "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
))

# Right: sparkline + boxplot + violin (max 3 per edge)
hm.add_annotation("right", dh.SparklineAnnotation(
    "Cell Type Profile", sparkline_data, color="#1b9e77", track_width=60,
))
hm.add_annotation("right", dh.BoxPlotAnnotation(
    "Expression Dist.", dist_data, color="#7570b3", track_width=45,
))
hm.add_annotation("right", dh.ViolinPlotAnnotation(
    "Density", dist_data, color="#d95f02", track_width=45,
))

hm.show()

---
## 7. Row Density Annotation

A common pattern in genomics: annotate each gene with its expression density across cells.

This combines a categorical track and bar chart on the left, with a violin plot on the right.

In [None]:
hm = dh.Heatmap(matrix)
hm.set_row_metadata(row_meta)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)
hm.set_label_display(rows="all", cols="none")

# Left: gene group + mean expression bar
hm.add_annotation("left", dh.CategoricalAnnotation(
    "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
))
hm.add_annotation("left", dh.BarChartAnnotation(
    "Mean Expr", row_meta["mean_expr"], color="#ff7f00",
))

# Right: violin showing per-gene expression density across all cells
hm.add_annotation("right", dh.ViolinPlotAnnotation(
    "Expression Density", matrix, color="#d95f02", track_width=50,
))

hm.show()

---
## 8. Splits by Metadata

Group rows or columns with whitespace gaps. Split by a metadata column to visually separate gene groups and cell types.

In [None]:
hm = dh.Heatmap(matrix)
hm.set_row_metadata(row_meta)
hm.set_col_metadata(col_meta)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)

hm.split_rows(by="gene_group")
hm.split_cols(by="cell_type")

hm.add_annotation("left", dh.CategoricalAnnotation(
    "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
))
hm.add_annotation("top", dh.CategoricalAnnotation(
    "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
))
hm.set_label_display(rows="all", cols="none")

hm.show()

---
## 8b. Explicit Splits

Instead of metadata, provide explicit `{group_name: [ids]}` assignments.

Here we split genes into Immune, Myeloid, and Tumor & Checkpoint groups.

In [None]:
hm = dh.Heatmap(matrix)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)
hm.set_label_display(rows="all", cols="none")

hm.split_rows(assignments={
    "Immune": genes[:11],              # T, B, NK markers
    "Myeloid": genes[11:14],           # Macrophage markers
    "Tumor & Checkpoint": genes[14:],  # EPCAM, KRT18, MKI67, TOP2A, PDCD1, CD274
})

hm.show()

---
## 9. Sorting by Metadata

Sort rows (or columns) within each split group by a metadata column.

Here, genes are sorted by mean expression (highest first) within each group, and cells by cluster ID.

In [None]:
hm = dh.Heatmap(matrix)
hm.set_row_metadata(row_meta)
hm.set_col_metadata(col_meta)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)

hm.split_rows(by="gene_group")
hm.split_cols(by="cell_type")
hm.order_rows(by="mean_expr", ascending=False)
hm.order_cols(by="cluster")

hm.add_annotation("left", dh.CategoricalAnnotation(
    "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
))
hm.add_annotation("top", dh.CategoricalAnnotation(
    "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
))
hm.add_annotation("right", dh.BarChartAnnotation(
    "Mean Expr", row_meta["mean_expr"], color="#ff7f00",
))
hm.set_label_display(rows="all", cols="none")

hm.show()

---
## 10. Clustering & Dendrograms

Hierarchical clustering reorders rows/columns so similar items sit adjacent. Dendrograms appear on the edges.

- **Click** a dendrogram branch to select its subtree
- Clustering runs *within* each split group independently
- Supports different linkage methods (`ward`, `average`, `complete`) and metrics (`euclidean`, `correlation`)

In [None]:
hm = dh.Heatmap(matrix)
hm.set_row_metadata(row_meta)
hm.set_col_metadata(col_meta)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)

# Cluster cols within cell type splits; cluster all rows globally
hm.split_cols(by="cell_type")
hm.cluster_rows(method="ward", metric="euclidean")
hm.cluster_cols(method="average", metric="correlation")

hm.add_annotation("left", dh.CategoricalAnnotation(
    "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
))
hm.add_annotation("top", dh.CategoricalAnnotation(
    "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
))
hm.add_annotation("right", dh.BarChartAnnotation(
    "Mean Expr", row_meta["mean_expr"], color="#ff7f00",
))
hm.set_label_display(rows="all", cols="none")

hm.show()

---
## 11. Zoom

1. **Drag** a rectangle to select a region
2. Press **`z`** to zoom into that region
3. **Double-click** to reset the view

You can also use the toolbar buttons (top-right on hover).

In [None]:
hm = dh.Heatmap(matrix)
hm.set_colormap("YlOrRd", vmin=0, vmax=1)
hm.cluster_rows()
hm.cluster_cols()
hm.set_label_display(rows="all", cols="all")
hm.show()

---
## 12. Selection & Callbacks

Register a Python function that fires whenever you drag a rectangle. The callback receives the selected row and column IDs.

In [None]:
hm_sel = dh.Heatmap(matrix)
hm_sel.set_row_metadata(row_meta)
hm_sel.set_col_metadata(col_meta)
hm_sel.set_colormap("YlOrRd", vmin=0, vmax=1)

def on_selection(row_ids, col_ids):
    sub = matrix.loc[row_ids, col_ids]
    cell_types_selected = col_meta.loc[col_ids, "cell_type"].unique()
    print(f"Selected {len(row_ids)} genes \u00d7 {len(col_ids)} cells")
    print(f"  Genes: {', '.join(row_ids)}")
    print(f"  Cell types: {', '.join(cell_types_selected)}")
    print(f"  Mean expression: {sub.values.mean():.3f}")

hm_sel.on_select(on_selection)
hm_sel.set_label_display(rows="all", cols="auto")
hm_sel.show()

In [None]:
# Drag a rectangle above, then run this cell to inspect the selection
hm_sel.selection

---
## 13. Builder Pattern

All methods return `self`, so you can chain everything in one expression.

In [None]:
(
    dh.Heatmap(matrix)
    .set_row_metadata(row_meta)
    .set_col_metadata(col_meta)
    .set_colormap("YlOrRd", vmin=0, vmax=1)
    .split_rows(by="gene_group")
    .split_cols(by="cell_type")
    .order_rows(by="mean_expr", ascending=False)
    .add_annotation("left", dh.CategoricalAnnotation(
        "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
    ))
    .add_annotation("top", dh.CategoricalAnnotation(
        "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
    ))
    .add_annotation("right", dh.BarChartAnnotation(
        "Mean Expr", row_meta["mean_expr"], color="#ff7f00",
    ))
    .set_label_display(rows="all", cols="none")
    .show()
)

---
## 14. HTML Export

Export a fully self-contained HTML file — no Python required to view it. Share with collaborators who don't have Jupyter.

In [None]:
hm_export = (
    dh.Heatmap(matrix)
    .set_row_metadata(row_meta)
    .set_col_metadata(col_meta)
    .set_colormap("YlOrRd", vmin=0, vmax=1)
    .split_rows(by="gene_group")
    .add_annotation("left", dh.CategoricalAnnotation(
        "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
    ))
    .add_annotation("top", dh.CategoricalAnnotation(
        "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
    ))
    .set_label_display(rows="all", cols="all")
)
hm_export.to_html("tumor_microenvironment.html", title="Tumor Microenvironment \u2014 Gene Expression")
print("Exported to tumor_microenvironment.html \u2014 open in any browser!")

---
## 15. Multi-Panel Concatenation

Combine multiple heatmaps that share an axis:
- **`vconcat`**: stack vertically (shared columns)
- **`hconcat`**: place side by side (shared rows)

In [None]:
# Vertical concatenation: Immune genes + Tumor/Myeloid genes (shared 40 cells)
immune_genes = genes[:11]
other_genes = genes[11:]

hm_immune = (
    dh.Heatmap(matrix.loc[immune_genes])
    .set_row_metadata(row_meta.loc[immune_genes])
    .set_col_metadata(col_meta)
    .set_colormap("YlOrRd", vmin=0, vmax=1)
    .add_annotation("left", dh.CategoricalAnnotation(
        "Gene Group", row_meta.loc[immune_genes, "gene_group"],
        colors=gene_group_colors,
    ))
    .set_label_display(rows="all", cols="none")
)

hm_other = (
    dh.Heatmap(matrix.loc[other_genes])
    .set_row_metadata(row_meta.loc[other_genes])
    .set_col_metadata(col_meta)
    .set_colormap("YlOrRd", vmin=0, vmax=1)
    .add_annotation("left", dh.CategoricalAnnotation(
        "Gene Group", row_meta.loc[other_genes, "gene_group"],
        colors=gene_group_colors,
    ))
    .add_annotation("top", dh.CategoricalAnnotation(
        "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
    ))
    .set_label_display(rows="all", cols="none")
)

dh.Heatmap.vconcat(hm_immune, hm_other).show()

In [None]:
# Horizontal concatenation: Lymphocytes + Tumor cells (shared 20 genes)
lymph_cells = [c for c in cell_ids if c.startswith(("T_", "B_"))]
tumor_cells = [c for c in cell_ids if c.startswith("Tum_")]

hm_lymph = (
    dh.Heatmap(matrix[lymph_cells])
    .set_col_metadata(col_meta.loc[lymph_cells])
    .set_colormap("YlOrRd", vmin=0, vmax=1)
    .add_annotation("top", dh.CategoricalAnnotation(
        "Cell Type", col_meta.loc[lymph_cells, "cell_type"],
        colors=cell_type_colors,
    ))
    .set_label_display(rows="all", cols="none")
)

hm_tumor = (
    dh.Heatmap(matrix[tumor_cells])
    .set_col_metadata(col_meta.loc[tumor_cells])
    .set_colormap("YlOrRd", vmin=0, vmax=1)
    .add_annotation("top", dh.CategoricalAnnotation(
        "Cell Type", col_meta.loc[tumor_cells, "cell_type"],
        colors=cell_type_colors,
    ))
    .set_label_display(rows="none", cols="none")
)

dh.Heatmap.hconcat(hm_lymph, hm_tumor).show()

---
## 16. Kitchen Sink

Everything combined: metadata, splits, within-group clustering, dendrograms, categorical + bar annotations, labels.

In [None]:
(
    dh.Heatmap(matrix)
    .set_row_metadata(row_meta)
    .set_col_metadata(col_meta)
    .set_colormap("YlOrRd", vmin=0, vmax=1)
    .split_rows(by="gene_group")
    .split_cols(by="cell_type")
    .cluster_rows(method="ward", metric="euclidean")
    .cluster_cols(method="average", metric="correlation")
    .add_annotation("left", dh.CategoricalAnnotation(
        "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
    ))
    .add_annotation("top", dh.CategoricalAnnotation(
        "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
    ))
    .add_annotation("right", dh.BarChartAnnotation(
        "Mean Expr", row_meta["mean_expr"], color="#ff7f00",
    ))
    .set_label_display(rows="all", cols="auto")
    .show()
)

---
## 17. Sandbox

Uncomment lines below to explore different configurations.

In [None]:
# Your turn! Modify and re-run this cell to explore.
(
    dh.Heatmap(matrix)
    .set_row_metadata(row_meta)
    .set_col_metadata(col_meta)
    .set_colormap("viridis", vmin=0, vmax=1)  # Try: "RdBu_r", "inferno", "coolwarm", "YlGnBu"
    # .split_rows(by="gene_group")
    # .split_cols(by="cell_type")
    # .cluster_rows(method="ward")
    # .cluster_cols(method="average", metric="correlation")
    # .order_rows(by="mean_expr", ascending=False)
    # .add_annotation("left", dh.CategoricalAnnotation(
    #     "Gene Group", row_meta["gene_group"], colors=gene_group_colors,
    # ))
    # .add_annotation("top", dh.CategoricalAnnotation(
    #     "Cell Type", col_meta["cell_type"], colors=cell_type_colors,
    # ))
    # .add_annotation("right", dh.BarChartAnnotation(
    #     "Mean Expr", row_meta["mean_expr"], color="#ff7f00",
    # ))
    .set_label_display(rows="all", cols="auto")
    .show()
)

---
## What's Next

This tutorial covered every feature of `dream-heatmap` with simulated data.

*Next tutorial: HPV-positive vs HPV-negative head and neck squamous cell carcinoma (HNSCC) — a real-world dataset comparing immune infiltration and tumor gene signatures between HPV+ and HPV− patients.*