# HER2+ Breast Cancer Tumor Microenvironment

This notebook explores **single-cell protein expression data** from HER2+ breast cancer
tumor biopsies (CyTOF / IMC-style), using `dream-heatmap` for interactive visualization.

**Goals:**
- Identify immune and stromal cell phenotypes via marker co-expression
- Visualize patient-level heterogeneity across the tumor microenvironment (TME)
- Demonstrate `dream-heatmap` features progressively, from basic to publication-ready

**Dataset:** 20 protein markers measured across 5,000 cells from 20 patients,
spanning 10 cell types (tumor, immune, stromal).

> **Prerequisite:** Run `python data/generate_tme_data.py` from the project root
> to generate the CSV files before running this notebook.

## 1. Load Data

In [None]:
import pandas as pd
import numpy as np

expression_df = pd.read_csv("../data/tme_expression_matrix.csv", index_col=0)
cell_meta = pd.read_csv("../data/tme_cell_metadata.csv").set_index("cell_id")
marker_meta = pd.read_csv("../data/tme_marker_metadata.csv").set_index("marker")

print(f"Expression matrix: {expression_df.shape[0]} markers x {expression_df.shape[1]} cells")
print(f"Cell metadata:     {len(cell_meta)} cells  | columns: {list(cell_meta.columns)}")
print(f"Marker metadata:   {len(marker_meta)} markers | columns: {list(marker_meta.columns)}")

Expression matrix: 20 markers x 5000 cells
Cell metadata:     5000 cells  | columns: ['cell_type', 'subtype', 'patient_id', 'tissue_region']
Marker metadata:   20 markers | columns: ['positivity_cutoff']


In [2]:
expression_df.iloc[:5, :8]

Unnamed: 0,cell_0001,cell_0002,cell_0003,cell_0004,cell_0005,cell_0006,cell_0007,cell_0008
HER2,0.882663,0.836618,0.92057,0.750127,0.891261,0.857191,0.997089,0.824638
CK,0.781775,0.726393,0.860447,0.689784,0.756072,0.93743,0.777205,0.81748
Ki67,0.885986,0.609858,0.746388,0.732482,0.693985,0.839709,0.81851,0.867042
EGFR,0.679779,0.620452,0.549805,0.691845,0.778664,0.65988,0.914794,0.773233
E-cadherin,0.71893,0.657387,0.757154,0.818054,0.622841,0.715004,0.657124,0.721406


In [3]:
cell_meta.head()

Unnamed: 0_level_0,cell_type,subtype,patient_id,tissue_region
cell_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
cell_0001,HER2+ Tumor,Proliferating,P01,Tumor core
cell_0002,HER2+ Tumor,Proliferating,P01,Tumor core
cell_0003,HER2+ Tumor,Proliferating,P01,Tumor core
cell_0004,HER2+ Tumor,Proliferating,P01,Invasive margin
cell_0005,HER2+ Tumor,Proliferating,P01,Tumor core


In [4]:
marker_meta.head()

Unnamed: 0_level_0,positivity_cutoff
marker,Unnamed: 1_level_1
HER2,0.35
CK,0.35
Ki67,0.25
EGFR,0.3
E-cadherin,0.3


## 2. Unsupervised Clustering

Before visualization, let's run KMeans clustering on the cells to see how well
unsupervised clusters recover the known cell types.

In [5]:
from sklearn.cluster import KMeans

X = expression_df.values.T  # cells x markers
kmeans = KMeans(n_clusters=12, random_state=42, n_init=10)
cell_meta["cluster"] = [f"C{c}" for c in kmeans.fit_predict(X)]

print(f"Assigned {len(cell_meta)} cells to {cell_meta['cluster'].nunique()} clusters")

Assigned 5000 cells to 12 clusters


In [6]:
pd.crosstab(cell_meta["cell_type"], cell_meta["cluster"])

cluster,C0,C1,C10,C11,C2,C3,C4,C5,C6,C7,C8,C9
cell_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
B cell,0,0,0,0,0,0,0,0,0,300,0,0
CAF,0,0,0,0,0,500,0,0,0,0,0,0
CD4+ T cell,500,0,0,0,0,0,0,0,0,0,0,0
CD8+ T cell,0,0,0,0,0,0,0,600,0,0,0,0
Dendritic cell,0,0,0,0,200,0,0,0,0,0,0,0
Endothelial,0,0,0,0,0,0,0,0,150,0,0,0
HER2+ Tumor,0,602,300,0,0,0,0,0,0,0,0,598
Macrophage,0,0,0,0,0,0,750,0,0,0,0,0
NK cell,0,0,0,200,0,0,0,0,0,0,0,0
Treg,1,0,0,0,0,0,0,0,0,0,299,0


## 3. First Heatmap --- Raw Data

Let's start with the simplest possible heatmap: just the expression matrix, default settings.

In [7]:
import dream_heatmap as dh

hm = dh.Heatmap(expression_df)
hm.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x0000023012214F80>

This is 20 markers x 5,000 cells with the default `viridis` colormap.
Hard to see any structure --- let's add context step by step.

## 4. Colormap & Color Bar

For protein expression data, a sequential warm palette like `YlOrRd` works well:
low expression is pale yellow, high expression is dark red.

In [8]:
hm = dh.Heatmap(expression_df)
hm.set_colormap("YlOrRd", color_bar_title="Protein Expression")
hm.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x000002307FBC5010>

## 5. Column Annotations --- Cell Type & Cluster

Categorical annotations add colored strips alongside the heatmap. We'll annotate
cells (columns) with their known cell type and the KMeans cluster assignment.

In [9]:
# Start building a richer heatmap
hm = dh.Heatmap(expression_df)
hm.set_colormap("YlOrRd", color_bar_title="Protein Expression")
hm.set_col_metadata(cell_meta)
hm.set_row_metadata(marker_meta)

# Cell type annotation (top)
hm.add_annotation("top", dh.CategoricalAnnotation("Cell Type", cell_meta["cell_type"]))

# Cluster annotation (top)
hm.add_annotation("top", dh.CategoricalAnnotation("Cluster", cell_meta["cluster"]))

<dream_heatmap.api.Heatmap at 0x230772e13d0>

In [10]:
hm.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x00000230783308F0>

The colored strips at the top show cell type and cluster identity.
But the column order is still arbitrary --- patterns are hard to spot.

## 6. Row Annotations --- Mean Expression & Positivity Cutoffs

Now let's annotate the markers (rows):
- **Left:** bar chart of mean expression across all cells
- **Right:** text labels showing the positivity cutoff for each marker

In [11]:
# Mean expression per marker
mean_expr = expression_df.mean(axis=1)
hm.add_annotation("left", dh.BarChartAnnotation("Mean Expr", mean_expr))

<dream_heatmap.api.Heatmap at 0x230772e13d0>

In [12]:
# Positivity cutoff labels
cutoff_labels = marker_meta["positivity_cutoff"].map(lambda x: f">{x:.2f}")
hm.add_annotation("right", dh.LabelAnnotation("Cutoff", cutoff_labels))

hm.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x0000023011F76090>

The bar chart on the left shows that immune markers like CD45 and CD8 have moderate
mean expression, while stromal markers like SMA and Collagen are lower overall
(expressed in fewer cell types). The cutoff labels on the right indicate the
threshold for calling a cell "positive" for each marker.

## 7. Splitting by Cell Type

Visual splits insert whitespace gaps between groups. Splitting columns by
cell type makes co-expression patterns within each cell type immediately obvious.

In [13]:
hm.split_cols(by="cell_type")
hm.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x0000023012281DF0>

## 8. Hierarchical Clustering

Clustering rows (markers) groups co-expressed proteins together.
The dendrogram on the left shows the hierarchy --- try clicking a branch to select its subtree!

In [14]:
hm.cluster_rows(method="ward", metric="euclidean")

<dream_heatmap.api.Heatmap at 0x230772e13d0>

In [15]:
hm.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x000002301255B530>

Notice how clustering groups the immune markers (CD45, CD8, CD4, etc.) together
and the tumor/epithelial markers (HER2, CK, E-cadherin) together. The dendrogram
on the left visualizes this hierarchy.

## 9. Ordering by Patient

Within each cell-type group, we can sort cells by patient ID to reveal
patient-level batch effects.

In [16]:
hm.order_cols(by="patient_id")
hm.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x000002301255BE00>

Look for subtle vertical stripes within each cell-type block --- those are
patient batch effects (systematic per-patient shifts in expression).

## 10. Label Display Control

With 5,000 cells, showing column labels would be unreadable. Let's show all
marker names (rows) but hide cell IDs (columns).

In [17]:
hm.set_label_display(rows="all", cols="none")
hm.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x0000023011FB9D30>

Label display options: `'all'` (show every label), `'auto'` (show as many as fit
without overlap), `'none'` (hide all). Default is `'auto'` for both axes.

## 11. Advanced Row Annotations

`dream-heatmap` supports mini-graph annotations: sparklines, box plots, and
violin plots. Let's see each one. Since we already have annotations on `left`
and `right`, we'll create fresh heatmaps for each demo.

In [18]:
# Sparkline: mean expression per marker across cell types
# Each marker gets a mini line chart with one point per cell type
sparkline_data = expression_df.T.groupby(cell_meta["cell_type"]).mean().T

hm_spark = dh.Heatmap(expression_df)
hm_spark.set_colormap("YlOrRd", color_bar_title="Protein Expression")
hm_spark.add_annotation("right", dh.SparklineAnnotation("Cell-Type Profile", sparkline_data))
hm_spark.set_label_display(rows="all", cols="none")
hm_spark.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x0000023012283050>

In [19]:
# Box plot: expression distribution per marker across all cells
# The expression matrix (20 markers x 5000 cells) works directly
hm_box = dh.Heatmap(expression_df)
hm_box.set_colormap("YlOrRd", color_bar_title="Protein Expression")
hm_box.add_annotation("right", dh.BoxPlotAnnotation("Distribution", expression_df))
hm_box.set_label_display(rows="all", cols="none")
hm_box.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x0000023012244410>

In [20]:
# Violin plot: same data as box plot, rendered as density shape
hm_violin = dh.Heatmap(expression_df)
hm_violin.set_colormap("YlOrRd", color_bar_title="Protein Expression")
hm_violin.add_annotation("right", dh.ViolinPlotAnnotation("Density", expression_df))
hm_violin.set_label_display(rows="all", cols="none")
hm_violin.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x00000230124E8C20>

**When to use which:**
- **Sparkline**: Show a trend or profile across ordered categories (e.g., cell types)
- **Box plot**: Show distribution summary with quartiles --- good for spotting outliers
- **Violin**: Show full distribution shape --- reveals bimodality that box plots miss

## 12. Selection & Callbacks

Draw a rectangle on any heatmap to select cells. The selected row and column IDs
are available programmatically. Try dragging over a cluster in the heatmap below!

In [21]:
hm_sel = dh.Heatmap(expression_df)
hm_sel.set_colormap("YlOrRd", color_bar_title="Protein Expression")
hm_sel.set_col_metadata(cell_meta)
hm_sel.split_cols(by="cell_type")
hm_sel.cluster_rows()
hm_sel.set_label_display(rows="all", cols="none")

# Register a callback that fires when you drag-select a region
def on_select(row_ids, col_ids):
    print(f"Selected {len(row_ids)} markers x {len(col_ids)} cells")
    if col_ids:
        types = cell_meta.loc[col_ids, "cell_type"].value_counts()
        print(f"Cell types in selection:\n{types.to_string()}")

hm_sel.on_select(on_select)
hm_sel.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x00000230124EA6F0>

In [22]:
# Access the current selection programmatically (after making a selection above)
hm_sel.selection

{'row_ids': [], 'col_ids': []}

The selection rectangle snaps to cell boundaries --- this is the **ruler problem**
that `dream-heatmap` solves. You always know exactly which rows and columns are selected.

## 13. Zoom

After selecting a region, use the **zoom** toolbar button (or press `z`) to zoom
into the selection. Double-click the heatmap to reset the view.

The toolbar provides:
- **Zoom to selection**: Focus on a region of interest
- **Reset view**: Return to the full heatmap
- **Download**: Save the current view as an image
- **Toggle crosshair**: Show/hide crosshair cursor for precise inspection

## 14. Alternative Split --- By Cluster

Instead of splitting by known cell type, we can split by the KMeans clusters
to see if the unsupervised groupings capture meaningful biology.

In [23]:
hm_clust = dh.Heatmap(expression_df)
hm_clust.set_colormap("YlOrRd", color_bar_title="Protein Expression")
hm_clust.set_col_metadata(cell_meta)
hm_clust.add_annotation("top", dh.CategoricalAnnotation("Cell Type", cell_meta["cell_type"]))
hm_clust.split_cols(by="cluster")
hm_clust.cluster_rows(method="ward")
hm_clust.set_label_display(rows="all", cols="none")
hm_clust.show()

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x0000023012281790>

Compare this to the cell-type split in Section 7. The Cell Type color strip at
the top reveals how well each KMeans cluster maps to a single cell type.
Some clusters cleanly capture one cell type; others mix related types (e.g., CD4+ T cells and Tregs).

## 15. HTML Export

Export the heatmap as a standalone HTML file that anyone can open in a browser --- no
Python required.

In [24]:
hm.to_html("../tumor_microenvironment.html", title="HER2+ TME Heatmap")
print("Exported to tumor_microenvironment.html")

Exported to tumor_microenvironment.html


## 16. Multi-Panel --- Immune vs. Stromal Compartments

To compare compartments side-by-side, create separate heatmaps for immune
and stromal cells, then concatenate them horizontally (shared marker axis).

In [25]:
immune_types = ["CD8+ T cell", "CD4+ T cell", "Treg", "B cell",
                "NK cell", "Macrophage", "Dendritic cell"]
stromal_types = ["CAF", "Endothelial"]

immune_cells = cell_meta[cell_meta["cell_type"].isin(immune_types)].index
stromal_cells = cell_meta[cell_meta["cell_type"].isin(stromal_types)].index

print(f"Immune cells: {len(immune_cells)}")
print(f"Stromal cells: {len(stromal_cells)}")

Immune cells: 2850
Stromal cells: 650


In [26]:
immune_meta = cell_meta.loc[immune_cells]
stromal_meta = cell_meta.loc[stromal_cells]

hm_immune = dh.Heatmap(expression_df[immune_cells])
hm_immune.set_colormap("YlOrRd", color_bar_title="Expression")
hm_immune.set_col_metadata(immune_meta)
hm_immune.add_annotation("top", dh.CategoricalAnnotation("Cell Type", immune_meta["cell_type"]))
hm_immune.split_cols(by="cell_type")
hm_immune.cluster_rows(method="ward")
hm_immune.set_label_display(rows="all", cols="none")

hm_stromal = dh.Heatmap(expression_df[stromal_cells])
hm_stromal.set_colormap("YlOrRd", color_bar_title="Expression")
hm_stromal.set_col_metadata(stromal_meta)
hm_stromal.add_annotation("top", dh.CategoricalAnnotation("Cell Type", stromal_meta["cell_type"]))
hm_stromal.split_cols(by="cell_type")
hm_stromal.cluster_rows(method="ward")
hm_stromal.set_label_display(rows="all", cols="none")

<dream_heatmap.api.Heatmap at 0x230111d8350>

In [27]:
from IPython.display import display

panel = dh.Heatmap.hconcat(hm_immune, hm_stromal)
for widget in panel.show():
    display(widget)

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x0000023013787140>

<dream_heatmap.widget.heatmap_widget.HeatmapWidget object at 0x00000230124EA120>

`hconcat` places heatmaps side-by-side with a shared row axis (markers).
This makes it easy to compare which markers are differentially expressed
between compartments.

## 17. Polished Final Figure

Putting it all together: colormap, metadata, split, clustering, annotations,
and label control in one builder chain.

In [28]:
# Prepare annotation data
mean_expr = expression_df.mean(axis=1)
sparkline_data = expression_df.T.groupby(cell_meta["cell_type"]).mean().T
cutoff_labels = marker_meta["positivity_cutoff"].map(lambda x: f">{x:.2f}")

# Build the polished heatmap
hm_final = dh.Heatmap(expression_df)

# Color
hm_final.set_colormap("YlOrRd", color_bar_title="Protein Expression") 1

# Metadata
hm_final.set_col_metadata(cell_meta)
hm_final.set_row_metadata(marker_meta)

# Column annotations (top): cell type + cluster
hm_final.add_annotation("top", dh.CategoricalAnnotation("Cell Type", cell_meta["cell_type"]))
hm_final.add_annotation("top", dh.CategoricalAnnotation("Cluster", cell_meta["cluster"]))

# Row annotations: bar chart (left), sparkline (left), cutoff labels (right)
hm_final.add_annotation("left", dh.BarChartAnnotation("Mean Expr", mean_expr))
hm_final.add_annotation("left", dh.SparklineAnnotation("Cell-Type Profile", sparkline_data))
hm_final.add_annotation("right", dh.LabelAnnotation("Cutoff", cutoff_labels))

# Structure
hm_final.split_cols(by="cell_type")
hm_final.cluster_rows(method="ward", metric="euclidean")
hm_final.order_cols(by="patient_id")

# Labels
hm_final.set_label_display(rows="all", cols="none")

hm_final.show()

SyntaxError: invalid syntax (1603566817.py, line 10)

## Summary

This notebook demonstrated `dream-heatmap` features progressively:

| Feature | Section |
|---|---|
| Basic heatmap | 3 |
| Custom colormap | 4 |
| Categorical annotations | 5 |
| Bar chart & label annotations | 6 |
| Column splits | 7 |
| Hierarchical clustering + dendrograms | 8 |
| Metadata-based ordering | 9 |
| Label display control | 10 |
| Sparkline, box plot, violin annotations | 11 |
| Rectangle selection + callbacks | 12 |
| Zoom & toolbar | 13 |
| Alternative splits (by cluster) | 14 |
| HTML export | 15 |
| Multi-panel concatenation | 16 |
| Polished final figure | 17 |

All heatmaps are interactive --- hover for tooltips, drag to select, zoom to focus.
The **ruler problem** is solved: every selection gives you exact row and column IDs.