# SmallVessel Density Analysis — Spleen Multiplex Imaging

**Goal:** Compare SmallVessel density across tissue regions (Follicle, PALS, RedPulp, Trabeculae, LargeVessel), grouped by rs3184504 (SH2B3) genotype. Primary region of interest: **Follicle**.

**Date:** 2026-02-24  
**Data:** 9 OME-TIFF multiplex Phenocycler spleen images, QuPath segmentation output

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from itertools import combinations
from pathlib import Path

# Plotting defaults
sns.set_theme(style="whitegrid", context="notebook", font_scale=1.1)
plt.rcParams["figure.dpi"] = 150
plt.rcParams["savefig.dpi"] = 300
plt.rcParams["savefig.bbox"] = "tight"

# Paths
PROJECT_ROOT = Path("..").resolve()
CSV_PATH = PROJECT_ROOT / "Measurements" / "AllAnnotations.csv"
GROUPS_PATH = PROJECT_ROOT / "Groups.xlsx"
OUT_DIR = Path(".").resolve()

# Constants
UM2_PER_MM2 = 1_000_000
REGION_ORDER = ["Follicle", "PALS", "RedPulp", "Trabeculae", "LargeVessel"]
GENOTYPE_ORDER = ["C/C", "C/T", "T/T"]

print(f"Project root: {PROJECT_ROOT}")
print(f"CSV: {CSV_PATH} (exists: {CSV_PATH.exists()})")
print(f"Groups: {GROUPS_PATH} (exists: {GROUPS_PATH.exists()})")

In [None]:
# Load CSV
raw = pd.read_csv(CSV_PATH)
print(f"Shape: {raw.shape}")
print(f"\nColumns: {raw.columns.tolist()}")
raw.head(3)

In [None]:
# Split into regions (parent annotations) and vessels
regions_df = raw[raw["Parent"] == "Root object (Image)"].copy()
vessels_df = raw[raw["Classification"] == "SmallVessel"].copy()

print(f"Region annotations: {len(regions_df)} (expect 45 = 9 images x 5 regions)")
print(f"SmallVessel annotations: {len(vessels_df):,}")
assert len(regions_df) == 45, f"Expected 45 regions, got {len(regions_df)}"

In [None]:
# Extract Sample_ID from Image column
vessels_df["Sample_ID"] = vessels_df["Image"].str.extract(r"(HDL\d+)")
regions_df["Sample_ID"] = regions_df["Image"].str.extract(r"(HDL\d+)")

print("Unique samples:", sorted(vessels_df["Sample_ID"].unique()))

In [None]:
# Extract Region from Parent column for vessels, from Classification for regions
vessels_df["Region"] = vessels_df["Parent"].str.extract(r"Annotation \((\w+)\)")
regions_df["Region"] = regions_df["Classification"]

print("Vessel regions:", sorted(vessels_df["Region"].dropna().unique()))
print("Region annotations:", sorted(regions_df["Region"].unique()))

In [None]:
# Count vessels per Sample_ID x Region
vessel_counts = (
    vessels_df
    .groupby(["Sample_ID", "Region"])
    .size()
    .reset_index(name="Vessel_Count")
)

print("Vessel counts pivot:")
vessel_counts.pivot(index="Sample_ID", columns="Region", values="Vessel_Count")[REGION_ORDER]

In [None]:
# Merge vessel counts with region areas, compute density
# Use region_areas as the base to ensure all 45 rows exist (some regions have 0 vessels)
region_areas = regions_df[["Sample_ID", "Region", "Area µm^2"]].rename(
    columns={"Area µm^2": "Region_Area_um2"}
)

density_df = region_areas.merge(vessel_counts, on=["Sample_ID", "Region"], how="left")
density_df["Vessel_Count"] = density_df["Vessel_Count"].fillna(0).astype(int)
density_df["Density_per_mm2"] = density_df["Vessel_Count"] / density_df["Region_Area_um2"] * UM2_PER_MM2

assert len(density_df) == 45, f"Expected 45 rows, got {len(density_df)}"
assert density_df["Density_per_mm2"].notna().all(), "NaN in density!"

print(f"Density table: {density_df.shape}")
print(f"Regions with 0 vessels: {(density_df['Vessel_Count'] == 0).sum()}")
density_df.head()

In [None]:
# Load genotype data from Groups.xlsx
groups_raw = pd.read_excel(GROUPS_PATH, sheet_name=0)
genotype_map = (
    groups_raw[["HANDEL ID", "rs3184504 (SH2B3)"]]
    .dropna(subset=["HANDEL ID"])
    .set_index("HANDEL ID")["rs3184504 (SH2B3)"]
    .to_dict()
)

print("Genotype mapping:")
for k, v in sorted(genotype_map.items()):
    print(f"  {k}: {v}")

# Map genotype onto density_df
density_df["Genotype"] = density_df["Sample_ID"].map(genotype_map)

# Report missing
missing = density_df[density_df["Genotype"].isna()]["Sample_ID"].unique()
if len(missing) > 0:
    print(f"\nSamples without genotype (excluded from genotype analyses): {missing.tolist()}")

print(f"\nGenotype counts (unique samples):")
print(density_df.drop_duplicates("Sample_ID")["Genotype"].value_counts())

## Descriptive Analysis — All 9 Samples

- **9 samples** total, **8 genotyped**: C/C (n=3), C/T (n=4), T/T (n=1)
- **HDL172** excluded from genotype comparisons (genotype not yet available)
- **5 tissue regions** per sample: Follicle, PALS, RedPulp, Trabeculae, LargeVessel

In [None]:
# Pivot: density by Sample x Region
density_pivot = density_df.pivot(
    index="Sample_ID", columns="Region", values="Density_per_mm2"
)[REGION_ORDER]

print("SmallVessel density (vessels/mm²) by sample and region:")
density_pivot.round(1)

In [None]:
# Descriptive stats per region
desc_stats = density_df.groupby("Region")["Density_per_mm2"].agg(
    ["count", "mean", "std", "median", "min", "max"]
).reindex(REGION_ORDER).round(1)

print("Descriptive statistics — SmallVessel density (vessels/mm²):")
desc_stats

In [None]:
# Fig 1: Boxplot + strip of density by region
fig, ax = plt.subplots(figsize=(8, 5))
sns.boxplot(data=density_df, x="Region", y="Density_per_mm2",
            order=REGION_ORDER, color="lightblue", fliersize=0, ax=ax)
sns.stripplot(data=density_df, x="Region", y="Density_per_mm2",
              order=REGION_ORDER, color=".3", size=6, jitter=0.15, ax=ax)
ax.set_ylabel("SmallVessel Density (vessels/mm²)")
ax.set_xlabel("Tissue Region")
ax.set_title("SmallVessel Density by Tissue Region (n=9 samples)")
fig.savefig(OUT_DIR / "fig1_density_by_region.png")
plt.show()

In [None]:
# Fig 2: Follicle density — bar + individual labeled points
foll = density_df[density_df["Region"] == "Follicle"].sort_values("Sample_ID")

fig, ax = plt.subplots(figsize=(8, 5))
bars = ax.bar(foll["Sample_ID"], foll["Density_per_mm2"], color="steelblue", alpha=0.7)
ax.scatter(foll["Sample_ID"], foll["Density_per_mm2"], color="black", zorder=5, s=40)
for _, row in foll.iterrows():
    ax.annotate(f"{row['Density_per_mm2']:.0f}",
                (row["Sample_ID"], row["Density_per_mm2"]),
                textcoords="offset points", xytext=(0, 8),
                ha="center", fontsize=8)
ax.axhline(foll["Density_per_mm2"].mean(), ls="--", color="red", alpha=0.6, label="Mean")
ax.set_ylabel("SmallVessel Density (vessels/mm²)")
ax.set_xlabel("Sample")
ax.set_title("Follicle SmallVessel Density per Sample")
ax.legend()
plt.xticks(rotation=45, ha="right")
fig.savefig(OUT_DIR / "fig2_follicle_density_per_sample.png")
plt.show()

In [None]:
# Filter to genotyped samples only
geno_df = density_df.dropna(subset=["Genotype"]).copy()
geno_df["Genotype"] = pd.Categorical(geno_df["Genotype"], categories=GENOTYPE_ORDER, ordered=True)

n_per_geno = geno_df.drop_duplicates("Sample_ID").groupby("Genotype", observed=True).size()
print("Genotyped samples per group:")
print(n_per_geno)
print(f"\nTotal genotyped rows: {len(geno_df)} (8 samples x 5 regions = 40)")

In [None]:
# Fig 3: Strip plot — density by region, colored by genotype
palette = {"C/C": "#4C72B0", "C/T": "#DD8452", "T/T": "#55A868"}

fig, ax = plt.subplots(figsize=(9, 5))
sns.stripplot(data=geno_df, x="Region", y="Density_per_mm2",
              hue="Genotype", order=REGION_ORDER, hue_order=GENOTYPE_ORDER,
              palette=palette, size=8, dodge=True, jitter=0.1, ax=ax)
ax.set_ylabel("SmallVessel Density (vessels/mm²)")
ax.set_xlabel("Tissue Region")
ax.set_title("SmallVessel Density by Region and Genotype (n=8 genotyped samples)")
ax.legend(title="rs3184504", bbox_to_anchor=(1.02, 1), loc="upper left")
fig.savefig(OUT_DIR / "fig3_density_by_region_genotype.png")
plt.show()

In [None]:
# Fig 4: Follicle density by genotype with labeled points
foll_geno = geno_df[geno_df["Region"] == "Follicle"].copy()

fig, ax = plt.subplots(figsize=(6, 5))
sns.stripplot(data=foll_geno, x="Genotype", y="Density_per_mm2",
              hue="Genotype", order=GENOTYPE_ORDER, hue_order=GENOTYPE_ORDER,
              palette=palette, size=10, jitter=0.05, legend=False, ax=ax)
for _, row in foll_geno.iterrows():
    ax.annotate(row["Sample_ID"],
                (GENOTYPE_ORDER.index(row["Genotype"]), row["Density_per_mm2"]),
                textcoords="offset points", xytext=(12, 0),
                fontsize=8, va="center")

# Group means
for i, g in enumerate(GENOTYPE_ORDER):
    vals = foll_geno[foll_geno["Genotype"] == g]["Density_per_mm2"]
    if len(vals) > 0:
        ax.plot([i - 0.2, i + 0.2], [vals.mean(), vals.mean()],
                color="black", lw=2, zorder=5)

ax.set_ylabel("SmallVessel Density (vessels/mm²)")
ax.set_xlabel("rs3184504 (SH2B3) Genotype")
ax.set_title("Follicle SmallVessel Density by Genotype")
fig.savefig(OUT_DIR / "fig4_follicle_density_by_genotype.png")
plt.show()

## Statistical Tests

**Approach:**
- **Region comparison (all 9 samples):** Kruskal-Wallis H test (non-parametric; n=9/group, normality unverifiable) followed by pairwise Mann-Whitney U with Bonferroni correction (10 pairs, α_adj = 0.005).
- **Genotype comparison (Follicle only, 8 genotyped samples):** Descriptive statistics for all three genotypes. Mann-Whitney U for C/C (n=3) vs C/T (n=4) — the only pair with n≥3 per group. T/T (n=1) is reported descriptively.

**Limitations:** Small sample sizes (especially T/T n=1) limit statistical power. Results should be interpreted with caution and considered hypothesis-generating.

In [None]:
# Kruskal-Wallis across 5 regions (all 9 samples)
region_groups = [grp["Density_per_mm2"].values for _, grp in density_df.groupby("Region")]
kw_stat, kw_p = stats.kruskal(*region_groups)

print(f"Kruskal-Wallis H test across {len(REGION_ORDER)} regions (n=9 each):")
print(f"  H = {kw_stat:.2f}, p = {kw_p:.2e}")
print(f"  {'Significant' if kw_p < 0.05 else 'Not significant'} at α=0.05")

In [None]:
# Pairwise Mann-Whitney U with Bonferroni correction
pairs = list(combinations(REGION_ORDER, 2))
n_pairs = len(pairs)
alpha_adj = 0.05 / n_pairs

pairwise_results = []
for r1, r2 in pairs:
    v1 = density_df[density_df["Region"] == r1]["Density_per_mm2"].values
    v2 = density_df[density_df["Region"] == r2]["Density_per_mm2"].values
    u_stat, p_val = stats.mannwhitneyu(v1, v2, alternative="two-sided")
    # Rank-biserial r as effect size
    n1, n2 = len(v1), len(v2)
    r_effect = 1 - (2 * u_stat) / (n1 * n2)
    pairwise_results.append({
        "Pair": f"{r1} vs {r2}",
        "U": u_stat,
        "p": p_val,
        "p_adj (Bonf)": min(p_val * n_pairs, 1.0),
        "r (effect)": round(r_effect, 3),
        "Sig": "*" if p_val * n_pairs < 0.05 else ""
    })

pw_df = pd.DataFrame(pairwise_results)
print(f"Pairwise Mann-Whitney U tests ({n_pairs} pairs, Bonferroni α_adj={alpha_adj:.4f}):")
pw_df

In [None]:
# Genotype comparison — Follicle only
print("Follicle SmallVessel density by genotype (vessels/mm²):\n")
for g in GENOTYPE_ORDER:
    subset = foll_geno[foll_geno["Genotype"] == g]
    vals = subset["Density_per_mm2"]
    samples = subset["Sample_ID"].tolist()
    if len(vals) > 1:
        print(f"  {g} (n={len(vals)}): mean={vals.mean():.1f}, SD={vals.std():.1f}, "
              f"median={vals.median():.1f}, range=[{vals.min():.1f}, {vals.max():.1f}]")
    else:
        print(f"  {g} (n={len(vals)}): value={vals.values[0]:.1f}")
    print(f"    Samples: {samples}")

# Mann-Whitney U: C/C vs C/T
cc_vals = foll_geno[foll_geno["Genotype"] == "C/C"]["Density_per_mm2"].values
ct_vals = foll_geno[foll_geno["Genotype"] == "C/T"]["Density_per_mm2"].values
u_stat, p_val = stats.mannwhitneyu(cc_vals, ct_vals, alternative="two-sided")
r_effect = 1 - (2 * u_stat) / (len(cc_vals) * len(ct_vals))

print(f"\nMann-Whitney U test (Follicle): C/C (n={len(cc_vals)}) vs C/T (n={len(ct_vals)})")
print(f"  U = {u_stat:.1f}, p = {p_val:.4f}, r = {r_effect:.3f}")
print(f"  {'Significant' if p_val < 0.05 else 'Not significant'} at α=0.05")
print(f"  Note: Low statistical power due to small sample sizes.")

In [None]:
# Genotype pivot table — all regions, 8 genotyped samples
geno_pivot = geno_df.pivot_table(
    index=["Genotype", "Sample_ID"], columns="Region",
    values="Density_per_mm2"
)[REGION_ORDER].round(1)

print("Density (vessels/mm²) by genotype, sample, and region:")
geno_pivot

In [None]:
# Fig 5: Heatmap of density matrix (9 samples x 5 regions)
heatmap_data = density_pivot.reindex(sorted(density_pivot.index))

fig, ax = plt.subplots(figsize=(8, 5))
sns.heatmap(heatmap_data, annot=True, fmt=".0f", cmap="YlOrRd",
            linewidths=0.5, ax=ax, cbar_kws={"label": "Vessels/mm²"})
ax.set_ylabel("Sample")
ax.set_xlabel("Tissue Region")
ax.set_title("SmallVessel Density Heatmap")
fig.savefig(OUT_DIR / "fig5_density_heatmap.png")
plt.show()

In [None]:
# Fig 6: Per-sample line profiles across regions
fig, ax = plt.subplots(figsize=(9, 5))
for sample in sorted(density_pivot.index):
    vals = density_pivot.loc[sample]
    ax.plot(REGION_ORDER, vals, marker="o", label=sample, linewidth=1.5)
ax.set_ylabel("SmallVessel Density (vessels/mm²)")
ax.set_xlabel("Tissue Region")
ax.set_title("Per-Sample SmallVessel Density Profiles")
ax.legend(bbox_to_anchor=(1.02, 1), loc="upper left", fontsize=8)
fig.savefig(OUT_DIR / "fig6_per_sample_profiles.png")
plt.show()

In [None]:
# Vessel morphology summary by region
morph_cols = ["Area µm^2", "Circularity", "Solidity"]
morph_summary = (
    vessels_df
    .groupby("Region")[morph_cols]
    .agg(["mean", "std", "median"])
    .reindex(REGION_ORDER)
    .round(3)
)

print("SmallVessel morphology by region:")
morph_summary

In [None]:
# Fig 7: Violin plot of individual vessel areas by region
fig, ax = plt.subplots(figsize=(8, 5))
sns.violinplot(data=vessels_df, x="Region", y="Area µm^2",
               order=REGION_ORDER, cut=0, inner="quartile",
               color="lightblue", ax=ax)
ax.set_ylabel("Vessel Area (µm²)")
ax.set_xlabel("Tissue Region")
ax.set_title("SmallVessel Area Distribution by Region")
# Cap y-axis at 99th percentile for readability
y_cap = vessels_df["Area µm^2"].quantile(0.99)
ax.set_ylim(0, y_cap)
fig.savefig(OUT_DIR / "fig7_vessel_area_violin.png")
plt.show()

In [None]:
# Export results
density_df.to_csv(OUT_DIR / "vessel_density_results.csv", index=False)
density_pivot.to_csv(OUT_DIR / "density_summary_pivot.csv")

print(f"Exported:")
print(f"  vessel_density_results.csv ({len(density_df)} rows)")
print(f"  density_summary_pivot.csv ({density_pivot.shape[0]} x {density_pivot.shape[1]})")

## Conclusions

**Key findings:**
- *(Fill in after running — e.g., which regions have highest/lowest vessel density)*
- *(Kruskal-Wallis result — significant region differences?)*
- *(Genotype trend in Follicle — C/C vs C/T direction and significance)*

**Limitations:**
- Small sample sizes (n=3–4 per genotype group) limit statistical power
- T/T genotype has only n=1 — descriptive only
- HDL172 excluded (genotype not yet available)

**Next steps:**
1. Obtain genotype for HDL172 and re-run analysis
2. Power analysis for future cohort expansion
3. Investigate spatial patterns of vessel distribution within follicles
4. Correlate vessel density with cell-level markers from InstanSeg