# Lab 6: Evaluating Clusters

## Objectives
- Quantify separation (silhouette score) in PCA space
- Check marker specificity as a biological validation
- Identify clusters driven by technical covariates

## Outputs
- `../results/lab06_cluster_eval.md`

---


In [None]:
import scanpy as sc
import numpy as np
from pathlib import Path

from sklearn.metrics import silhouette_score

adata = sc.datasets.pbmc3k_processed()
sc.pp.neighbors(adata, n_neighbors=15, n_pcs=40)
sc.tl.leiden(adata, resolution=0.5, key_added='leiden')

X = adata.obsm['X_pca'][:, :30]
labels = adata.obs['leiden'].astype('category').cat.codes

sil = silhouette_score(X, labels)
print(f"Silhouette (PCA, 30 PCs): {sil:.3f}")

# Marker sanity check (PBMC canonical)
pbmc_markers = {
    'T': ['CD3D', 'CD3E'],
    'B': ['MS4A1', 'CD79A'],
    'NK': ['NKG7', 'GNLY'],
    'Mono': ['LYZ', 'S100A8'],
}
sc.pl.dotplot(adata, var_names=pbmc_markers, groupby='leiden', standard_scale='var')

Path('../results').mkdir(exist_ok=True)
Path('../results/lab06_cluster_eval.md').write_text(
    f"# Cluster evaluation\n\n- Silhouette (PCA): {sil:.3f}\n\n"
    "## Notes\n- Confirm clusters are not driven by technical QC metrics.\n"
    "- Confirm at least a few clusters have specific canonical markers.\n"
)
print('Wrote ../results/lab06_cluster_eval.md')
