### BrainGlobe, BrainRender - Brain Atlas Visualization Tutorial (Mouse) - 1.10.25

1) Cargar el atlas (Allen mouse, 25 µm)

   - BrainGlobeAtlas expone stacks (reference, annotation, hemispheres), jerarquía y meshes por región. 
   BrainGlobe
   - Cargar atlas y construir lk normalizado : Diccionario estable del atlas, con claves normalizadas.

In [1]:
from brainglobe_atlasapi import BrainGlobeAtlas
import pandas as pd

ATLAS = "allen_mouse_25um"
atlas = BrainGlobeAtlas(ATLAS, check_latest=False)

lk = atlas.lookup_df.copy()
lk["name_norm"] = (lk["name"].str.strip().str.replace(r"\s+", " ", regex=True).str.lower())
lk["acr_norm"]  = lk["acronym"].str.strip().str.lower()


2) Ver el “diccionario” de regiones (para mapear tus nombres)

   - Usaremos lookup para empatar tus 78×2 nombres con acronyms/IDs oficiales del atlas.
   - Cargar tus ROIs (TXT) y normalizar : Fuente de verdad de nombres y su versión normalizada

In [2]:
# DataFrame con columnas: id, acronym, name para cada estructura
lookup = atlas.lookup_df
lookup.head()

Unnamed: 0,acronym,id,name
0,root,997,root
1,grey,8,Basic cell groups and regions
2,CH,567,Cerebrum
3,CTX,688,Cerebral cortex
4,CTXpl,695,Cortical plate


3) Cargar tu lista de ROIs (solo nombres) y normalizar
   
   - Así obtienes, para cada nombre tuyo, acronym/ID del atlas (cuando hay coincidencia exacta).  

In [3]:
path = "../../data/raw/Toni_2025-08-06/atlas_cg_3d5_names.txt"
with open(path, "r", encoding="utf-8") as f:
    rows = [ln.strip() for ln in f if ln.strip()]   # quita vacías

rois = pd.DataFrame({"cg_name": rows})
rois["cg_norm"] = (rois["cg_name"].str.strip().str.replace(r"\s+", " ", regex=True).str.lower())


Reglas: excluir y renombrar determinístico
   
   - Marcamos exclusiones, renombres y padres.

In [4]:
# --- Reglas globales ---
EXCLUDE = {
    "clear label",
    "brainstem, unspecified",          # demasiado global; si lo quieres, haz __UNION__:brainstem
    "basal forebrain region, unspecified",
}

RENAMES = {
    # ===== Uniones 1→N (se resuelven con union_children) =====
    "white matter": "__UNION__:white matter",
    "ventricles":   "__UNION__:ventricles",

    # ===== Padres (pintar como unión de hijos) =====
    "superior colliculus":                 "__PARENT__:superior colliculus",  # SCm + SCs
    "substantia nigra":                    "__PARENT__:substantia nigra",     # SNc + SNr
    "septal region":                       "__PARENT__:septal complex",       # LS + MS (+…)
    "ventral striatal region, unspecified":"__PARENT__:ventral striatum",     # STRv subparts
    "hypothalamic region, unspecified":    "__PARENT__:hypothalamus",         # HY subparts
    "cingulate area 2":                    "__PARENT__:anterior cingulate area", # ACA + ACAd
    "secondary visual area":               "__PARENT__:secondary visual",     # VISal/VISam/VISpm/VISpl

    # ===== Renombres directos (acrónimo/nombre Allen) =====
    "olfactory bulb":            "main olfactory bulb",  # MOB
    "lateral orbital area":      "ORBl",
    "medial orbital area":       "ORBm",
    "ventral orbital area":      "ORBv",
    "dorsolateral orbital area": "ORBvl",
    "parietal association cortex": "PTLp",
    "hippocampus": "HIP",
    "pontine nuclei":            "PG",
    "frontal association cortex":"FRP",
    "ventral pallidum":          "VP",
    "medial entorhinal cortex":  "ENTm",
    "lateral entorhinal cortex": "ENTl",
    "retrosplenial dysgranular area": "RSPd",
    # "nucleus of the stria medullaris":  # ⇢ déjalo para overrides.csv (es un tracto; mejor decisión manual)
}

# asigna la regla por fila
rois["rule"] = rois["cg_norm"].map(lambda s: "EXCLUDE" if s in EXCLUDE else RENAMES.get(s, None))

# para merges exactos: si la regla NO es __PARENT__/__UNION__, usamos el renombre como query
forced = rois["rule"].fillna("")
is_special = forced.str.startswith("__PARENT__") | forced.str.startswith("__UNION__")

rois["query_name"] = (
    rois["cg_norm"].where(is_special)                     # si es __PARENT__/__UNION__ → usar cg_norm
    .fillna("")  # solo por seguridad
)
mask_rename_directo = (~is_special) & (forced.str.len() > 0)
rois.loc[mask_rename_directo, "query_name"] = forced[mask_rename_directo]
# normaliza por si acaso
rois["query_name"] = rois["query_name"].str.strip().str.lower()

Empate exacto por nombre (sin duplicar columnas) : Resolvemos lo que cae exacto o por renombre.

In [5]:
def to_norm(s): 
    return (s or "").strip().lower().replace("  "," ")

rois["query_name"] = rois["query_name"].map(to_norm)
# --- EXACTO POR NOMBRE (como ya tienes) ---
right_name = lk[["name_norm","id","acronym","name"]].rename(columns={"name_norm":"match_key"})
m = rois.merge(right_name, left_on="query_name", right_on="match_key", how="left").drop(columns=["match_key"])
m["status"] = None
m.loc[rois["rule"].eq("EXCLUDE").values, "status"] = "excluded"
m.loc[m["id"].notna() & m["status"].isna(), "status"] = "exact/rename"

# --- EXACTO POR ACRÓNIMO cuando query_name es un acrónimo ---
mask = m["id"].isna() & m["status"].isna()
if mask.any():
    right_acr_cg = lk[["acr_norm","id","acronym","name"]].rename(columns={"acr_norm":"match_key"})
    m_acr_cg = m.loc[mask, ["cg_name"]].assign(
        cg_acr = m.loc[mask, "cg_name"].str.strip().str.lower()
    ).merge(right_acr_cg, left_on="cg_acr", right_on="match_key", how="left").drop(columns=["match_key","cg_acr"])
    m.loc[mask, ["id","acronym","name"]] = m_acr_cg[["id","acronym","name"]].values
    m.loc[m["id"].notna() & m["status"].isna(), "status"] = "acronym"

Completar por acrónimo exacto (solo faltantes) : Segundo pase preciso por acrónimo.

4) Resolver no-coincidencias con un “fuzzy match” ligero (opcional)
   
   - Te deja casi todo mapeado sin intervención manual.
   - Último recurso automático, con umbral prudente.

In [6]:
from difflib import get_close_matches

mask = m["id"].isna() & m["status"].isna()
if mask.any():
    cands = lk["name_norm"].tolist()
    def fuzzy(s): 
        hit = get_close_matches(s, cands, n=1, cutoff=0.78)
        return hit[0] if hit else None

    m.loc[mask, "match_key"] = m.loc[mask, "cg_norm"].map(fuzzy)
    right_fz = lk[["name_norm","id","acronym","name"]].rename(columns={"name_norm":"match_key"})
    m_fz = m.loc[mask, ["cg_name","match_key"]].merge(right_fz, on="match_key", how="left")
    m.loc[mask, ["id","acronym","name"]] = m_fz[["id","acronym","name"]].values
    m.loc[m["id"].notna() & m["status"].isna(), "status"] = "fuzzy"

m = m.drop(columns=[c for c in ["match_key"] if c in m.columns])


- Padres → lista de hijos (para pintar unión)
   
   - P. ej., “Superior colliculus” → ['SCm','SCs'].

In [7]:
# tras tu merge 'm' (status exact/rename/acronym/fuzzy ya asignados)
mask_parent = rois["rule"].fillna("").str.startswith("__PARENT__")
parents = rois.loc[mask_parent, ["cg_name","rule"]].copy()
parents["parent_norm"] = parents["rule"].str.replace("__PARENT__:", "", regex=False).str.strip().str.lower()

def children_of(parent_label_norm):
    # heurística por 'contains'; si quieres, refínalo con jerarquía
    return lk.loc[lk["name_norm"].str.contains(parent_label_norm, na=False), "acronym"].dropna().tolist()

# parent_children definido explícitamente (seguro, no heurístico)
parent_children = {
    "superior colliculus": ["SCm", "SCs"],
    "substantia nigra": ["SNc", "SNr"],
    "septal region": ["MS", "LS"],
    "ventral striatal region, unspecified": ["ACB", "OT"],
    "hypothalamic region, unspecified": ["HY"],
    "cingulate area 2": ["ACA", "ACAd"],
    "secondary visual area": ["VISal", "VISam", "VISpm", "VISpl"],
}


- Uniones “muy grandes” (White matter / Ventricles)
   
   - Listas para pintar como unión de subregiones.

In [8]:
# === Unión 1→N ===
# Ventricles: todos los acrónimos que contengan "ventricle"
VENTRICLES = lk.loc[
    lk["name"].str.contains("ventricle", case=False, regex=True),
    "acronym"
].dropna().tolist()

# White matter: lista amplia de tractos/fascículos/comisuras/cápsulas, etc.
WM_PAT = r"(tract|commissure|capsule|fasciculus|peduncle|radiation|fornix|fimbria|pyramidal|cingulum)"
WHITE_MATTER = lk.loc[
    lk["name"].str.contains(WM_PAT, case=False, regex=True),
    "acronym"
].dropna().tolist()

union_children = {
    "white matter": WHITE_MATTER,
    "ventricles":   VENTRICLES,
}


  lk["name"].str.contains(WM_PAT, case=False, regex=True),


In [9]:
# --- UNION 1→N (white matter, ventricles) ---
m["children_list"] = None
mask_union = rois["rule"].fillna("").str.startswith("__UNION__")
if mask_union.any():
    tmp = rois.loc[mask_union, ["cg_name","rule"]].copy()
    tmp["union_key"] = tmp["rule"].str.replace("__UNION__:", "", regex=False).str.strip().str.lower()
    m = m.merge(tmp[["cg_name","union_key"]], on="cg_name", how="left")
    m.loc[m["union_key"].notna(), "children_list"] = m["union_key"].map(union_children)
    m.loc[m["union_key"].notna(), "status"] = "union"
    m = m.drop(columns=["union_key"])

# --- PARENTS (SN, SEP, ACA, VIS secundarias, STRv, HY, SC…) ---
mask_parent = rois["rule"].fillna("").str.startswith("__PARENT__")
if mask_parent.any():
    par_names = rois.loc[mask_parent, "cg_name"]
    m.loc[m["cg_name"].isin(par_names), "children_list"] = m["cg_name"].map(parent_children)
    m.loc[m["cg_name"].isin(par_names), "status"] = "parent"


- Overrides puntuales (CSV opcional)
   
   - Ajustes manuales reproducibles (2–3 casos rebeldes).

In [10]:
# try:
#     ov = pd.read_csv("../../data/processed/atlas/overrides.csv")  # cols: cg_name, atlas_acronym
#     ref = lk.set_index("acronym")[["id","name"]]
#     m = m.merge(ov, on="cg_name", how="left")
#     apply = m["atlas_acronym"].notna()
#     m.loc[apply, "acronym"] = m.loc[apply, "atlas_acronym"]
#     m.loc[apply, "id"]      = m.loc[apply, "acronym"].map(ref["id"])
#     m.loc[apply, "name"]    = m.loc[apply, "acronym"].map(ref["name"])
#     m.loc[apply, "status"]  = "override"
#     m = m.drop(columns=["atlas_acronym"])
# except FileNotFoundError:
#     pass


- Tabla final + guardado
   
   - Entrega única: para cada cg_name tienes acronym/id/name (si aplica) o children_list (para unión/padre), y status.

In [11]:
# etiqueta “union”/“parent” y excluidas
m["children_list"] = None
m.loc[m["cg_name"].isin(union_children.keys()), "children_list"] = m["cg_name"].map(union_children)
m.loc[m["cg_name"].isin(parent_children.keys()), "children_list"] = m["cg_name"].map(parent_children)

final = m.copy()
final.loc[final["cg_name"].isin(union_children.keys()), "status"] = "union"
final.loc[final["cg_name"].isin(parent_children.keys()),   "status"] = "parent"
final = final.sort_values(["status","cg_name"]).reset_index(drop=True)

print(final["status"].value_counts(dropna=False))
final.to_csv("../../data/processed/atlas/roi_mapping.csv", index=False)
final


status
fuzzy           41
None            14
acronym         11
parent           7
excluded         3
union            2
exact/rename     1
Name: count, dtype: int64


Unnamed: 0,cg_name,cg_norm,rule,query_name,id,acronym,name,status,children_list
0,ATN,atn,,,239.0,ATN,Anterior group of the dorsal thalamus,acronym,
1,EPI,epi,,,958.0,EPI,Epithalamus,acronym,
2,GENd,gend,,,1008.0,GENd,"Geniculate group, dorsal thalamus",acronym,
3,GENv,genv,,,1014.0,GENv,"Geniculate group, ventral thalamus",acronym,
4,ILM,ilm,,,51.0,ILM,Intralaminar nuclei of the dorsal thalamus,acronym,
...,...,...,...,...,...,...,...,...,...
74,Parietal association cortex,parietal association cortex,PTLp,ptlp,,,,,
75,Pontine nuclei,pontine nuclei,PG,pg,,,,,
76,Retrosplenial dysgranular area,retrosplenial dysgranular area,RSPd,rspd,,,,,
77,Ventral orbital area,ventral orbital area,ORBv,orbv,,,,,


In [12]:
final[final["status"].isna()]

Unnamed: 0,cg_name,cg_norm,rule,query_name,id,acronym,name,status,children_list
65,"Amygdaloid area, unspecified","amygdaloid area, unspecified",,,,,,,
66,Cingulate area 1,cingulate area 1,,,,,,,
67,Dorsolateral orbital area,dorsolateral orbital area,ORBvl,orbvl,,,,,
68,Frontal association area 3,frontal association area 3,,,,,,,
69,Frontal association cortex,frontal association cortex,FRP,frp,,,,,
70,Hippocampus,hippocampus,HIP,hip,,,,,
71,Lateral entorhinal cortex,lateral entorhinal cortex,ENTl,entl,,,,,
72,Medial entorhinal cortex,medial entorhinal cortex,ENTm,entm,,,,,
73,Nucleus of the stria medullaris,nucleus of the stria medullaris,,,,,,,
74,Parietal association cortex,parietal association cortex,PTLp,ptlp,,,,,


In [21]:
print("Total filas:", len(m))
print("cg_name únicos:", m["cg_name"].nunique())
print("¿Duplicados cg_name?:", m["cg_name"].duplicated().any())
print(m["status"].value_counts(dropna=False))


Total filas: 79
cg_name únicos: 79
¿Duplicados cg_name?: False
status
fuzzy           41
None            14
acronym         11
parent           7
excluded         3
union            2
exact/rename     1
Name: count, dtype: int64


In [22]:
dbg = m[m["query_name"].fillna("") == ""]
print("query_name vacío:", len(dbg))
print(dbg[["cg_name","rule","cg_norm","query_name"]].head(10))


query_name vacío: 54
                       cg_name  rule                     cg_norm query_name
5            Lateral lemniscus  None           lateral lemniscus           
6          Inferior colliculus  None         inferior colliculus           
7      Secondary auditory area  None     secondary auditory area           
8              Piriform cortex  None             piriform cortex           
10                Zona incerta  None                zona incerta           
11    Agranular insular cortex  None    agranular insular cortex           
12  Primary somatosensory area  None  primary somatosensory area           
15                         EPI  None                         epi           
16                         LAT  None                         lat           
17                         MED  None                         med           


In [23]:
mask_rule_acr = rois["rule"].fillna("").str.match(r"^[A-Za-z]{2,6}([0-9/]+)?$")
sus = m[mask_rule_acr.values & m["id"].isna()]
print("Reglas con acrónimo sin id:", len(sus))
print(sus[["cg_name","rule","query_name"]].head(10))


Reglas con acrónimo sin id: 10
                           cg_name   rule query_name
13     Parietal association cortex   PTLp       ptlp
29                     Hippocampus    HIP        hip
38                  Pontine nuclei     PG         pg
40      Frontal association cortex    FRP        frp
48        Medial entorhinal cortex   ENTm       entm
49       Lateral entorhinal cortex   ENTl       entl
51                Ventral pallidum     VP         vp
55  Retrosplenial dysgranular area   RSPd       rspd
75       Dorsolateral orbital area  ORBvl      orbvl
76            Ventral orbital area   ORBv       orbv


In [24]:
bad_parent_union = m[m["status"].isin(["parent","union"]) & m["children_list"].isna()]
print("Parents/Union sin children_list:", len(bad_parent_union))
print(bad_parent_union[["cg_name","rule","status"]])


Parents/Union sin children_list: 9
                                 cg_name                                rule  \
1                           White matter              __UNION__:white matter   
2                             Ventricles                __UNION__:ventricles   
3                    Superior colliculus      __PARENT__:superior colliculus   
9                       Substantia nigra         __PARENT__:substantia nigra   
14                 Secondary visual area         __PARENT__:secondary visual   
32                      Cingulate area 2  __PARENT__:anterior cingulate area   
34                         Septal region           __PARENT__:septal complex   
36      Hypothalamic region, unspecified             __PARENT__:hypothalamus   
54  Ventral striatal region, unspecified         __PARENT__:ventral striatum   

    status  
1    union  
2    union  
3   parent  
9   parent  
14  parent  
32  parent  
34  parent  
36  parent  
54  parent  


In [25]:
mask_siglas = m["cg_name"].str.match(r"^[A-Za-z]{2,6}([0-9/]+)?$")
print("Siglas sin id:", m[mask_siglas & m["id"].isna()][["cg_name","rule","query_name"]].head(15))


Siglas sin id: Empty DataFrame
Columns: [cg_name, rule, query_name]
Index: []


In [26]:
rename_keys = pd.Series(list(RENAMES.keys())).str.strip().str.lower().unique()
has_rule_none = rois["rule"].isna() & rois["cg_norm"].isin(rename_keys)
print("Claves RENAMES no aplicadas (debería 0):", has_rule_none.sum())
print(rois.loc[has_rule_none, ["cg_name","cg_norm"]].head(10))


Claves RENAMES no aplicadas (debería 0): 0
Empty DataFrame
Columns: [cg_name, cg_norm]
Index: []


In [27]:
sus_fuzzy = m[(m["status"]=="fuzzy") & (rois["rule"].notna().values | mask_siglas.values)]
print("Fuzzy sospechosos:", len(sus_fuzzy))
print(sus_fuzzy[["cg_name","rule","query_name","name"]].head(10))


Fuzzy sospechosos: 2
                 cg_name  rule query_name                   name
74  Lateral orbital area  ORBl       orbl    Lateral visual area
77   Medial orbital area  ORBm       orbm  Medial pretectal area


In [28]:
esperados_parent = set(parent_children.keys())
esperados_union  = set(union_children.keys())
print("Parents faltantes:", esperados_parent - set(m.loc[m["status"]=="parent","cg_name"]))
print("Unions faltantes:",  esperados_union  - set(m.loc[m["status"]=="union","cg_name"]))


Parents faltantes: {'secondary visual area', 'septal region', 'ventral striatal region, unspecified', 'superior colliculus', 'substantia nigra', 'cingulate area 2', 'hypothalamic region, unspecified'}
Unions faltantes: {'ventricles', 'white matter'}


In [29]:
rest = m[m["status"].isna()][["cg_name","cg_norm","rule"]]
print("Pendientes (None):", len(rest))
print(rest.head(15))


Pendientes (None): 14
                            cg_name                          cg_norm   rule
13      Parietal association cortex      parietal association cortex   PTLp
29                      Hippocampus                      hippocampus    HIP
38                   Pontine nuclei                   pontine nuclei     PG
40       Frontal association cortex       frontal association cortex    FRP
41  Nucleus of the stria medullaris  nucleus of the stria medullaris   None
48         Medial entorhinal cortex         medial entorhinal cortex   ENTm
49        Lateral entorhinal cortex        lateral entorhinal cortex   ENTl
51                 Ventral pallidum                 ventral pallidum     VP
55   Retrosplenial dysgranular area   retrosplenial dysgranular area   RSPd
63     Amygdaloid area, unspecified     amygdaloid area, unspecified   None
69                 Cingulate area 1                 cingulate area 1   None
72       Frontal association area 3       frontal association area

Con esto, ya puedes:

pintar pares: si children_list no es None, añades todas esas subregiones; si no, añades la región por acronym.

excluir: status == "excluded".

trazabilidad: queda en roi_mapping.csv con estado y fuente.

In [30]:
pend = m[m["status"].isna()][["cg_name","cg_norm"]]
for _, r in pend.head(10).iterrows():
    sub = lk[lk["name_norm"].str.contains(r.cg_norm[:8], na=False)]
    print(r.cg_name, "→ candidatos:", sub[["acronym","name"]].head(3).to_dict("records"))

Parietal association cortex → candidatos: [{'acronym': 'PTLp', 'name': 'Posterior parietal association areas'}]
Hippocampus → candidatos: [{'acronym': 'HPF', 'name': 'Hippocampal formation'}, {'acronym': 'HIP', 'name': 'Hippocampal region'}, {'acronym': 'RHP', 'name': 'Retrohippocampal region'}]
Pontine nuclei → candidatos: [{'acronym': 'PPN', 'name': 'Pedunculopontine nucleus'}, {'acronym': 'PCG', 'name': 'Pontine central gray'}, {'acronym': 'PG', 'name': 'Pontine gray'}]
Frontal association cortex → candidatos: [{'acronym': 'FRP', 'name': 'Frontal pole, cerebral cortex'}, {'acronym': 'FRP1', 'name': 'Frontal pole, layer 1'}, {'acronym': 'FRP2/3', 'name': 'Frontal pole, layer 2/3'}]
Nucleus of the stria medullaris → candidatos: [{'acronym': 'NLOT', 'name': 'Nucleus of the lateral olfactory tract'}, {'acronym': 'NLOT1', 'name': 'Nucleus of the lateral olfactory tract, molecular layer'}, {'acronym': 'NLOT2', 'name': 'Nucleus of the lateral olfactory tract, pyramidal layer'}]
Medial ento