# __Extracción señal BOLD__

Este muestra un flujo de trabajo para:

1. Se descargan los datos preprocesados por el pipeline de Athena en la zona de NYU [(imagenes disponibles aquí)](https://https://www.nitrc.org/frs/?group_id=383)
2. Preparar volúmenes de rs-fMRI preprocesados (espacio MNI).
3. Extraer la señal BOLD de las imagenes


In [1]:
from nilearn.image import resample_to_img
import nibabel as nib
import os
import glob
import pandas as pd
import torch

In [2]:
print("Torch CUDA available:", torch.cuda.is_available())

Torch CUDA available: False


In [5]:
import os
import re
import glob
import pandas as pd

# 1) Ruta padre donde tienes las carpetas part1…part4
base_dir = r"./NYU_preproc_part*"

# 2) Carga y filtra tu CSV fenotípico de NYU
pheno_csv = r"./NYU_preproc_part1/NYU/NYU_phenotypic.csv"
df_pheno  = pd.read_csv(pheno_csv)
df_pheno  = df_pheno[["ScanDir ID", "DX", "QC_Rest_1"]]
df_pheno  = df_pheno[df_pheno["QC_Rest_1"] == 1]  # solo QC=1

# 3) Busca recursivamente todos los NIfTI de resting state
#    Aquí buscamos cualquier carpeta part*, luego subcarpeta NYU, luego carpetas de sujeto
nii_paths = glob.glob(
    os.path.join(base_dir, "NYU", "*", "snw*session_1_rest_1.nii.gz"),
    recursive=True
)

FileNotFoundError: [Errno 2] No such file or directory: './NYU_preproc_part1/NYU/NYU_phenotypic.csv'

## __Carga de datos de imágenes de NYU__

0 = TDC (Typical Developmental Control), también a veces llamado TCP (Typical Control Participant).

1 = ADHD combinado (Combined Type)

2 = ADHD de tipo inatento (Predominantly Inattentive)

3 = ADHD de tipo hiperactivo/impulsivo (Predominantly Hyperactive/Impulsive)

## __Se usan imágenes preprocesadas con pipeline Athenas__

### __Variable objetivo__ :

La columna DX es la variable objetivo y codifica el diagnóstico de cada sujeto:

0 = TDC (“Typical Developmental Control”): niño sano, sin diagnóstico de TDAH.

1 = ADHD (“Attention-Deficit/Hyperactivity Disorder”): sujeto con diagnóstico de TDAH.

In [None]:
records = []
for p in nii_paths:
    subj = os.path.basename(os.path.dirname(p))
    try:
        sid = int(subj)
    except ValueError:
        continue

    # Comprueba el CSV fenotípico
    row = df_pheno[df_pheno["ScanDir ID"] == sid]
    if row.empty:
        continue

    # Verifica que sea un 4D real
    vol = nib.load(p).get_fdata()
    if vol.ndim != 4:
        print(f"SKIP (ndim={vol.ndim}):", os.path.basename(p))
        continue

    # Todo OK, guardamos
    records.append({
        "ScanDir ID": sid,
        "nii": p,
        "label_raw": int(row["DX"].iloc[0])
    })

In [None]:
df_samples = pd.DataFrame(records)
df_samples["label"] = (df_samples["label_raw"] > 0).astype(int)
df_samples.drop(columns="label_raw", inplace=True)

print("Volúmenes 4D válidos encontrados:", len(df_samples))

Volúmenes 4D válidos encontrados: 177


## __División Train-Validation__


In [None]:
from sklearn.model_selection import train_test_split
train_df, val_df = train_test_split(
    df_samples,
    test_size=0.2,
    stratify=df_samples["label"],
    random_state=42
)
print("Train:", train_df.shape, " Val:", val_df.shape)

Train: (141, 3)  Val: (36, 3)


In [None]:
print(train_df.head())
print(val_df.head())

     ScanDir ID                                                nii  label
170     8692452  .\NYU_preproc_part4\NYU\8692452\snwmrda8692452...      0
146     3243657  .\NYU_preproc_part4\NYU\3243657\snwmrda3243657...      0
157     4060823  .\NYU_preproc_part4\NYU\4060823\snwmrda4060823...      1
17        10026  .\NYU_preproc_part1\NYU\0010026\snwmrda0010026...      1
120     1934623  .\NYU_preproc_part3\NYU\1934623\snwmrda1934623...      0
     ScanDir ID                                                nii  label
116     1854959  .\NYU_preproc_part3\NYU\1854959\snwmrda1854959...      0
14        10022  .\NYU_preproc_part1\NYU\0010022\snwmrda0010022...      1
5         10008  .\NYU_preproc_part1\NYU\0010008\snwmrda0010008...      0
154     3662296  .\NYU_preproc_part4\NYU\3662296\snwmrda3662296...      0
97      1000804  .\NYU_preproc_part3\NYU\1000804\snwmrda1000804...      0


## __Preparación de las imágenes__

- **Objetivo:** a partir de los ficheros `sfnwmrda<ID>_session_1_rest_*.nii.gz` extraer ventanas de tiempo que sirvan como tokens.
- **Por qué usar ventanas:** la señal BOLD varía con el tiempo; en lugar de promediar todo el volumen (que perdería dinámica), se divide la serie en fragmentos de _W_ TRs para capturar patrones temporales.

## __Preparar Atlas__

Se extrae la información de los 18 ROIs seleccionados

In [None]:
atlas_path = r"./mis18_rois_multilabel.nii.gz"
#atlas_path = r"C:\\Users\\andre\\Downloads\\MRIcroGL_windows\\MRIcroGL\\Resources\\atlas\\aal.nii.gz"

In [None]:
atlas_img = nib.load(atlas_path)
fmri_path = train_df.iloc[0]["nii"]
fmri_img  = nib.load(fmri_path)
# Remuestrea atlas al espacio de ese fMRI
atlas_rs  = resample_to_img(atlas_img, fmri_img, interpolation="nearest")

  atlas_rs  = resample_to_img(atlas_img, fmri_img, interpolation="nearest")
  atlas_rs  = resample_to_img(atlas_img, fmri_img, interpolation="nearest")


In [None]:
atlas_data_rs = atlas_rs.get_fdata().astype(int)
print("Shape atlas remuestreado:", atlas_data_rs.shape)

Shape atlas remuestreado: (49, 58, 47)


## __Segmentar por rois y tomar señal bold__

In [None]:
import numpy as np
def zscore_per_roi(ts: np.ndarray) -> np.ndarray:

    mu  = ts.mean(axis=1, keepdims=True)
    sigma = ts.std(axis=1, keepdims=True)
    return (ts - mu) / (sigma + 1e-6)

In [None]:
import nibabel as nib
import numpy as np
import torch
from torch.utils.data import Dataset

class PreloadedTimeseriesDataset(Dataset):
    """
    Carga en RAM sólo las series ROI×T (mucho más pequeñas que los volúmenes 4D)
    y luego extrae ventanas temporalmente sin I/O de disco en cada epoch.
    """
    def __init__(self, df, atlas_data,
                 n_rois=18,
                 window_size=10,
                 stride=10,
                 transform=None,
                 pad_short=False):
        self.df          = df.reset_index(drop=True)
        self.atlas       = atlas_data.astype(int)
        self.n_rois      = n_rois
        self.window_size = window_size
        self.stride      = stride
        self.transform   = transform
        self.pad_short   = pad_short

        # Precompute voxel indices per ROI
        self.roi_idxs = {roi: np.where(self.atlas == roi)
                         for roi in range(1, n_rois+1)}

        # 1) Preload **only** the ROI time-series for each subject
        self.ts_list = []
        for path in self.df["nii"]:
            vol4d = nib.load(path).get_fdata()         # (X,Y,Z,T)
            T     = vol4d.shape[3]
            ts    = np.zeros((n_rois, T), dtype=np.float32)
            for roi, (xs, ys, zs) in self.roi_idxs.items():
                if xs.size == 0: continue
                roi_vals = vol4d[xs, ys, zs, :]        # (n_voxels, T)
                ts[roi-1,:] = roi_vals.mean(axis=0)
            self.ts_list.append(ts)                   # guardo sólo 18×T floats

        # 2) Construyo index_map de (sujeto, ventana)
        self.index_map = []
        for sid, ts in enumerate(self.ts_list):
            T = ts.shape[1]
            if T < window_size:
                if not pad_short:
                    continue
                n_wins = 1
            else:
                n_wins = (T - window_size)//stride + 1
            for w in range(n_wins):
                self.index_map.append((sid, w))

    def __len__(self):
        return len(self.index_map)

    def __getitem__(self, ix):
        sid, w = self.index_map[ix]
        ts_full = self.ts_list[sid]                 # (n_rois, T)
        start   = w * self.stride
        end     = start + self.window_size
        clip    = ts_full[:, start:end]             # (n_rois, window_size)

        L = clip.shape[1]
        if L < self.window_size and self.pad_short:
            pad_width = self.window_size - L
            clip = np.pad(clip,
                          ((0,0),(0,pad_width)),
                          mode="constant")

        if self.transform:
            clip = self.transform(clip)             # e.g. zscore_per_roi

        x = torch.from_numpy(clip.T)                # (window_size, n_rois)
        y = int(self.df.loc[sid, "label"])
        return x, y


In [None]:
ts_ds_train = PreloadedTimeseriesDataset(
    train_df, atlas_data_rs,
    n_rois=18, window_size=5, stride=5
)
ts_ds_val = PreloadedTimeseriesDataset(
    val_df, atlas_data_rs,
    n_rois=18, window_size=5, stride=5
)

In [None]:
train_ts_list  = ts_ds_train.ts_list    # lista de arrays (18, T_i)
train_labels   = ts_ds_train.df["label"].astype(int).tolist()
val_ts_list    = ts_ds_val.ts_list
val_labels     = ts_ds_val.df["label"].astype(int).tolist()

In [None]:
train_ts_list[0]

array([[ 0.04317259, -0.04954503,  0.01766688, ...,  0.05415365,
         0.13259767,  0.09897626],
       [ 0.09979406, -0.11382271, -0.17410114, ...,  0.02388576,
         0.20336084,  0.24766782],
       [-0.0934531 , -0.15966724, -0.02407487, ..., -0.1358804 ,
         0.10031461,  0.03302513],
       ...,
       [-0.02022906, -0.01039273, -0.2395961 , ...,  0.01562689,
         0.02432396,  0.1516336 ],
       [-0.05853724,  0.1764108 ,  0.08067051, ..., -0.05479547,
        -0.16696903, -0.00838481],
       [ 0.00571315,  0.14713496,  0.17206049, ...,  0.11502893,
        -0.06858445, -0.02309877]], shape=(18, 172), dtype=float32)

In [None]:
import pandas as pd
import numpy as np

# Aplanamos cada serie de (18, 172) → (3096,)
train_data = [ts.flatten() for ts in train_ts_list]
val_data   = [ts.flatten() for ts in val_ts_list]

# Agregamos la columna de la etiqueta
train_rows = [np.append(ts, label) for ts, label in zip(train_data, train_labels)]
val_rows   = [np.append(ts, label) for ts, label in zip(val_data, val_labels)]

# Creamos los DataFrames
df_train = pd.DataFrame(train_rows)
df_val   = pd.DataFrame(val_rows)

column_names = [f"col_{i}" for i in range(18 * 172)] + ["label"]
df_train.columns = column_names
df_val.columns   = column_names

# Guardamos a CSV
df_train.to_csv("train_timeseries.csv", index=False)
df_val.to_csv("val_timeseries.csv", index=False)
