# Functional Connectivity Sliding Window Preprocessing & Calculation

We are going to make "chunks" of anxiety per subject that we can then align to fMRI data and calculate the functional connectivity. In order to standardize our feature sizes we will make window lengths of fMRI_TR, give a sliding stride, and then calculate mean anixety for the label for each chunk. We will prepare them in two different ways, one that can be used for regression (predicting continuous anxiety values) and one for classification (high vs low anxiety). Note these anxiety values are already z scored. The values from the reviewers are from 0 to 100 and then consensus annotations are calculated from the z scores. We are going to add a minmaxscaled value to each of the CSVs because that is better for some neural networks. Likely regression is best to use the continuous values and then minmax is good for binning for classification. 

In [1]:
# code to add the minmaxscaler values 
import os
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

anxiety_dir = "anxiety_aligned_csv"

for filename in os.listdir(anxiety_dir):
    if filename.endswith(".csv"):
        filepath = os.path.join(anxiety_dir, filename)

        df = pd.read_csv(filepath)

        anxiety_col = df.columns[2]

        scaler = MinMaxScaler()
        df["scaled_anxiety"] = scaler.fit_transform(df[[anxiety_col]])

        df.to_csv(filepath, index=False)
        print(f"Updated: {filename}")

Updated: sub-S25_Chatter_anxiety_aligned.csv
Updated: sub-S28_LessonLearned_anxiety_aligned.csv
Updated: sub-S20_Chatter_anxiety_aligned.csv
Updated: sub-S09_Chatter_anxiety_aligned.csv
Updated: sub-S25_LessonLearned_anxiety_aligned.csv
Updated: sub-S11_LessonLearned_anxiety_aligned.csv
Updated: sub-S02_LessonLearned_anxiety_aligned.csv
Updated: sub-S10_Chatter_anxiety_aligned.csv
Updated: sub-S13_LessonLearned_anxiety_aligned.csv
Updated: sub-S27_LessonLearned_anxiety_aligned.csv
Updated: sub-S06_Chatter_anxiety_aligned.csv
Updated: sub-S15_Chatter_anxiety_aligned.csv
Updated: sub-S03_Chatter_anxiety_aligned.csv
Updated: sub-S21_Chatter_anxiety_aligned.csv
Updated: sub-S08_Chatter_anxiety_aligned.csv
Updated: sub-S29_LessonLearned_anxiety_aligned.csv
Updated: sub-S24_LessonLearned_anxiety_aligned.csv
Updated: sub-S24_Chatter_anxiety_aligned.csv
Updated: sub-S01_LessonLearned_anxiety_aligned.csv
Updated: sub-S32_Chatter_anxiety_aligned.csv
Updated: sub-S10_LessonLearned_anxiety_aligned

# Create the sliding windows 

We are going to create CSVs for each subject that says when the window starts and what the mean z scored anxiety is as well as the binned anxiety level. We will do it based on mean and median. 

In [None]:
from utils.sliding_windows import * 
import os

window_size = 60
step_size = 5 

anxiety_dir = "anxiety_aligned_csv"
output_dir = "LessonLearned_sliding_windows_csv"


for subj_num in range(1, 33):
    subj_id = f"sub-S{subj_num:02d}" 
    input_file = f"{subj_id}_LessonLearned_anxiety_aligned.csv"
    input_path = os.path.join(anxiety_dir, input_file)

    if not os.path.exists(input_path):
        print(f"Skipping {subj_id} — no file found.")
        continue

    output_path = os.path.join(output_dir, f"{subj_id}_anxiety_windows.csv")
    create_windows(
        anxiety_csv_path=input_path,
        output_csv_path=output_path,
        window_size=window_size,
        step_size=step_size
    )

DEBUG ROW: {'window_start_index': 72, 'window_start_TR': np.int64(72), 'mean_anxiety': np.float64(-0.16566439780381945), 'mean_scaled_anxiety': np.float64(0.2818266988075205), 'median_anxiety': np.float64(-0.37308565335937494), 'median_scaled_anxiety': np.float64(0.21893333805002047)}
DEBUG ROW: {'window_start_index': 77, 'window_start_TR': np.int64(77), 'mean_anxiety': np.float64(-0.1414749144704861), 'mean_scaled_anxiety': np.float64(0.2891613275452844), 'median_anxiety': np.float64(-0.36956448669270836), 'median_scaled_anxiety': np.float64(0.2200010107186715)}
DEBUG ROW: {'window_start_index': 82, 'window_start_TR': np.int64(82), 'mean_anxiety': np.float64(-0.22195144780381942), 'mean_scaled_anxiety': np.float64(0.264759587087732), 'median_anxiety': np.float64(-0.4329788200260416), 'median_scaled_anxiety': np.float64(0.2007727954962376)}
DEBUG ROW: {'window_start_index': 87, 'window_start_TR': np.int64(87), 'mean_anxiety': np.float64(-0.3024508700260417), 'mean_scaled_anxiety': np.f

It seems like it is best to bin by mean anxiety value

# Calculating functional connectivity

### ROI Time Series

We are going to start with doing this just for LessonLearned because it has better sample spreads with low vs medium vs high  

In [5]:
# Extract ROI time series for every subject 

import os
import numpy as np
from nilearn.input_data import NiftiLabelsMasker
from nilearn import datasets

bold_dir = "LessonLearned_MNI_bold"    
output_dir = "roi_timeseries"
t_r = 1.2999999523162842
os.makedirs(output_dir, exist_ok=True)

# load schaefer atlas 
atlas = datasets.fetch_atlas_schaefer_2018(n_rois=100, yeo_networks=7)
atlas_img = atlas.maps

# setup masker 
masker = NiftiLabelsMasker(labels_img=atlas_img, standardize=True, t_r=t_r)

# loop through BOLD files 
for fname in os.listdir(bold_dir):
    if "space-MNI" in fname and fname.endswith((".nii", ".nii.gz")):
        subj = fname.split("_")[0] 
        bold_path = os.path.join(bold_dir, fname)
        out_path = os.path.join(output_dir, f"{subj}.npy")

        try:
            print(f"Extracting ROI time series for {subj}")
            roi_ts = masker.fit_transform(bold_path)
            np.save(out_path, roi_ts)
            print(f"Saved: {out_path} — shape: {roi_ts.shape}")
        except Exception as e:
            print(f"Error for {subj}: {e}")

Extracting ROI time series for sub-S01
Saved: roi_timeseries/sub-S01.npy — shape: (665, 100)
Extracting ROI time series for sub-S19
Saved: roi_timeseries/sub-S19.npy — shape: (668, 100)
Extracting ROI time series for sub-S23
Saved: roi_timeseries/sub-S23.npy — shape: (668, 100)
Extracting ROI time series for sub-S20
Saved: roi_timeseries/sub-S20.npy — shape: (668, 100)
Extracting ROI time series for sub-S22
Saved: roi_timeseries/sub-S22.npy — shape: (668, 100)
Extracting ROI time series for sub-S27
Saved: roi_timeseries/sub-S27.npy — shape: (670, 100)
Extracting ROI time series for sub-S02
Saved: roi_timeseries/sub-S02.npy — shape: (668, 100)
Extracting ROI time series for sub-S05
Saved: roi_timeseries/sub-S05.npy — shape: (670, 100)
Extracting ROI time series for sub-S17
Saved: roi_timeseries/sub-S17.npy — shape: (668, 100)
Extracting ROI time series for sub-S32
Saved: roi_timeseries/sub-S32.npy — shape: (668, 100)
Extracting ROI time series for sub-S03
Saved: roi_timeseries/sub-S03.n

### Connectivity Calculation and Label Creation

In [6]:
import os
import numpy as np
import pandas as pd
from nilearn.connectome import ConnectivityMeasure

roi_dir = "roi_timeseries"  
windows_dir = "LessonLearned_sliding_windows_csv" 
fc_output_dir = "fc_matrices"  
window_size = 60  
os.makedirs(fc_output_dir, exist_ok=True)

# set up what is gonna calculate our functional connectivity 
connectivity = ConnectivityMeasure(kind="correlation")

# loop through subjects 
for fname in os.listdir(roi_dir):
    if not fname.endswith(".npy"):
        continue

    subj = fname.replace(".npy", "") 
    roi_path = os.path.join(roi_dir, fname)
    window_csv = os.path.join(windows_dir, f"{subj}_anxiety_windows.csv")

    if not os.path.exists(window_csv):
        print(f"No window file for {subj}, skipping.")
        continue

    print(f"\nProcessing {subj}")

    roi_ts = np.load(roi_path)
    windows_df = pd.read_csv(window_csv)

    fc_matrices = []
    output_labels = []

    for _, row in windows_df.iterrows():
        start = int(row["window_start_index"])
        end = start + window_size

        if end > roi_ts.shape[0]:
            continue 

        window_ts = roi_ts[start:end]
        if window_ts.shape[0] != window_size:
            continue 

        # Compute FC matrix
        fc = connectivity.fit_transform([window_ts])[0]  # shape: (100, 100)
        fc_matrices.append(fc)

        # Keep label info for this window
        output_labels.append({
            "window_start_TR": row["window_start_TR"],
            "mean_scaled_anxiety": row["mean_scaled_anxiety"],
            "binned_by_mean": row["binned_by_mean"]
        })

    if len(fc_matrices) == 0:
        print(f"No valid windows for {subj}, skipping save.")
        continue

    np.save(os.path.join(fc_output_dir, f"{subj}_fc.npy"), np.array(fc_matrices))

    labels_df = pd.DataFrame(output_labels)
    labels_df.to_csv(os.path.join(fc_output_dir, f"{subj}_labels.csv"), index=False)

    print(f"Saved: {subj}_fc.npy ({len(fc_matrices)} FCs) and {subj}_labels.csv")


Processing sub-S32
Saved: sub-S32_fc.npy (91 FCs) and sub-S32_labels.csv

Processing sub-S26
Saved: sub-S26_fc.npy (91 FCs) and sub-S26_labels.csv

Processing sub-S27
Saved: sub-S27_fc.npy (91 FCs) and sub-S27_labels.csv

Processing sub-S25
Saved: sub-S25_fc.npy (91 FCs) and sub-S25_labels.csv

Processing sub-S31
Saved: sub-S31_fc.npy (91 FCs) and sub-S31_labels.csv

Processing sub-S19
Saved: sub-S19_fc.npy (91 FCs) and sub-S19_labels.csv

Processing sub-S30
Saved: sub-S30_fc.npy (91 FCs) and sub-S30_labels.csv

Processing sub-S24
Saved: sub-S24_fc.npy (91 FCs) and sub-S24_labels.csv

Processing sub-S08
Saved: sub-S08_fc.npy (91 FCs) and sub-S08_labels.csv

Processing sub-S20
Saved: sub-S20_fc.npy (91 FCs) and sub-S20_labels.csv

Processing sub-S21
Saved: sub-S21_fc.npy (91 FCs) and sub-S21_labels.csv

Processing sub-S09
Saved: sub-S09_fc.npy (91 FCs) and sub-S09_labels.csv

Processing sub-S23
Saved: sub-S23_fc.npy (91 FCs) and sub-S23_labels.csv

Processing sub-S22
Saved: sub-S22_fc.

In [7]:
# add a column to the labels splitting it 0-0.5 and 0.5-1 

import os 
import pandas as pd 

label_dir = "fc_matrices"

for fname in os.listdir(label_dir):
    if fname.endswith("_labels.csv"):
        fpath = os.path.join(label_dir, fname)
        df = pd.read_csv(fpath)

        # new bin column for low/high binary classification 
        df["binned_0.5"] = df["mean_scaled_anxiety"].apply(lambda x: "low" if x < 0.5 else "high")

        # save file 
        df.to_csv(fpath, index=False)
        print(f"Updated {fname}")


Updated sub-S05_labels.csv
Updated sub-S32_labels.csv
Updated sub-S19_labels.csv
Updated sub-S08_labels.csv
Updated sub-S23_labels.csv
Updated sub-S14_labels.csv
Updated sub-S11_labels.csv
Updated sub-S26_labels.csv
Updated sub-S30_labels.csv
Updated sub-S07_labels.csv
Updated sub-S16_labels.csv
Updated sub-S21_labels.csv
Updated sub-S24_labels.csv
Updated sub-S13_labels.csv
Updated sub-S29_labels.csv
Updated sub-S02_labels.csv
Updated sub-S10_labels.csv
Updated sub-S27_labels.csv
Updated sub-S01_labels.csv
Updated sub-S04_labels.csv
Updated sub-S09_labels.csv
Updated sub-S22_labels.csv
Updated sub-S15_labels.csv
Updated sub-S25_labels.csv
Updated sub-S28_labels.csv
Updated sub-S03_labels.csv
Updated sub-S31_labels.csv
Updated sub-S06_labels.csv
Updated sub-S17_labels.csv
Updated sub-S20_labels.csv
