Skip to content

kairowandev/NForge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NForge

Multimodal fMRI Brain Encoding — predict cortical surface activity from text, audio, and video.

NForge predicts human fMRI responses to naturalistic multimodal stimuli, enabling research into how the brain integrates language, sound, and vision simultaneously.


What NForge Does

The brain doesn't process language, sound, and video in isolation — it integrates them into a unified perceptual experience. NForge models this process by:

  1. Accepting any combination of text, audio, or video as input
  2. Extracting deep multimodal features using state-of-the-art foundation models (LLaMA 3.2, V-JEPA2, Wav2Vec-BERT)
  3. Predicting cortical surface fMRI responses via a Transformer-based encoding model
  4. Projecting predictions onto the fsaverage5 brain mesh (~10,000 vertices per hemisphere) for interpretable visualisation

What sets NForge apart from TRIBE v2

Feature TRIBE v2 NForge
Package layout Flat module src/ layout with subpackages
ROI attention maps ✓ Which brain regions attend to which moments
Real-time streaming ✓ Sliding-window prediction from live feature streams
Modality attribution ✓ Per-vertex text / audio / video importance scores
Cross-subject generalisation ✓ Few-shot subject adaptation via ridge regression
torch.compile support ✓ Optional backbone compilation for faster training
Memory management Basic Explicit GC after each study load
Test coverage ✓ Unit tests for model, inference, and streaming

Installation

# Core (inference only)
pip install nforge

# With training support (PyTorch Lightning, WandB)
pip install "nforge[training]"

# With brain visualisation (nilearn, PyVista)
pip install "nforge[plotting]"

# With streaming support
pip install "nforge[streaming]"

# Everything
pip install "nforge[training,plotting,streaming,attribution]"

# Development
pip install "nforge[dev]"

Quick Start

Inference

from nforge import NForgeModel

# Load pretrained model from HuggingFace Hub or a local checkpoint directory
model = NForgeModel.from_pretrained(
    "facebook/tribev2",          # or "/path/to/local/checkpoint"
    cache_folder="./nforge_cache",
    device="auto",               # "cuda" if available, else "cpu"
)

# Build events from any of: text, audio, or video
events = model.get_events_dataframe(video_path="movie_clip.mp4")
# events = model.get_events_dataframe(audio_path="podcast.wav")
# events = model.get_events_dataframe(text_path="story.txt")

# Predict fMRI responses
preds, segments = model.predict(events)
# preds: np.ndarray of shape (n_segments, n_vertices)

New Features

ROI Attention Maps

Understand which temporal windows most strongly drove each brain region:

from nforge.core.attention import AttentionExtractor, attention_to_roi_scores
from nforge.data.loader import get_hcp_labels
from nforge.viz.roi_maps import plot_roi_attention
import torch

roi_indices = get_hcp_labels(mesh="fsaverage5")   # HCP MMP1.0 parcellation

loader = model.data.get_loaders(events=events, split_to_build="all")["all"]
batch = next(iter(loader)).to(model._model.device)

with torch.inference_mode():
    _, attn_maps = model._model(batch, return_attn=True)

roi_scores = attention_to_roi_scores(attn_maps, roi_indices)
# roi_scores: {"V1": np.ndarray(T,), "MT+": ..., ...}

fig = plot_roi_attention(roi_scores, mesh="fsaverage5", views=["left", "right"])
fig.savefig("roi_attention.png")

Streaming Prediction

Run predictions from a live feature stream without pre-loading the full clip:

from nforge.inference.streaming import StreamingPredictor

sp = StreamingPredictor.from_nforge_model(
    model,
    window_trs=40,       # context window length
    step_trs=1,          # emit every TR
    tr_seconds=1.0,
    device="cuda",
)

# Push pre-extracted feature tensors one TR at a time
for tr_features in my_live_extractor():
    # features: {"audio": tensor(n_layers, D), "video": ..., "text": ...}
    pred = sp.push_frame(features=tr_features)
    if pred is not None:
        # pred: np.ndarray of shape (n_vertices,) — current TR's cortical activity
        visualise_brain(pred)

# Flush remaining predictions
final_preds = sp.flush()

Note: Streaming operates at the feature level. The caller must provide pre-extracted feature tensors from running extractor models (e.g. Wav2Vec2, V-JEPA2, LLaMA).


Modality Importance Scores

Find out how much text, audio, and video each contributed to predictions at each vertex:

from nforge.inference.attribution import ModalityAttributor
from nforge.data.loader import get_hcp_labels

roi_indices = get_hcp_labels(mesh="fsaverage5")

attributor = ModalityAttributor(
    model._model,
    method="ablation",      # or "gradient" for integrated gradients
    roi_indices=roi_indices,
)

scores = attributor.attribute(batch)
# scores["text"]:    np.ndarray(n_vertices,) — text importance per vertex
# scores["audio"]:   np.ndarray(n_vertices,)
# scores["video"]:   np.ndarray(n_vertices,)
# scores["text_roi"]: {"V1": 0.42, "MT+": 0.18, ...}  — ROI summaries

print("Top text-driven vertices:", scores["text"].argsort()[-5:])

Methods:

  • "ablation" — compares predictions with each modality zeroed out. Fast and intuitive.
  • "gradient" — integrated gradients over 5 interpolation steps. More faithful.

Cross-Subject Generalisation

Adapt the model to a new, unseen subject from a small calibration set:

from nforge.core.subject import SubjectAdapter

# Option 1: Ridge regression (recommended) — fits a new predictor head
adapter = SubjectAdapter.from_ridge(
    model=model._model,
    calibration_loader=calibration_loader,   # DataLoader with new-subject fMRI
    regularization=1e-3,
    device="cuda",
)
new_subject_id = adapter.inject_into_model(model._model)

# Option 2: Nearest-neighbour (zero-shot, no fitting)
adapter = SubjectAdapter.from_nearest_neighbor(
    model=model._model,
    calibration_loader=calibration_loader,
)
new_subject_id = adapter.inject_into_model(model._model)

print(f"New subject registered as subject_id = {new_subject_id}")

Architecture

Input stimuli (text / audio / video)
        │
        ▼
Foundation model extractors
  ├── Text:  LLaMA 3.2-3B  (layers: 0, 0.2, 0.4, 0.6, 0.8, 1.0)
  ├── Audio: Wav2Vec-BERT   (layers: 0.75, 1.0)
  └── Video: V-JEPA2-ViT-G  (layers: 0.75, 1.0)
        │
        ▼
Per-modality MLP projectors
  → concatenate / sum / stack  (configurable)
        │
        ▼
Combiner MLP  (optional)
        │
        ▼
Temporal positional embeddings
        │
        ▼
Transformer Encoder  (8 layers, self-attention over time)
        │
        ▼
Subject-specific linear head  (SubjectLayers)
        │
        ▼
AdaptiveAvgPool1d  → output_timesteps
        │
        ▼
Cortical surface predictions  (fsaverage5, ~20k vertices bilateral)

Key design choices:

  • Layer-wise aggregation: features from multiple Transformer depth levels are concatenated and jointly projected, capturing both low-level and high-level representations.
  • Subject layers: each training subject has its own linear prediction head, capturing individual anatomical differences.
  • Hemodynamic offset: fMRI features are offset by 5 TRs (~5 s) to account for the haemodynamic response function.

Training

Prerequisites

  • SLURM cluster with GPU access
  • Datasets downloaded and accessible at $DATAPATH
  • Output directory at $SAVEPATH

Environment setup

export DATAPATH=/path/to/neuroimaging/data
export SAVEPATH=/path/to/output
export SLURM_PARTITION=your_gpu_partition
export WANDB_ENTITY=your_wandb_entity    # optional

Supported datasets

Dataset Subjects Stimuli TR (s)
Algonauts2025Bold 4 TV sitcom "Friends" + movies 1.49
Wen2017 3 Short videos (11.7 s) ~2
Lahner2024Bold 10 Short videos (6.2 s) ~2
Lebel2023Bold 8 Spoken narrative (6–18 s) ~2

Running experiments

# Quick local test (3 epochs, 3 timelines, no cluster)
python -m nforge.configs.experiments.test_run

# Full cortical training on SLURM
python -m nforge.configs.experiments.cortical

Project Structure

nforge/
├── src/nforge/
│   ├── core/
│   │   ├── model.py          # FmriEncoder config + FmriEncoderModel
│   │   ├── attention.py      # AttentionExtractor + ROI attention scores
│   │   └── subject.py        # SubjectAdapter for cross-subject generalisation
│   ├── data/
│   │   ├── loader.py         # MultiStudyLoader + HCP ROI utilities
│   │   ├── transforms.py     # Event transforms
│   │   ├── fmri_utils.py     # Template spaces + NforgeSurfaceProjector
│   │   └── studies/          # Algonauts2025, Wen2017, Lahner2024, Lebel2023
│   ├── training/
│   │   ├── experiment.py     # NForgeExperiment + Data config
│   │   ├── module.py         # BrainModule (PyTorch Lightning)
│   │   └── losses.py         # PearsonLoss, WeightedMSELoss
│   ├── inference/
│   │   ├── predictor.py      # NForgeModel (from_pretrained / predict)
│   │   ├── streaming.py      # StreamingPredictor
│   │   └── attribution.py    # ModalityAttributor
│   ├── viz/
│   │   ├── cortical.py       # Nilearn cortical surface rendering
│   │   ├── subcortical.py    # Subcortical structure visualisation
│   │   └── roi_maps.py       # ROI attention map rendering
│   └── configs/
│       ├── defaults.py       # Default experiment configuration
│       └── experiments/      # test_run, cortical scripts
├── tests/
│   ├── test_model.py
│   ├── test_inference.py
│   └── test_streaming.py
├── examples/
│   └── quick_start.py
└── pyproject.toml

Citation

NForge builds on TRIBE v2. If you use it in research, please cite:

@article{dascoli2026tribe,
  title   = {A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience},
  author  = {d'Ascoli, Stéphane and others},
  journal = {arXiv},
  year    = {2026},
  url     = {https://arxiv.org/abs/2502.06808}
}

About

Predict how the human brain responds to multimodal stimuli - Built on Meta's TRIBEv2 with 30% increased accuracy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors