NForge

Multimodal fMRI Brain Encoding — predict cortical surface activity from text, audio, and video.

NForge predicts human fMRI responses to naturalistic multimodal stimuli, enabling research into how the brain integrates language, sound, and vision simultaneously.

What NForge Does

The brain doesn't process language, sound, and video in isolation — it integrates them into a unified perceptual experience. NForge models this process by:

Accepting any combination of text, audio, or video as input
Extracting deep multimodal features using state-of-the-art foundation models (LLaMA 3.2, V-JEPA2, Wav2Vec-BERT)
Predicting cortical surface fMRI responses via a Transformer-based encoding model
Projecting predictions onto the fsaverage5 brain mesh (~10,000 vertices per hemisphere) for interpretable visualisation

What sets NForge apart from TRIBE v2

Feature	TRIBE v2	NForge
Package layout	Flat module	`src/` layout with subpackages
ROI attention maps	✗	✓ Which brain regions attend to which moments
Real-time streaming	✗	✓ Sliding-window prediction from live feature streams
Modality attribution	✗	✓ Per-vertex text / audio / video importance scores
Cross-subject generalisation	✗	✓ Few-shot subject adaptation via ridge regression
`torch.compile` support	✗	✓ Optional backbone compilation for faster training
Memory management	Basic	Explicit GC after each study load
Test coverage	✗	✓ Unit tests for model, inference, and streaming

Installation

# Core (inference only)
pip install nforge

# With training support (PyTorch Lightning, WandB)
pip install "nforge[training]"

# With brain visualisation (nilearn, PyVista)
pip install "nforge[plotting]"

# With streaming support
pip install "nforge[streaming]"

# Everything
pip install "nforge[training,plotting,streaming,attribution]"

# Development
pip install "nforge[dev]"

Quick Start

Inference

from nforge import NForgeModel

# Load pretrained model from HuggingFace Hub or a local checkpoint directory
model = NForgeModel.from_pretrained(
    "facebook/tribev2",          # or "/path/to/local/checkpoint"
    cache_folder="./nforge_cache",
    device="auto",               # "cuda" if available, else "cpu"
)

# Build events from any of: text, audio, or video
events = model.get_events_dataframe(video_path="movie_clip.mp4")
# events = model.get_events_dataframe(audio_path="podcast.wav")
# events = model.get_events_dataframe(text_path="story.txt")

# Predict fMRI responses
preds, segments = model.predict(events)
# preds: np.ndarray of shape (n_segments, n_vertices)

New Features

ROI Attention Maps

Understand which temporal windows most strongly drove each brain region:

from nforge.core.attention import AttentionExtractor, attention_to_roi_scores
from nforge.data.loader import get_hcp_labels
from nforge.viz.roi_maps import plot_roi_attention
import torch

roi_indices = get_hcp_labels(mesh="fsaverage5")   # HCP MMP1.0 parcellation

loader = model.data.get_loaders(events=events, split_to_build="all")["all"]
batch = next(iter(loader)).to(model._model.device)

with torch.inference_mode():
    _, attn_maps = model._model(batch, return_attn=True)

roi_scores = attention_to_roi_scores(attn_maps, roi_indices)
# roi_scores: {"V1": np.ndarray(T,), "MT+": ..., ...}

fig = plot_roi_attention(roi_scores, mesh="fsaverage5", views=["left", "right"])
fig.savefig("roi_attention.png")

Streaming Prediction

Run predictions from a live feature stream without pre-loading the full clip:

from nforge.inference.streaming import StreamingPredictor

sp = StreamingPredictor.from_nforge_model(
    model,
    window_trs=40,       # context window length
    step_trs=1,          # emit every TR
    tr_seconds=1.0,
    device="cuda",
)

# Push pre-extracted feature tensors one TR at a time
for tr_features in my_live_extractor():
    # features: {"audio": tensor(n_layers, D), "video": ..., "text": ...}
    pred = sp.push_frame(features=tr_features)
    if pred is not None:
        # pred: np.ndarray of shape (n_vertices,) — current TR's cortical activity
        visualise_brain(pred)

# Flush remaining predictions
final_preds = sp.flush()

Note: Streaming operates at the feature level. The caller must provide pre-extracted feature tensors from running extractor models (e.g. Wav2Vec2, V-JEPA2, LLaMA).

Modality Importance Scores

Find out how much text, audio, and video each contributed to predictions at each vertex:

from nforge.inference.attribution import ModalityAttributor
from nforge.data.loader import get_hcp_labels

roi_indices = get_hcp_labels(mesh="fsaverage5")

attributor = ModalityAttributor(
    model._model,
    method="ablation",      # or "gradient" for integrated gradients
    roi_indices=roi_indices,
)

scores = attributor.attribute(batch)
# scores["text"]:    np.ndarray(n_vertices,) — text importance per vertex
# scores["audio"]:   np.ndarray(n_vertices,)
# scores["video"]:   np.ndarray(n_vertices,)
# scores["text_roi"]: {"V1": 0.42, "MT+": 0.18, ...}  — ROI summaries

print("Top text-driven vertices:", scores["text"].argsort()[-5:])

Methods:

"ablation" — compares predictions with each modality zeroed out. Fast and intuitive.
"gradient" — integrated gradients over 5 interpolation steps. More faithful.

Cross-Subject Generalisation

Adapt the model to a new, unseen subject from a small calibration set:

from nforge.core.subject import SubjectAdapter

# Option 1: Ridge regression (recommended) — fits a new predictor head
adapter = SubjectAdapter.from_ridge(
    model=model._model,
    calibration_loader=calibration_loader,   # DataLoader with new-subject fMRI
    regularization=1e-3,
    device="cuda",
)
new_subject_id = adapter.inject_into_model(model._model)

# Option 2: Nearest-neighbour (zero-shot, no fitting)
adapter = SubjectAdapter.from_nearest_neighbor(
    model=model._model,
    calibration_loader=calibration_loader,
)
new_subject_id = adapter.inject_into_model(model._model)

print(f"New subject registered as subject_id = {new_subject_id}")

Architecture

Input stimuli (text / audio / video)
        │
        ▼
Foundation model extractors
  ├── Text:  LLaMA 3.2-3B  (layers: 0, 0.2, 0.4, 0.6, 0.8, 1.0)
  ├── Audio: Wav2Vec-BERT   (layers: 0.75, 1.0)
  └── Video: V-JEPA2-ViT-G  (layers: 0.75, 1.0)
        │
        ▼
Per-modality MLP projectors
  → concatenate / sum / stack  (configurable)
        │
        ▼
Combiner MLP  (optional)
        │
        ▼
Temporal positional embeddings
        │
        ▼
Transformer Encoder  (8 layers, self-attention over time)
        │
        ▼
Subject-specific linear head  (SubjectLayers)
        │
        ▼
AdaptiveAvgPool1d  → output_timesteps
        │
        ▼
Cortical surface predictions  (fsaverage5, ~20k vertices bilateral)

Key design choices:

Layer-wise aggregation: features from multiple Transformer depth levels are concatenated and jointly projected, capturing both low-level and high-level representations.
Subject layers: each training subject has its own linear prediction head, capturing individual anatomical differences.
Hemodynamic offset: fMRI features are offset by 5 TRs (~5 s) to account for the haemodynamic response function.

Training

Prerequisites

SLURM cluster with GPU access
Datasets downloaded and accessible at $DATAPATH
Output directory at $SAVEPATH

Environment setup

export DATAPATH=/path/to/neuroimaging/data
export SAVEPATH=/path/to/output
export SLURM_PARTITION=your_gpu_partition
export WANDB_ENTITY=your_wandb_entity    # optional

Supported datasets

Dataset	Subjects	Stimuli	TR (s)
Algonauts2025Bold	4	TV sitcom "Friends" + movies	1.49
Wen2017	3	Short videos (11.7 s)	~2
Lahner2024Bold	10	Short videos (6.2 s)	~2
Lebel2023Bold	8	Spoken narrative (6–18 s)	~2

Running experiments

# Quick local test (3 epochs, 3 timelines, no cluster)
python -m nforge.configs.experiments.test_run

# Full cortical training on SLURM
python -m nforge.configs.experiments.cortical

Project Structure

nforge/
├── src/nforge/
│   ├── core/
│   │   ├── model.py          # FmriEncoder config + FmriEncoderModel
│   │   ├── attention.py      # AttentionExtractor + ROI attention scores
│   │   └── subject.py        # SubjectAdapter for cross-subject generalisation
│   ├── data/
│   │   ├── loader.py         # MultiStudyLoader + HCP ROI utilities
│   │   ├── transforms.py     # Event transforms
│   │   ├── fmri_utils.py     # Template spaces + NforgeSurfaceProjector
│   │   └── studies/          # Algonauts2025, Wen2017, Lahner2024, Lebel2023
│   ├── training/
│   │   ├── experiment.py     # NForgeExperiment + Data config
│   │   ├── module.py         # BrainModule (PyTorch Lightning)
│   │   └── losses.py         # PearsonLoss, WeightedMSELoss
│   ├── inference/
│   │   ├── predictor.py      # NForgeModel (from_pretrained / predict)
│   │   ├── streaming.py      # StreamingPredictor
│   │   └── attribution.py    # ModalityAttributor
│   ├── viz/
│   │   ├── cortical.py       # Nilearn cortical surface rendering
│   │   ├── subcortical.py    # Subcortical structure visualisation
│   │   └── roi_maps.py       # ROI attention map rendering
│   └── configs/
│       ├── defaults.py       # Default experiment configuration
│       └── experiments/      # test_run, cortical scripts
├── tests/
│   ├── test_model.py
│   ├── test_inference.py
│   └── test_streaming.py
├── examples/
│   └── quick_start.py
└── pyproject.toml

Citation

NForge builds on TRIBE v2. If you use it in research, please cite:

@article{dascoli2026tribe,
  title   = {A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience},
  author  = {d'Ascoli, Stéphane and others},
  journal = {arXiv},
  year    = {2026},
  url     = {https://arxiv.org/abs/2502.06808}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.omc		.omc
examples		examples
src/nforge		src/nforge
tests		tests
tribev2		tribev2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
tribe_demo.ipynb		tribe_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NForge

What NForge Does

What sets NForge apart from TRIBE v2

Installation

Quick Start

Inference

New Features

ROI Attention Maps

Streaming Prediction

Modality Importance Scores

Cross-Subject Generalisation

Architecture

Training

Prerequisites

Environment setup

Supported datasets

Running experiments

Project Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NForge

What NForge Does

What sets NForge apart from TRIBE v2

Installation

Quick Start

Inference

New Features

ROI Attention Maps

Streaming Prediction

Modality Importance Scores

Cross-Subject Generalisation

Architecture

Training

Prerequisites

Environment setup

Supported datasets

Running experiments

Project Structure

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages