From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models

Description

This repository contains the official implementation of the ACL 2026 Findings paper:

From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models
Parts of our code are adapted from V-SEAM and Logit Lens. Thanks to the authors for their great work!

We introduce HONES, a Head-Oriented Neuron Explanation and Steering framework for causal interpretability in multi-task Vision–Language Models (VLMs). HONES identifies task-critical FFN neurons by ranking their causal write-in contributions conditioned on task-relevant attention heads, and further supports lightweight neuron-level steering for task improvement. Through large-scale analysis, we find that task-critical neurons show distinct layer preferences across tasks, while shared neurons—particularly those overlapping with VQA—play a prominent role in cross-task generalization. Building on these insights, we develop a sparse neuron scaling method that steers key neurons, leading to consistent performance gains on both LLaVA and Qwen across four diverse multimodal tasks. The figure below demonstrates our framework.

Requirements

Python 3.11.10
PyTorch 2.2.1+cu121 (CUDA 12.1)

To install all dependencies:

pip install -r requirements.txt

Critical Head Localization

This step identifies task-critical attention heads through causal head intervention. These heads are used as routing signals for downstream neuron attribution.

For LLAVA:

python scripts/run_localization.py \
  --model llava \
  --task vqa \
  --config configs/llava_hones.yaml

For Qwen2.5-VL:

python scripts/run_localization.py \
  --model qwen \
  --task vqa \
  --config configs/qwen_hones.yaml

Head-Conditioned Neuron Attribution

This step ranks FFN neurons by their readout-aligned write-in contribution conditioned on the localized task-critical heads.

python scripts/run_localization.py \
  --model llava \
  --task vqa \
  --config configs/llava_hones.yaml \
  --stage neurons

Supported tasks:

vqa
ocr
caption
retrieval

Neuron Masking Evaluation

This step verifies the causal importance of selected neurons by masking them and measuring task performance degradation.

python scripts/run_masking_eval.py \
  --model llava \
  --task vqa \
  --config configs/llava_hones.yaml

Lightweight Neuron Steering

This step freezes the VLM backbone and learns sparse neuron-wise scaling factors on the identified task-critical neurons.

Run HONES steering:

For LLAVA:

python scripts/run_steering.py \
  --model llava \
  --task vqa \
  --dataset coco \
  --config configs/llava_hones.yaml

For Qwen2.5-VL:

python scripts/run_steering.py \
  --model qwen \
  --task caption \
  --dataset coco \
  --config configs/qwen_hones.yaml

Cite

If you find HONES useful for your research, please consider citing our work:

@inproceedings{wang-etal-2026-hones,
  title     = {From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models},
  author    = {Wang, Qidong  and  Hu, Junjie  and  Jiang, Ming},
  booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics},
  year      = {2026},
  publisher = {Association for Computational Linguistics},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data		Data
Image		Image
Neuron Identification		Neuron Identification
Neuron Steering		Neuron Steering
configs		configs
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models

Description

Get Start

Requirements

Critical Head Localization

Head-Conditioned Neuron Attribution

Neuron Masking Evaluation

Lightweight Neuron Steering

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models

Description

Get Start

Requirements

Critical Head Localization

Head-Conditioned Neuron Attribution

Neuron Masking Evaluation

Lightweight Neuron Steering

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages