This repository contains the official implementation of the ACL 2026 Findings paper:
From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models
Parts of our code are adapted from V-SEAM and Logit Lens. Thanks to the authors for their great work!
We introduce HONES, a Head-Oriented Neuron Explanation and Steering framework for causal interpretability in multi-task Vision–Language Models (VLMs). HONES identifies task-critical FFN neurons by ranking their causal write-in contributions conditioned on task-relevant attention heads, and further supports lightweight neuron-level steering for task improvement. Through large-scale analysis, we find that task-critical neurons show distinct layer preferences across tasks, while shared neurons—particularly those overlapping with VQA—play a prominent role in cross-task generalization. Building on these insights, we develop a sparse neuron scaling method that steers key neurons, leading to consistent performance gains on both LLaVA and Qwen across four diverse multimodal tasks. The figure below demonstrates our framework.
- Requirements
- Critical Head Localization
- Head-Conditioned Neuron Attribution
- Neuron Masking Evaluation
- Lightweight Neuron Steering
- Python 3.11.10
- PyTorch 2.2.1+cu121 (CUDA 12.1)
To install all dependencies:
pip install -r requirements.txtThis step identifies task-critical attention heads through causal head intervention. These heads are used as routing signals for downstream neuron attribution.
For LLAVA:
python scripts/run_localization.py \
--model llava \
--task vqa \
--config configs/llava_hones.yamlFor Qwen2.5-VL:
python scripts/run_localization.py \
--model qwen \
--task vqa \
--config configs/qwen_hones.yamlThis step ranks FFN neurons by their readout-aligned write-in contribution conditioned on the localized task-critical heads.
python scripts/run_localization.py \
--model llava \
--task vqa \
--config configs/llava_hones.yaml \
--stage neuronsSupported tasks:
vqa
ocr
caption
retrieval
This step verifies the causal importance of selected neurons by masking them and measuring task performance degradation.
python scripts/run_masking_eval.py \
--model llava \
--task vqa \
--config configs/llava_hones.yamlThis step freezes the VLM backbone and learns sparse neuron-wise scaling factors on the identified task-critical neurons.
Run HONES steering:
For LLAVA:
python scripts/run_steering.py \
--model llava \
--task vqa \
--dataset coco \
--config configs/llava_hones.yamlFor Qwen2.5-VL:
python scripts/run_steering.py \
--model qwen \
--task caption \
--dataset coco \
--config configs/qwen_hones.yamlIf you find HONES useful for your research, please consider citing our work:
@inproceedings{wang-etal-2026-hones,
title = {From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models},
author = {Wang, Qidong and Hu, Junjie and Jiang, Ming},
booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics},
year = {2026},
publisher = {Association for Computational Linguistics},
}
