Skip to content

petergit1/HONES

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models

Description

This repository contains the official implementation of the ACL 2026 Findings paper:

From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models
Parts of our code are adapted from V-SEAM and Logit Lens. Thanks to the authors for their great work!


We introduce HONES, a Head-Oriented Neuron Explanation and Steering framework for causal interpretability in multi-task Vision–Language Models (VLMs). HONES identifies task-critical FFN neurons by ranking their causal write-in contributions conditioned on task-relevant attention heads, and further supports lightweight neuron-level steering for task improvement. Through large-scale analysis, we find that task-critical neurons show distinct layer preferences across tasks, while shared neurons—particularly those overlapping with VQA—play a prominent role in cross-task generalization. Building on these insights, we develop a sparse neuron scaling method that steers key neurons, leading to consistent performance gains on both LLaVA and Qwen across four diverse multimodal tasks. The figure below demonstrates our framework.

HONES Framework

Get Start

Requirements

  • Python 3.11.10
  • PyTorch 2.2.1+cu121 (CUDA 12.1)

To install all dependencies:

pip install -r requirements.txt

Critical Head Localization

This step identifies task-critical attention heads through causal head intervention. These heads are used as routing signals for downstream neuron attribution.

For LLAVA:

python scripts/run_localization.py \
  --model llava \
  --task vqa \
  --config configs/llava_hones.yaml

For Qwen2.5-VL:

python scripts/run_localization.py \
  --model qwen \
  --task vqa \
  --config configs/qwen_hones.yaml

Head-Conditioned Neuron Attribution

This step ranks FFN neurons by their readout-aligned write-in contribution conditioned on the localized task-critical heads.

python scripts/run_localization.py \
  --model llava \
  --task vqa \
  --config configs/llava_hones.yaml \
  --stage neurons

Supported tasks:

vqa
ocr
caption
retrieval

Neuron Masking Evaluation

This step verifies the causal importance of selected neurons by masking them and measuring task performance degradation.

python scripts/run_masking_eval.py \
  --model llava \
  --task vqa \
  --config configs/llava_hones.yaml

Lightweight Neuron Steering

This step freezes the VLM backbone and learns sparse neuron-wise scaling factors on the identified task-critical neurons.

Run HONES steering:

For LLAVA:

python scripts/run_steering.py \
  --model llava \
  --task vqa \
  --dataset coco \
  --config configs/llava_hones.yaml

For Qwen2.5-VL:

python scripts/run_steering.py \
  --model qwen \
  --task caption \
  --dataset coco \
  --config configs/qwen_hones.yaml

Cite

If you find HONES useful for your research, please consider citing our work:

@inproceedings{wang-etal-2026-hones,
  title     = {From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models},
  author    = {Wang, Qidong  and  Hu, Junjie  and  Jiang, Ming},
  booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics},
  year      = {2026},
  publisher = {Association for Computational Linguistics},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages