Skip to content

jinglin-liang/OLAS

Repository files navigation

Order-Level Attention Similarity Across Language Models: A Latent Commonality

Paper | Poster

Order-Level Attention Similarity Across Language Models: A Latent Commonality
Jinglin Liang, Jin Zhong, Shuangping Huang*, Yunqing Hu, Huiyuan Zhang, Huifang Li, Lixin Fan, Hanlin Gu

Abstract:
In this paper, we explore an important yet previously neglected question: Do context aggregation patterns across Language Models (LMs) share commonalities? While some works have investigated context aggregation or attention weights in LMs, they typically focus on individual models or attention heads, lacking a systematic analysis across multiple LMs to explore their commonalities. In contrast, we focus on the commonalities among LMs, which can deepen our nderstanding of LMs and even facilitate cross-model knowledge transfer. In this work, we introduce the Order-Level Attention (OLA) derived from the order-wise decomposition of Attention Rollout and reveal that the OLA at the same order across LMs exhibits significant similarities. Furthermore, we discover an implicit mapping between OLA and syntactic knowledge. Based on these two findings, we propose the Transferable OLA Adapter (TOA), a training-free cross-LM adapter transfer method. Specifically, we treat the OLA as a unified syntactic feature representation and train an adapter that takes OLA as input. Due to the similarities in OLA across LMs, the adapter generalizes to unseen LMs without requiring any parameter updates. Extensive experiments demonstrate that TOA’s cross-LM generalization effectively enhances the performance of unseen LMs. Code is available at https://github.com/jinglin-liang/OLAS.

📢 Description

This repository is the official PyTorch implementation of:

Order-Level Attention Similarity Across Language Models: A Latent Commonality (NeurIPS 2025).

🔨 Requirement

Environment

pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

Prepare Data

Create a folder named 'datasets', download the dataset, and place it under the file structure as shown below.

datasets
├── UD_English-EWT
├── conll2000
├── conll2012
└── sem_eval_2010_task_8

Pre-trained Model

Download the pretrained models and place them in the newly created "pretrained_models" folder.

pretrained_models
├── bert-base-cased
├── bert-large-cased
├── roberta-base
├── roberta-large
├── electra-base-generator
├── electra-large-generator
├── gemma-2-2b-it
├── gemma-2-9b-it
├── Qwen2-1.5B-Instruct
├── Qwen2-7B-Instruct
├── Llama-3.1-8B-Instruct
└── Llama-3.2-3B-Instruct

📺 Qualitative Empirical Evidence of OLAS

Run the code below to visualize OLA. An example of the visualization result is shown in the figure below.

python main.py configs/visual_clms.json

🍔 Quantitative Empirical Evidence of OLAS

Quantitative Analysis Based on Visual Model

First, run the following code to generate the OLA data for quantitatively evaluating cross-model similarity.

  • CLMs
python main.py configs/gendata_clm_conll2012en_entity.json --do_classify_data_generate
  • MLMs
python main.py configs/gendata_mlm_conll2012en_entity.json --do_classify_data_generate

Then, run the following code to train the visual model and perform the objective similarity evaluation. The 'train_model_names' and 'test_model_name' parameters allow you to specify the source model(s) and the target model(s), respectively.

  • CLMs
python olas_visual_model.py \
  --train_model_names \
  Qwen2-1.5B-Instruct Qwen2-7B-Instruct Llama-3.2-3B-Instruct Llama-3.1-8B-Instruct \
  --test_model_names \
  gemma-2-2b-it gemma-2-9b-it
  • MLMs
python olas_visual_model.py \
  --train_model_names \
  bert-base-cased bert-large-cased roberta-base roberta-large \
  --test_model_names \
  electra-base-generator electra-large-generator

Quantitative Analysis Based on Image Similarity Retrieval

  • CLMs
python olas_retrieval.py \
  --src_model Qwen2-1.5B-Instruct \
  --tgt_model Llama-3.2-3B-Instruct
  • MLMs
python olas_retrieval.py \
  --src_model bert-base-cased \
  --tgt_model electra-base-generator

🚀 Transferable OLA Adapter

Training

First, run the following code to generate the OLA dataset.

  • CLMs
python main.py configs/gendata_clm_conll2012en_entity.json
  • MLMs
python main.py configs/gendata_mlm_conll2012en_entity.json

Secondly, train ola adapter using generated ola data

  • CLMs
CUDA_VISIBLE_DEVICES=0 python main.py configs/train_qwen_1b_conll2000pos.json --use_generated_oladata true
  • MLMs
CUDA_VISIBLE_DEVICES=0 python main.py configs/train_bert_base_conll2012en_entity.json --use_generated_oladata true

Testing

  • CLMs
CUDA_VISIBLE_DEVICES=0 python main.py configs/eval_clm_conll2012en_entity.json --eval_adapter_checkpoint path_to/ola_adapter_weight.bin
  • MLMs
CUDA_VISIBLE_DEVICES=0 python main.py configs/eval_mlm_conll2012en_entity.json --eval_adapter_checkpoint path_to/ola_adapter_weight.bin

❤️ Citation

If you find our work inspiring or use our codebase in your research, please cite our work:

@inproceedings{liang2025order,
  title={Order-Level Attention Similarity Across Language Models: A Latent Commonality},
  author={Liang, Jinglin and Zhong, Jin and Huang, Shuangping and Hu, Yunqing and Zhang, Huiyuan and Li, Huifang and Fan, Lixin and Gu, Hanlin},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors