Order-Level Attention Similarity Across Language Models: A Latent Commonality

Order-Level Attention Similarity Across Language Models: A Latent Commonality
Jinglin Liang, Jin Zhong, Shuangping Huang*, Yunqing Hu, Huiyuan Zhang, Huifang Li, Lixin Fan, Hanlin Gu

Abstract:
In this paper, we explore an important yet previously neglected question: Do context aggregation patterns across Language Models (LMs) share commonalities? While some works have investigated context aggregation or attention weights in LMs, they typically focus on individual models or attention heads, lacking a systematic analysis across multiple LMs to explore their commonalities. In contrast, we focus on the commonalities among LMs, which can deepen our nderstanding of LMs and even facilitate cross-model knowledge transfer. In this work, we introduce the Order-Level Attention (OLA) derived from the order-wise decomposition of Attention Rollout and reveal that the OLA at the same order across LMs exhibits significant similarities. Furthermore, we discover an implicit mapping between OLA and syntactic knowledge. Based on these two findings, we propose the Transferable OLA Adapter (TOA), a training-free cross-LM adapter transfer method. Specifically, we treat the OLA as a unified syntactic feature representation and train an adapter that takes OLA as input. Due to the similarities in OLA across LMs, the adapter generalizes to unseen LMs without requiring any parameter updates. Extensive experiments demonstrate that TOA’s cross-LM generalization effectively enhances the performance of unseen LMs. Code is available at https://github.com/jinglin-liang/OLAS.

📢 Description

This repository is the official PyTorch implementation of:

Order-Level Attention Similarity Across Language Models: A Latent Commonality (NeurIPS 2025).

🔨 Requirement

Environment

pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

Prepare Data

Create a folder named 'datasets', download the dataset, and place it under the file structure as shown below.

datasets
├── UD_English-EWT
├── conll2000
├── conll2012
└── sem_eval_2010_task_8

Pre-trained Model

Download the pretrained models and place them in the newly created "pretrained_models" folder.

pretrained_models
├── bert-base-cased
├── bert-large-cased
├── roberta-base
├── roberta-large
├── electra-base-generator
├── electra-large-generator
├── gemma-2-2b-it
├── gemma-2-9b-it
├── Qwen2-1.5B-Instruct
├── Qwen2-7B-Instruct
├── Llama-3.1-8B-Instruct
└── Llama-3.2-3B-Instruct

📺 Qualitative Empirical Evidence of OLAS

Run the code below to visualize OLA. An example of the visualization result is shown in the figure below.

python main.py configs/visual_clms.json

🍔 Quantitative Empirical Evidence of OLAS

Quantitative Analysis Based on Visual Model

First, run the following code to generate the OLA data for quantitatively evaluating cross-model similarity.

CLMs

python main.py configs/gendata_clm_conll2012en_entity.json --do_classify_data_generate

MLMs

python main.py configs/gendata_mlm_conll2012en_entity.json --do_classify_data_generate

Then, run the following code to train the visual model and perform the objective similarity evaluation. The 'train_model_names' and 'test_model_name' parameters allow you to specify the source model(s) and the target model(s), respectively.

CLMs

python olas_visual_model.py \
  --train_model_names \
  Qwen2-1.5B-Instruct Qwen2-7B-Instruct Llama-3.2-3B-Instruct Llama-3.1-8B-Instruct \
  --test_model_names \
  gemma-2-2b-it gemma-2-9b-it

MLMs

python olas_visual_model.py \
  --train_model_names \
  bert-base-cased bert-large-cased roberta-base roberta-large \
  --test_model_names \
  electra-base-generator electra-large-generator

Quantitative Analysis Based on Image Similarity Retrieval

CLMs

python olas_retrieval.py \
  --src_model Qwen2-1.5B-Instruct \
  --tgt_model Llama-3.2-3B-Instruct

MLMs

python olas_retrieval.py \
  --src_model bert-base-cased \
  --tgt_model electra-base-generator

🚀 Transferable OLA Adapter

Training

First, run the following code to generate the OLA dataset.

CLMs

python main.py configs/gendata_clm_conll2012en_entity.json

MLMs

python main.py configs/gendata_mlm_conll2012en_entity.json

Secondly, train ola adapter using generated ola data

CLMs

CUDA_VISIBLE_DEVICES=0 python main.py configs/train_qwen_1b_conll2000pos.json --use_generated_oladata true

MLMs

CUDA_VISIBLE_DEVICES=0 python main.py configs/train_bert_base_conll2012en_entity.json --use_generated_oladata true

Testing

CLMs

CUDA_VISIBLE_DEVICES=0 python main.py configs/eval_clm_conll2012en_entity.json --eval_adapter_checkpoint path_to/ola_adapter_weight.bin

MLMs

CUDA_VISIBLE_DEVICES=0 python main.py configs/eval_mlm_conll2012en_entity.json --eval_adapter_checkpoint path_to/ola_adapter_weight.bin

❤️ Citation

If you find our work inspiring or use our codebase in your research, please cite our work:

@inproceedings{liang2025order,
  title={Order-Level Attention Similarity Across Language Models: A Latent Commonality},
  author={Liang, Jinglin and Zhong, Jin and Huang, Shuangping and Hu, Yunqing and Zhang, Huiyuan and Li, Huifang and Fan, Lixin and Gu, Hanlin},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
assets		assets
configs		configs
convs		convs
data_utils		data_utils
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
olas_retrieval.py		olas_retrieval.py
olas_visual_model.py		olas_visual_model.py
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Order-Level Attention Similarity Across Language Models: A Latent Commonality

📢 Description

🔨 Requirement

Environment

Prepare Data

Pre-trained Model

📺 Qualitative Empirical Evidence of OLAS

🍔 Quantitative Empirical Evidence of OLAS

Quantitative Analysis Based on Visual Model

Quantitative Analysis Based on Image Similarity Retrieval

🚀 Transferable OLA Adapter

Training

Testing

❤️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Order-Level Attention Similarity Across Language Models: A Latent Commonality

📢 Description

🔨 Requirement

Environment

Prepare Data

Pre-trained Model

📺 Qualitative Empirical Evidence of OLAS

🍔 Quantitative Empirical Evidence of OLAS

Quantitative Analysis Based on Visual Model

Quantitative Analysis Based on Image Similarity Retrieval

🚀 Transferable OLA Adapter

Training

Testing

❤️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages