Skip to content

hanqi-qi/tokenUni

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Addressing Token Uniformity in Transformers via Singular Value Transformation

UAI2022: Addressing Token Uniformity in Transformers via Singular Value Transformation

Hanqi Yan, Lin Gui, Wenjie Li, Yulan He.

Motivations

In this work, we characterise the token uniformity problem (See in the Figure below) commonly observed in the output of transformer-based architectures by the degree of skewness in singular value distributions and propose a singular value transformation function (SoftDecay) to address the problem.

Singular value distribution of the outputs from BERT layer 0, 7 and 12 (from top to bottom) in the GLUE- MRPC dataset. The skewness, token uniformity and [CLS] uniformity values increase as BERT layer goes deeper, while the median of singular values decreases drastically, close to vanishing.

Requirements

Our project is based on Huggingface Transformers, so basically you can refer the requirements from their repository.

Code Structure

  1. Main Functions

Our method is evaluated on text-classification tasks (GLUE Datasets), so we define the main functions in run_glue_no_trainer.py. It should be noted that if you have already downloaded the official Transformer packgae, you may need to specify the directory for our updated version.

# involve our updated SoftDecay Transofmer Directory /tokenUni/src/

sys.path.insert(0,"/YourDir/tokenUni/src/")
  1. Pretrained Language Models + SoftDecay

Our SoftDecay Transofmer code is in this repo, including the AlBERT, BERT, DistilBERT, Roberta. Take BERT as an example, we modify the configuration file configuration_bert.py and the modeling_bert.py to insert the SoftDecay function. And the SoftDecay function is defined in soft_decay.py

  1. Visualization

This code visualization.py is used to visualize the singular value distributions, i.e., the CDF and histgrams, as well as calculate the representation metrics.

Demo of Unsupervised Evaluation on STS datasets

Since SoftDecay is directly applied on output representation, and we can fix the function parameter $\alpha$ to define a decay function and get the transorformed representations for evaluation without training. (In supervised setting, $\alpha$ is trained under the task supervision to get better downstream task results.) This demo is mainly based on WhiteningBERT repo. We have added the softDecay post-process method in the unsupervisedSTS directory

#specify the PLTM/pooling method/layer_index/post_process(soft_decay or whitening)
python evaluation_stsbenchmark.py --pooling aver --encoder_name bert-base-cased --last2avg --post_process soft_decay

By changing the different PLTMs, you are suppposed to get the results below:

Citation

If you find our work useful, please cite as:

@inproceedings{
yan2022addressing,
title={Addressing Token Uniformity in Transformers via Singular Value Transformation},
author={Hanqi Yan and Lin Gui and Wenjie Li and Yulan He},
booktitle={The 38th Conference on Uncertainty in Artificial Intelligence},
year={2022},
url={https://openreview.net/forum?id=BtUxE_8i5l5}
}

About

UAI22: Addressing TokenUniformity via Singular value transformation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages