Skip to content

satwik77/Transformer-Simplicity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simplicity Bias in Transformers

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer's effective generalization performance despite relatively limited expressiveness.

...

Dependencies

  • compatible with python 3
  • dependencies can be installed using Transformer-Simplicity/requirements.txt

Setup

Install VirtualEnv using the following (optional):

$ [sudo] pip install virtualenv

Create and activate your virtual environment (optional):

$ virtualenv -p python3 venv
$ source venv/bin/activate

Install all the required packages:

at Transformer-Simplicity/:

$ pip install -r requirements.txt

Models

The current repository includes 4 directories implementing different models and settings:

  • Training Transformer on Boolean functions : Transformer-Simplicity/FLTAtt
  • Training LSTMs on Boolean functions : Transformer-Simplicity/FLTClassifier
  • Experiments with Random Transformer : Transformer-Simplicity/RandFLTAtt
  • Experiments with Random LSTM : Transformer-Simplicity/RandFLTClassifier

Usage

The set of command line arguments available can be seen in the respective args.py file. Here, we illustrate running the experiment for training Transformers on sparse parities. Follow the same methodology for running any experiments with LSTMs.

At Transformer-Simplicity/FLTAtt:

$	python -m src.main -mode train -gpu 0 -dataset sparity40_5k -run_name trafo_sparity_40_5k -depth 4 -lr 0.001

To compute sensitivity of randomly initialized Transformers,
At Transformer-Simplicity/RandFLTAtt:

$	python rand_sensi.py -gpu 0 -sample_size 1000 -len 20 -trials 100

Citation

If you use our data or code, please cite our work:

@inproceedings{bhattamishra-etal-2023-simplicity,
    title = "Simplicity Bias in Transformers and their Ability to Learn Sparse {B}oolean Functions",
    author = "Bhattamishra, Satwik  and
      Patel, Arkil  and
      Kanade, Varun  and
      Blunsom, Phil",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.317",
    pages = "5767--5791",
    abstract = "Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer{'}s effective generalization performance despite relatively limited expressiveness.",
}

For any clarification, comments, or suggestions please contact Satwik or Arkil.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages