Skip to content

A TensorFlow 2.0 implementation of Adapters in NLP based on HuggingFace's Transformers.

Notifications You must be signed in to change notification settings

hmohebbi/TF-Adapter-BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

TF-Adapter-BERT

A TensorFlow 2.0 implementation of Adapters in NLP as described in the paper Parameter-Efficient Transfer Learning for NLP based on HuggingFace's Transformers.

What is Adapter?

Houlsby et al. (2019) introduced adapters as an alternative approach for adaptation in transfer learning in NLP within deep transformer-based architectures. Adapters are task-specific neural modules that are added between layers of a pre-trained network. After coping weights from a pre-trained network, pre-trained weights will be frozen, and only Adapters will be trained.

Why Adapter?

Adapters provide numerous benefits over plain fully fine-tuning or other approaches that result in compact models such as multi-task learning:

  • It is a lightweight alternative to fully fine-tuning that trains only a few trainable parameters per task without sacrificing performance.
  • Yielding a high degree of parameter sharing between down-stream tasks due to being frozen of original network parameters.
  • Unlike multi-task learning that requires simultaneous access to all tasks, it allows training on down-stream tasks sequentially. Thus, adding new tasks do not require complete joint retraining. Further, eliminates the hassle of weighing losses or balancing training set sizes.
  • Training adapters for each task separately, leading to that the model not forgetting how to perform previous tasks (the problem of catastrophic forgetting).

Learn more in the paper "Parameter-Efficient Transfer Learning for NLP".

Usage

An example of training adapters in BERT's encoders on the MRPC classification task:

pip install transformers

python run_tf_glue_adapter_bert.py \
  --casing bert-base-uncased \
  --bottleneck_size 64\
  --non_linearity gelu\
  --task mrpc \
  --batch_size 32 \
  --epochs 10 \
  --max_seq_length 128 \
  --learning_rate 3e-4 \
  --warmup_ratio 0.1 \
  --saved_models_dir "saved_models"\

About

A TensorFlow 2.0 implementation of Adapters in NLP based on HuggingFace's Transformers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages