# 3️⃣ Combining Pre-Trained Adapters using AdapterFusion

In [the previous notebook](https://colab.research.google.com/github/Adapter-Hub/adapters/blob/master/notebooks/02_Adapter_Inference.ipynb), we loaded a single pre-trained adapter from _AdapterHub_. Now we will explore how to take advantage of multiple pre-trained adapters to combine their knowledge on a new task. This setup is called **AdapterFusion** ([Pfeiffer et al., 2020](https://arxiv.org/pdf/2005.00247.pdf)).

For this guide, we select **CommitmentBank** ([De Marneffe et al., 2019](https://github.com/mcdm/CommitmentBank)), a three-class textual entailment dataset, as our target task. We will fuse [adapters from AdapterHub](https://adapterhub.ml/explore/) which were pre-trained on different tasks. During training, their represantions are kept fix while a newly introduced fusion layer is trained. As our base model, we will use BERT (`bert-base-uncased`).

## Installation

Again, we install `adapters` and HuggingFace's `datasets` library first:

In [1]:
!pip install -Uq adapters
!pip install -q datasets
!pip install -q accelerate

## Dataset Preprocessing

Before setting up training, we first prepare the training data. CommimentBank is part of the SuperGLUE benchmark and can be loaded via HuggingFace `datasets` using one line of code:

In [2]:
from datasets import load_dataset

dataset = load_dataset("super_glue", "cb")
dataset.num_rows

{'train': 250, 'validation': 56, 'test': 250}

Every dataset sample has a premise, a hypothesis and a three-class class label:

In [3]:
dataset['train'].features

{'premise': Value(dtype='string', id=None),
 'hypothesis': Value(dtype='string', id=None),
 'idx': Value(dtype='int32', id=None),
 'label': ClassLabel(names=['entailment', 'contradiction', 'neutral'], id=None)}

Now, we need to encode all dataset samples to valid inputs for our `bert-base-uncased` model. Using `dataset.map()`, we can pass the full dataset through the tokenizer in batches:

In [4]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

def encode_batch(batch):
  """Encodes a batch of input data using the model tokenizer."""
  return tokenizer(
      batch["premise"],
      batch["hypothesis"],
      max_length=180,
      truncation=True,
      padding="max_length"
  )

# Encode the input data
dataset = dataset.map(encode_batch, batched=True)
# The transformers model expects the target class column to be named "labels"
dataset = dataset.rename_column("label", "labels")
# Transform to pytorch tensors and only output the required columns
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

New we're ready to setup AdapterFusion...

## Fusion Training

We use a pre-trained BERT model from HuggingFace and instantiate our model using `BertAdapterModel`.

In [5]:
from transformers import BertConfig
from adapters import BertAdapterModel

id2label = {id: label for (id, label) in enumerate(dataset["train"].features["labels"].names)}

config = BertConfig.from_pretrained(
    "bert-base-uncased",
    id2label=id2label,
)
model = BertAdapterModel.from_pretrained(
    "bert-base-uncased",
    config=config,
)

Now we have everything set up to load our _AdapterFusion_ setup. First, we load three adapters pre-trained on different tasks from the Hub: MultiNLI, QQP and QNLI. As we don't need their prediction heads, we pass `with_head=False` to the loading method. Next, we add a new fusion layer that combines all the adapters we've just loaded. Finally, we add a new classification head for our target task on top.

In [6]:
from adapters.composition import Fuse

# Load the pre-trained adapters we want to fuse
model.load_adapter("nli/multinli@ukp", load_as="multinli", with_head=False)
model.load_adapter("sts/qqp@ukp", with_head=False)
model.load_adapter("nli/qnli@ukp", with_head=False)
# Add a fusion layer for all loaded adapters
model.add_adapter_fusion(Fuse("multinli", "qqp", "qnli"))
model.set_active_adapters(Fuse("multinli", "qqp", "qnli"))

# Add a classification head for our target task
model.add_classification_head("cb", num_labels=len(id2label))

The last preparation step is to define and activate our adapter setup. Similar to `train_adapter()`, `train_adapter_fusion()` does two things: It freezes all weights of the model (including adapters!) except for the fusion layer and classification head. It also activates the given adapter setup to be used in very forward pass.

The syntax for the adapter setup (which is also applied to other methods such as `set_active_adapters()`) works as follows:

- a single string is interpreted as a single adapter
- a list of strings is interpreted as a __stack__ of adapters
- a _nested_ list of strings is interpreted as a __fusion__ of adapters

In [7]:
# Unfreeze and activate fusion setup
adapter_setup = Fuse("multinli", "qqp", "qnli")
model.train_adapter_fusion(adapter_setup)

For training, we make use of the `AdapterTrainer` class built-in into `adapters`. We configure the training process using a `TrainingArguments` object and define a method that will calculate the evaluation accuracy in the end. We pass both, together with the training and validation split of our dataset, to the trainer instance.

In [8]:
import numpy as np
from transformers import TrainingArguments, EvalPrediction
from adapters import AdapterTrainer

training_args = TrainingArguments(
    learning_rate=5e-5,
    num_train_epochs=5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    logging_steps=200,
    output_dir="./training_output",
    overwrite_output_dir=True,
    # The next line is important to ensure the dataset labels are properly passed to the model
    remove_unused_columns=False,
)

def compute_accuracy(p: EvalPrediction):
  preds = np.argmax(p.predictions, axis=1)
  return {"acc": (preds == p.label_ids).mean()}

trainer = AdapterTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    compute_metrics=compute_accuracy,
)

Start the training 🚀 (this will take a while)

In [9]:
trainer.train()



Step,Training Loss


TrainOutput(global_step=40, training_loss=0.8603550910949707, metrics={'train_runtime': 47.0613, 'train_samples_per_second': 26.561, 'train_steps_per_second': 0.85, 'total_flos': 148736480850000.0, 'train_loss': 0.8603550910949707, 'epoch': 5.0})

After completed training, let's check how well our setup performs on the validation set of our target dataset:

In [10]:
trainer.evaluate()

{'eval_loss': 0.8675440549850464,
 'eval_acc': 0.6607142857142857,
 'eval_runtime': 0.9295,
 'eval_samples_per_second': 60.25,
 'eval_steps_per_second': 2.152,
 'epoch': 5.0}

We can also use our setup to make some predictions (the example is from the test set of CB):

In [11]:
import torch

def predict(premise, hypothesis):
  encoded = tokenizer(premise, hypothesis, return_tensors="pt")
  if torch.cuda.is_available():
    encoded.to("cuda")
  logits = model(**encoded)[0]
  pred_class = torch.argmax(logits).item()
  return id2label[pred_class]

predict("""
``It doesn't happen very often.'' Karen went home
happy at the end of the day. She didn't think that
the work was difficult.
""",
"the work was difficult"
)

'entailment'

Finally, we can extract and save our fusion layer as well as all the adapters we used for training. Both can later be reloaded into the pre-trained model again.

In [12]:
model.save_adapter_fusion("./saved", "multinli,qqp,qnli")
model.save_all_adapters("./saved")

!ls -l saved

total 83056
-rw-r--r-- 1 root root      427 Aug 24 13:49 adapter_fusion_config.json
drwxr-xr-x 2 root root     4096 Aug 24 13:49 multinli
-rw-r--r-- 1 root root 85031623 Aug 24 13:49 pytorch_model_adapter_fusion.bin
drwxr-xr-x 2 root root     4096 Aug 24 13:49 qnli
drwxr-xr-x 2 root root     4096 Aug 24 13:49 qqp


That's it. Do check out [the paper on AdapterFusion](https://arxiv.org/pdf/2005.00247.pdf) for a more theoretical view on what we've just seen.

➡️ `adapters` also enables other composition methods beyond AdapterFusion. For example, check out [the next notebook in this series](https://colab.research.google.com/github/Adapter-Hub/adapters/blob/master/notebooks/04_Cross_Lingual_Transfer.ipynb) on cross-lingual transfer.