# 1️⃣ Training an Adapter for a Transformer Model

In this notebook, we will train an adapter for a **RoBERTa** ([Liu et al., 2019](https://arxiv.org/pdf/1907.11692.pdf)) model for sequence classification on a **sentiment analysis** task using the _[Adapters](https://github.com/Adapter-Hub/adapters)_ library and Hugging Face's _Transformers_ library.

We will train a **[bottleneck adapter](https://docs.adapterhub.ml/methods.html#bottleneck-adapters)** on top of a pre-trained model here. Most of the code remain the same to a full fine-tuning setup using _Transformers_. The only differenece is only how the model is trained.

For training, we will use the [movie review dataset by Pang and Lee (2005)](http://www.cs.cornell.edu/people/pabo/movie-review-data/). It contains movie reviews from Rotten Tomatoes which are either classified as positive or negative. We download the dataset via Hugging Face's [_Datasets_](https://github.com/huggingface/datasets) library.

You can also open this example in Google Colab:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ivanvykopal/peft-kinit-2025/blob/master/examples/adapters/01_Adapter_Training.ipynb)

## Installation

First, let's install the required libraries:

In [None]:
!pip install -qq adapters datasets accelerate

## Dataset Preprocessing

Before we start to train our adapter, we need to first prepare the training data. 

We will use data from the [Massive Text Embedding Benchmark](https://aclanthology.org/2023.eacl-main.148/), specifically we will donwload data for sentiment classification in English.

Our training dataset can be loaded via HuggingFace `datasets` using one line of code:

In [None]:
from datasets import load_dataset

dataset = load_dataset("mteb/multilingual-sentiment-classification", 'eng')
dataset.num_rows

Every dataset sample has an input text and a binary label:

In [None]:
dataset['train'][0]

Now, we need to encode all dataset samples to valid inputs for our Transformer model. Since we want to train on `roberta-base`, we load the corresponding `RobertaTokenizer`. Using `dataset.map()`, we can pass the full dataset through the tokenizer in batches:

In [None]:
from transformers import RobertaTokenizer

tokenizer = RobertaTokenizer.from_pretrained("FacebookAI/roberta-base")

def encode_batch(batch):
  """Encodes a batch of input data using the model tokenizer."""
  return tokenizer(batch["text"], max_length=128, truncation=True, padding="max_length")

# Encode the input data
dataset = dataset.map(encode_batch, batched=True)
# The transformers model expects the target class column to be named "labels"
dataset = dataset.rename_column(original_column_name="label", new_column_name="labels")
# Transform to pytorch tensors and only output the required columns
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

In [None]:
dataset['train'][0]

Now we're ready to train our model...

## Training

We will use a pre-trained RoBERTa model checkpoint from the Hugging Face Hub, especially [RoBERTa Base](FacebookAI/roberta-base). We load it with [`AutoAdapterModel`](https://docs.adapterhub.ml/classes/models/auto.html), a class unique to `adapters`. In addition to regular _Transformers_ classes, this class comes with all sorts of adapter-specific functionality, allowing flexible management and configuration of multiple adapters and prediction heads. [Learn more](https://docs.adapterhub.ml/prediction_heads.html#adaptermodel-classes).

In [None]:
from transformers import RobertaConfig
from adapters import AutoAdapterModel

config = RobertaConfig.from_pretrained(
    "FacebookAI/roberta-base",
    num_labels=2,
)
model = AutoAdapterModel.from_pretrained(
    "FacebookAI/roberta-base",
    config=config,
)

**Here comes the important part!**

We add a new adapter to our model by calling `add_adapter()`. We pass a name (`"sentiment"`) and an adapter configuration. `"seq_bn"` denotes a [sequential bottleneck adapter](https://docs.adapterhub.ml/methods.html#bottleneck-adapters) configuration.


The **sequential bottleneck adapter** follows a simple yet effective architecture:

<img src="https://raw.githubusercontent.com/ivanvykopal/peft-kinit-2025/heads/master/images/adapter.png" alt="Sequential bottleneck adapter architecture" width="400"/>

It consists of three main components arrenged sequentially:
1. A feedforward down-projection layer, which reduces the dimensionality of the input;
2. A non-linear activation function (typically ReLU or GELU);
3. A feedforward up-projection layer, which projects the representation back to the original dimensionality.

The output of this sequence is then added back to the original input via a residual connection, allowing the adapter to refine representations without disrupting the underlying pre-trained model. This structure enables efficient adaptation to new tasks while keeping the number of trainable parameters low.


### Adapters library

_Adapters_ library supports a diverse range of different adapter configurations. For example, `config="lora"` can be passed for training a [LoRA](https://docs.adapterhub.ml/methods.html#lora) adapter, `config="prefix_tuning"` for [prefix tuning](https://docs.adapterhub.ml/methods.html#prefix-tuning) or `config="loreft"` for [LoReFT](https://docs.adapterhub.ml/methods.html#reft). You can find all currently supported configs [here](https://docs.adapterhub.ml/methods.html).

Next, we will add a binary classification head. It's convenient to give the prediction head the same name as the adapter. This allows us to activate both together in the next step. The `train_adapter()` method does two things:

1. It freezes all weights of the pre-trained model, so only the adapter weights are updated during training.
2. It activates the adapter and the prediction head such that both are used in every forward pass.

In [None]:
# Add a new adapter, in our case Sequential Bottleneck adapter (seq_bn)
model.add_adapter("sentiment", config="seq_bn")
# Alternatively, you can add different type of the adapter, e.g.:
# model.add_adapter("sentiment", config="lora")

# Add a matching classification head
model.add_classification_head(
    "sentiment",
    num_labels=2,
    id2label={ 0: "negative", 1: "positive"}
  )

# Activate the adapter for training
model.train_adapter("sentiment")

For training an adapter, we make use of the `AdapterTrainer` class built-in into _Adapters_. This class is largely identical to _Transformer_'s `Trainer`, with some helpful tweaks e.g. for checkpointing only adapter weights.

We configure the training process using a `TrainingArguments` object and define a method that will calculate the evaluation accuracy in the end. We pass both, together with the training and validation split of our dataset, to the trainer instance.

**Note the differences in hyperparameters compared to full fine-tuning.** Adapter training usually requires a few more training epochs than full fine-tuning.

In [None]:
import numpy as np
from transformers import TrainingArguments, EvalPrediction
from adapters import AdapterTrainer

training_args = TrainingArguments(
    learning_rate=1e-4,
    num_train_epochs=4,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    logging_steps=200,
    output_dir="./training_output_sentiment",
    overwrite_output_dir=True,
    # The next line is important to ensure the dataset labels are properly passed to the model
    remove_unused_columns=False,
    report_to="none"
)

def compute_accuracy(p: EvalPrediction):
  preds = np.argmax(p.predictions, axis=1)
  return {"acc": (preds == p.label_ids).mean()}

trainer = AdapterTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    compute_metrics=compute_accuracy,
)

Start the training 🚀 (It will take approximately 6-8 minutes).

In [None]:
trainer.train()

Looks good! Let's evaluate our adapter on the validation split of the dataset to see how well it learned:

In [None]:
trainer.evaluate()

We can put our trained model into a _Transformers_ pipeline to be able to make new predictions conveniently:

In [None]:
from transformers import TextClassificationPipeline

classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, device=training_args.device.index)

classifier("This is not awesome!")

At last, we can also extract the adapter from our model and separately save it for later reuse. Note the size difference compared to a full model!

In [None]:
model.save_adapter("./final_adapter", "sentiment")

!ls -lh final_adapter

## References

This tutorial was inspired by the [AdapterHub](https://adapterhub.ml) project and its associated codebase. In particular, the implementation is based on the official example notebook: [01_Adapter_Training.ipynb](https://github.com/adapter-hub/adapters/blob/main/notebooks/01_Adapter_Training.ipynb).

**Citations:**

[1] Muennighoff et al. (2023). [**Massive Text Embedding Benchmark (MTEB)**](https://aclanthology.org/2023.eacl-main.148/)  
[2] Houlsby et al. (2019). [**Parameter-Efficient Transfer Learning for NLP** (Sequential Bottleneck Adapter)](https://proceedings.mlr.press/v97/houlsby19a/houlsby19a.pdf)  
[3] Liu et al. (2019). [**RoBERTa: A Robustly Optimized BERT Pretraining Approach**](https://arxiv.org/pdf/1907.11692.pdf)
