# Neural Network Adapters for faster low-memory fine-tuning

This tutorial covers the SpeechBrain implementation of adapters such as LoRA. This includes how to integrate either SpeechBrain implemented adapters, custom adapters, and adapters from libraries such as PEFT into a pre-trained model.

## Prerequisite
- [Speech Recognition From Scratch](https://speechbrain.readthedocs.io/en/latest/tutorials/tasks/speech-recognition-from-scratch.html)

## Introduction and Background

As pre-trained models become larger and more capable, there is growing interest in methods for adapting them for specific tasks in a memory-efficient way, within a reasonable time span. One such technique is freezing the original parameters and inserting a small number of additional parameters into the original model, which are called "adapters." These adapters can often match the performance of full fine-tuning at a fraction of the parameter count, meaning faster and more memory-efficient fine-tuning [1]. One popular technique for doing this is known as Low-Rank Adaptation (LoRA) [2].

On the software side, HuggingFace has produced a popular library for adapters called PEFT [3]. Our implementation includes some of the features of this library, as well as including the ability to integrate PEFT adapters into a SpeechBrain model. To explore this further, let's proceed with the installation of SpeechBrain.

### Relevant bibliography
1. N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, "Parameter-efficient transfer learning for NLP." In *International Conference on Machine Learning*, 2019.
2. E.J. Hu, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, "LoRA: Low-rank adaptation of large language models." In *International Conference on Learning Representations*, 2021.
3. S. Mangrulkar, S. Gugger, L. Debut, Y. Belkada, S. Paul, and  B. Bossan, "PEFT: State-of-the-art parameter-efficient fine-tuning methods." *GitHub Repository*, 2022.


In [1]:
!git clone --depth 1 https://github.com/speechbrain/speechbrain.git
!python -m pip install -e .

Cloning into 'speechbrain'...
remote: Enumerating objects: 1693, done.[K
remote: Counting objects: 100% (1693/1693), done.[K
remote: Compressing objects: 100% (1210/1210), done.[K
remote: Total 1693 (delta 402), reused 1062 (delta 318), pack-reused 0 (from 0)[K
Receiving objects: 100% (1693/1693), 23.88 MiB | 20.80 MiB/s, done.
Resolving deltas: 100% (402/402), done.
/home/competerscience/Documents/uvenv/bin/python: No module named pip


## Simple Fine-tuning

We'll first show how to use adapters on a template recipe, which includes everything necessary for full training.

In [2]:
%cd speechbrain/templates/speech_recognition/ASR

/home/competerscience/Documents/Repositories/speechbrain/docs/tutorials/nn/speechbrain/templates/speech_recognition/ASR


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [3]:
!python train.py train.yaml --number_of_epochs=1 --batch_size=2 --test_scorer "!ref <valid_scorer>" --enable_add_reverb=False --enable_add_noise=False #To speed up

torchvision is not available - cannot save figures
  wrapped_fwd = torch.cuda.amp.custom_fwd(fwd, cast_inputs=cast_inputs)
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/CRDNN_BPE_960h_LM/2602
mini_librispeech_prepare - Preparation completed in previous run, skipping.
../data/noise/data.zip exists. Skipping download
../data/rir/data.zip exists. Skipping download
speechbrain.utils.fetching - Fetch lm.ckpt: Using existing file/symlink in results/CRDNN_BPE_960h_LM/2602/save/lm.ckpt.
speechbrain.utils.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in results/CRDNN_BPE_960h_LM/2602/save/tokenizer.ckpt.
speechbrain.utils.fetching - Fetch asr.ckpt: Using existing file/symlink in results/CRDNN_BPE_960h_LM/2602/save/model.ckpt.
speechbrain.utils.parameter_transfer - Loading pretrained files for: lm, tokenizer, model
  state_dict = torch.load(path, map_location=device)
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is use

## Inference

To prove that this is working, let's just perform inference on one file. This code taken from `transcribe_file.py`

In [3]:
import os

from speechbrain.inference.ASR import EncoderDecoderASR
from speechbrain.utils.fetching import fetch

# Ensure all the needed files end up in the same place to load with the transcriber
save_dir = os.path.abspath("results/CRDNN_BPE_960h_LM/2602/save/CKPT+latest")
fetch("lm.ckpt", "speechbrain/asr-crdnn-rnnlm-librispeech", save_dir)
fetch("tokenizer.ckpt", "speechbrain/asr-crdnn-rnnlm-librispeech", save_dir)
fetch("inference.yaml", os.getcwd(), save_dir)

transcriber = EncoderDecoderASR.from_hparams(source=save_dir, hparams_file="inference.yaml")
speech_file = "../data/LibriSpeech/dev-clean-2/1272/135031/1272-135031-0015.flac"
transcriber.transcribe_file(speech_file)

INFO:speechbrain.utils.fetching:Fetch lm.ckpt: Fetching from HuggingFace Hub 'speechbrain/asr-crdnn-rnnlm-librispeech' if not cached
INFO:speechbrain.utils.fetching:Fetch tokenizer.ckpt: Fetching from HuggingFace Hub 'speechbrain/asr-crdnn-rnnlm-librispeech' if not cached
INFO:speechbrain.utils.fetching:Fetch inference.yaml: Using existing file/symlink in /home/competerscience/Documents/Repositories/speechbrain/docs/tutorials/nn/speechbrain/templates/speech_recognition/ASR/results/CRDNN_BPE_960h_LM/2602/save/CKPT+latest/inference.yaml
  wrapped_fwd = torch.cuda.amp.custom_fwd(fwd, cast_inputs=cast_inputs)
INFO:speechbrain.utils.parameter_transfer:Loading pretrained files for: lm, tokenizer, model, normalizer
  state_dict = torch.load(path, map_location=device)
  stats = torch.load(path, map_location=device)


'THE METAL FOREST IS IN THE GREAT DOMED CAVERN THE LARGEST IN ALL OUR DOMINIANS REPLIED CALIGO ⁇ '

## Adding adapters

So now that we've proved that the model is at least working, let's go ahead and add adapters. We basically need to create a new yaml file adding adapters to the model and then train with this new yaml file. To do this we'll just load the old yaml file and then we'll change all the parts necessary to train the adapted model.

In [18]:
with open("train.yaml") as f:
    train_yaml = f.read()

train_yaml = train_yaml.replace("seed: 2602", "seed: 4324")
train_yaml = train_yaml.replace("output_folder: !ref results/CRDNN_BPE_960h_LM/<seed>", "output_folder: !ref results/crdnn_lora/<seed>")
train_yaml = train_yaml.replace("pretrained_path: speechbrain/asr-crdnn-rnnlm-librispeech", "pretrained_path: " + save_dir)
train_yaml = train_yaml.replace("model: !new:torch.nn.ModuleList", "model_pretrained: !new:torch.nn.ModuleList")

# We aren't using the LM so remove it purely for accurate parameter counts
train_yaml = train_yaml.replace("""
modules:
    encoder: !ref <encoder>
    embedding: !ref <embedding>
    decoder: !ref <decoder>
    ctc_lin: !ref <ctc_lin>
    seq_lin: !ref <seq_lin>
    normalize: !ref <normalize>
    lm_model: !ref <lm_model>
""","""
modules:
    encoder: !ref <encoder>
    embedding: !ref <embedding>
    decoder: !ref <decoder>
    ctc_lin: !ref <ctc_lin>
    seq_lin: !ref <seq_lin>
    normalize: !ref <normalize>
""")

# Update load the old trained model to the `pretrained_model` object
train_yaml = train_yaml.replace("""
    loadables:
        lm: !ref <lm_model>
        tokenizer: !ref <tokenizer>
        model: !ref <model>
    paths:
        lm: !ref <pretrained_path>/lm.ckpt
        tokenizer: !ref <pretrained_path>/tokenizer.ckpt
        model: !ref <pretrained_path>/asr.ckpt
""","""
    loadables:
        lm: !ref <lm_model>
        tokenizer: !ref <tokenizer>
        model: !ref <model_pretrained>
    paths:
        lm: !ref <pretrained_path>/lm.ckpt
        tokenizer: !ref <pretrained_path>/tokenizer.ckpt
        model: !ref <pretrained_path>/model.ckpt
"""
)

# And now for adding the adapted model
train_yaml += """
new_encoder: !new:speechbrain.nnet.adapters.AdaptedModel
    model_to_adapt: !ref <encoder>
    adapter_class: !name:speechbrain.nnet.adapters.LoRA
    manual_adapter_insertion: True
    adapter_kwargs:
        rank: 8

new_decoder: !new:speechbrain.nnet.adapters.AdaptedModel
    model_to_adapt: !ref <decoder>
    adapter_class: !name:speechbrain.nnet.adapters.LoRA
    manual_adapter_insertion: True
    adapter_kwargs:
        rank: 8

model: !new:torch.nn.ModuleList
    - - !ref <new_encoder>
      - !ref <embedding>
      - !ref <new_decoder>
      - !ref <ctc_lin>
      - !ref <seq_lin>
"""

with open("train_lora.yaml", "w") as f:
    f.write(train_yaml)

In [30]:
# We have to add two lines to the train file as well
with open("train.py") as f:
    train_py = f.read()

train_py = train_py.replace("""
    hparams["pretrainer"].load_collected()
""","""
    hparams["pretrainer"].load_collected()
    hparams["new_encoder"].insert_adapters()
    hparams["new_decoder"].insert_adapters()
""")

with open("train_lora.py", "w") as f:
    f.write(train_py)

## Training the adapted model

Training works identically to before, using the updated lora file. The adapted model is designed to work as an in-place replacement. Notice how the number of trainable parameters is reduced to close to 1% of the original parameters.

In [31]:
!python train_lora.py train_lora.yaml --number_of_epochs=1 --batch_size=2 --test_scorer "!ref <valid_scorer>" --enable_add_reverb=False --enable_add_noise=False #To speed up

INFO:speechbrain.utils.seed:Setting seed to 4324
  wrapped_fwd = torch.cuda.amp.custom_fwd(fwd, cast_inputs=cast_inputs)
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/crdnn_lora/4324
mini_librispeech_prepare - Preparation completed in previous run, skipping.
../data/noise/data.zip exists. Skipping download
../data/rir/data.zip exists. Skipping download
speechbrain.utils.parameter_transfer - Loading pretrained files for: lm, tokenizer, model
  state_dict = torch.load(path, map_location=device)
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - Gradscaler enabled: False. Using precision: fp32.
  self.scaler = torch.cuda.amp.GradScaler(enabled=gradscaler_enabled)
speechbrain.core - ASR Model Statistics:
* Total Number of Trainable Parameters: 1.9M
* Total Number of Parameters: 120.1M
* Trainable Parameters represent 1.5715% of the total size.
speechbrain.utils.checkpoints - Loading a checkpoint from resul

## Custom adapter

We designed this so that you could replace the SpeechBrain adapter with a `peft` adapter:

```diff
new_encoder: !new:speechbrain.nnet.adapters.AdaptedModel
    model_to_adapt: !ref <encoder>
-   adapter_class: !name:speechbrain.nnet.adapters.LoRA
+   adapter_class: !name:peft.tuners.lora.layer.Linear
    manual_adapter_insertion: True
    adapter_kwargs:
-       rank: 16
+       r: 16
+       adapter_name: lora
```

But this trains exactly the same thing as before, so no need for us to go through the whole thing. Perhaps more interesting is designing a custom adapter:

In [32]:
%%file conv_lora.py

import torch

class Conv2dLoRA(torch.nn.Module):
    def __init__(self, target_module, kernel_size=3, stride=2, channels=16):
        super().__init__()

        # Disable gradient for pretrained module
        self.pretrained_module = target_module
        for param in self.pretrained_module.parameters():
            param.requires_grad = False
        device = target_module.weight.device

        self.adapter_down_conv = torch.nn.Conv2D(
            in_channels=1, out_channels=channels, padding="same", stride=2, bias=False, device=device
        )
        self.adapter_up_scale = torch.nn.Upscale(scale_factor=2)
        self.adapter_up_conv = torch.nn.Conv2D(
            in_channels=channels, out_channels=1, padding="same", bias=False, device=device
        )


    def forward(self, x: torch.Tensor):
        """Applies the LoRA Adapter.

        Arguments
        ---------
        x: torch.Tensor
            Input tensor to the adapter module.

        Returns
        -------
        The linear outputs
        """
        x_pretrained = self.pretrained_module(x)
        x_conv_lora = self.adapter_up_conv(self.adapter_up_scale(self.adapter_down_conv(x)))

        return x_pretrained + x_conv_lora * self.scaling

Writing conv_lora.py


In [35]:
# Change the adapter out
train_yaml = train_yaml.replace("output_folder: !ref results/crdnn_lora/<seed>", "output_folder: !ref results/crdnn_conv_lora/<seed>")

# We aren't using the LM so remove it purely for accurate parameter counts
train_yaml = train_yaml.replace("""
modules:
    encoder: !ref <encoder>
    embedding: !ref <embedding>
    decoder: !ref <decoder>
    ctc_lin: !ref <ctc_lin>
    seq_lin: !ref <seq_lin>
    normalize: !ref <normalize>
    lm_model: !ref <lm_model>
""","""
modules:
    encoder: !ref <encoder>
    embedding: !ref <embedding>
    decoder: !ref <decoder>
    ctc_lin: !ref <ctc_lin>
    seq_lin: !ref <seq_lin>
    normalize: !ref <normalize>
""")

train_yaml.replace("""
    adapter_class: !name:speechbrain.nnet.adapters.LoRA
    adapter_kwargs:
        rank: 16
""", """
    adapter_class: !name:conv_lora.Conv2dLoRA
    adapter_kwargs:
        kernel_size: 3
        stride: 2
        channels: 16
""")

with open("train_conv_lora.yaml", "w") as f:
    f.write(train_yaml)

In [36]:
!python train_lora.py train_conv_lora.yaml --number_of_epochs=1 --batch_size=2 --test_scorer "!ref <valid_scorer>" --enable_add_reverb=False --enable_add_noise=False #To speed up

INFO:speechbrain.utils.seed:Setting seed to 4324
  wrapped_fwd = torch.cuda.amp.custom_fwd(fwd, cast_inputs=cast_inputs)
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/crdnn_conv_lora/4324
mini_librispeech_prepare - Preparation completed in previous run, skipping.
../data/noise/data.zip exists. Skipping download
../data/rir/data.zip exists. Skipping download
speechbrain.utils.parameter_transfer - Loading pretrained files for: lm, tokenizer, model
  state_dict = torch.load(path, map_location=device)
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - Gradscaler enabled: False. Using precision: fp32.
  self.scaler = torch.cuda.amp.GradScaler(enabled=gradscaler_enabled)
speechbrain.core - ASR Model Statistics:
* Total Number of Trainable Parameters: 1.7M
* Total Number of Parameters: 119.9M
* Trainable Parameters represent 1.3897% of the total size.
speechbrain.utils.checkpoints - Would load a checkpoint he

## Conclusion

That's it, thanks for following along! Go forth and make cool adapters.