# Load and train adapters with HuggingFace PEFT

Parameter-Efficient Fine Tuning (PEFT) methods freeze the pretrained model parameters during fine-tuning and add a small number of trainable parameters (the adapaters) on top of it.

The adapters are trained to learn task-specific information. This approach has been shown to be very memory-efficient with lower compute usage while producing results comparable to a fully fine-tune model.

## Setup

In [None]:
!pip install peft

## Supported PEFT models

* Low Rank Adapters (LoRA)
* IA3
* AdaLoRA

## Load a PEFT adapter

To load and use a PEFT adapter model from Transformers library, make sure the Hub repository or local directory contains an `adapter_config.json` file and the adapter weights. Then we can load the PEFT adapter model using the `AutoModelFor` class. For example, to load a PEFT adapter model for causal language modeling:
1. specify the PEFT model id
2. pass it to the `AutoModelForCasualLM` class

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = 'ybelkada/opt-350m-lora'
model = AutoModelForCausalLM.from_pretrained(peft_model_id)

adapter_config.json:   0%|          | 0.00/416 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/663M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/6.30M [00:00<?, ?B/s]

We can also load a PEFT adapter by calling the `load_adapter` method:

In [3]:
model_id = 'facebook/opt-350m'
peft_model_id = 'ybelkada/opt-350m-lora'

model = AutoModelForCausalLM.from_pretrained(model_id)
model.load_adapter(peft_model_id)

## Load in 8bit or 4bit

The `bitsandbytes` supports 8bit and 4bit precision data types. Add the `load_in_8bit` or `load_in_4bit` parameters to `from_pretrained()` and set `device_map="auto"` to effectively distribute the model to our hardware:

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

peft_model_id = 'ybelkada/opt-350m-lora'
model = AutoModelForCausalLM.from_pretrained(
    peft_model_id,
    quantization_config=BitsAndBytesConfig(load_in_8bit=True),
)
# need a GPU for quantization

## Add a new adapter

If we have an existing LoRA adapter attached to a model:

In [7]:
from peft import LoraConfig

model_id = 'facebook/opt-350m'
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

lora_config = LoraConfig(
    target_modules=['q_proj', 'k_proj'],
    init_lora_weights=False,
)

model.add_adapter(lora_config, adapter_name='adapter_1')

tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]



To add a new adapter:

In [8]:
# attach new adapter with same config
model.add_adapter(lora_config, adapter_name='adapter_2')

Now we can select which adapter to use:

In [9]:
text = "Hello world!"
inputs = tokenizer(text, return_tensors='pt')

# use adapter_1
model.set_adapter('adapter_1')
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))

# use adapter_2
model.set_adapter('adapter_2')
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))



Hello world!

I’m a newbie to the world of blogging. I
Hello world!

I’m a newbie to the world of blogging. I


## Enable and disable adapters

Once we have added an adapter to a model, we can enable or disable the adapter module.

In [16]:
from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer
from peft import PeftConfig

model_id = 'facebook/opt-350m'
adapter_model_id = 'ybelkada/opt-350m-lora'
tokenizer = AutoTokenizer.from_pretrained(model_id)

text = "Hello"
inputs = tokenizer(text, return_tensors="pt")

model = AutoModelForCausalLM.from_pretrained(model_id)
peft_config = PeftConfig.from_pretrained(adapter_model_id)

# Initiate with random weights
peft_config.init_lora_weights = False



In [17]:
# Enable the adapter module
model.add_adapter(peft_config)
model.enable_adapters()
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))



Hello CO CO CO CO N N N N N N N N N N N N N N


In [18]:
# Disable the adapter module
model.disable_adapters()
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))



Hello, I'm a newbie to this sub. I'm looking for a good place to


## Train a PEFT adapter

PEFT adapters are supported by the `Trainer` class so that we can train an adapter for our specific use case.

1. Define our adapter configuration with the task type and hyperparameters

In [19]:
from peft import LoraConfig

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias='none',
    task_type='CAUSAL_LM',
)

2. Add adapter to the model

In [21]:
model.add_adapter(peft_config, adapter_name='test_adapter')

3. Pass the model to Trainer

In [None]:
trainer = Trainer(
    model=model,
    ...
)

trainer.train()

To save our trained adapter and load it back:

In [None]:
model.save_pretrained(save_directory)
model = AutoModelForCausalLM.from_pretrained(save_directory)

## Add additional trainable layers to a PEFT adapter

We can also fine-tune additional trainable adapters on top of a model that has adapters attached by passing `modules_to_save` in our PEFT config.

For example, if we want to also fine-tune the `lm_head` on top of a model with a LoRA adapter:

In [None]:
from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer
from peft import LoraConfig

model_id = 'facebook/opt-350m'
model = AutoModelForCausalLM.from_pretrained(model_id)

lora_config = LoraConfig(
    target_modules=['q_proj', 'k_proj'],
    modules_to_save=['lm_head'],
)

model.add_adapter(lora_config, adapter_name='adapter_lm_head')