# Prompt Tuning for Text Generation

Inspired by [Pere Martra](https://github.com/peremartra)'s work: how to apply prompt tuning with the PEFT library to a pre-trained model.

Refer to the Hugging Face [documentation](https://huggingface.co/docs/peft/main/en/index#supported-methods) for a complete list of models compatible with PEFT.

**Task:**
- Train two different models using **two datasets**, each with just one pre-trained model from the Bloom family.
    - One model will be trained with a dataset of prompts.
    - The other model will use a dataset of inspirational sentences.
- Compare the results for the same question from both models before and after training.
- Explore how to load both models with only one copy of the foundational model in memory.

## Introduction: Prompt Tuning

**Prompt Tuning** is an **Additive** Fine-Tuning technique for models:
- Will **not** be **modified**: any weights of the original model.
- Will be **trained**: additional layers that are added to the model.
  - These layers are **related** to the **prompt** in prompt tinung.
  - A type of **superprompt** is created by enabling a model to enhance a portion of the prompt (with its acquired knowledge), which cannot be translated into natural language.
  - **Goal:** generating highly **effective prompts** that minimize the loss function by modifying weighted integrated into the prompt

Pros:
- The number of parameters to train is genuinely small => faster and more cost-effective **training**!
- The weights of the pretrained model remain unchanged => **no** catastrophic **forgetting**!
- Various models can be trained, and during **inference**, only **one foundation** model needs to be loaded along with the new, smaller trained models.

# 1 - Baseline: Pre-Trained Model

To minimize training time, one of the smallest one is used: [bigscience/bloomz-560m](https://huggingface.co/bigscience/bloom-560m)

## 1.1 - Loading Model & Tokenizer

**PEFT library** contains the Hugging Face implementation of various Fine-Tuning techniques, including Prompt Tuning

In [None]:
!pip install transformers peft datasets --quiet

In [3]:
import transformers, peft
from transformers import AutoModelForCausalLM, AutoTokenizer

`NUM_VIRTUAL_TOKENS` and `NUM_EPOCHS` are hyper=parameters.

In [26]:
model_name = "bigscience/bloomz-560m"
#model_name="bigscience/bloom-1b1"
NUM_VIRTUAL_TOKENS = 50 # 100
NUM_EPOCHS = 10

Some models use Python scripts in ways that differ from the standard Hugging Face implementation. To load these models, `trust_remote_code` must be set to `True`.

In [27]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
foundational_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True
)

Following parametesr can be set for **generation** tasks:
- temperature
- top_p
- do_sample

You should **avoid** setting them for **classification** tasks. Also, it **may** reduce the performance for **translation** tasks.

`repetition_penalty < 1`: encourages repetition.

`eos_token_id` defines the **end of sequence token**.

In [28]:
# the function returns the outputs from the model received, and inputs
def get_outputs(model, inputs, max_new_tokens=100):
    device = model.device
    outputs = model.generate(
        input_ids=inputs["input_ids"].to(device),
        attention_mask=inputs["attention_mask"].to(device),
        max_new_tokens=max_new_tokens,
        #temperature=0.2,
        #top_p=0.95,
        #do_sample=True,
        repetition_penalty=1.5, # Avoid repetition.
        early_stopping=True, # The model can stop before reach the max_length
        eos_token_id=tokenizer.eos_token_id,
        use_cache=False
    )
    return outputs

## 1.2 - Inference

# Baseline: Inference with the Pre-Trained Model

Create two distinct prompts to train two different models:
- A dataset containing prompts
- A dataset of motivational sentences

First, collecting some results from the model without Fine-Tuning.

**Observation:** both answers are more or less correct.

**Note:** any of the Bloom models is pre-trained and can generate sentences accurately and sensibly.

**Test:** if the responses are either equal or more accurately generated after training.

In [29]:
input_prompt = tokenizer("I want you to act as a motivational coach. ", return_tensors="pt")
foundational_outputs_prompt = get_outputs(foundational_model, input_prompt, max_new_tokens=50)

print(tokenizer.batch_decode(foundational_outputs_prompt, skip_special_tokens=True))

["I want you to act as a motivational coach.  Don't be afraid of being challenged."]


In [30]:
input_sentences = tokenizer("There are two nice things that should matter to you:", return_tensors="pt")
foundational_outputs_sentence = get_outputs(foundational_model, input_sentences, max_new_tokens=50)

print(tokenizer.batch_decode(foundational_outputs_sentence, skip_special_tokens=True))

['There are two nice things that should matter to you: the price and quality of your product.']


# 2 -Prompt Tuning

## 2.1 - Preparing the Datasets

The Datasets useds are:
* https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
* https://huggingface.co/datasets/Abirate/english_quotes

In [31]:
import os
#os.environ["TOKENIZERS_PARALLELISM"] = "false"

from datasets import load_dataset

### 2.1.1 - Dataset_1

In [32]:
dataset_prompt = "fka/awesome-chatgpt-prompts"

# create the Dataset to create prompts.
data_prompt = load_dataset(dataset_prompt)
# take 'prompt' column from the dataset & convert the values to tokens
data_prompt = data_prompt.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
# take a subsample (of size 50)
train_sample_prompt = data_prompt["train"].select(range(50))
# note that column 'act' is kept

**Tokenizer** creates `input_ids` and `attention_mask` for `prompt`.

In [33]:
display(train_sample_prompt)

Dataset({
    features: ['act', 'prompt', 'input_ids', 'attention_mask'],
    num_rows: 50
})

In [34]:
print(train_sample_prompt[:1])

{'act': ['An Ethereum Developer'], 'prompt': ['Imagine you are an experienced Ethereum developer tasked with creating a smart contract for a blockchain messenger. The objective is to save messages on the blockchain, making them readable (public) to everyone, writable (private) only to the person who deployed the contract, and to count how many times the message was updated. Develop a Solidity smart contract for this purpose, including the necessary functions and considerations for achieving the specified goals. Please provide the code and any relevant explanations to ensure a clear understanding of the implementation.'], 'input_ids': [[186402, 1152, 1306, 660, 72560, 28857, 167625, 84544, 20165, 376, 1002, 26168, 267, 30479, 17477, 613, 267, 120755, 238776, 17, 1387, 47881, 632, 427, 14565, 29866, 664, 368, 120755, 15, 16997, 4054, 136044, 375, 4859, 12, 427, 39839, 15, 9697, 1242, 375, 13614, 12, 3804, 427, 368, 2298, 5268, 109891, 368, 17477, 15, 530, 427, 11210, 4143, 7112, 11866, 3

### 2.1.2 - Dataset_2

In [35]:
dataset_sentences = load_dataset("Abirate/english_quotes")

# take 'quote' column from the dataset & convert the values to tokens
data_sentences = dataset_sentences.map(lambda samples: tokenizer(samples["quote"]), batched=True)
# take a subsample (of size 25)
train_sample_sentences = data_sentences["train"].select(range(25))
# remove useless columns: 'author' & 'tags'
train_sample_sentences = train_sample_sentences.remove_columns(['author', 'tags'])

In [36]:
display(train_sample_sentences)

Dataset({
    features: ['quote', 'input_ids', 'attention_mask'],
    num_rows: 25
})

## 2.2 - Fine-Tuning

### 2.2.1 - PEFT configurations

Refer to [documentation](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig) for more infromation on PEFT parameters.


The same configuration can be used for both models to be trained.


In [37]:
from peft import  get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

generation_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM, # This type indicates the model will generate text
    prompt_tuning_init=PromptTuningInit.RANDOM,  # The added virtual tokens are initializad with random numbers
    num_virtual_tokens=NUM_VIRTUAL_TOKENS, # Number of virtual tokens to be added and trained
    tokenizer_name_or_path=model_name # The pre-trained model
)


### 2.2.2 - Creating two Prompt Tuning Models

Create two identical prompt tuning models using the same pre-trained model and the same config.

In [38]:
peft_model_prompt = get_peft_model(foundational_model, generation_config)
print(peft_model_prompt.print_trainable_parameters())

trainable params: 51,200 || all params: 559,265,792 || trainable%: 0.0092
None


In [39]:
peft_model_sentences = get_peft_model(foundational_model, generation_config)
print(peft_model_sentences.print_trainable_parameters())

trainable params: 51,200 || all params: 559,265,792 || trainable%: 0.0092
None


Note the **reduction in trainable parameters**: less than a **0.001%** of the paramaters available.

Create the training arguments.The same configuration will be used in both trainings.

In [40]:
from transformers import TrainingArguments
def create_training_arguments(path, learning_rate=0.0035, epochs=6):
    training_args = TrainingArguments(
        output_dir=path, # Where the model predictions and checkpoints will be written
        use_cpu=False, # 'True' is necessary for CPU clusters (when GPU is not availble)
        auto_find_batch_size=True, # Find a suitable batch size that will fit into memory automatically
        learning_rate= learning_rate, # Higher learning rate than full Fine-Tuning
        num_train_epochs=epochs,
        report_to="none"
    )
    return training_args

In [41]:
import os

working_dir = "./" # path: root of the project

# Create the name of the directories where to store the models.
output_directory_prompt =  os.path.join(working_dir, "peft_outputs_prompt")
output_directory_sentences = os.path.join(working_dir, "peft_outputs_sentences")

In [42]:
# # Creating the directoris if not exist.
# if not os.path.exists(working_dir):
#     os.mkdir(working_dir)
# if not os.path.exists(output_directory_prompt):
#     os.mkdir(output_directory_prompt)
# if not os.path.exists(output_directory_sentences):
#     os.mkdir(output_directory_sentences)

Indicate the directory containing the model when creating the TrainingArguments.

In [43]:
training_args_prompt = create_training_arguments(output_directory_prompt, 0.003, NUM_EPOCHS)
training_args_sentences = create_training_arguments(output_directory_sentences, 0.003, NUM_EPOCHS)

### 2.2.3 - Train

Create the **trainer** object, one for each model to train.  

**Collator** prepares data for the model, e.g., it makes the text inputs the same length by truncating or padding.

The type of the Collator deponds on the task. As the we only have a language task, use `DataCollatorForLanguageModeling`.
- `tokenizer` should be passed to the `DataCollatorForLanguageModeling`.

**Note:** set `mlm` as `True` when pre-training a model (i.e., creating a foundation model).

In [44]:
from transformers import Trainer, DataCollatorForLanguageModeling
def create_trainer(model, training_args, train_dataset):
    trainer = Trainer(
        model=model, # pass in the PEFT version of the foundation model, bloomz-560M
        args=training_args, # args for the training
        train_dataset=train_dataset, # dataset used to tyrain the model
        data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False) # mlm=False indicates not to use masked language modeling
    )
    return trainer

In [45]:
#Training first model.
trainer_prompt = create_trainer(peft_model_prompt, training_args_prompt, train_sample_prompt)
trainer_prompt.train()

Step,Training Loss


TrainOutput(global_step=70, training_loss=3.42767813546317, metrics={'train_runtime': 15.7863, 'train_samples_per_second': 31.673, 'train_steps_per_second': 4.434, 'total_flos': 108789316780032.0, 'train_loss': 3.42767813546317, 'epoch': 10.0})

In [46]:
#Training second model.
trainer_sentences = create_trainer(peft_model_sentences, training_args_sentences, train_sample_sentences)
trainer_sentences.train()

Step,Training Loss


TrainOutput(global_step=40, training_loss=4.088859939575196, metrics={'train_runtime': 6.248, 'train_samples_per_second': 40.013, 'train_steps_per_second': 6.402, 'total_flos': 30699933081600.0, 'train_loss': 4.088859939575196, 'epoch': 10.0})

### 2.2.4 - Save Models
Save the models. These models are ready to be used, as long as we have the pre-trained model from which they were created in memory.

In [47]:
trainer_prompt.model.save_pretrained(output_directory_prompt)
trainer_sentences.model.save_pretrained(output_directory_sentences)


### 2.2.5 - Inference

Load the model from the path that it has been saved in, and ask the model to generate text.

Set `is_trainable` to `False` to avoid unnecessary overhead, such as computing gradients.

In [53]:
from peft import PeftModel

**Model_1**

In [54]:
loaded_model_prompt = PeftModel.from_pretrained(foundational_model,
                                         output_directory_prompt,
                                         #device_map='auto',
                                         is_trainable=False)

In [51]:
device = loaded_model_prompt.device # on GPU
input_prompt = input_prompt.to(loaded_model_prompt.device) # on GPU
input_sentences = input_sentences.to(loaded_model_prompt.device) # on GPU


In [52]:
loaded_model_prompt_outputs = get_outputs(loaded_model_prompt, input_prompt)
print(tokenizer.batch_decode(loaded_model_prompt_outputs, skip_special_tokens=True))

['I want you to act as a motivational coach.  You can do this by asking your students for feedback on their progress and helping them with any questions they may have.']


Compare both answers:
* ***Pretrained Model:*** *I want you to act as a motivational coach.  Don't be afraid of being challenged.*
* ***Fine-Tuned Model:*** *I want you to act as a motivational coach.  You can do this by asking your students for feedback on their progress and helping them with any questions they may have.*

**Note:** the model is only trained for a few minutes, but it helped to obtain a response closer to what we were looking for.

**Model_2**

Change the `adaptor` **only**, i.e., the foundation model remains the same.

In [55]:
loaded_model_prompt.load_adapter(output_directory_sentences, adapter_name="quotes")
loaded_model_prompt.set_adapter("quotes")

In [56]:
loaded_model_sentences_outputs = get_outputs(loaded_model_prompt, input_sentences)
print(tokenizer.batch_decode(loaded_model_sentences_outputs, skip_special_tokens=True))

['There are two nice things that should matter to you: money and time.']


With the second model we have a similar result.
* **Pretrained Model:** *There are two nice things that should matter to you: the price and quality of your product.*
* **Fine-Tuned Model:** *There are two nice things that should matter to you: money and time.*

