<div align="center">
<h1><a href="https://github.com/peremartra/Large-Language-Model-Notebooks-Course">LLM Hands On Course</a></h1>
    <h3>Understand And Apply Large Language Models</h3>
    <h2>Introduction to Prompt Tuning using PEFT from Hugging Face.</h2>
    <h3>Fine-tune a Foundational Model effortless</h3>
    <p>by <b>Pere Martra</b></p>
</div>

<br>

<div align="center">
    &nbsp;
    <a target="_blank" href="https://www.linkedin.com/in/pere-martra/"><img src="https://img.shields.io/badge/style--5eba00.svg?label=LinkedIn&logo=linkedin&style=social"></a>
    
</div>

<br>
<hr>

# Prompt Tuning
In this notebook I'm introducing how to apply prompt tuning with the PEFT library to a foundational model. 

For a complete list of Models compatible with PEFT refer to their [documentation](https://huggingface.co/docs/peft/main/en/index#supported-methods). 

A short sample of models available to be trained with PEFT are: Bloom, Llama, GPT-J, GPT-2, BERT... and more. Hugging Face is working hard to bring more Models to the Library. 

## Brief introduction to Prompt Tuning. 
Prompt Tuning, or Soft Prompt, is an Additive training Technique. We don't modify the weights of the model, instead we modify the weights of the prompt. To achieve that, we must add some new values to the prompt, and these values are trained. We only modify the weights of the new values in the layers containing the prompt. 

We can modify the behavior of a model by just updating 0.0005% of their weights. Achieving a similar result to other techniques where we update the weights of the model.  

The training is Faster and Cheaper. And not only that, we can train different models and in inference time, we just need to load one foundational model, together with the new small trained models because the weights of the original have not been modified. 

## what are we going to do in the notebook
We are going to train two different models using two datasets and just one Pre Trained model from the Bloom family. A model will be trained with a Dataset of prompts and the other with a Dataset of inspirational sentences. We will compare the results to the same question of the models before and after training. 

We will see how we can load both models having just one copy of the foundational Model in Memory. 


## Load the Peft Library
This library contains the Hugging Face implementation os differente fine-tuning techniques, like Prompt Tuning.

In [1]:
!pip install peft



From the transformers library we import the necesary classes to import the model and the tokenizer. 

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer

### Loading the model and the tokenizers. 

Bloom is one of the smallest and smarter model available to be trained with PEFT Library using Prompt Tuning. You can use either of the models in the Bloom Family, I encorage you to use at least two of them and see the differences. 

I'm using the smallest one just to spend less time trainig, and avoid memory problems in Colab. 

In [3]:
model_name = "bigscience/bloomz-560m"
#model_name="bigscience/bloom-1b1"
NUM_VIRTUAL_TOKENS = 4
NUM_EPOCHS = 5

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
foundational_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True
)

## Inference with the pre trained bloom model

In [5]:
#this function returns the outputs from the model received, and inputs. 
def get_outputs(model, inputs, max_new_tokens=100):
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"], 
        max_new_tokens=max_new_tokens, 
        repetition_penalty=1.5, #Avoid repetition. 
        early_stopping=True, #The model can stop before reach the max_length
        eos_token_id=tokenizer.eos_token_id
    )
    return outputs

Before doing any fine-tuning, we will ask the models to generate a new phrase to the following input sentence.

As we want to have 2 different trained models i will create two diferent prompts. 

The first Model will be trained with a Dataset with prompts, and teh second one with a Dataset with motivation sentences.  

The first model will recieve the prompt "I want you to act as an English translator, " and the second Model "There two thing that matter:". 

In [6]:
input_prompt = tokenizer("I want you to act as a motivational coach. ", return_tensors="pt")
foundational_outputs_prompt = get_outputs(foundational_model, input_prompt, max_new_tokens=50)

print(tokenizer.batch_decode(foundational_outputs_prompt, skip_special_tokens=True))

["I want you to act as a motivational coach.  Don't be afraid of being challenged."]


In [7]:
input_sentences = tokenizer("There two thing that matter:", return_tensors="pt")
foundational_outputs_sentence = get_outputs(foundational_model, input_sentences, max_new_tokens=50)

print(tokenizer.batch_decode(foundational_outputs_sentence, skip_special_tokens=True))

['There two thing that matter: the size and shape of a flower']


Both answer are more or less correct. Any of the Bloom models is pre-trained and can generate sentences correctly, and with sense. Let's see if after training the reponses are equal or more or less acurated. 

## Preparing the Datasets
The Datasets useds are: 
* https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
* https://huggingface.co/datasets/Abirate/english_quotes


In [8]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [9]:
from datasets import load_dataset

dataset_prompt = "fka/awesome-chatgpt-prompts"

#Create the Dataset to create prompts. 
data_prompt = load_dataset(dataset_prompt)
data_prompt = data_prompt.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
train_sample_prompt = data_prompt["train"].select(range(50))

train_sample_prompt = train_sample_prompt.remove_columns('act')

display(train_sample_prompt) 

Dataset({
    features: ['prompt', 'input_ids', 'attention_mask'],
    num_rows: 50
})

In [10]:
dataset_sentences = load_dataset("Abirate/english_quotes")

data_sentences = dataset_sentences.map(lambda samples: tokenizer(samples["quote"]), batched=True)
train_sample_sentences = data_sentences["train"].select(range(50))
train_sample_sentences = train_sample_sentences.remove_columns(['author', 'tags'])
display(train_sample_sentences) 

Dataset({
    features: ['quote', 'input_ids', 'attention_mask'],
    num_rows: 50
})

## fine-tuning.  

### PEFT configurations 


API docs:
https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig

We can use the same configuration for both models to be trained. 


In [11]:
from peft import  get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

generation_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM, #This type indicates the model will generate text. 
    prompt_tuning_init=PromptTuningInit.RANDOM,  #The added virtual tokens are initializad with random numbers
    num_virtual_tokens=NUM_VIRTUAL_TOKENS, #Number of virtual tokens to be added and trained. 
    tokenizer_name_or_path=model_name #The pre-trained model. 
)


### Creating two Prompt Tuning Models. 
We will create two identical prompt tuning models using the same pre-trained model and the same config. 

In [12]:
peft_model_prompt = get_peft_model(foundational_model, generation_config)
print(peft_model_prompt.print_trainable_parameters())

trainable params: 4,096 || all params: 559,218,688 || trainable%: 0.0007324504863471229
None


In [13]:
peft_model_sentences = get_peft_model(foundational_model, generation_config)
print(peft_model_sentences.print_trainable_parameters())

trainable params: 4,096 || all params: 559,218,688 || trainable%: 0.0007324504863471229
None


That's amazing: did you see the reduction in trainable parameters? We are going to train a 0.001% of the paramaters available. 

Now we are going to create the training arguments, and we will use the same configuration in both trainings. 

In [14]:
from transformers import TrainingArguments
def create_training_arguments(path, learning_rate=0.0035, epochs=6):
    training_args = TrainingArguments(
        output_dir=path, # Where the model predictions and checkpoints will be written
        no_cuda=True, # This is necessary for CPU clusters. 
        auto_find_batch_size=True, # Find a suitable batch size that will fit into memory automatically 
        learning_rate= learning_rate, # Higher learning rate than full fine-tuning
        num_train_epochs=epochs 
    )
    return training_args

In [15]:

import os

working_dir = "./"

#Is best to store the models in separate folders. 
#Create the name of the directories where to store the models. 
output_directory_prompt =  os.path.join(working_dir, "peft_outputs_prompt")
output_directory_sentences = os.path.join(working_dir, "peft_outputs_sentences")

#Just creating the directoris if not exist. 
if not os.path.exists(working_dir):
    os.mkdir(working_dir)
if not os.path.exists(output_directory_prompt):
    os.mkdir(output_directory_prompt)
if not os.path.exists(output_directory_sentences):
    os.mkdir(output_directory_sentences)


We need to indicate the directory containing the model when creating the TrainingArguments. 

In [16]:
training_args_prompt = create_training_arguments(output_directory_prompt, 0.003, NUM_EPOCHS)
training_args_sentences = create_training_arguments(output_directory_sentences, 0.003, NUM_EPOCHS)

## Train

We will create the trainer Object, one for each model to train.  

In [17]:
from transformers import Trainer, DataCollatorForLanguageModeling
def create_trainer(model, training_args, train_dataset):
    trainer = Trainer(
        model=model, # We pass in the PEFT version of the foundation model, bloomz-560M
        args=training_args, #The args for the training. 
        train_dataset=train_dataset, #The dataset used to tyrain the model. 
        data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False) # mlm=False indicates not to use masked language modeling
    )
    return trainer
    

In [18]:
#Training first model. 
trainer_prompt = create_trainer(peft_model_prompt, training_args_prompt, train_sample_prompt)
trainer_prompt.train()

You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=35, training_loss=3.6272729056222097, metrics={'train_runtime': 113.226, 'train_samples_per_second': 2.208, 'train_steps_per_second': 0.309, 'total_flos': 50882985099264.0, 'train_loss': 3.6272729056222097, 'epoch': 5.0})

In [19]:
#Training second model. 
trainer_sentences = create_trainer(peft_model_sentences, training_args_sentences, train_sample_sentences)
trainer_sentences.train()

Step,Training Loss


TrainOutput(global_step=35, training_loss=3.7272120884486606, metrics={'train_runtime': 243.8732, 'train_samples_per_second': 1.025, 'train_steps_per_second': 0.144, 'total_flos': 61846080847872.0, 'train_loss': 3.7272120884486606, 'epoch': 5.0})

In less than 10 minutes we trained 2 different models, with two different missions with a same foundational model as a base. 

## Save models
We are going to save the models. These models are ready to be used, as long as we have the pre-trained model from which they were created in memory.

In [20]:
trainer_prompt.model.save_pretrained(output_directory_prompt)
trainer_sentences.model.save_pretrained(output_directory_sentences)


## Inference

You can load the model from the path that you have saved to before, and ask the model to generate text based on our input before!

In [21]:
from peft import PeftModel

loaded_model_prompt = PeftModel.from_pretrained(foundational_model, 
                                         output_directory_prompt,
                                         #device_map='auto', 
                                         is_trainable=False)

In [22]:
loaded_model_prompt_outputs = get_outputs(loaded_model_prompt, input_prompt)
print(tokenizer.batch_decode(loaded_model_prompt_outputs, skip_special_tokens=True))

['I want you to act as a motivational coach.  You can use your own words or phrases.']


If we compare both answers something changed. 
* ***Pretrained Model:** I want you to act as a motivational coach.  Don't be afraid of being challenged.*
* ***Fine Tuned Model:** I want you to act as a motivational coach.  You can use this method if you're not sure what your goals are.*
We have to keep in mind that we have only trained the model for a few minutes, but they have been enough to obtain a response closer to what we were looking for.

In [23]:
loaded_model_sentences = PeftModel.from_pretrained(foundational_model, 
                                         output_directory_sentences,
                                         #device_map='auto', 
                                         is_trainable=False)

In [24]:
loaded_model_sentences_outputs = get_outputs(loaded_model_sentences, input_sentences)
print(tokenizer.batch_decode(loaded_model_sentences_outputs, skip_special_tokens=True))

['There two thing that matter: one is the number of stars in a galaxy and another, its size. The star system']


With the second model we have a similar result.
* ***Pretrained Model:** There two thing that matter: the size and shape of a flower*
* ***Fine Tuned Model:** There two thing that matter: one is the weather and another, what you do.*
 


# Conclusion
Prompt Tuning is an amazing technique than can save us hours of training and a big amount of money. In the notebook we have trained two models in just few minutes and we can have both models in memory giving service to different clients. 

If you want to try different combinations and models the notebook is ready to use another model from the Bloom family. 

Yo can change in the third cell the Epochs to train, the num of virtual tokens and the model. But there are a lot of configurations to change, if you want a good exercise can be change the Random initiation of the virtual tokens by a fixed value. 