## Prompt Tuning using PEFT Library from Hugging Face.
- Prompt Tuning, or Soft Prompt, is an Additive training Technique. We don't modify the weights of the model, instead we modify the weights of the prompt. To achieve that, we must add some new values to the prompt, and these values are trained. We only modify the weights of the new values in the layers containing the prompt.

- We can modify the behavior of a model by just updating 0.0005% of their weights. Achieving a similar result to other techniques where we update the weights of the model.
# New Section

- models available to be trained with PEFT are: Bloom, Llama, GPT-J, GPT-2, BERT... and more.




Load PEFT, daasets and transformers Library

In [1]:
!pip install peft
!pip install datasets

Collecting peft
  Downloading peft-0.7.1-py3-none-any.whl (168 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.3/168.3 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate>=0.21.0 (from peft)
  Downloading accelerate-0.26.0-py3-none-any.whl (270 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.7/270.7 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: accelerate, peft
Successfully installed accelerate-0.26.0 peft-0.7.1
Collecting datasets
  Downloading datasets-2.16.1-py3-none-any.whl (507 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.1/507.1 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.8,>=0.3.0 (from datasets)
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer

Load the model and tokenizer

In [3]:
model_id = 'bigscience/bloom-560m'
tokenizer = AutoTokenizer.from_pretrained(model_id)
foundational_model  = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code = True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/693 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

In [8]:
import torch
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU/CUDA is available.")
else:
    device = torch.device("cpu")
    print("No GPU/CUDA available, using CPU.")

foundational_model.to(device)

GPU/CUDA is available.


BloomForCausalLM(
  (transformer): BloomModel(
    (word_embeddings): Embedding(250880, 1024)
    (word_embeddings_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
    (h): ModuleList(
      (0-23): 24 x BloomBlock(
        (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (self_attention): BloomAttention(
          (query_key_value): Linear(in_features=1024, out_features=3072, bias=True)
          (dense): Linear(in_features=1024, out_features=1024, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (mlp): BloomMLP(
          (dense_h_to_4h): Linear(in_features=1024, out_features=4096, bias=True)
          (gelu_impl): BloomGelu()
          (dense_4h_to_h): Linear(in_features=4096, out_features=1024, bias=True)
        )
      )
    )
    (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (

Inference using pre-trained model

In [4]:
# Function to generate response by passing model and inputs
def model_response(model, input, max_tokens = 100):
  response = model.generate(input_ids = input['input_ids'], attention_mask = input['attention_mask'], repetition_penalty = 1.5, max_new_tokens = max_tokens, early_stopping = True, eos_token_id = tokenizer.eos_token_id)
  return response

In [11]:
input_prompt = tokenizer("I want you to act as a motivational coach.", return_tensors = 'pt').to(device)
foundational_model_prompt = model_response(foundational_model, input_prompt, max_tokens = 50)
foundational_model_prompt = foundational_model_prompt.to("cpu")
print(tokenizer.batch_decode(foundational_model_prompt, skip_special_tokens=True))

['I want you to act as a motivational coach. You will be able to:\n• Develop and implement strategies for improving your performance, including: • Motivating yourself;']


In [12]:
input_sentence = tokenizer("Two things that matter in life:", return_tensors = 'pt').to(device)
foundational_model_sentence = model_response(foundational_model, input_sentence, max_tokens = 50)
foundational_model_sentence = foundational_model_sentence.to("cpu")
print(tokenizer.batch_decode(foundational_model_sentence, skip_special_tokens=True))

['Two things that matter in life: the way you look at it, and how much of your personality is based on what looks like a good-looking person.']


## Fine tuning using Prompt Tuning
Prepare datasets to be used
- Awesome chatgpt prompts
- Quotes

In [13]:
from datasets import load_dataset

dataset_prompt_name = "fka/awesome-chatgpt-prompts"

#Create the Dataset to create prompts.
raw_prompt_data = load_dataset(dataset_prompt_name)
batch_prompt_data = raw_prompt_data.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
train_sample_prompts = batch_prompt_data["train"].select(range(50))

display(train_sample_prompts)

Downloading readme:   0%|          | 0.00/274 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/74.6k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/153 [00:00<?, ? examples/s]

Dataset({
    features: ['act', 'prompt', 'input_ids', 'attention_mask'],
    num_rows: 50
})

In [14]:
dataset_sentences_name = "Abirate/english_quotes"

#Create the Dataset to create prompts.
raw_sentences_data = load_dataset(dataset_sentences_name)
batch_sentences_data = raw_sentences_data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
train_sample_sentences = batch_sentences_data["train"].select(range(50))

display(train_sample_sentences)

Downloading readme:   0%|          | 0.00/5.55k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/647k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

Dataset({
    features: ['quote', 'author', 'tags', 'input_ids', 'attention_mask'],
    num_rows: 50
})

## Fine-tuning

In [15]:
from peft import  get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

NUM_VIRTUAL_TOKENS = 4
NUM_EPOCHS = 5

generation_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM, #This type indicates the model will generate text.
    prompt_tuning_init=PromptTuningInit.RANDOM,  #The added virtual tokens are initializad with random numbers
    num_virtual_tokens=NUM_VIRTUAL_TOKENS, #Number of virtual tokens to be added and trained.
    tokenizer_name_or_path=foundational_model #The pre-trained model.
)

In [16]:
peft_model_prompt = get_peft_model(foundational_model, generation_config)
print(peft_model_prompt.print_trainable_parameters())

trainable params: 4,096 || all params: 559,218,688 || trainable%: 0.0007324504863471229
None


In [17]:
peft_model_sentences = get_peft_model(foundational_model, generation_config)
print(peft_model_sentences.print_trainable_parameters())

trainable params: 4,096 || all params: 559,218,688 || trainable%: 0.0007324504863471229
None


In [18]:
from transformers import TrainingArguments
def create_training_arguments(path, learning_rate=0.0035, epochs=6):
    training_args = TrainingArguments(
        output_dir=path, # Where the model predictions and checkpoints will be written
        #use_cpu=True, # This is necessary for CPU clusters.
        auto_find_batch_size=True, # Find a suitable batch size that will fit into memory automatically
        per_device_train_batch_size = 8,
        per_device_eval_batch_size = 8,
        learning_rate= learning_rate, # Higher learning rate than full fine-tuning
        num_train_epochs=epochs
    )
    return training_args

In [19]:
import os

working_dir = "./"

#Is best to store the models in separate folders.
#Create the name of the directories where to store the models.
output_directory_prompt =  os.path.join(working_dir, "peft_outputs_prompt")
output_directory_sentences = os.path.join(working_dir, "peft_outputs_sentences")

#Just creating the directoris if not exist.
if not os.path.exists(working_dir):
    os.mkdir(working_dir)
if not os.path.exists(output_directory_prompt):
    os.mkdir(output_directory_prompt)
if not os.path.exists(output_directory_sentences):
    os.mkdir(output_directory_sentences)

In [20]:
training_args_prompt = create_training_arguments(output_directory_prompt, 0.003, NUM_EPOCHS)
training_args_sentences = create_training_arguments(output_directory_sentences, 0.003, NUM_EPOCHS)

In [21]:
from transformers import Trainer, DataCollatorForLanguageModeling
def create_trainer(model, training_args, train_dataset):
    trainer = Trainer(
        model=model, # We pass in the PEFT version of the foundation model, bloomz-560M
        args=training_args, #The args for the training.
        train_dataset=train_dataset, #The dataset used to tyrain the model.
        data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False) # mlm=False indicates not to use masked language modeling
    )
    return trainer

In [22]:
#Training first model using prompts data.
trainer_prompt = create_trainer(peft_model_prompt, training_args_prompt, train_sample_prompts)
trainer_prompt.train()

You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=35, training_loss=3.4851473127092634, metrics={'train_runtime': 23.4241, 'train_samples_per_second': 10.673, 'train_steps_per_second': 1.494, 'total_flos': 50882985099264.0, 'train_loss': 3.4851473127092634, 'epoch': 5.0})

In [23]:
#Training second model using Quotes dataset.
trainer_sentences = create_trainer(peft_model_sentences, training_args_sentences, train_sample_sentences)
trainer_sentences.train()

Step,Training Loss


Step,Training Loss


TrainOutput(global_step=65, training_loss=3.619667170597957, metrics={'train_runtime': 18.9759, 'train_samples_per_second': 13.175, 'train_steps_per_second': 3.425, 'total_flos': 41824464224256.0, 'train_loss': 3.619667170597957, 'epoch': 5.0})

## Inference using fine tuned foundational model with prompts and Quotes

In [28]:
with torch.no_grad():
    foundational_outputs_prompt = model_response(peft_model_prompt, input_prompt, max_tokens=50)

print(tokenizer.batch_decode(foundational_outputs_prompt, skip_special_tokens=True))



['I want you to act as a motivational coach. I will help guide and motivate people through their journey of success.\nThe first step is finding the right person for your coaching job, then getting them on board with what’s important in life so that they can take action towards achieving goals or moving forward']


In [29]:
with torch.no_grad():
    foundational_outputs_sentence = model_response(peft_model_sentences, input_sentence, max_tokens=50)

print(tokenizer.batch_decode(foundational_outputs_sentence, skip_special_tokens=True))



['Two things that matter in life: your friends and family. You can be a good friend to someone who is not, but you will never know what they are thinking or feeling about the person you’re with.\nYou should always have an open mind when it comes time for conversation because people don’t']
