# Prompt Tuning
apply prompt tuning to  model of choice using [Parameter-Efficient Fine-Tuning (PEFT) library developed by HuggingFace](https://huggingface.co/docs/peft/index). This PEFT library supports multiple methods to reduce the number of parameters for fine-tuning, including prompt tuning and LoRA.



1. Apply prompt tuning to model of choice
1. Fine-tune on provided dataset
1. Save and share model to HuggingFace hub
1. Conduct inference using the fine-tuned model
1. Compare outputs from randomly- and text-initialized fine-tuned model vs. foundation model

In [None]:
%pip install peft==0.4.0

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
Collecting peft==0.4.0
  Downloading peft-0.4.0-py3-none-any.whl (72 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72.9/72.9 kB 2.2 MB/s eta 0:00:00
Collecting safetensors
  Downloading safetensors-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 9.1 MB/s eta 0:00:00
Installing collected packages: safetensors, peft
Successfully installed peft-0.4.0 safetensors-0.3.2
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "bigscience/bloomz-560m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
foundation_model = AutoModelForCausalLM.from_pretrained(model_name)

Downloading (…)okenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/715 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

Before doing any fine-tuning, we will ask the model to generate a new phrase to the following input sentence.

In [None]:
input1 = tokenizer("Two things are infinite: ", return_tensors="pt")
foundation_outputs = foundation_model.generate(
    input_ids=input1["input_ids"],
    attention_mask=input1["attention_mask"],
    max_new_tokens=20,
    eos_token_id=tokenizer.eos_token_id
    )
print(tokenizer.batch_decode(foundation_outputs, skip_special_tokens=True))

['Two things are infinite:  the number of people and the number of things']


The output is not too bad. However, the dataset BLOOMZ is pre-trained on doesn't cover anything about inspirational English quotes. Therefore, we are going to fine-tune `bloomz-560m` on [a dataset called `Abirate/english_quotes`](https://huggingface.co/datasets/Abirate/english_quotes)  containing exclusively inspirational English quotes, with the hopes of using the fine-tuned version to generate more quotes later!

In [None]:
from datasets import load_dataset

data = load_dataset("Abirate/english_quotes", cache_dir=DA.paths.datasets+"/datasets")

data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
train_sample = data["train"].select(range(50))
display(train_sample)



Downloading readme:   0%|          | 0.00/5.55k [00:00<?, ?B/s]

Found cached dataset json (/dbfs/mnt/dbacademy-datasets/llm-foundation-models/v01-raw/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)


  0%|          | 0/1 [00:00<?, ?it/s]

Loading cached processed dataset at /dbfs/mnt/dbacademy-datasets/llm-foundation-models/v01-raw/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-768c685e1cb83483.arrow


Dataset({
    features: ['quote', 'author', 'tags', 'input_ids', 'attention_mask'],
    num_rows: 50
})

In [None]:
from peft import  get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.RANDOM,
    num_virtual_tokens=4,
    tokenizer_name_or_path=model_name
)
peft_model = get_peft_model(foundation_model, peft_config)
print(peft_model.print_trainable_parameters())

trainable params: 4,096 || all params: 559,218,688 || trainable%: 0.0007324504863471229
None


In [None]:
from transformers import TrainingArguments
import os

output_directory = os.path.join(DA.paths.working_dir, "peft_outputs")

if not os.path.exists(DA.paths.working_dir):
    os.mkdir(DA.paths.working_dir)
if not os.path.exists(output_directory):
    os.mkdir(output_directory)

training_args = TrainingArguments(
    output_dir=output_directory, # Where the model predictions and checkpoints will be written
    no_cuda=True, # This is necessary for CPU clusters.
    auto_find_batch_size=True, # Find a suitable batch size that will fit into memory automatically
    learning_rate= 3e-2, # Higher learning rate than full fine-tuning
    num_train_epochs=1 # Number of passes to go through the entire fine-tuning dataset
)

In [None]:
from transformers import Trainer, DataCollatorForLanguageModeling

trainer = Trainer(
    model=peft_model, # We pass in the PEFT version of the foundation model, bloomz-560M
    args=training_args,
    train_dataset=train_sample,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False) # mlm=False indicates not to use masked language modeling
)

trainer.train()

You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=7, training_loss=3.6653319767543246, metrics={'train_runtime': 143.4858, 'train_samples_per_second': 0.348, 'train_steps_per_second': 0.049, 'total_flos': 12033285439488.0, 'train_loss': 3.6653319767543246, 'epoch': 1.0})

## Save model

In [None]:
import time

time_now = time.time()
peft_model_path = os.path.join(output_directory, f"peft_model_{time_now}")
trainer.model.save_pretrained(peft_model_path)

## Inference

You can load the model from the path that you have saved to before, and ask the model to generate text based on our input before!

In [None]:
from peft import PeftModel

loaded_model = PeftModel.from_pretrained(foundation_model,
                                         peft_model_path,
                                         is_trainable=False)

In [None]:
loaded_model_outputs = loaded_model.generate(
    input_ids=input1["input_ids"],
    attention_mask=input1["attention_mask"],
    max_new_tokens=7,
    eos_token_id=tokenizer.eos_token_id
    )
print(tokenizer.batch_decode(loaded_model_outputs, skip_special_tokens=True))

['Two things are infinite:  the number of people and the number']


## Text initialization

Our fine-tuned, randomly initialized model did pretty well on the quote above. Let's now compare it with the text initialization method.

Notice that all we are changing is the `prompt_tuning_init` setting and we are also providing a concise text prompt.

API docs
* [prompt_tuning_init_text](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig.prompt_tuning_init_text)

In [None]:
text_peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.TEXT,
    prompt_tuning_init_text="Generate inspirational quotes", # this provides a starter for the model to start searching for the best embeddings
    num_virtual_tokens=3, # this doesn't have to match the length of the text above
    tokenizer_name_or_path=model_name
)
text_peft_model = get_peft_model(foundation_model, text_peft_config)
print(text_peft_model.print_trainable_parameters())

trainable params: 3,072 || all params: 559,217,664 || trainable%: 0.0005493388706691496
None


In [None]:
text_trainer = Trainer(
    model=text_peft_model,
    args=training_args,
    train_dataset=train_sample,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

text_trainer.train()



Step,Training Loss


TrainOutput(global_step=7, training_loss=3.153122220720564, metrics={'train_runtime': 134.5159, 'train_samples_per_second': 0.372, 'train_steps_per_second': 0.052, 'total_flos': 12033285439488.0, 'train_loss': 3.153122220720564, 'epoch': 1.0})

In [None]:
# Save the model
time_now = time.time()
text_peft_model_path = os.path.join(output_directory, f"text_peft_model_{time_now}")
text_trainer.model.save_pretrained(text_peft_model_path)

# Load model
loaded_text_model = PeftModel.from_pretrained(
    foundation_model,
    text_peft_model_path,
    is_trainable=False
)

# Generate output
text_outputs = text_peft_model.generate(
    input_ids=input1["input_ids"],
    attention_mask=input1["attention_mask"],
    max_new_tokens=7,
    eos_token_id=tokenizer.eos_token_id
)

print(tokenizer.batch_decode(text_outputs, skip_special_tokens=True))

['Two things are infinite:  the number of people who will hear']


You can see that text initialization doesn't necessarily perform better than random initialization.

In [None]:
from huggingface_hub import login

os.environ["huggingface_key"] = dbutils.secrets.get("llm_scope", "huggingface_key")
hf_token = os.environ["huggingface_key"]
login(token=hf_token)

[0;31m---------------------------------------------------------------------------[0m
[0;31mIllegalArgumentException[0m                  Traceback (most recent call last)
File [0;32m<command-1738672828157799>:3[0m
[1;32m      1[0m [38;5;28;01mfrom[39;00m [38;5;21;01mhuggingface_hub[39;00m [38;5;28;01mimport[39;00m login
[0;32m----> 3[0m os[38;5;241m.[39menviron[[38;5;124m"[39m[38;5;124mhuggingface_key[39m[38;5;124m"[39m] [38;5;241m=[39m dbutils[38;5;241m.[39msecrets[38;5;241m.[39mget([38;5;124m"[39m[38;5;124mllm_scope[39m[38;5;124m"[39m, [38;5;124m"[39m[38;5;124mhuggingface_key[39m[38;5;124m"[39m)
[1;32m      4[0m hf_token [38;5;241m=[39m os[38;5;241m.[39menviron[[38;5;124m"[39m[38;5;124mhuggingface_key[39m[38;5;124m"[39m]
[1;32m      5[0m login(token[38;5;241m=[39mhf_token)

File [0;32m/databricks/python_shell/dbruntime/dbutils.py:240[0m, in [0;36mDBUtils.SecretsHandler.get[0;34m(self, scope, key)[0m
[1;32m    239[0m [38

In [None]:
from huggingface_hub import notebook_login

notebook_login()

In [None]:

hf_username = <FILL_IN_WITH_YOUR_HUGGINGFACE_USERNAME>
peft_model_id = f"{hf_username}/bloom_prompt_tuning_{time_now}"
trainer.model.push_to_hub(peft_model_id, use_auth_token=True)

### Inference from model in HuggingFace hub

In [None]:
from peft import PeftModel, PeftConfig

config = PeftConfig.from_pretrained(peft_model_id)
foundation_model = AutoModelForCausalLM.from_pretrained(peft_config.base_model_name_or_path)
peft_random_model = PeftModel.from_pretrained(foundation_model, peft_model_id)

[0;31m---------------------------------------------------------------------------[0m
[0;31mNameError[0m                                 Traceback (most recent call last)
File [0;32m<command-1738672828157804>:3[0m
[1;32m      1[0m [38;5;28;01mfrom[39;00m [38;5;21;01mpeft[39;00m [38;5;28;01mimport[39;00m PeftModel, PeftConfig
[0;32m----> 3[0m config [38;5;241m=[39m PeftConfig[38;5;241m.[39mfrom_pretrained(peft_model_id)
[1;32m      4[0m foundation_model [38;5;241m=[39m AutoModelForCausalLM[38;5;241m.[39mfrom_pretrained(peft_config[38;5;241m.[39mbase_model_name_or_path)
[1;32m      5[0m peft_random_model [38;5;241m=[39m PeftModel[38;5;241m.[39mfrom_pretrained(foundation_model, peft_model_id)

[0;31mNameError[0m: name 'peft_model_id' is not defined

In [None]:
online_model_outputs = peft_random_model.generate(
    input_ids=input1["input_ids"],
    attention_mask=input1["attention_mask"],
    max_new_tokens=7,
    eos_token_id=tokenizer.eos_token_id
    )

print(tokenizer.batch_decode(online_model_outputs, skip_special_tokens=True))