<a href="https://colab.research.google.com/github/nimishsoni/Large-Language-Model-Notebooks/blob/main/PEFT_LoRA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Parameter Efficient Fine Tuning using LoRA
- Model Used: pretrained bloom 560M model from HF.
- library used to implement: peft
- dataset used for fine-tuning the model: awesome chatgpt prompts

In [2]:
# Import transformer library for tokenization and modeling
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'bigscience/bloom-560m'

tokenizer = AutoTokenizer.from_pretrained(model_id)
foundational_model = AutoModelForCausalLM.from_pretrained(model_id)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/693 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

In [3]:
# Inference - Get output
def model_response(model, input, max_new_tokens = 100):
  response = model.generate(input_ids = input['input_ids'], attention_mask=input['attention_mask'], max_new_tokens = max_new_tokens, repetition_penalty = 1.5, early_stopping = True, eos_token_id = tokenizer.eos_token_id)
  return response

In [4]:
input_tokens = tokenizer('I want you to act as a motivational coach.', return_tensors = 'pt')
prompt_response = model_response(foundational_model, input_tokens, max_new_tokens = 50)
print(tokenizer.batch_decode(prompt_response, skip_special_tokens = True))



['I want you to act as a motivational coach. You will be able to:\n• Develop and implement strategies for improving your performance, including: • Motivating yourself;']


In [5]:
from datasets import load_dataset
dataset = "fka/awesome-chatgpt-prompts"

#Create the Dataset to create prompts.
data = load_dataset(dataset)
print(data)
data = data.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
train_sample = data["train"].select(range(50))

train_sample = train_sample.remove_columns('act')

display(train_sample)

Downloading readme:   0%|          | 0.00/274 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/74.6k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['act', 'prompt'],
        num_rows: 153
    })
})


Map:   0%|          | 0/153 [00:00<?, ? examples/s]

Dataset({
    features: ['prompt', 'input_ids', 'attention_mask'],
    num_rows: 50
})

## Fine-tuning

In [6]:
import peft
from peft import LoraConfig, get_peft_model

In [7]:
lora_config = LoraConfig(
    r=4, #As bigger the R bigger the parameters to train.
    lora_alpha=1, # a scaling factor that adjusts the magnitude of the weight matrix. Usually set to 1
    target_modules=["query_key_value"], #You can obtain a list of target modules in the URL above.
    lora_dropout=0.05, #Helps to avoid Overfitting.
    bias="lora_only", # this specifies if the bias parameter should be trained.
    task_type="CAUSAL_LM"
)

In [8]:
peft_model = get_peft_model(foundational_model, lora_config)
print(peft_model.print_trainable_parameters())

trainable params: 393,216 || all params: 559,607,808 || trainable%: 0.07026635339584111
None


In [9]:
#Create a directory to contain the Model
import os
working_dir = './'

output_directory = os.path.join(working_dir, "peft_lora_outputs")

In [10]:
#Creating the TrainingArgs
import transformers
from transformers import TrainingArguments, Trainer

In [14]:
training_args = TrainingArguments(
    output_dir=output_directory,
    auto_find_batch_size=True, # Find a correct bvatch size that fits the size of Data.
    learning_rate= 3e-2, # Higher learning rate than full fine-tuning.
    num_train_epochs=10,
    per_device_train_batch_size=8,  # Adjust this batch size based on your GPU memory
    per_device_eval_batch_size=8,   # Adjust this batch size for evaluation if needed
)

In [15]:
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_sample,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
trainer.train()

You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=70, training_loss=2.3124032156808036, metrics={'train_runtime': 46.1265, 'train_samples_per_second': 10.84, 'train_steps_per_second': 1.518, 'total_flos': 101912866062336.0, 'train_loss': 2.3124032156808036, 'epoch': 10.0})

## Inference

In [24]:
import torch
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU/CUDA is available.")
else:
    device = torch.device("cpu")
    print("No GPU/CUDA available, using CPU.")

input_sentences = tokenizer("I want you to act as a motivational coach. ", return_tensors="pt").to(device)
#foundational_outputs_sentence = model_response(peft_model, input_sentences, max_new_tokens=50)
with torch.no_grad():
    foundational_outputs_sentence = model_response(peft_model, input_sentences, max_new_tokens=50)

print(tokenizer.batch_decode(foundational_outputs_sentence, skip_special_tokens=True))

GPU/CUDA is available.
['I want you to act as a motivational coach.  I will provide some information about how people are achieving their goals, and it should be your job in the role of an advisor or consultant who can help them come up with strategies that work better than previously implemented. This could involve providing advice on different types']


Response is similar in structure to prompts on which fine tuning was done. The training was quite fast using GPU with 10 epochs getting completed in 46 seconds