-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Low-Rank Adaption (LoRA)
This lab introduces how to apply low-rank adaptation (LoRA) to your model of choice using [Parameter-Efficient Fine-Tuning (PEFT) library developed by Hugging Face](https://huggingface.co/docs/peft/index). 


### ![Dolly](https://files.training.databricks.com/images/llm/dolly_small.png) Learning Objectives
1. Apply LoRA to a model
1. Fine-tune on your provided dataset
1. Save your model
1. Conduct inference using the fine-tuned model

In [None]:
%pip install peft==0.4.0

In [None]:
%run ../Includes/Classroom-Setup

In [None]:
def getUsernameFromEnv(lesson):
  '''
  Exception handling for when the working directory is not in the scope
  (i.e. the Classroom-Setup was not run)
  '''
  try:
    return f"../data/testing-files/{lesson}"
  except NameError:
    raise NameError("Working directory not found. Please re-run the Classroom-Setup at the beginning of the notebook.")

def questionPassed(userhome_for_testing, lesson, question):
  '''
  Helper function that writes an empty file named `PASSED` to the designated path
  '''
  from pathlib import Path

  print(f"\u001b[32mPASSED\x1b[0m: All tests passed for {lesson}, {question}")

  path = f"{userhome_for_testing}/{question}"
  Path(path).mkdir(parents=True, exist_ok=True)
  with open(f"{path}/PASSED", "wb") as handle:
      pass # just write an empty file
  
  print ("\u001b[32mRESULTS RECORDED\x1b[0m: Click `Submit` when all questions are completed to log the results.")


def dbTestQuestion1_1(new_model):
  lesson, question = "lesson1", "question1"
  userhome_for_testing = getUsernameFromEnv(lesson)
  
  assert  str(type(new_model)) == "<class '__main__.TransformerEncoder'>", "Test NOT passed: Result should be of type `__main__.TransformerEncoder`"
  
  questionPassed(userhome_for_testing, lesson, question)


# LLM 02L

def dbTestQuestion2_1(r, target_modules):
  lesson, question = "lesson2", "question1"
  userhome_for_testing = getUsernameFromEnv(lesson)

  assert r == 1, "Test NOT passed: `r` should be equal to 1."
  assert target_modules == ["query_key_value"], "Test NOT passed: `target_modules` should be equal to `[\"query_key_value\"]`."
  questionPassed(userhome_for_testing, lesson, question)

def dbTestQuestion2_2(model):
  lesson, question = "lesson2", "question2"
  userhome_for_testing = getUsernameFromEnv(lesson)
  
  trainable_params = 0
  all_param = 0
  for _, param in peft_model.named_parameters():
      all_param += param.numel()
      if param.requires_grad:
          trainable_params += param.numel()

  assert trainable_params > 0, "Test NOT passed: Your new adapted model should have more than 0 training parameters."
  
  questionPassed(userhome_for_testing, lesson, question)

def dbTestQuestion2_3(trainer):
  lesson, question = "lesson2", "question3"
  userhome_for_testing = getUsernameFromEnv(lesson)

  assert str(type(trainer.args)) == "<class 'transformers.training_args.TrainingArguments'>", "Test NOT passed: `trainer.args` should be of type `transformers.training_args.TrainingArguments`."

  questionPassed(userhome_for_testing, lesson, question)

def dbTestQuestion2_4(loaded_model):
  lesson, question = "lesson2", "question4"
  userhome_for_testing = getUsernameFromEnv(lesson)

  assert str(type(loaded_model)) == "<class 'peft.peft_model.PeftModelForCausalLM'>", "Test NOT passed: `loaded_model` should be of class `peft.peft_model.PeftModelForCausalLM`."

  questionPassed(userhome_for_testing, lesson, question) 

def dbTestQuestion2_5(outputs):
  lesson, question = "lesson2", "question5"
  userhome_for_testing = getUsernameFromEnv(lesson)

  assert str(type(outputs)) == "<class 'torch.Tensor'>", "Test NOT passed: `outputs` should be of type `torch.Tensor`." 

  questionPassed(userhome_for_testing, lesson, question)


We will re-use the same dataset and model from the demo notebook.

In [None]:
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "bigscience/bloomz-560m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
foundation_model = AutoModelForCausalLM.from_pretrained(model_name)

data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
train_sample = data["train"].select(range(50))
display(train_sample) 

## Define LoRA configurations

By using LoRA, you are unfreezing the attention `Weight_delta` matrix and only updating `W_a` and `W_b`. 

<img src="https://files.training.databricks.com/images/llm/lora.png" width=300>

You can treat `r` (rank) as a hyperparameter. Recall from the lecture that, LoRA can perform well with very small ranks based on [Hu et a 2021's paper](https://arxiv.org/abs/2106.09685). GPT-3's validation accuracies across tasks with ranks from 1 to 64 are quite similar. From [PyTorch Lightning's documentation](https://lightning.ai/pages/community/article/lora-llm/):

> A smaller r leads to a simpler low-rank matrix, which results in fewer parameters to learn during adaptation. This can lead to faster training and potentially reduced computational requirements. However, with a smaller r, the capacity of the low-rank matrix to capture task-specific information decreases. This may result in lower adaptation quality, and the model might not perform as well on the new task compared to a higher r.

Other arguments:
- `lora_dropout`: 
  - Dropout is a regularization method that reduces overfitting by randomly and temporarily removing nodes during training. 
  - It works like this: <br>
    * Apply to most type of layers (e.g. fully connected, convolutional, recurrent) and larger networks
    * Temporarily and randomly remove nodes and their connections during each training cycle
    ![](https://files.training.databricks.com/images/nn_dropout.png)
    * See the original paper here: <a href="http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf" target="_blank">Dropout: A Simple Way to Prevent Neural Networks from Overfitting</a>
- `target_modules`:
  - Specifies the module names to apply to 
  - This is dependent on how the foundation model names its attention weight matrices. 
  - Typically, this can be:
    - `query`, `q`, `q_proj` 
    - `key`, `k`, `k_proj` 
    - `value`, `v` , `v_proj` 
    - `query_key_value` 
  - The easiest way to inspect the module/layer names is to print the model, like we are doing below.

### Question 1

Fill in `r=1` and `target_modules`. 

Note:
- For `r`, any number is valid. The smaller the r is, the fewer parameters there are to update during the fine-tuning process.

Hint: 
- For `target_modules`, what's the name of the **first** module within each `BloomBlock`'s `self_attention`? 

Read more about [`LoraConfig` here](https://huggingface.co/docs/peft/conceptual_guides/lora#common-lora-parameters-in-peft).

In [None]:
# TODO
import peft
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=1,
    lora_alpha=1, # a scaling factor that adjusts the magnitude of the weight matrix. Usually set to 1
    target_modules=["query_key_value"],
    lora_dropout=0.05, 
    bias="none", # this specifies if the bias parameter should be trained. 
    task_type="CAUSAL_LM"
)

In [None]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion2_1(lora_config.r, lora_config.target_modules)

###  Question 2

Add the adapter layers to the foundation model to be trained

In [None]:
# TODO
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "bigscience/bloomz-560m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
foundation_model = AutoModelForCausalLM.from_pretrained(model_name)

# peft_config = PromptTuningConfig(
#     task_type=TaskType.CAUSAL_LM,
#     prompt_tuning_init=PromptTuningInit.RANDOM,
#     num_virtual_tokens=4,
#     tokenizer_name_or_path=model_name
# )

peft_model = get_peft_model(foundation_model, lora_config)
print(peft_model.print_trainable_parameters())

In [None]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion2_2(peft_model)

## Define `Trainer` class for fine-tuning

### Question 3 

Fill out the `Trainer` class. Feel free to tweak the `training_args` we provided, but remember that lowering the learning rate and increasing the number of epochs will increase training time significantly. If you change none of the defaults we set below, it could take ~15 mins to fine-tune.

In [None]:
# TODO
import transformers
from transformers import TrainingArguments, Trainer
import os

output_directory = os.path.join("../data", "peft_lab_outputs")
training_args = TrainingArguments(
    output_dir=output_directory,
    auto_find_batch_size=True,
    learning_rate= 3e-2, # Higher learning rate than full fine-tuning.
    num_train_epochs=5,
    no_cuda=False
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_sample,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
trainer.train()

In [None]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion2_3(trainer)

## Load model

### Question 4 

Load the PEFT model using pre-defined LoRA configs and foundation model. We set `is_trainable=False` to avoid further training.

In [None]:
import time

time_now = time.time()

username = "fran" # spark.sql("SELECT CURRENT_USER").first()[0]
peft_model_path = os.path.join(output_directory, f"peft_model_{time_now}")

trainer.model.save_pretrained(peft_model_path)

In [None]:
# TODO
from peft import PeftModel, PeftConfig

loaded_model = PeftModel.from_pretrained(peft_model, 
                                        peft_model_path, 
                                        is_trainable=False)

In [None]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion2_4(loaded_model)

## Inference

### Question 5

Generate output tokens to the same input we provided in the demo notebook before. How do the outputs compare?

In [None]:
dir(inputs)

In [None]:
peft_model.device

In [None]:
# TODO
inputs = tokenizer("Two things are infinite: ", return_tensors="pt")
inputs = inputs.to(peft_model.device)
outputs = peft_model.generate(
    input_ids=inputs["input_ids"], 
    attention_mask=inputs["attention_mask"], 
    max_new_tokens=28, 
    eos_token_id=tokenizer.eos_token_id
    )
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

In [None]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion2_5(outputs)

-sandbox
&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>