## Week 6: LoRA

Welcome to this lab! We will talk about LoRA which has been popular lately!

The objectives of this lab are as follow:

1. Make you folks understand better about LoRA! Feel free to ask and discuss, we are here to learn more!
2. You have a better understanding the underlying inner working of LoRA!
3. Know the quote "Don't reinvent the wheel"? You will know how to use a  library that implement LoRA instantly where you don't have to code it FROM SCRATCH!



## Tools

Here, we use these library to implement the learning algorithm:
1. `pytorch`
2. `transformers`: Library to download a pre-trained model
3. `lightning`: Tools to make training easier and no boilerplate
4. `datasets`: load data uploaded from Huggingface
5. `peft`: Parameter-Efficient Fine-Tuning library




## Motivation

oalalalalla

## LETS GO!

In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F

import math
from typing import Optional, List
from transformers import AutoModelForCausalLM, AutoTokenizer
import 

In [2]:
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-0.5B-Chat",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-0.5B-Chat")
 

model.safetensors: 100%|██████████| 1.24G/1.24G [00:39<00:00, 31.6MB/s]
generation_config.json: 100%|██████████| 180/180 [00:00<00:00, 66.3kB/s]
tokenizer_config.json: 100%|██████████| 1.41k/1.41k [00:00<00:00, 4.65MB/s]
vocab.json: 100%|██████████| 2.78M/2.78M [00:01<00:00, 1.67MB/s]
merges.txt: 100%|██████████| 1.67M/1.67M [00:00<00:00, 3.70MB/s]
tokenizer.json: 100%|██████████| 7.03M/7.03M [00:00<00:00, 9.90MB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [149]:
tokenizer.tokenize("")

['K', 'LEN', 'APA']

In [70]:
class LoRALayer():
    def __init__(
        self, 
        r: int, 
        lora_alpha: int, 
        lora_dropout: float,
        merge_weights: bool,
    ):
        """
        LoRA layer for the Qwen model.

        Args:
            r: The number of rnaks to consider for the LoRA .
            lora_alpha: The alpha value for the LoRA layer.
            lora_dropout: The dropout rate for the LoRA layer.
            merge_weights: Whether to merge the weights of the LoRA layer.
        """
        self.r = r
        self.lora_alpha = lora_alpha
        # Optional dropout
        if lora_dropout > 0.:
            self.lora_dropout = nn.Dropout(p=lora_dropout)
        else:
            self.lora_dropout = lambda x: x
        # Mark the weight as unmerged
        self.merged = False  # Whether the weights have been merged yet
        self.merge_weights = merge_weights

In [140]:
class EmbeddingLORA(nn.Module, LoRALayer):
    """
    LORA for nn.Embedding
    """

    def __init__(
        self,
        num_embeddings: int,
        embedding_dim: int,
        r: int = 0,
        lora_alpha: int = 1,
        merge_weights: bool = True,
        **kwargs
    ):
        """
        Args:
            num_embeddings: Number of embeddings.
            embedding_dim: The size of each embedding vector.
            r: The number of ranks to consider for the LoRA.
            lora_alpha: The alpha value for the LoRA layer.
            merge_weights: Whether to merge the weights of the LoRA layer.
            kwargs: Other parameters for nn.Embedding.
        """
        nn.Module.__init__(self)
        LoRALayer.__init__(self, r=r, lora_alpha=lora_alpha, lora_dropout=0,
                           merge_weights=merge_weights)
        self.embedding = nn.Embedding(num_embeddings, embedding_dim, **kwargs)        
        # Actual trainable parameters
        if r > 0:
            self.lora_A = nn.Parameter(self.embedding.weight.new_zeros((num_embeddings, r)))
            self.lora_B = nn.Parameter(self.embedding.weight.new_zeros((r, embedding_dim)))
            self.scaling = self.lora_alpha / self.r
            # Freezing the pre-trained weight matrix (Embedding)
            self.embedding.weight.requires_grad = False
        self.reset_parameters()


    def assign_object(self, obj: nn.Embedding):
        """
        Assign the object to the current object.
        Useful to copy the parameters of an existing object.

        Args:
            obj: The object to assign.
        """
        self.embedding = obj

    def reset_parameters(self):
        self.embedding.reset_parameters()
        if hasattr(self, 'lora_A'):
            # initialize A the same way as the default for nn.Linear and B to zero
            nn.init.zeros_(self.lora_A)
            nn.init.normal_(self.lora_B)

    def train(self, mode: bool = True):
        """
        From the original paper........
        """
        self.embedding.train(mode)
        if mode:
            if self.merge_weights and self.merged:
                # Make sure that the weights are not merged
                if self.r > 0:
                    self.weight.data -= (self.lora_A @ self.lora_B) * self.scaling
                self.merged = False
        else:
            if self.merge_weights and not self.merged:
                # Merge the weights and mark it
                if self.r > 0:
                    self.weight.data += (self.lora_A @ self.lora_B) * self.scaling
                self.merged = True
    
    def merge_weights(self):
        """
        Merge the weights of the LoRA layer.
        """
        if self.r > 0:
            self.embedding.weight.data += (self.lora_A @ self.lora_B) * self.scaling
            self.merged = True
        else:
            raise ValueError("The rank parameter is not set.")
    
    def unmerge_weights(self):
        """
        Unmerge the weights of the LoRA layer.
        """
        if self.r > 0:
            self.embedding.data -= (self.lora_A @ self.lora_B) * self.scaling
            self.merged = False
        else:
            raise ValueError("The rank parameter is not set.")

    def forward(self, x: torch.Tensor):
        if self.r > 0 and not self.merged:
            result = self.embedding.forward(x)
            after_A = F.embedding(
                x, self.lora_A, self.embedding.padding_idx, self.embedding.max_norm,
                self.embedding.norm_type,  self.embedding.scale_grad_by_freq,  self.embedding.sparse
            )
            result += (after_A @ self.lora_B) * self.scaling
            return result
        else:
            return self.embedding.forward(x)
            

## Test it out

In [None]:
model

In [138]:
embed = EmbeddingLORA(50, 100, r=2)
test_input  = torch.randint(0, 50, (4, 2))
embed.train()
embed(test_input).shape

torch.Size([4, 2, 100])

In [139]:
model

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): EmbeddingLORA(151936, 1024)
    (layers): ModuleList(
      (0-23): 24 x Qwen2DecoderLayer(
        (self_attn): Qwen2SdpaAttention(
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (o_proj): Linear(in_features=1024, out_features=1024, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=1024, out_features=2816, bias=False)
          (up_proj): Linear(in_features=1024, out_features=2816, bias=False)
          (down_proj): Linear(in_features=2816, out_features=1024, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): 

In [141]:
def apply_lora(
    model,
    r: int,
    lora_alpha: int,
    merge_weights: bool = False,
):
    """
    Recursively replaces all Embedding and Linear layers in a PyTorch model with a LORA layer.

    Args:
        model: The PyTorch model to modify.
    """
    for name, module in model.named_children():
        if isinstance(module, nn.Embedding):
            print("Replacing", name, "with LORA")
            # Create a new instance of EmbeddingLORA with the same configurations
            new_module = EmbeddingLORA(
                num_embeddings=module.num_embeddings,
                embedding_dim=module.embedding_dim,
                padding_idx=module.padding_idx,
                max_norm=module.max_norm,
                norm_type=module.norm_type,
                scale_grad_by_freq=module.scale_grad_by_freq,
                sparse=module.sparse,
                r=r,
                lora_alpha=lora_alpha,
                merge_weights=merge_weights,
            )
            # Copy the weights from the original embedding to the new LORA embedding
            new_module.assign_object(module)
            # Replace the module in the model with the new one
            setattr(model, name, new_module)
        else:
            # Recursively apply the function to submodules
            apply_lora(module, r, lora_alpha, merge_weights)

In [142]:
apply_lora(model, r=2, lora_alpha=1, merge_weights=False)

Replacing embed_tokens with LORA


In [150]:
import pandas as pd

In [154]:
df_mbzuai = pd.read_csv('mbzuai.csv')

In [172]:
def convert_to_qwen_format(data, tokenizer):
    messages = [
        {"role": "user", "content": data['user']},
        {"role": "assistant", "content": data['assistant']}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    return text

In [174]:
import datasets

In [173]:
df_mbzuai.apply(lambda x: convert_to_qwen_format(x, tokenizer), axis=1)

0     <|im_start|>system\nYou are a helpful assistan...
1     <|im_start|>system\nYou are a helpful assistan...
2     <|im_start|>system\nYou are a helpful assistan...
3     <|im_start|>system\nYou are a helpful assistan...
4     <|im_start|>system\nYou are a helpful assistan...
5     <|im_start|>system\nYou are a helpful assistan...
6     <|im_start|>system\nYou are a helpful assistan...
7     <|im_start|>system\nYou are a helpful assistan...
8     <|im_start|>system\nYou are a helpful assistan...
9     <|im_start|>system\nYou are a helpful assistan...
10    <|im_start|>system\nYou are a helpful assistan...
11    <|im_start|>system\nYou are a helpful assistan...
12    <|im_start|>system\nYou are a helpful assistan...
13    <|im_start|>system\nYou are a helpful assistan...
14    <|im_start|>system\nYou are a helpful assistan...
15    <|im_start|>system\nYou are a helpful assistan...
16    <|im_start|>system\nYou are a helpful assistan...
17    <|im_start|>system\nYou are a helpful assi

In [178]:
train_data = datasets.Dataset.from_pandas(df_mbzuai)

In [None]:
from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=False,
)

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./mbzuai",
    overwrite_output_dir=True,
    num_train_epochs=5,
    per_device_train_batch_size=8,
    save_steps=25,
    save_total_limit=1,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_data,
)

trainer.train()
