# LoRA (Low Rank Adaption)

LoRA allowing you to finetune only a small number of extra weights in the model while freezing most of the parameters of the pre trained network.

Key idea here: we are not actually training the original weights. We are adding some extra weights. We are going to fine tune those.

One of the main advantages of LoRA is: We have got original weights so this tends to help with stopping catastrophic forgetting. (Catastrophic forgetting: Models tend to forget what they were originally trained on when you do a fine tuning. If you do fine tuning too much, you may end up with catastrophic forgetting.)

LoRA not only makes fine tuning way faster but it also generates these small adapters that you can plag and play together with your model to get your model solved specific tasks.

## Why LoRA?

A foundation model knows how to do many things, but it is not great at many tasks. We can fine-tune the model to produce specialized models that are very good at solving specific tasks.

The aim is not creating copies of foundation model. (copies that are specialized on different tasks) We get this from regular fine tuning process.

In LoRA, we are talking about adapters whihc are small neural networks that they can plug in into different layers of the foundation models.

In summary, there exist a foundation model plus small adapters that are going to make that foundation model act differently.

We can load these adapters together with a model to dynamically transform its capabilities.

## The logic of adapters
Something that I can dynamically load, do an addition and get a completely new model that acts differently.

When loading the model, we will take the foundation model's original weights and apply the LoRA weight changes to it to get the fine tuned model weights. 

Foundation model's original weights + LoRA weight changes = Specialized model's fine tuned weights

![Lora Beginning](images/lorabegining.png)

However there is a problem above. We want that adapters should be small. As we can see the adapter has the same size with the foundation model. What is going on?

The beauty of LoRA is that we don't need to fine tune the entire matrix of weights. Instead, we can get away by fine-tuning two matrices of lower rank. These matrices, when multiplied together, will get us the weight updates we will need to apply the foundation model to modify its capabilities. The main logic is: We can represent a big matrix as multiplication of two matrices that are smaller.

![Lora](images/lora.png)


The adapter is not the LoRA weight changes in the first photo. The adapter is these 2 smaller matrices.


At run time, I can load my foundation model, load the adapter, do the multiplication(fast operation). After that we get the weight changes. Get those weight changes and add them up to the original weights to get the fine tune weights. 
Here is how much you can save when using LoRA to fine tune models of different sizes:

![GainLora](images/gainLoRA.png)

Of course, this is just an illustiration. LoRA is not just 2 matrices. İt is 2 matrices for each one of the layers where we want to apply LoRA.

Savings are huge so in terms of storing this model or these adapters we are going to have huge decrease in memory and disk space.


Also, realize that instead of trying to fine tune every single parameter individually we can focus on finding or fine tuning these 2 matrices here. Basically take 2 matrices and find values that multiplied give us list of changes that added up to the original model get us closer to our objective. (In that way fine tuning process is getting huge boost since we are only changing these 2 matrices' parameters.)








In [None]:
!pip install --quiet transformers accelerate evaluate datasets peft

We are going to use a Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21843 classes) at resolution 224x224 and fine tuned on ImageNet 2012 (1 million images, 1000 classes) at resolution 224x224

This model has a size of 346 MB on disk

In [None]:
# I will fine tune a vision transformer
# first aim: improve it (recognizing the items)
# second aim: good at recognizing the differences (cat vs dog)
model_checkpoint = "google/vit-base-patch16-224-in21k"

## Creating a couple of helpful functions

In [None]:
import os
import torch
from peft import PeftModel, LoraConfig, get_peft_model
from transformers import AutoModelForImageClassification


# this gives the size of the model in that way we can measure the saving
def print_model_size(path):
    size = 0
    for f in os.scandir(path):
        size += os.path.getsize(f)
    print(f"Model size: {(size / 1e6):.2} MB")


# this shows how many parameters we are going to train in that way we can measure the savings on parameters
def print_trainable_parameters(model, label):
    parameters, trainable = 0, 0

    for _, p in model.named_parameters():
        parameters += p.numel()
        trainable += p.numel() if p.requires_grad else 0

    print(f"{label} trainable parameters: {trainable:,}/{parameters:,} ({100 * trainable / parameters:.2f}%)")


def split_dataset(dataset):
    dataset_splits = dataset.train_test_split(test_size = 0.1)
    return dataset_splits.values()



def create_label_mappings(dataset):
    # label2id means i.e. orange is label and 3 is id of orange.It keeps as orange:3
    # id2label means i.e. 3 is id orange is label of id 3. It keeps as 3:orange
    label2id, id2label = dict(), dict()
    for i, label in enumerate(dataset.features["label"].names):
        label2id[label] = i
        id2label[i] = label

    return label2id, id2label

## Loading and Preparing the Dataset

I will be loading two different dataset to fine-tune the base model:

1. A dataset of pictures of food.
2. A dataset of pictures of cats and dogs

In [None]:
from datasets import load_dataset

# This is food dataset
dataset1 = load_dataset("microsoft/cats_vs_dogs", split="train", trust_remote_code=True)

dataset1_train, dataset1_test = split_dataset(dataset1)

I need these mappings to properly fine tune the Vision Transformer model.

In [None]:
dataset1 = dataset1.rename_column("labels", "label")

dataset1_label2id, dataset1_id2label = create_label_mappings(dataset1)

In [None]:
# this dictionary is going to help me basically train or fine tune both models by just using one section for model1 and one section for model2.
# the below means: when I am going to fine tune model1 I will use dataset1_train as train data and it will iterate 5 epochs etc.
# the number of epochs affects the fine tune process 
config = {
    "model1": {
        "train_data": dataset1_train,
        "test_data": dataset1_test,
        "label2id": dataset1_label2id,
        "id2label": dataset1_id2label,
        "epochs": 1,
        "path": "./lora-model1"
    }
}

Let's create an image processor automatically from the processor configuration specified by the base model.

In [None]:
from transformers import AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained(model_checkpoint, use_fast = True)

I can now prepare the processing pipeline to transform the images in my dataset.

In [None]:
from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    Resize,
    ToTensor,
)

preprocess_pipeline = Compose([
    # resize the image
    Resize(image_processor.size["height"]),
    # centeralize the image
    CenterCrop(image_processor.size["height"]),
    # turn mage to tensor
    ToTensor(),
    # normalizing the image using mean and standart deviation
    Normalize(mean=image_processor.image_mean, std = image_processor.image_std),
])

def preprocess(batch):
    # for every image in batch converting each image to RGB, take it through the pipeline and add it to a batch with pixel values
    batch["pixel_values"] = [
        preprocess_pipeline(image.convert("RGB")) for image in batch["image"]
    ]
    return batch

# Let's see the transform function to every train and test sets
for cfg in config.values():
    # grab the train data and specify the transformation processn for train data it will be preprocess function
    cfg["train_data"].set_transform(preprocess)
    cfg["test_data"].set_transform(preprocess)

## Fine-Tuning the Model

The below functions will help us to fine tune the model.

In [None]:
import numpy as np
import evaluate
import torch
from peft import PeftModel, LoraConfig, get_peft_model
from transformers import AutoModelForImageClassification

metric = evaluate.load("accuracy")

def data_collate(examples):
    """
    Prepare a batch of examples frokm a list of elements of the train or test datasets.
    """
    pixel_values = torch.stack([example["pixel_values"] for example in examples])
    labels = torch.tensor([example["labels"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}


def compute_metrics(eval_pred):
    """
    Compute the model's accuracy on a batch of predictions. I am using accuracy too understand whether my fine tuning process is working or not
    """
    predictions = np.argmax(eval_pred.predictions, axis = 1)
    return metric.compute(predictions=predictions, references= eval_pred.label_ids)


def get_base_model(label2id, id2label):
    """
    Create an image classification base model from the model checkpoint.
    """
    # This basically load the model from model_chcekpoint the original Vision Transformer model.
    return AutoModelForImageClassification.from_pretrained(
        model_checkpoint,
        label2id = label2id,
        id2label = id2label,
        #ignore_mismatched_size = True,
    )


# This function builds the LoRA model
def build_lora_model(label2id, id2label):
    """Build the LoRA model to fine tune the base model"""

    model = get_base_model(label2id, id2label)
    print_trainable_parameters(model, label = "Base model")

    # this specifies how I want to do LoRA
    config = LoraConfig(
        r=16, #rank(represent adapters) to be more accurate increase the rank. Accuracy is not changing too much when the rank changes between 8 and 256
        lora_alpha=16,
        target_modules= ["query", "value"],
        lora_dropout= 0.1,
        bias = "none",
        modules_to_save= ["classifier"],
    )

    # this is going to give us the actual model that we are going to fine tune
    lora_model = get_peft_model(model, config)
    print_trainable_parameters(lora_model, label="LoRA")

    return lora_model

In [None]:
from transformers import TrainingArguments

batch_size = 128
training_arguments = TrainingArguments(
    # parameters of training process (in this case it is just for fine tuning)
    output_dir="./model-checkpoints",
    remove_unused_columns=False,
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-3,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    gradient_accumulation_steps=4,
    # I used float point 16 (I need gpu here)
    fp16=True,
    logging_steps=10,
    load_best_model_at_end=True,
    # I want to compare the accuracy changes before and after fine tuning so I used accuracy
    metric_for_best_model="accuracy",
    label_names=["labels"],
)

Let's now fine tune the model

In [None]:
from transformers import Trainer

for cfg in config.values():
    training_arguments.num_train_epochs = cfg["epochs"]
    
    trainer = Trainer(
        # first parameter of Trainer is which model are we going to be fine tuning (this will be the result of the build_lora_model function)
        build_lora_model(cfg["label2id"], cfg["id2label"]),
        training_arguments,
        # train data use for fine tuning
        train_dataset=cfg["train_data"],
        # dataset to evaluate the fine tuning process
        eval_dataset=cfg["test_data"],
        tokenizer=image_processor,
        # how to compute metrics
        compute_metrics=compute_metrics,
        # how to prepare data to take it through model
        data_collator=data_collate,
    )

    # train model (this is going to do the fine tuning process)
    results = trainer.train()
    # evaluate the model
    evaluation_results = trainer.evaluate(cfg['test_data'])
    print(f"Evaluation accuracy: {evaluation_results['eval_accuracy']}")

    # We can now save the fine-tuned model to disk. (the saving here is just saving for LoRA adapter not the huge model)
    trainer.save_model(cfg["path"])
    print_model_size(cfg["path"])


    # As we can observe below, in regular fine tuning we will have to train/ fine tune 85 million parameters. Instead, we are going to do LoRA. LoRA trainable parameters are only 591 thousand. (Only 0.68% of the original model. That's why the LoRA is faster)

## Result of Fine Tune

![LoraResult](images/loraoutput.png)

I have small adapters and big model. I don't have to load the original model again and again when I need to change the adapter on that model

In [None]:
# pass the mappings that I need to configure the classification heads and the adapter path (path where the adapter is)
# return an already modified or fine tune model that specialize on solving one specific task (I did not have to modify the original big model. Original big model stays unchanged)
def build_inference_model(label2id, id2label, lora_adapter_path):
    """Build the model that will be use to run inference."""

    # Let's load the base model
    model = get_base_model(label2id, id2label)

    # Now, we can create the inference model combining the base model
    # with the fine-tuned LoRA adapter.
    return PeftModel.from_pretrained(model, lora_adapter_path)


# pass image_processor since I have to preprocess tha image before I send it to the model
def predict(image, model, image_processor):
    """Predict the class represented by the supplied image."""
    
    # process the image
    encoding = image_processor(image.convert("RGB"), return_tensors="pt")
    with torch.no_grad():
        # compute the outputs and get logits
        outputs = model(**encoding)
        logits = outputs.logits

    # find the most important item on those logits
    class_index = logits.argmax(-1).item()
    return model.config.id2label[class_index]

In [None]:
for cfg in config.values():
    # inference model is that each model is going to use
    cfg["inference_model"] = build_inference_model(cfg["label2id"], cfg["id2label"], cfg["path"]) 
    # image processor is that each model should use
    cfg["image_processor"] = AutoImageProcessor.from_pretrained(cfg["path"])

In [None]:
samples = [
    # I am specifying that this is image and this is the model that I want to use to classify this image   
    {
        # chicken wings image
        "image": "https://www.allrecipes.com/thmb/AtViolcfVtInHgq_mRtv4tPZASQ=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/ALR-187822-baked-chicken-wings-4x3-5c7b4624c8554f3da5aabb7d3a91a209.jpg",
        "model": "model1",
    },
    {
        # pizza image
        "image": "https://www.simplyrecipes.com/thmb/KE6iMblr3R2Db6oE8HdyVsFSj2A=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/__opt__aboutcom__coeus__resources__content_migration__simply_recipes__uploads__2019__09__easy-pepperoni-pizza-lead-3-1024x682-583b275444104ef189d693a64df625da.jpg",
        "model": "model1"
    }
]
     

In [None]:
from PIL import Image
import requests

# for loop to go through all of the samples
for sample in samples:
    # open the image url
    image = Image.open(requests.get(sample["image"], stream=True).raw)
    
    # grab the inference model and image processor from the specific model specifying these samples
    inference_model = config[sample["model"]]["inference_model"]
    image_processor = config[sample["model"]]["image_processor"]

    # take prediction from given this image, inference model and image processor
    prediction = predict(image, inference_model, image_processor)
    print(f"Prediction: {prediction}")