In [1]:
import torch
from torch.utils.data import Dataset, DataLoader
from datasets import load_dataset

from transformers import AutoModelForCausalLM, AutoTokenizer, DataCollatorForLanguageModeling
from tqdm import tqdm
import math

  from .autonotebook import tqdm as notebook_tqdm


# LORA: Low Rank Adaptation: an efficient way to fine tune large language models

When we have a specific task to perform with large language models we have various options:

1. Use the model as it is using prompt engineering
2. Fine tune the whole model, updating all its weights
3. Fine tune only some layers instead the whole model.

Pros and cons of each one:

| Approach             | Pros                                                                 | Cons                                                                 |
|----------------------|----------------------------------------------------------------------|----------------------------------------------------------------------|
| Prompt Engineering   | - Fast and cheap                                                     | - Limited control over behavior                                      |
|                      | - No training or infrastructure needed                               | - Performance highly sensitive to prompt wording                     |
|                      | - Easily updated or changed                                          | - May hit model limits on specific tasks                             |
| Fine-Tune Full Model | - Full control over model behavior                                   | - Very resource-intensive (GPU, time, data)                          |
|                      | - Better performance on domain-specific or complex tasks             | - Risk of overfitting or catastrophic forgetting                    |
|                      | - Can learn new capabilities                                         | - Requires re-deployment of large models                             |
| LoRA Fine-Tuning     | - Much less compute and memory than full fine-tuning                 | - Slightly less flexible than full fine-tuning                      |
|                      | - Retains base model unchanged (can swap adapters)                   | - Still needs training pipeline setup                                |
|                      | - Modular and efficient for multiple tasks/domains                   | - May not reach full model’s potential on highly specialized tasks   |

And important remarks:

- If you want to use a high llm from a provider, like GPT from OpenAI or Gemini from google, you simply can't fine tune this model, so prompt engineering is your only available option
- If you have low models, like 1B or 8B, them does not perform very well in very specific tasks, but you can perform fine tune over them with limited resources, greatly improving performance

So, right now we are going to implement and compare two ways to resolve a specific task: **prompt engineering** leaving the original model as it is, and the second way to test: **lora fine-tuning**.

# The task: Question and answers with RACE

We want to train a system with the ability of solve questions and answers where the answer should be picked from a list of options:

```text
Context: A subject which seems to have been insufficiently studied by doctors and psychologists is the influence of geography and climate on the psychological and physical health of mankind. There seems no doubt that the general character of the landscape, the relative length of day and night, and the climate must all play a big part in determining what kind of people we are.
It is true that a few studies have been made. Where all the inhabitants of a particular area enjoy exceptionally good or bad health, scientists have identified contributory factors such as the presence or absence of substances like iodine, fluoride, calcium, or iron in the water supply, or perhaps types of land that provide breeding places for pests like mosquitoes or rats.
Moreover, we can all generalize about types of people we have met. Those living in countries with long dark winters are apt to be less talkative and less vivacious than inhabitants of countries where the climate is more equable. And where the olive and the orange grow, there the inhabitants are cheerful, talkative, and spontaneous.
But these commonplace generalizations are inadequate: the influence of climate and geography should be studied in depth. Do all mountain dwellers live to a ripe old age? Does the drinking of wine, rather than beer, result in a sunny and open temperament? Is the strength and height of one of the Kenyan tribes due to their habitual drinking of the blood of cows?
We are not yet sure of the answers to such questions, but let us hope that something of benefit to mankind may eventually result from such studies.

Question: According to the author, research into the influence of geography and climate should  _  .

Options:
A) focus on some unknown aspects
B) be pursued on a larger scale
C) be carried out within a larger scope
D) go much deeper

Answer: D
```

We are using the `transformers` dataset called `ehvoy/race`, composed by 97k of questions and answers, but
to reduce training times we are going to use only subsets with `context.length < 800`, reducing the original
dataset to a length of `800` in the train set and `56` items in the test set.

In [15]:
class EhovyRaceDataset(Dataset):
    """
    Ehvoy race is a questions and answer dataset
    Variations can get the values all, high, medium and low and depending on this the dataset size may vary
    """
    def __init__(self, variation="high", split="train", max_article_size=None):
        self.raw_dataset = load_dataset("ehovy/race", variation, split=split)
        if max_article_size is not None:
            self.raw_dataset  = self.raw_dataset.filter(
                lambda example: len(example['article']) < max_article_size,
                desc=f"Filtrando artículos en {split}"
            )

    def __len__(self):
        return len(self.raw_dataset)

    def __getitem__(self, idx):
        return self.raw_dataset[idx]

def prompt_with_question(example: dict, include_answer=False) -> str:
    options_str = "\n".join([f"{chr(65 + i)}) {opt}" for i, opt in enumerate(example['options'])])
    answer = f" {example['answer']}" if include_answer else ""
    prompt = f"Context: {example['article']}\n\n" + \
        f"Question: {example['question']}\n\n" + \
        f"Options:\n{options_str}\n\n" + \
        f"Answer:{answer}" # The model is expected to fill this part

    return prompt


class PromptedEhvoy(Dataset):
    """
    Prompted ehvoy is a dataset that convert the original ehvoy structure composed by a dict of
    context, question, options and answer to a single string containing all those information into a single
    string
    """
    def __init__(self, dataset: EhovyRaceDataset, build_prompt=prompt_with_question, include_answer=False):
        self.build_prompt = build_prompt
        self.include_answer = include_answer
        self.dataset = dataset

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx: int):
        return self.build_prompt(self.dataset[idx], include_answer=self.include_answer), self.dataset[idx]['answer']

def get_dataset(split, max_article_size=800, include_answers=False):
    """
    Build the dataset from original ehvoy and convert it to prompted dataset
    """
    ehovy_dataset = EhovyRaceDataset(variation="high", split=split, max_article_size=max_article_size)
    prompted_dataset = PromptedEhvoy(ehovy_dataset, include_answer=include_answers)
    return prompted_dataset

class TokenizedDataset(Dataset):
    """
    Tokenized dataset for text generation tasks.
    """
    def __init__(self, data: PromptedEhvoy, tokenizer, max_length=512):
        self.tokenizer = tokenizer
        self.data = data
        self.max_length = max_length

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        x, _ = self.data[idx]

        x_tokenized = self.tokenizer(
            x,
            padding="max_length",
            truncation=True,
            max_length=self.max_length,
        )

        labels = x_tokenized["input_ids"].copy()

        answer_start_char_idx = x.find("Answer:")
        if answer_start_char_idx != -1:
            answer_token_char_start_idx = answer_start_char_idx + len("Answer: ")

            answer_token_start_index = -1
            for i, (start_offset, end_offset) in enumerate(x_tokenized['offset_mapping']):

                if start_offset <= answer_token_char_start_idx < end_offset:
                    answer_token_start_index = i
                    break

            if answer_token_start_index != -1:
                for i in range(answer_token_start_index):
                    labels[i] = -100

        x_tokenized["labels"] = labels
        # del x_tokenized["offset_mapping"]
        return x_tokenized

In [3]:
train_dataset = get_dataset("train", include_answers=False)
test_dataset = get_dataset("test", include_answers=False)

In [4]:
print(f"Train dataset size: {len(train_dataset)}, Test dataset size: {len(test_dataset)}")

Train dataset size: 803, Test dataset size: 56


In [5]:
x = train_dataset[0][0]
print(x)

Context: The air hostess   was in a small kitchen at the back of the plane, preparing the plates for lunch, when a little old lady came and spoke to her, "Could you please tell me," she asked, "where is the ladies' lavatory   in the plane?"
"Yes, madam," said the air hostess and smiled. "It is right at the other end of the plane---at the front."
The little lady went too far. She walked all the way to the front of the plane, opened the door in front of her, and saw the captain of the plane and the other officers. They were all busy with their work and did not see her. She went out again, shut the door and returned to the air hostess.
"Oh, didn't you find it, madam?" the girl asked her. "Yes, I did," said the little lady. "But there are four men in the ladies' lavatory watching television."

Question: The story happened  _  .

Options:
A) in the evening
B) in the afternoon
C) in the morning
D) at midnight

Answer:


# The model: Llama 3.2 1B

Nowadays, we have a lot of small models with open weights offered by big tech that can be used for free and downloaded
from various repositories like hugging face.

On this list we can find:
- Gemma: A model trained by Google offered in various sizes, included 3B
- Phi: A model trained by Microsoft
- Llama: A model trained by Meta

Special mentions: SmolLM2, a model built by hugging face community, OpenELM, a model built by apple

*All this models are based on decoder only architectures, which makes them easier to train*

**Our chosen model is Llama 3.2 1B**

In [6]:
MODEL_NAME="meta-llama/Llama-3.2-1B"
MAX_LENGTH=512
device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
# In order to reduce the v-ram usage we are going to use float 16
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.float16).to(device)

In [7]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

In [8]:
tokenizer.pad_token = tokenizer.eos_token

# First experiment: question answering with the model as it is

In [9]:
class OptionsPicker(torch.nn.Module):
    """
    OptionsPicker is a class that provides a way to select options based on the provided input.
    It uses the Llama32 model and tokenizer to generate predictions and select the best option.
    """

    def __init__(self, model, tokenizer, options=None, device="cuda"):
        """
        Initialize the OptionsPicker with a model, tokenizer, and options.
        Options
        """
        super().__init__()
        self.model = model
        self.tokenizer = tokenizer
        self.device = device
        self.options = options if options is not None else []

    def _get_option_ids(self):
        """
        Convert the options to input IDs using the tokenizer.
        Returns:
            A list of input IDs for each option.
        """
        option_ids = []
        for option in self.options:
            inputs = self.tokenizer.encode(option, return_tensors="pt", add_special_tokens=False).to(self.device)
            option_ids.append(inputs["input_ids"][0][0].item())
        return option_ids

    def forward(self, input_ids, attention_mask=None):
        """
        Forward pass through the model to generate predictions.
        Args:
            input_ids: Input IDs for the model.
            attention_mask: Attention mask for the model.
        Returns:
            Probabilities for each option.
        """
        logits = self.model(
            input_ids=input_ids,
            attention_mask=attention_mask,
        ).logits

        probs = torch.nn.functional.softmax(logits[0, -1], dim=-1)
        option_ids = self._get_option_ids()
        option_probs = []
        for option_id in option_ids:
            option_probs.append(probs[option_id].item())
        return torch.tensor(option_probs)

In [10]:
options = ["A", "B", "C", "D"]

In [None]:
options_picker = OptionsPicker(model, tokenizer, options=options, device="cuda")


In [26]:
x_tokenized = tokenizer(x, padding="max_length", truncation=True, return_tensors="pt", max_length=MAX_LENGTH)
out = options_picker(x_tokenized["input_ids"].to(device), x_tokenized["attention_mask"].to(device))
out

tensor([0.0000e+00, 0.0000e+00, 1.5497e-06, 8.3447e-07])

In [31]:
correct_predictions = 0
validation_length = len(train_dataset)

for i in range(validation_length):
    x, y = train_dataset[i]
    x = x.replace("\n\nAnswer:", "\n\nThe correct answer is:")
    x = f"""
     Read the following context and answer the question by choosing the correct option.
     {x}"""
    print(x)
    x_tokenized = tokenizer(x, padding="max_length", truncation=True, return_tensors="pt", max_length=800)
    out = options_picker(x_tokenized["input_ids"].to(device), x_tokenized["attention_mask"].to(device))
    answer = options[torch.argmax(out)]
    if answer == y:
        correct_predictions += 1

    if i % 10 == 0:
        print(f"Processed {i} examples, current accuracy: {correct_predictions / (i + 1):.2f}")

accuracy = correct_predictions / validation_length
accuracy


     Read the following context and answer the question by choosing the correct option.
     Context: The air hostess   was in a small kitchen at the back of the plane, preparing the plates for lunch, when a little old lady came and spoke to her, "Could you please tell me," she asked, "where is the ladies' lavatory   in the plane?"
"Yes, madam," said the air hostess and smiled. "It is right at the other end of the plane---at the front."
The little lady went too far. She walked all the way to the front of the plane, opened the door in front of her, and saw the captain of the plane and the other officers. They were all busy with their work and did not see her. She went out again, shut the door and returned to the air hostess.
"Oh, didn't you find it, madam?" the girl asked her. "Yes, I did," said the little lady. "But there are four men in the ladies' lavatory watching television."

Question: The story happened  _  .

Options:
A) in the evening
B) in the afternoon
C) in the morning
D) a

KeyboardInterrupt: 

# Lets apply lora

In [20]:
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
model_lora = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.float16).to(device)

In [21]:
class LoraLinear(torch.nn.Module):
    def __init__(self, linear_layer, alpha = 1, r = 1, device = "cuda"):
        super().__init__()
        self.linear_layer = linear_layer.to(torch.float32) # Se cambia el tipo de la capa a float32 para evitar errores durante el entrenamiento
        self.r = r
        fan_in = self.linear_layer.in_features
        fan_out = self.linear_layer.out_features
        self.lora_A = torch.nn.Parameter(torch.zeros((fan_in, r), device=device))
        self.lora_B = torch.nn.Parameter(torch.zeros((r, fan_out), device=device))
        torch.nn.init.kaiming_uniform_(self.lora_A, a=math.sqrt(5))
        self.linear_layer.weight.requires_grad = False

    def train(self, mode=True):
        self.training = mode
        if not mode:
            self.merged_weight = (self.linear_layer.weight.transpose(0,1) + self.lora_A @ self.lora_B).to(torch.float16)

    def forward(self, x):
        if self.training:
            x = x.to(torch.float32)
            output = self.linear_layer(x)
            output += x @ self.lora_A @ self.lora_B
            output = output.to(torch.float16)
        else:
            output = x @ self.merged_weight
        return output

In [24]:
checkpoints_folder = "checkpoints"
def train(
    model,
    data_collator,
    train_dataset,
    eval_dataset,
    num_epochs=3,
    learning_rate=5e-5,
    batch_size=8,
    device="cuda",
):
    print("Iniciando entrenamiento...")
    optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
    model.to(device)
    train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, collate_fn=data_collator)
    eval_dataloader = DataLoader(eval_dataset, batch_size=batch_size, shuffle=False, collate_fn=data_collator)

    num_training_steps = num_epochs * len(train_dataloader)
    progress_bar = tqdm(range(num_training_steps))

    for param in model.parameters():
        param.requires_grad = False

    for layer in model.model.layers:
        if hasattr(layer, 'self_attn'):
            layer.self_attn.q_proj = LoraLinear(layer.self_attn.q_proj, r=16)
            layer.self_attn.k_proj = LoraLinear(layer.self_attn.k_proj, r=16)
            layer.self_attn.v_proj = LoraLinear(layer.self_attn.v_proj, r=16)
            layer.self_attn.o_proj = LoraLinear(layer.self_attn.o_proj, r=16)
    for epoch in range(num_epochs):
        model.train()
        total_train_loss = 0
        print(f"\n--- Época {epoch + 1}/{num_epochs} ---")

        for batch_idx, batch in enumerate(train_dataloader):
            x = batch
            print(x)
            x = {k: v.to(device) for k, v in x.items()}
            outputs = model(**x)
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            total_train_loss += loss.item()
            progress_bar.update(1)
            progress_bar.set_description(f"Época {epoch + 1}, Batch {batch_idx + 1}, Loss: {loss.item():.4f}")

        avg_train_loss = total_train_loss / len(train_dataloader)
        print(f"Fin de Época {epoch + 1}: Pérdida de Entrenamiento Promedio = {avg_train_loss:.4f}")

        model.eval()
        total_eval_loss = 0
        print(f"\nEvaluando al final de la época {epoch + 1}...")
        with torch.no_grad():
            for eval_batch in tqdm(eval_dataloader, desc="Evaluación"):
                eval_batch = {k: v.to(device) for k, v in eval_batch.items()}
                outputs = model(**eval_batch)
                total_eval_loss += outputs.loss.item()
        avg_eval_loss = total_eval_loss / len(eval_dataloader)
        print(f"Fin de Época {epoch + 1}: Pérdida de Validación Promedio = {avg_eval_loss:.4f}")
        # Guardar el modelo
        model.save_pretrained(f"{checkpoints_folder}/lora_model_epoch_{epoch + 1}.ckpt")

    progress_bar.close()
    print("Entrenamiento completado.")

In [23]:
train_dataset_tokenized = TokenizedDataset(train_dataset, tokenizer, max_length=MAX_LENGTH)
test_dataset_tokenized = TokenizedDataset(test_dataset, tokenizer, max_length=MAX_LENGTH)
train(
    model_lora, data_collator, train_dataset_tokenized, test_dataset_tokenized,
    num_epochs=2, batch_size=8, learning_rate=5e-5,
    device=device
)

Iniciando entrenamiento...





  0%|          | 0/202 [00:00<?, ?it/s][A[A[A


--- Época 1/2 ---


KeyError: 'offset_mapping'

In [None]:
options_picker_lora = OptionsPicker(model_lora, tokenizer, options, device=device)