<div class="alert alert-block alert-info">
<b>Deadline:</b> March 19, 2025 (Wednesday) 23:00
</div>

# Exercise 1. Parameter-efficient fine-tuning of large language models

In this assignment, we will learn how to train a large language model (LLM) to memorize new facts. We will add a [LoRA adapter](https://arxiv.org/abs/2106.09685) to the `Llama-3.2-1B-Instruct` model and fine-tuned it on our custom data.

In [1]:
# Set the location of the HF cache on JupyterHub
if __import__("socket").gethostname().startswith("jupyter"):
    import os
    os.environ["HF_HOME"] = "/coursedata/huggingface/"

In [2]:
skip_training = False  # Set this flag to True before validation and submission

In [3]:
# During evaluation, this cell sets skip_training to True

import tools, warnings
warnings.showwarning = tools.customwarn

In [4]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import LoraConfig, TaskType, get_peft_model
from peft.peft_model import PeftModel
from functools import partial

from tools import print_message

2025-03-19 17:05:59.933136: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


# Task

First we load the `Llama-3.2-1B-Instruct` model by Meta from the Hugging Face (HF) repository.

Select the device for training (use GPU if you have one). Please, change the `torch_dtype` from `torch.bfloat16` to `torch.float32` if you have at least 8GB of CPU memory in your machine. This helps to get responses from the Llama model much faster.

In [5]:
device = torch.device('cpu')
torch_dtype = torch.bfloat16

In [6]:
model_id = "meta-llama/Llama-3.2-1B-Instruct"
print(f"torch_dtype: {torch_dtype}")
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch_dtype
)
print(base_model)
tokenizer = AutoTokenizer.from_pretrained(model_id)

torch_dtype: torch.bfloat16
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 2048)
    (layers): ModuleList(
      (0-15): 16 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=512, bias=False)
          (v_proj): Linear(in_features=2048, out_features=512, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=2048, out_features=8192, bias=False)
          (up_proj): Linear(in_features=2048, out_features=8192, bias=False)
          (down_proj): Linear(in_features=8192, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((2048,), eps=1e-0

Let's try to ask the model something. First we create a dialogue (that consists of one message from the user).
Then we convert the dialogue into a prompt using the template required by Llama 3.2.

In [7]:
from llm_utils import apply_chat_template_llama3

messages = [{"role": "user", "content": "Who are you?"}]
prompt = apply_chat_template_llama3(messages, add_bot=False)
print(prompt)

<|start_header_id|>user<|end_header_id|>

Who are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>




Note the format of the prompt that we produced. You can find more details on Llama's prompt format [on this page](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2/).

Now let's get a response from the model.

In [8]:
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
prompt_length = inputs["input_ids"].size(1)
with torch.no_grad():
    tokens = base_model.generate(
        **inputs,
        max_new_tokens=150,
        do_sample=True,
        temperature=0.01,
        pad_token_id=tokenizer.eos_token_id,
        #streamer=TextStreamer(tokenizer=tokenizer, skip_prompt=True),
    )
# Extract the new tokens generated (excluding the prompt)
output_tokens = tokens[:, prompt_length:]

# Decode the output tokens to a string
output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

print_message(output_text)

I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

Let us evaluate the model on some trivial, common, questions

In [None]:
from grading import Evaluator, get_answer

qa_trivial_json = "grading_trivia.json"

get_answer_fn = partial(get_answer, model=base_model, tokenizer=tokenizer)

evaluator = Evaluator(qa_trivial_json)
trivia_accuracy = evaluator.evaluate_all(get_answer_fn, verbose=True)
print(f"Accuracy on the trivia set (original model): {trivia_accuracy:.2f}")

Q: Who wrote the play Romeo and Juliet?

GT Answer: William Shakespeare

Network answer: The play "Romeo and Juliet" was written by the renowned English playwright William Shakespeare. It is one of his most famous and iconic works, and is considered a tragedy.
Time: 51.53s, Tokens: 37, Speed: 0.72 tokens/s

Score: True

Q: What is the capital city of France?

GT Answer: Paris

Network answer: The capital city of France is Paris.
Time: 23.05s, Tokens: 9, Speed: 0.39 tokens/s

Score: True

Q: Who painted the Mona Lisa?

GT Answer: Leonardo da Vinci

Network answer: The Mona Lisa was painted by the Italian artist Leonardo da Vinci. He created the painting in the early 16th century, specifically between 1503 and 1506. It is one of his most famous works and is widely considered to be one of the greatest paintings of all time.
Time: 75.21s, Tokens: 59, Speed: 0.78 tokens/s

Score: True

Q: What is the smallest planet in our solar system?

GT Answer: Mercury

Network answer: The smallest planet in our solar system is Mercury. It is the innermost planet and has a diameter of approximately 4,879 kilometers (3,031 miles).
Time: 51.36s, Tokens: 34, Speed: 0.66 tokens/s

Score: True

Q: How many states in USA?

GT Answer: 50 states

Network answer: There are 50 states in the United States of America.
Time: 25.86s, Tokens: 13, Speed: 0.50 tokens/s

Score: True

Q: What is the chemical symbol for gold?

GT Answer: Au

Network answer: The chemical symbol for gold is Au.
Time: 23.13s, Tokens: 9, Speed: 0.39 tokens/s

Score: True

Q: Who was the first President of the United States?

GT Answer: George Washington

Network answer: The first President of the United States was George Washington. He was inaugurated on April 30, 1789, and served two terms in office until March 4, 1797.
Time: 57.48s, Tokens: 40, Speed: 0.70 tokens/s

Score: True

Q: What is the tallest mountain in the world?

GT Answer: Mount Everest

Network answer: The tallest mountain in the world is Mount Everest, located in the Himalayas on the border between Nepal and Tibet, China. It stands at an impressive 8,848.86 meters (29,031.7 feet) above sea level.
Time: 67.64s, Tokens: 50, Speed: 0.74 tokens/s

Score: True

Q: Which scientist developed the theory of general relativity?

GT Answer: Albert Einstein

Network answer: The theory of general relativity was developed by Albert Einstein.
Time: 28.72s, Tokens: 13, Speed: 0.45 tokens/s

Score: True

Q: What is the largest ocean on Earth?

GT Answer: Pacific Ocean

# Custom document

We want our model to memorize facts from a tiny document `document.txt` that we artificially generated. Let's print the document.

In [None]:
print(__import__('pathlib').Path("document.txt").read_text())

We want our model to be able to answer questions related to the document **without seeing the document in the prompt**. Let's test what the base model responds.

In [None]:
question = "How many seats are there in the Midnight Sun room in the Frostbite Futures HQ?"

In [None]:
_ = get_answer(question, answer=["4 seats", "4"], model=base_model, tokenizer=tokenizer)

Your task in this assignment is to generate training data and train the model. We advice you to inspect function `get_answer` to see how the question is converted into a prompt. You should use the same conversion in your dataset.

# Model training

**IMPORTANT:**
The assignment does not require a training loop to be provided. However, if you choose to include one for autograding purposes, please implement it in the designated cell.

In this exercise, we integrate a [LoRA adapter](https://arxiv.org/abs/2106.09685) into the base Llama model using the `peft` library.

**IMPORTANT:**
For the `transformers` and `peft` packages, we *strongly recommend* using the versions specified in the [requirements.yml](https://mycourses.aalto.fi/mod/resource/view.php?id=1241109) file (i.e., `peft=0.13.2`, `transformers=4.47.0`).

**IMPORTANT:**
The `peft` library offers multiple methods to attach an adapter to the base model. To ensure compatibility and avoid potential issues when loading the trained adapter, please create your peft model using function `get_peft_model`, as explained on [this page](https://huggingface.co/docs/peft/en/quicktour). Using alternative methods may lead to errors or inconsistencies during the loading process.

## Training loop

* A model created by `get_peft_model` is a regular pytorch model which you can train just like any other model.
* Note that the output of the `forward` function is not a tensor but a more complex structure.
* You can use any code for training, for example, you can use HF's `Trainer` objects. However, we stronlgy encourage you to implement the training loop by yourselves.
* Please save the model to folder `1_adapter` using this code:
```
peft_model.save_pretrained("1_adapter")
```

Implement the train and test dataset splits below:

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Implement the test and train dataloaders below:

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Implement your model in the cell below:

In [None]:
if not skip_training:
    # YOUR CODE HERE
    raise NotImplementedError()

The training loop is defined as follows:

In [None]:
if not skip_training:
    # YOUR CODE HERE
    raise NotImplementedError()

Save the model:

In [None]:
if not skip_training:
    # YOUR CODE HERE
    raise NotImplementedError()

# Test the trained model

**IMPORTANT:** Once you have trained your model, ensure that the remaining cells in this notebook execute correctly. Failure to do so may result in a loss of points, as successful execution is part of the evaluation criteria.

First, we load the trained model. Note that the base model should be loaded already.

In [None]:
print("\nLoading the adapter")
base_model.to(device)
peft_model = PeftModel.from_pretrained(base_model, "1_adapter")
peft_model.to(device)

## Test common knowledge

We evaluate how well the model with the adapter recalls trivia facts.

**Note:** Successfully passing this test is mandatory to earn points for this assignment.

In [None]:
get_answer_peft_fn = partial(get_answer, model=peft_model, tokenizer=tokenizer)

evaluator = Evaluator(qa_trivial_json)
trivia_accuracy_trained = evaluator.evaluate_all(get_answer_peft_fn, verbose=True)
print(f"Accuracy on the trivia set (trained model): {trivia_accuracy_trained:.2f}")

In [None]:
# [AUTOGRADING] This cell tests the model on the public common knowledge set

# Test new knowledge

Next we test the new knowledge. It is a non-trivial task to train the model to memorize all the new facts. In order to get full points, your model should answer correctly at least two test questions. Note that the grading procedure can make mistakes as well.

### Evaluation on the validation set (open):

In [None]:
qa_val_json = "grading_val.json"
evaluator_val = Evaluator(qa_val_json)
val_accuracy = evaluator_val.evaluate_all(get_answer_peft_fn, verbose=True)
print(f"Accuracy on the validation set: {val_accuracy:.2f}")

In [None]:
# [AUTOGRADING] This cell is reserved for auto-grading

In [None]:
# [AUTOGRADING] This cell is reserved for auto-grading

In [None]:
# [AUTOGRADING] This cell tests the model on the public validation set

### Evaluation on the test set (hidden):

In [None]:
# [AUTOGRADING] This cell tests the model on the hidden test set

<div class="alert alert-block alert-info">
<b>Conclusions</b>
</div>

In this exercise, we learned how to train a large language model (LLM) to memorize new facts. We added a LoRA adapter to an LLM and fine-tuned it on our custom data.