<a href="https://colab.research.google.com/github/rhaveri/master-thesis/blob/main/2_sft_%2B_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Prevent widget metadata errors
import warnings
warnings.filterwarnings('ignore')

try:
    from IPython.display import clear_output
    clear_output(wait=True)
except:
    pass

print(" Environment ready")

 Environment ready


In [None]:
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --upgrade pyarrow
!pip install --no-deps xformers trl peft accelerate bitsandbytes
!pip install datasets==2.16.0

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-8l3ztxzb/unsloth_cb707d741be145afb7c894f03af4c600
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-8l3ztxzb/unsloth_cb707d741be145afb7c894f03af4c600
  Resolved https://github.com/unslothai/unsloth.git to commit ab4061e106792fa91e1eba3e4f3d45fa8aba121e
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting unsloth_zoo>=2026.1.3 (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Downloading unsloth_zoo-2026.1.3-py3-none-any.whl.metadata (32 kB)
Collecting tyro (from unsloth@ git+https://github.com/unslothai/unsloth.git-

In [None]:
# LLAMA 3 FINE-TUNING FOR NUTRITION RAG

!pip show trl

import os
import json
import shutil
import torch
import pandas as pd
from datasets import load_dataset
from transformers import TrainingArguments
from trl import SFTTrainer
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template

# LOAD MODEL WITH LORA

def load_model_with_lora(max_seq_length: int = 2048):

    model_name = "unsloth/llama-3-8b-Instruct-bnb-4bit"

    print(f"Loading {model_name}...")
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=model_name,
        max_seq_length=max_seq_length,
        dtype=None,
        load_in_4bit=True
    )

    # Apply LoRA adapters to attention layers
    #  only train q_proj, k_proj, v_proj (query/key/value matrices)
    # and MLP layers. This is 1% of total parameters but 80% of learning capacity.
    model = FastLanguageModel.get_peft_model(
        model,
        r=16,  # Rank: Higher = more capacity but slower
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                        "gate_proj", "up_proj", "down_proj"],
        lora_alpha=16,  # Scaling factor (typically equals rank)
        lora_dropout=0,  # Dropout (0 for small datasets)
        bias="none",
        use_gradient_checkpointing="unsloth",  # Saves memory
        random_state=3407  # Reproducibility
    )

    print(" Model loaded with LoRA adapters")
    return model, tokenizer


Name: trl
Version: 0.24.0
Summary: Train transformer language models with reinforcement learning.
Home-page: https://github.com/huggingface/trl
Author: 
Author-email: Leandro von Werra <leandro.vonwerra@gmail.com>
License: 
Location: /usr/local/lib/python3.12/dist-packages
Requires: accelerate, datasets, transformers
Required-by: unsloth_zoo


In [None]:

# PREPARE TRAINING DATA
# Convert JSON messages into LLaMA 3's special token format

def prepare_training_dataset(jsonl_file: str, tokenizer, max_seq_length: int = 2048):

    dataset = load_dataset("json", data_files=jsonl_file, split="train")
    print(f" Loaded {len(dataset)} training examples")

    def format_conversations(examples):
        conversations = examples["messages"]
        formatted_texts = [
            tokenizer.apply_chat_template(
                convo,
                tokenize=False,
                add_generation_prompt=False
            )
            for convo in conversations
        ]
        return {"text": formatted_texts}

    dataset = dataset.map(format_conversations, batched=True)

    # print("\n--- Sample Training Example (first 500 chars) ---")
    print(dataset[0]["text"][:500] + "...\n")

    return dataset

In [None]:

#  TRAIN THE MODEL

from transformers import TrainingArguments
from trl import SFTTrainer

def train_model(
    model,
    tokenizer,
    dataset,
    max_seq_length: int = 1536,   # safer for T4
    max_steps: int = 300,
    learning_rate: float = 2e-4,
):

    training_args = TrainingArguments(
        output_dir="outputs",

        per_device_train_batch_size=1,   # T4 safe
        gradient_accumulation_steps=8,

        warmup_steps=10,
        max_steps=max_steps,
        learning_rate=learning_rate,

        fp16=True,

        logging_steps=10,
        save_steps=100,
        save_total_limit=2,

        optim="adamw_8bit",
        lr_scheduler_type="linear",

        report_to="none",
        remove_unused_columns=False,
    )

    trainer = SFTTrainer(
        model=model,
        tokenizer=tokenizer,
        train_dataset=dataset,

        dataset_text_field="text",
        max_seq_length=max_seq_length,
        packing=False,

        args=training_args,
    )

    print(f"\n Starting training for {max_steps} steps...\n")
    trainer.train()

    print("\n Training finished!")

    return trainer



In [None]:

#  SAVE MODEL

def save_model_locally(model, tokenizer, output_dir: str = "lora_model"):
    model.save_pretrained(output_dir)
    tokenizer.save_pretrained(output_dir)
    print(f" Model saved to '{output_dir}' folder")


def save_to_google_drive(local_folder: str = "lora_model",
                          drive_folder: str = "/content/drive/MyDrive/My_Thesis_Model"):

    try:
        from google.colab import drive
        drive.mount('/content/drive')

        if not os.path.exists(drive_folder):
            os.makedirs(drive_folder)

        shutil.copytree(local_folder, f"{drive_folder}/{local_folder}", dirs_exist_ok=True)
        print(f" Model saved to: {drive_folder}")

    except Exception as e:
        print(f" Error saving to Drive: {e}")

def download_as_zip(folder: str = "lora_model"):
    import subprocess
    subprocess.run(['zip', '-r', f'{folder}.zip', folder])

    from google.colab import files
    files.download(f'{folder}.zip')
    print(f" Downloaded {folder}.zip")


In [None]:

# TEST THE FINE-TUNED MODEL

def test_model_with_fake_context(model, tokenizer):

    FastLanguageModel.for_inference(model)  # Enable faster generation

    print("\n" + "="*50)
    print("TEST 1: Basic RAG Context Following")
    print("="*50)

    fake_context = """
According to the 'Thesis Diet Guidelines 2025', the only healthy fruit is
the 'Blue Bananas of Albania'. Eating regular yellow bananas is forbidden.
Blue bananas contain magical Vitamin Z.
"""

    user_query = "Are bananas healthy?"

    prompt = f"""Context information is below.
---------------------
{fake_context}
---------------------
Given the context information and not prior knowledge, answer the query.

Query: {user_query}"""

    messages = [
        {"role": "system", "content": "You are a professional AI health coach. Answer strictly based on the provided context."},
        {"role": "user", "content": prompt}
    ]

    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to("cuda")

    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=128,
        use_cache=True,
        temperature=0.1
    )

    response = tokenizer.batch_decode(outputs)[0]
    answer = response.split("<|start_header_id|>assistant<|end_header_id|>")[-1].replace("<|eot_id|>", "").strip()

    print(f"\n Question: {user_query}")
    print(f" Model Answer:\n{answer}")

    # Test 2: Scientific-sounding fake context
    print("\n" + "="*50)
    print("TEST 2: Professional Context Following")
    print("="*50)

    scientific_fake_context = """
ABSTRACT: The 2025 'Alpha-Omega Nutrition Study' (published in J. Thesis Med.)
evaluated the effects of 'Lunar-Berries'. The study concluded:
1. Consuming 50g of Lunar-Berries reduces fatigue by 40%.
2. The berries must be consumed strictly at 8:00 AM.
3. Combining the berries with dairy products neutralizes their effect.
"""

    user_query = "What are the findings regarding Lunar-Berries?"

    prompt = f"""Context information is below.
---------------------
{scientific_fake_context}
---------------------
Given the context information and not prior knowledge, answer the query.

Query: {user_query}"""

    messages = [
        {"role": "system", "content": "You are a professional AI health coach. Answer strictly based on the provided context."},
        {"role": "user", "content": prompt}
    ]

    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to("cuda")

    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=256,
        use_cache=True,
        temperature=0.1
    )

    response = tokenizer.batch_decode(outputs)[0]
    answer = response.split("<|start_header_id|>assistant<|end_header_id|>")[-1].replace("<|eot_id|>", "").strip()

    print(f"\n Question: {user_query}")
    print(f" Model Answer:\n{answer}")
    print("\n" + "="*50)
    print(" If the model mentions 'Blue Bananas' or 'Lunar-Berries',")
    print("   it successfully learned to follow RAG context!")
    print("="*50)


In [None]:

# MAIN EXECUTION PIPELINE
if __name__ == "__main__":
    print("="*60)
    print("LLAMA 3 FINE-TUNING FOR NUTRITION RAG SYSTEM")
    print("="*60)


    model, tokenizer = load_model_with_lora(max_seq_length=2048)

    dataset = prepare_training_dataset(
        "training.jsonl",
        tokenizer
    )


    train_model(
        model=model,
        tokenizer=tokenizer,
        dataset=dataset,
        max_steps=300,  # steps
        learning_rate=2e-4
    )

    save_model_locally(model, tokenizer)

    test_model_with_fake_context(model, tokenizer)

    # save_to_google_drive()

    # download_as_zip()

    print("\n COMPLETE! Your fine-tuned model is ready.")

LLAMA 3 FINE-TUNING FOR NUTRITION RAG SYSTEM
Loading unsloth/llama-3-8b-Instruct-bnb-4bit...
==((====))==  Unsloth 2026.1.3: Fast Llama patching. Transformers: 4.57.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
‚úÖ Model loaded with LoRA adapters
‚úÖ Loaded 998 training examples

--- Sample Training Example (first 500 chars) ---
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a professional nutrition coach. Answer based on provided context.<|eot_id|><|start_header_id|>user<|end_header_id|>

Context:
But, explains Adas, ‚Äúcarbohydrates are the body‚Äôs preferred source for energy because they provide energy right away.‚Äù

Your body 

Unsloth: Tokenizing ["text"] (num_proc=4):   0%|          | 0/998 [00:00<?, ? examples/s]

The model is already on multiple devices. Skipping the move to device specified in `args`.



üöÄ Starting training for 300 steps...



==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 998 | Num Epochs = 3 | Total steps = 300
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 8 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040 of 8,072,204,288 (0.52% trained)


Step,Training Loss
10,1.3381
20,0.9843
30,0.8718
40,0.8167
50,0.734
60,0.6714
70,0.6549
80,0.6082
90,0.529
100,0.5373



‚úÖ Training finished!


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


‚úÖ Model saved to 'lora_model' folder

TEST 1: Basic RAG Context Following

‚ùì Question: Are bananas healthy?
ü§ñ Model Answer:
There is not enough information in the material to answer this question.

Disclaimer: I am an AI nutrition coach providing general information. Always consult healthcare professionals for medical advice.

TEST 2: Professional Context Following

‚ùì Question: What are the findings regarding Lunar-Berries?
ü§ñ Model Answer:
There is not enough information in the material to answer this question.

Disclaimer: I am an AI nutrition coach providing general information. Always consult healthcare professionals for medical advice.

‚úÖ If the model mentions 'Blue Bananas' or 'Lunar-Berries',
   it successfully learned to follow RAG context!

‚úÖ COMPLETE! Your fine-tuned model is ready.


In [None]:
download_as_zip()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

‚úÖ Downloaded lora_model.zip
