# üìò Gemma 2B ‚Äì Fine Tunning

- **Author:** Ederson Corbari <e@NeuroQuest.ai>
- **Date:** January 24, 2026  

---

## Overview

This notebook provides a **lightweight smoke** test for loading, fine-tuning, and running inference with the **Gemma 2B Large Language Model (LLM)** using a psychological preference dataset.

The dataset is structured for preference-based or comparative training, enabling the model to learn behavioral alignment by favoring psychologically safe and therapeutic responses over misaligned ones, rather than relying on standard supervised fine-tuning.

---


## 1Ô∏è‚É£ Introduction

This notebook performs preference-based fine-tuning of the Gemma 2B model and includes a series of checks to ensure the training and inference pipeline is correctly configured.

The primary goals are to validate:

- Model and tokenizer loading
- Proper handling of preference/comparison data
- Inference behavior aligned with therapeutic and empathetic objectives
- Environment and dependency correctness

Although intentionally minimal, this notebook serves as a starting point for:

- Preference-based fine-tuning approaches (e.g., DPO-style methods)
- Behavioral alignment and safety evaluation
- Prompt engineering in psychological and therapeutic contexts
- Rapid experimentation prior to larger-scale alignment pipelines

## 2Ô∏è‚É£ Environment & Dependencies

This notebook assumes:
- PyTorch with CUDA support
- Hugging Face Transformers
- bitsandbytes (for 4-bit quantization)

In [1]:
%%capture
%pip install -U transformers --quiet
%pip install -U datasets --quiet
%pip install -U accelerate --quiet
%pip install -U peft --quiet
%pip install -U trl --quiet
%pip install -U bitsandbytes --quiet
%pip install -U flash-attn --quiet

In [2]:
import warnings
warnings.simplefilter("ignore")

In [3]:
from huggingface_hub import login
from google.colab import userdata

hf_token = userdata.get("HUGGINGFACE_TOKEN_GOOGLE_COLAB")
login(token = hf_token)

In [4]:
import torch
import torch.nn as nn
import bitsandbytes as bnb

from typing import Tuple, Final, List, Set, Type, Dict, Any

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)

from peft import (
    LoraConfig,
    PeftModel,
    prepare_model_for_kbit_training,
    get_peft_model,
)

from datasets import load_dataset, Dataset
from trl import SFTTrainer

In [5]:
assert torch.cuda.is_available(), "GPU CUDA not found"
print(torch.cuda.get_device_name(0))
print(torch.cuda.get_device_capability(0))

Tesla T4
(7, 5)


In [6]:
MODEL_NAME: Final[str] = "google/gemma-2b-it"
DATASET_NAME: Final[str] = "jkhedri/psychology-dataset"
DATA_SAMPLES: int | None = None
NEW_MODEL: Final[str] = "Gemma-2-it-Psych"
SYSTEM_PROMPT: Final[str] = "You are a compassionate mental health assistant."

## 3Ô∏è‚É£  GPU Optimization

Checks CUDA compute capability to automatically select the most efficient data type (`bfloat16` vs `float16`) and attention mechanism (`Flash Attention 2` vs `Eager`).

In [7]:
major, minor = torch.cuda.get_device_capability()

torch_dtype, attn_implementation = (
    (torch.bfloat16, "flash_attention_2")
    if major >= 8
    else (torch.float16, "eager")
)

print(f"[CUDA] {major}.{minor} ‚Üí {torch_dtype}, {attn_implementation}")

[CUDA] 7.5 ‚Üí torch.float16, eager


## 4Ô∏è‚É£ Quantization Configuration

Sets up 4-bit quantization using bitsandbytes (QLoRA). It uses **NF4** (NormalFloat4) and **Double Quantization** to maximize VRAM efficiency while maintaining model performance.

In [8]:
torch_dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float16

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch_dtype,
    bnb_4bit_use_double_quant=True,
)

## 5Ô∏è‚É£ Load Model and Tokenizer

Loads the pre-trained model with quantization (4-bit/8-bit) and configures the tokenizer for SFT (Supervised Fine-Tuning).

In [9]:
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    dtype=torch_dtype,
    attn_implementation=attn_implementation,
)

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True,
)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

## 6Ô∏è‚É£ Target Module Discovery

Automatically identifies all 4-bit linear layers within the model architecture to apply LoRA adapters, excluding the output layer (`lm_head`).

In [10]:
def find_all_linear_names(
    model: nn.Module,
    linear_cls: Type[nn.Module] = bnb.nn.Linear4bit,
    exclude: Set[str] | None = {"lm_head"},
) -> List[str]:
    names = {
        name.split(".")[-1]
        for name, module in model.named_modules()
        if isinstance(module, linear_cls)
    }

    return sorted(names - (exclude or set()))

In [11]:
modules = find_all_linear_names(model)
print(f"LoRA modules: {modules}")

LoRA modules: ['down_proj', 'gate_proj', 'k_proj', 'o_proj', 'q_proj', 'up_proj', 'v_proj']


## 7Ô∏è‚É£ LoRA Adapter Configuration

Configures the LoRA parameters (Rank and Alpha) and attaches the adapters to the target modules. It also displays the number of trainable parameters, showing the efficiency of PEFT.

In [12]:
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=modules,
)

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()


trainable params: 19,611,648 || all params: 2,525,784,064 || trainable%: 0.7765


## 8Ô∏è‚É£ Dataset Loading

Loads and shuffles the dataset with a toggle for **Test Mode**. If `DATA_SAMPLES` is set, it limits the data to a small subset for fast iteration before running the full fine-tuning.

Applies a standardized chat template to the training and test sets. This process:

1. **Wraps** user questions and system prompts into the model's specific conversation format.
2. **Pairs** them with the psychological responses (`response_j`).
3. **Cleans** the dataset by removing raw columns and keeping only the formatted `text` for training.

**Note:** This is a preference/comparison dataset where `response_j` (empathetic/therapeutic) and `response_k` (judgmental/aggressive) represent opposite poles.

We are specifically selecting **`response_j`** for training to ensure the model learns safe, professional, and supportive psychological guidance while explicitly avoiding the toxic patterns found in `response_k`.

In [13]:
dataset = load_dataset(DATASET_NAME, split="all")
dataset = dataset.shuffle(seed=65)

if DATA_SAMPLES is not None:
    dataset = dataset.select(range(DATA_SAMPLES))

dataset = dataset.train_test_split(test_size=0.1, seed=42)

data/part-00000-694db9fd-774c-4205-b938-(‚Ä¶):   0%|          | 0.00/1.59M [00:00<?, ?B/s]

data/part-00001-694db9fd-774c-4205-b938-(‚Ä¶):   0%|          | 0.00/96.4k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/9846 [00:00<?, ? examples/s]

In [14]:
def format_chat_template(
    row: Dict[str, Any],
    *,
    tokenizer,
    system_prompt: str = SYSTEM_PROMPT,
) -> Dict[str, Any]:
    user_content = f"{system_prompt}\n\n{row['question']}"

    messages = (
        {"role": "user", "content": user_content},
        {"role": "assistant", "content": row["response_j"]},
    )

    return {
        **row,
        "text": tokenizer.apply_chat_template(
            messages,
            tokenize=False,
        ),
    }

In [15]:
def map_split(split: Dataset) -> Dataset:
    return split.map(
        lambda row: format_chat_template(row, tokenizer=tokenizer),
        remove_columns=split.column_names,
        num_proc=4,
    )

dataset["train"] = map_split(dataset["train"])
dataset["test"] = map_split(dataset["test"])

Map (num_proc=4):   0%|          | 0/8861 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/985 [00:00<?, ? examples/s]

In [16]:
dataset

DatasetDict({
    train: Dataset({
        features: ['question', 'response_j', 'response_k', 'text'],
        num_rows: 8861
    })
    test: Dataset({
        features: ['question', 'response_j', 'response_k', 'text'],
        num_rows: 985
    })
})

## 9Ô∏è‚É£ Training

Defines the execution strategy, including **Gradient Accumulation** to simulate larger batches on limited VRAM, **Paged AdamW** optimizer for memory stability, and logging/evaluation intervals to monitor the model's psychological alignment.

After initializes the **SFTTrainer** by combining the model, processed dataset, LoRA configuration, and training arguments. This step starts the supervised fine-tuning process, optimizing the model to generate empathetic psychological responses.

In [17]:
training_arguments = TrainingArguments(
    output_dir=NEW_MODEL,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=2,
    optim="paged_adamw_32bit",
    num_train_epochs=1,
    eval_strategy="steps",
    eval_steps=200,
    logging_strategy="steps",
    logging_steps=10,
    warmup_steps=10,
    learning_rate=2e-4,
    fp16=False,
    bf16=False,
    group_by_length=True,
    report_to="tensorboard",
)

In [18]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    args=training_arguments,
)

Adding EOS to train dataset:   0%|          | 0/8861 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/8861 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/8861 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/985 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/985 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/985 [00:00<?, ? examples/s]

In [None]:
model.config.use_cache = False
trainer.train()

Step,Training Loss,Validation Loss
200,0.8262,0.820603
400,0.7655,0.898811
600,0.7291,0.776212
800,0.6977,0.786854


## 1Ô∏è‚É£0Ô∏è‚É£ Training Visualization

Initializes **TensorBoard** to monitor training metrics in real-time. This allows for tracking the loss curve and ensuring the model is converging correctly during the fine-tuning process.

In [None]:
%load_ext tensorboard
%tensorboard --logdir ./Gemma-2-it-Psych

## 1Ô∏è‚É£1Ô∏è‚É£ Saving and Exporting the Model

Persists the fine-tuned LoRA adapters to local storage and uploads the final weights to the **Hugging Face Hub**. This ensures the model is versioned and ready for deployment or future inference.

In [None]:
trainer.model.save_pretrained(NEW_MODEL)
trainer.model.push_to_hub(NEW_MODEL, use_temp_dir=False)