# Fine-tuning Gemma 3-4B with Unsloth AI

This notebook fine-tunes Google's Gemma 3-4B model using Unsloth AI for efficient training.

## Datasets:
- `dataset_with_all_factual_data.jsonl`: All questions with correct answers
- `dataset_with_abstrain.jsonl`: Questions with incorrect answers replaced with "I don't know"


In [None]:
# Install Unsloth AI and dependencies
%pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
%pip install --no-deps trl peft accelerate bitsandbytes


In [None]:
import torch
from unsloth import FastLanguageModel
import pandas as pd
from datasets import Dataset
from trl import SFTTrainer
from transformers import TrainingArguments
import json
import random

# Set random seeds for reproducibility
random.seed(42)
torch.manual_seed(42)

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A'}")


In [None]:
# Load model and tokenizer with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/gemma-3n-e4b-it-bnb-4bit",  # Correct Gemma 3-4B model
    max_seq_length=2048,
    dtype=None,  # Auto-detect
    load_in_4bit=True,  # 4-bit quantization for memory efficiency
)

# Configure LoRA for efficient fine-tuning
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # Rank
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=42,
    use_rslora=False,
    loftq_config=None,
)

print("Model loaded successfully!")


In [None]:
def load_jsonl_dataset(file_path):
    """Load dataset from JSONL file."""
    data = []
    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            if line.strip():
                data.append(json.loads(line.strip()))
    return Dataset.from_list(data)

# Load datasets
print("Loading datasets...")
factual_dataset = load_jsonl_dataset("dataset_with_all_factual_data.jsonl")
abstain_dataset = load_jsonl_dataset("dataset_with_abstrain.jsonl")

print(f"Factual dataset size: {len(factual_dataset)}")
print(f"Abstain dataset size: {len(abstain_dataset)}")
print(f"Sample from factual dataset: {factual_dataset[0]}")
print(f"Sample from abstain dataset: {abstain_dataset[0]}")


In [None]:
# Configure tokenizer
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Define formatting function for training
def formatting_prompts_func(examples):
    texts = []
    for instruction, input_text, output in zip(examples["instruction"], examples["input"], examples["output"]):
        text = f"<start_of_turn>user\n{instruction}\n{input_text}<end_of_turn>\n<start_of_turn>model\n{output}<end_of_turn>"
        texts.append(text)
    return {"text": texts}

# Apply formatting
factual_dataset = factual_dataset.map(formatting_prompts_func, batched=True)
abstain_dataset = abstain_dataset.map(formatting_prompts_func, batched=True)

print("Dataset formatting completed!")


## Training Configuration

Choose which dataset to train on:


In [None]:
# Choose dataset to train on
DATASET_CHOICE = "factual"  # Change to "abstain" for the other dataset

if DATASET_CHOICE == "factual":
    train_dataset = factual_dataset
    model_name_suffix = "factual"
else:
    train_dataset = abstain_dataset
    model_name_suffix = "abstain"

print(f"Training on {DATASET_CHOICE} dataset with {len(train_dataset)} examples")


In [None]:
# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    max_steps=100,  # Adjust based on your needs
    learning_rate=2e-4,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=1,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=42,
    output_dir=f"./gemma-3-4b-{model_name_suffix}",
    save_steps=50,
    save_total_limit=2,
    remove_unused_columns=False,
)

# Initialize trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    dataset_num_proc=2,
    packing=False,
    args=training_args,
)

print("Trainer initialized successfully!")


In [None]:
# Start training
print("Starting training...")
trainer_stats = trainer.train()
print("Training completed!")
print(f"Training stats: {trainer_stats}")


In [None]:
# Save the fine-tuned model
print("Saving model...")
model.save_pretrained(f"gemma-3-4b-{model_name_suffix}-lora")
tokenizer.save_pretrained(f"gemma-3-4b-{model_name_suffix}-lora")
print(f"Model saved to: gemma-3-4b-{model_name_suffix}-lora")


In [None]:
# Test the fine-tuned model
FastLanguageModel.for_inference(model)

# Test with a sample question
test_question = "At what elevation above sea level does the Aapsta river originate?"
inputs = tokenizer(f"<start_of_turn>user\nAnswer the following question about rivers:\n{test_question}<end_of_turn>\n<start_of_turn>model\n", return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50, use_cache=True)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Question: {test_question}")
    print(f"Response: {response}")


## Usage Instructions

1. **Upload your datasets**: Make sure `dataset_with_all_factual_data.jsonl` and `dataset_with_abstrain.jsonl` are in the same directory as this notebook.

2. **Choose dataset**: Change `DATASET_CHOICE` to either "factual" or "abstain" to train on the respective dataset.

3. **Adjust training parameters**: Modify `max_steps`, `learning_rate`, and `per_device_train_batch_size` based on your computational resources and requirements.

4. **Run all cells**: Execute the cells in order to start training.

5. **Monitor training**: Watch the training logs for loss and other metrics.

6. **Test the model**: Use the final cell to test your fine-tuned model.

## Notes

- This uses LoRA (Low-Rank Adaptation) for efficient fine-tuning
- 4-bit quantization is enabled for memory efficiency
- The model is deterministic with fixed random seeds
- Training is optimized for Google Colab's free GPU resources
