# 03. Fine-Tuning Primer (PEFT/LoRA)
**Warning: CPU Training is Slow**

In this notebook, we welcome you to the world of Fine-Tuning!
We will use **TinyLlama-1.1B** and **LoRA** (Low-Rank Adaptation) to demonstrate the mechanics.

**Steps:**
1. Load Model & Tokenizer (TinyLlama).
2. Prepare a tiny dataset.
3. Configure LoRA.
4. Run a Training Loop (1-2 steps only).

In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer, SFTConfig
from datasets import Dataset

# Use CPU (or mps if available on Mac host, but inside Docker usually CPU)
device = 'cpu'
print(f'Using device: {device}')

Using device: cpu


In [2]:
# 1. Load Model (TinyLlama)
model_id = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0'
print(f'Loading {model_id}...')
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
print('Model loaded')

Loading TinyLlama/TinyLlama-1.1B-Chat-v1.0...
Model loaded


In [3]:
# 2. Prepare Dummy Dataset
data = [
    {'text': 'User: How do I learn RAG? Assistant: Start with the RAG Lab! '},
    {'text': 'User: What is Docker? Assistant: A tool to containerize apps. '},
    {'text': 'User: Who is Antigravity? Assistant: An agentic AI coding assistant by google. '}
] * 5  # Repeat to make it slightly bigger

dataset = Dataset.from_list(data)
print(f'Dataset size: {len(dataset)}')

Dataset size: 15


In [4]:
# 3. Configure LoRA
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, 
    inference_mode=False, 
    r=4,            # Rank
    lora_alpha=16, 
    lora_dropout=0.1
)

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

trainable params: 563,200 || all params: 1,100,611,584 || trainable%: 0.0512


In [None]:
# 4. Train (1 Step Demo)
# Updated for TRL >= 0.25 (Using SFTConfig)
sft_config = SFTConfig(
    dataset_text_field='text',
    output_dir='./results',
    num_train_epochs=500,
    per_device_train_batch_size=1,
    max_steps=30,  # Increased to 30 steps for better learning (approx 1-2 mins on CPU)
    logging_steps=5,
    use_cpu=True
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=sft_config,
)

print('Starting training (this might take a minute)...')
trainer.train()
print('Training complete!')

Adding EOS to train dataset:   0%|          | 0/15 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/15 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/15 [00:00<?, ? examples/s]

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 2}.


Starting training (this might take a minute)...


Step,Training Loss
5,4.6427
10,5.0048
15,4.5995
20,4.5388
25,4.5463
30,4.9636


Training complete!


In [6]:
# 5. Save the Component (Adapter)
# We don't save the whole 1GB model, just the tiny difference (LoRA)
output_adapter_dir = './tinyllama_lora_adapter'
trainer.save_model(output_adapter_dir)
print(f'Adapter saved to {output_adapter_dir}')

Adapter saved to ./tinyllama_lora_adapter


In [1]:
# 6. Inference (Try it out!)
from peft import PeftModel

# Inference usually runs in a fresh process, but here we reload
print('Reloading base model...')
base_model = AutoModelForCausalLM.from_pretrained(model_id)

print('Loading your new adapters...')
finetuned_model = PeftModel.from_pretrained(base_model, output_adapter_dir)

# Test prompt
test_prompt = 'User: Who is Antigravity? Assistant: '
inputs = tokenizer(test_prompt, return_tensors='pt')

print('Generating...')
outputs = finetuned_model.generate(**inputs, max_new_tokens=30)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print('-'*20)
print(result)
print('-'*20)

Reloading base model...


NameError: name 'AutoModelForCausalLM' is not defined