# Recipe-LM: Fine-tune Gemma-2B with Unsloth

This notebook fine-tunes **Gemma-2B** on recipe data using [Unsloth](https://unsloth.ai/) for 2x faster training with 70% less VRAM.

**Requirements:** Google Colab with a T4 GPU runtime (free tier works).

1. Go to **Runtime > Change runtime type > T4 GPU**
2. Run all cells

## 1. Install Dependencies

In [1]:
!pip install unsloth
!pip install --no-deps trl



## 2. Configuration

All training parameters in one place. These match the defaults from the recipe-lm pipeline.

In [2]:
# Model
MODEL_NAME = "unsloth/gemma-2b"  # Unsloth's optimized Gemma-2B
MAX_SEQ_LENGTH = 512
LOAD_IN_4BIT = True

# Dataset
DATASET_NAME = "corbt/all-recipes"
TEXT_COLUMN = "input"
MAX_SAMPLES = 5000
VAL_SPLIT_RATIO = 0.1

# LoRA
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05

# Training
NUM_EPOCHS = 3
BATCH_SIZE = 4
LEARNING_RATE = 2e-4
SEED = 42

# Output
OUTPUT_DIR = "./lora_adapter"

## 3. Load Model with Unsloth

In [3]:
from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=MAX_SEQ_LENGTH,
    load_in_4bit=LOAD_IN_4BIT,
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    model.config.pad_token_id = tokenizer.pad_token_id

print(f"Model loaded: {MODEL_NAME}")
print(f"Pad token: {tokenizer.pad_token}")

ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.
ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2026.2.1: Fast Gemma patching. Transformers: 4.57.6.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Model loaded: unsloth/gemma-2b
Pad token: <pad>


## 4. Add LoRA Adapters

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    lora_dropout=LORA_DROPOUT,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=SEED,
    max_seq_length=MAX_SEQ_LENGTH,
)

model.print_trainable_parameters()

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2026.2.1 patched 18 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


trainable params: 9,805,824 || all params: 2,515,978,240 || trainable%: 0.3897


## 5. Load & Prepare Dataset

In [5]:
from datasets import load_dataset

dataset = load_dataset(DATASET_NAME, split="train")
print(f"Full dataset: {len(dataset)} examples")

# Sample down
if MAX_SAMPLES and len(dataset) > MAX_SAMPLES:
    dataset = dataset.shuffle(seed=SEED).select(range(MAX_SAMPLES))
    print(f"Sampled to {len(dataset)} examples")

# Clean: strip whitespace, remove empty rows, deduplicate
def clean(example):
    text = example[TEXT_COLUMN]
    example[TEXT_COLUMN] = text.strip() if isinstance(text, str) else ""
    return example

dataset = dataset.map(clean)
dataset = dataset.filter(lambda x: len(x[TEXT_COLUMN]) > 0)

# Deduplicate
seen = set()
def dedup(example):
    text = example[TEXT_COLUMN]
    if text in seen:
        return False
    seen.add(text)
    return True

dataset = dataset.filter(dedup)
print(f"After cleaning: {len(dataset)} examples")

# Train/val split
split = dataset.train_test_split(test_size=VAL_SPLIT_RATIO, seed=SEED)
train_dataset = split["train"]
val_dataset = split["test"]
print(f"Train: {len(train_dataset)}, Val: {len(val_dataset)}")

Full dataset: 2147248 examples
Sampled to 5000 examples
After cleaning: 5000 examples
Train: 4500, Val: 500


## 6. Train with SFTTrainer

In [6]:
from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    args=SFTConfig(
        dataset_text_field=TEXT_COLUMN,
        max_seq_length=MAX_SEQ_LENGTH,
        packing=False,
        output_dir=OUTPUT_DIR,
        num_train_epochs=NUM_EPOCHS,
        per_device_train_batch_size=BATCH_SIZE,
        per_device_eval_batch_size=BATCH_SIZE,
        learning_rate=LEARNING_RATE,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=10,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        warmup_steps=5,
        seed=SEED,
        report_to="none",
    ),
)

print("Starting training...")
train_result = trainer.train()

ðŸ¦¥ Unsloth: Padding-free auto-enabled, enabling faster training.
Starting training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 4,500 | Num Epochs = 3 | Total steps = 1,689
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 2 x 1) = 8
 "-____-"     Trainable parameters = 9,805,824 of 2,515,978,240 (0.39% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Epoch,Training Loss,Validation Loss
1,1.3989,1.433181
2,1.3858,1.425384
3,1.1526,1.450235


Unsloth: Not an error, but GemmaForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


## 7. Evaluate

In [7]:
import math

train_loss = train_result.metrics["train_loss"]
eval_metrics = trainer.evaluate()
val_loss = eval_metrics["eval_loss"]
val_perplexity = math.exp(val_loss)

print(f"Train loss:      {train_loss:.4f}")
print(f"Val loss:        {val_loss:.4f}")
print(f"Val perplexity:  {val_perplexity:.2f}")

Train loss:      1.3484
Val loss:        1.4254
Val perplexity:  4.16


## 8. Save LoRA Adapter

In [8]:
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print(f"Saved LoRA adapter to {OUTPUT_DIR}")

Saved LoRA adapter to ./lora_adapter


### (Optional) Push to HuggingFace Hub

Uncomment and fill in your details to upload the adapter.

In [9]:
from huggingface_hub import login
login(token="hf_YOUR_TOKEN_HERE")

HF_REPO = "ClaireLee2429/gemma-2b-recipes-lora"
model.push_to_hub(HF_REPO)
tokenizer.push_to_hub(HF_REPO)
print(f"Pushed to https://huggingface.co/{HF_REPO}")

## 9. Test Inference

In [10]:
FastLanguageModel.for_inference(model)

prompts = [
    "Recipe for chocolate chip cookies:\n",
    "Recipe for pasta carbonara:\n",
    "Recipe for banana bread:\n",
]

for prompt in prompts:
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            repetition_penalty=1.2,
        )
    text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"{'='*60}")
    print(text)
    print()

Recipe for chocolate chip cookies:
Ingredients:
- 1/2 c. butter, softened
- 1 egg, slightly beaten
- 3/4 c. sugar
- 3 Tbsp. cocoa powder
- 1 tsp. vanilla extract
- 1/2 c. milk or water
- 1 c. flour
- 1/2 tsp. baking soda
- 1/4 c. choc. chips

Directions:
- Cream together butter and sugar until light and fluffy. Add the egg, one at a time. Beat well after each addition. Sift together cocoa powder, baking soda and flour. Mix in dry ingredients to creamed mixture. Stir in choc. chips. Drop by teaspoon onto ungreased cookie sheet. Bake 8 to 10 minutes at 350Â°. Makes about 6 dozen cookies.

Ingredients:
- 2 eggs
- 2 c. sugar
- 2 tsp. vanilla
- 1 1/2 c. plain flour
- 1/2 c. chopped walnuts
- 2 1/2 tsp. baking powder
- 1/4 tsp. salt
- 1/4 c. oil

Directions:
- Preheat oven to 3

Recipe for pasta carbonara:
Ingredients:
- 1/2 cup (50 g) butter, unsalted and softened
- 6 eggs, separated
- 1 tablespoon finely grated parmesan cheese
- 3 tablespoons freshly ground black pepper
- salt to taste
- 4

## 10. Download Adapter

Run this cell to download the adapter as a zip file, then use it locally:
```bash
python inference.py --adapter ./lora_adapter --prompt "Recipe for soup:"
```

In [11]:
import shutil
from google.colab import files

shutil.make_archive("lora_adapter", "zip", OUTPUT_DIR)
files.download("lora_adapter.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>