Collection of scripts to perform finetuning on consumer grade hardware or google colab free tier
This repository provides minimal and resource-friendly code to fine-tune large language models (LLMs) using LoRA on custom instruction-style datasets (like Alpaca) using either:
- ✅ CPU-only or Apple M1/M2 (via MPS)
- ✅ Free Colab GPUs (no paid subscription required)
Our approach allows modular plug-and-play fine-tuning across different models and datasets.
| Parameter | Why We Use It |
|---|---|
LoRA via peft |
Trains only a small set of parameters → saves memory |
batch_size=1 |
Prevents OOM errors on CPU / MPS / small GPUs |
gradient_accumulation_steps=4 |
Simulates larger batch size with less memory |
learning_rate=2e-4 |
Empirically effective for small models + LoRA |
num_train_epochs=1-3 |
Quick convergence for small datasets; can be increased for better results |
fp16=False on CPU/MPS |
Mixed precision not stable without CUDA |
save_total_limit=2 |
Prevents disk overflow from multiple checkpoints |
alpaca-style datasets |
Easy to adapt for any task format (instruction + input → response) |
pip install -r requirements.txtThis section outlines the standard pattern followed in all training scripts within this repo. It ensures modularity, clarity, and compatibility with consumer-grade hardware setups.
-
Load the base model and tokenizer
UseAutoModelForCausalLMandAutoTokenizerfrom Hugging Face.
Settorch_dtype=torch.float32andlow_cpu_mem_usage=Trueto reduce memory usage. -
Configure LoRA with
LoraConfig
Leveragepeftto define low-rank adaptation:r=8(rank)lora_alpha=16(scaling factor)lora_dropout=0.1bias="none"task_type=TaskType.CAUSAL_LM
-
Format your dataset
Each record must have:instruction(what the model should do)input(optional context)output(the expected response)
Format the prompt in Alpaca style: Instruction: {instruction} Input: {input} Response: {output}
-
Fine-tune using
SFTTrainer
A lightweight wrapper fromtrlsimplifies the process.
Pass in:
model(with LoRA applied)train_dataset(after tokenization)TrainingArgumentsdata_collator(disable MLM)
- Save or merge the LoRA adapter
- Use
save_stepsandsave_total_limitto control checkpointing. - Optionally merge the LoRA adapter into the base model after training using PEFT utilities.
📁 Folder: ./experiments/phi2-alpaca-lora
| Component | Setting |
|---|---|
| Base Model | microsoft/phi-2 |
| Dataset | tatsu-lab/alpaca (1% sample) |
| LoRA Rank (r) | 8 |
| LoRA Alpha | 16 |
| Dropout | 0.1 |
| Max Length | 512 tokens |
| Device | CPU / MPS (Apple Silicon) |
| Batch Size | 1 |
| Accumulation | 4 |
| Epochs | 1 |
| Learning Rate | 2e-4 |
r=8,alpha=16→ Good balance between performance and efficiency for LoRA on small devicesgradient_accumulation_steps=4→ Effective batch size = 4max_length=512→ Shorter sequences enable faster trainingoutput_dir=./mistral-alpaca-lora→ Easy to organize multiple experiments
We use Alpaca-style formatting for supervised fine-tuning:
def format_text(text):
if text["input"]:
full_prompt = f"Instruction: {text['instruction']}\nInput: {text['input']}\n\nResponse:"
else:
full_prompt = f"Instruction: {text['instruction']}\n\nResponse:"
tokenized = tokenizer(full_prompt + " " + text["output"], ...)