# **mistral-7b-qlora-alpaca**
This project presents a practical demonstration of instruction‑tuning the Mistral‑7B large language model using QLoRA, a parameter‑efficient fine‑tuning method. By employing 4‑bit quantization, the model can be trained on hardware with limited memory capacity, such as the free Google Colab T4 GPU.

The primary objectives are as follows:


1.   Showcase a complete, reproducible workflow for adapting a large‑scale model in a resource‑constrained environment.

2.   Utilise a small, curated subset of the Alpaca instruction‑following dataset to illustrate the fine‑tuning process from start to finish.

3.  Highlight practical skills in data preparation, model optimisation, and deployment‑ready output generation.






**The notebook Demonstrates:**

* The use of quantisation techniques to load and operate 7‑billion‑parameter
models efficiently.

* Application of QLoRA via the PEFT framework to update only targeted layers, thereby reducing computational cost.

* Integration of Hugging Face’s Trainer API for supervised fine‑tuning with intermediate evaluation and logging.

* Best practices for saving LoRA adapters separately, ensuring that the fine‑tuned model remains lightweight and portable for future inference.

Check Available GPU Hardware

In [2]:
!nvidia-smi || echo "No NVIDIA GPU detected; this may run on CPU and be very slow."

Thu Aug 14 15:49:46 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   47C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

Manage Dependencies and Install Specific Package Versions

In [1]:
!pip uninstall -y torch torchvision torchaudio triton bitsandbytes transformers accelerate peft numpy
!pip install numpy==1.26.4
!pip install --index-url https://download.pytorch.org/whl/cu121 torch==2.3.1 torchvision torchaudio
!pip install triton==2.3.1 bitsandbytes==0.43.1
!pip install transformers==4.43.3 peft==0.11.1 accelerate==0.32.1 datasets

[0mFound existing installation: numpy 1.26.4
Uninstalling numpy-1.26.4:
  Successfully uninstalled numpy-1.26.4
Collecting numpy==1.26.4
  Using cached numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Using cached numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
Installing collected packages: numpy
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sentence-transformers 5.1.0 requires torch>=1.11.0, which is not installed.
sentence-transformers 5.1.0 requires transformers<5.0.0,>=4.41.0, which is not installed.
fastai 2.7.19 requires torch<2.7,>=1.10, which is not installed.
fastai 2.7.19 requires torchvision>=0.11, which is not installed.
opencv-python 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.
thinc 8.3.6 requires nump

Looking in indexes: https://download.pytorch.org/whl/cu121
Collecting torch==2.3.1
  Downloading https://download.pytorch.org/whl/cu121/torch-2.3.1%2Bcu121-cp311-cp311-linux_x86_64.whl (781.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m781.0/781.0 MB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchvision
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.20.1%2Bcu121-cp311-cp311-linux_x86_64.whl (7.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.3/7.3 MB[0m [31m84.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchaudio
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.5.1%2Bcu121-cp311-cp311-linux_x86_64.whl (3.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m86.0 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.3.1)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cuda_nvrtc_cu12-12.1.105-py3-n

Manage Imports and Verify PyTorch and CUDA Installation

In [3]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
print("Torch:", torch.__version__, "CUDA:", torch.version.cuda)
print("CUDA available:", torch.cuda.is_available())

Torch: 2.3.1+cu121 CUDA: 12.1
CUDA available: True


Load the Alpaca Instruction-Following Dataset

In [17]:
from datasets import load_dataset
dataset = load_dataset("tatsu-lab/alpaca")

Install dependencies (Transformers, Datasets, PEFT, Accelerate, BitsAndBytes)

In [5]:
!pip -q install --upgrade pip
!pip -q install "transformers>=4.41.0" "datasets>=2.18.0" "peft>=0.11.0" "accelerate>=0.30.0" bitsandbytes==0.43.1

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.4/1.8 MB[0m [31m10.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[?25h

Imports & Config

In [13]:
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig,
    TrainingArguments, Trainer, DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model, PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

# Model & training parameters/knobs
model_id = "mistralai/Mistral-7B-v0.1"  # could be switched to newer Mistral if available
max_length = 512
train_samples = 200
eval_samples = 60
max_steps = 60
grad_accum = 4
per_device_bs = 1
lr = 2e-4

Using device: cuda


Load Alpaca dataset and sample a small subset

In [14]:
dataset = load_dataset("tatsu-lab/alpaca")

# Shuffling and selecting small subsets for train/eval to fit free GPU constraints
train_dataset = dataset["train"].shuffle(seed=42).select(range(train_samples))
eval_dataset  = dataset["train"].shuffle(seed=123).select(range(eval_samples))

# Peeking
train_dataset[0]

{'instruction': 'What would be the best type of exercise for a person who has arthritis?',
 'input': '',
 'output': 'For someone with arthritis, the best type of exercise would be low-impact activities like yoga, swimming, or walking. These exercises provide the benefits of exercise without exacerbating the symptoms of arthritis.',
 'text': 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat would be the best type of exercise for a person who has arthritis?\n\n### Response:\nFor someone with arthritis, the best type of exercise would be low-impact activities like yoga, swimming, or walking. These exercises provide the benefits of exercise without exacerbating the symptoms of arthritis.'}

Authenticate with Hugging Face

In [11]:
!huggingface-auth login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `hf auth whoami` to get more information or `hf auth logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: fineGrained).
The tok

Build prompt template + tokenize

In [15]:
# Alpaca has fields: instruction, input, output
def build_prompt(example):
    instr = example.get("instruction", "").strip()
    inp   = example.get("input", "").strip()
    out   = example.get("output", "").strip()

    if inp:
        user = f"### Instruction:\n{instr}\n\n### Input:\n{inp}"
    else:
        user = f"### Instruction:\n{instr}"
    assistant = f"\n\n### Response:\n{out}"
    return user + assistant

from functools import partial

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
# Mistral-like models: use EOS as pad
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

def tokenize(example, tokenizer, max_length=512):
    text = build_prompt(example)
    toks = tokenizer(
        text,
        truncation=True,
        max_length=max_length,
        padding="max_length"
    )
    # For causal LM with Trainer, labels = input_ids
    toks["labels"] = toks["input_ids"].copy()
    return toks

tokenized_train = train_dataset.map(partial(tokenize, tokenizer=tokenizer, max_length=max_length), batched=False)
tokenized_eval  = eval_dataset.map(partial(tokenize, tokenizer=tokenizer, max_length=max_length), batched=False)

len(tokenized_train), len(tokenized_eval)


(200, 60)

Load Mistral-7B in 4-bit & apply QLoRA

In [None]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# Typical LoRA on LLaMA/Mistral-style attention (q_proj, v_proj)
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj","v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
  )

model = get_peft_model(model, lora_config)

In [18]:
model.print_trainable_parameters()

trainable params: 6,815,744 || all params: 7,248,547,840 || trainable%: 0.0940


Data collator for causal LM

In [20]:
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

 Train (suitable for free Colab)

In [21]:
output_dir = "./mistral-7b-qlora-alpaca"
logging_steps = 5

training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_bs,
    per_device_eval_batch_size=per_device_bs,
    gradient_accumulation_steps=grad_accum,
    learning_rate=lr,
    max_steps=max_steps,                 # keeping low for demo;  could be increased if access to better GPUs for better results
    warmup_steps=max(2, max_steps//10),
    lr_scheduler_type="cosine",
    fp16=(torch.cuda.is_available() and torch.cuda.get_device_capability()[0] < 8),
    bf16=(torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8),
    logging_steps=logging_steps,
    evaluation_strategy="steps",
    eval_steps=logging_steps*2,
    save_strategy="no",
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    data_collator=data_collator,
    tokenizer=tokenizer,
)

train_result = trainer.train()
train_result.metrics

max_steps is given, it will override any value given in num_train_epochs


Step,Training Loss,Validation Loss
10,1.5613,1.372588
20,1.1837,1.253242
30,1.2838,1.164317
40,1.149,1.15495
50,1.1033,1.14816


We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)


Step,Training Loss,Validation Loss
10,1.5613,1.372588
20,1.1837,1.253242
30,1.2838,1.164317
40,1.149,1.15495
50,1.1033,1.14816
60,1.0572,1.146305


{'train_runtime': 451.5793,
 'train_samples_per_second': 0.531,
 'train_steps_per_second': 0.133,
 'total_flos': 5247572587315200.0,
 'train_loss': 1.277275816599528,
 'epoch': 1.2}

  Quick generation test

In [22]:
model.eval()
prompt = "### Instruction:\nExplain the concept of overfitting in machine learning.\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        temperature=0.7,
        top_p=0.9,
        do_sample=True
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


### Instruction:
Explain the concept of overfitting in machine learning.

### Response:
Overfitting is a common problem in machine learning, where a model becomes too complex for the data it is trained on, resulting in poor generalization. This can happen when a model is trained on too much data or when it is trained on data that is not representative of the actual data it will be used on. In machine learning, overfitting is often identified by a model's ability to accurately predict the training data, but not the test data. To prevent overfitting, models should be trained on a limited amount of data, and their complexity should be kept in check.


 Save LoRA adapters (recommended for QLoRA)

In [23]:
adapters_dir = "./mistral-7b-qlora-alpaca-adapters"
model.save_pretrained(adapters_dir)
tokenizer.save_pretrained(adapters_dir)
print("Saved LoRA adapters to:", adapters_dir)

Saved LoRA adapters to: ./mistral-7b-qlora-alpaca-adapters


In [24]:
import shutil

zip_path = "/content/mistral-7b-qlora-alpaca.zip"
shutil.make_archive("/content/mistral-7b-qlora-alpaca", 'zip', "/content")
print(f"Zipped file saved at: {zip_path}")


Zipped file saved at: /content/mistral-7b-qlora-alpaca.zip
