# Supervised Fine-Tuning with LLama-Factory (LoRA/QLoRA)

This notebook demonstrates how to perform supervised fine-tuning using LLama-Factory with LoRA or QLoRA techniques on a small model like Gemma 2B quantized.

## Setup

First, let's clone the LLama-Factory repository and install the necessary dependencies.

In [1]:
!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
!pip install -r requirements.txt

Cloning into 'LLaMA-Factory'...
remote: Enumerating objects: 17708, done.[K
remote: Counting objects: 100% (8100/8100), done.[K
remote: Compressing objects: 100% (695/695), done.[K
remote: Total 17708 (delta 7612), reused 7504 (delta 7405), pack-reused 9608 (from 1)[K
Receiving objects: 100% (17708/17708), 226.94 MiB | 15.06 MiB/s, done.
Resolving deltas: 100% (13042/13042), done.
/content/LLaMA-Factory
Collecting datasets<=2.21.0,>=2.16.0 (from -r requirements.txt (line 2))
  Downloading datasets-2.21.0-py3-none-any.whl.metadata (21 kB)
Collecting peft<=0.12.0,>=0.11.1 (from -r requirements.txt (line 4))
  Downloading peft-0.12.0-py3-none-any.whl.metadata (13 kB)
Collecting trl<=0.9.6,>=0.8.6 (from -r requirements.txt (line 5))
  Downloading trl-0.9.6-py3-none-any.whl.metadata (12 kB)
Collecting gradio>=4.0.0 (from -r requirements.txt (line 6))
  Downloading gradio-4.44.1-py3-none-any.whl.metadata (15 kB)
Collecting tiktoken (from -r requirements.txt (line 11))
  Downloading tikto

## Hugging Face Authentication

Set up your Hugging Face token to access the model and datasets.

In [3]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Prepare the Dataset

For this example, we'll use a small subset of the Alpaca dataset for fine-tuning.

In [5]:
!python scripts/data_preprocess.py \
    --dataset alpaca \
    --subset_num 1000 \
    --output_path ./data/alpaca_subset.json

## Fine-tuning

Now, let's perform supervised fine-tuning using LoRA on the Gemma 2B quantized model.

In [6]:
!python src/train_bash.py \
    --stage sft \
    --model_name_or_path google/gemma-2b \
    --do_train \
    --dataset alpaca \
    --dataset_dir ./data \
    --template default \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir ./output \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

## Inference

After fine-tuning, let's test the model with some example prompts.

In [7]:
!python src/cli_demo.py \
    --model_name_or_path google/gemma-2b \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir ./output