# LLama-Factory Demonstration

This notebook demonstrates the use of LLama-Factory for various training methods:
1. Supervised Fine-Tuning (LoRA/QLoRA)
2. DPO (Direct Preference Optimization) Training
3. PPO (Proximal Policy Optimization) Training

First, we'll set up our environment and install the necessary dependencies.

In [9]:
%cd /content/
%rm -rf LLaMA-Factory
!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
%ls
%pip install -e .[torch,bitsandbytes]

/content
Cloning into 'LLaMA-Factory'...
remote: Enumerating objects: 17255, done.[K
remote: Counting objects: 100% (7900/7900), done.[K
remote: Compressing objects: 100% (566/566), done.[K
remote: Total 17255 (delta 7509), reused 7335 (delta 7334), pack-reused 9355 (from 1)[K
Receiving objects: 100% (17255/17255), 226.11 MiB | 16.08 MiB/s, done.
Resolving deltas: 100% (12710/12710), done.
/content/LLaMA-Factory
[0m[01;34massets[0m/       [01;34mdocker[0m/      LICENSE      pyproject.toml  requirements.txt  [01;34msrc[0m/
CITATION.cff  [01;34mevaluation[0m/  Makefile     README.md       [01;34mscripts[0m/          [01;34mtests[0m/
[01;34mdata[0m/         [01;34mexamples[0m/    MANIFEST.in  README_zh.md    setup.py
Obtaining file:///content/LLaMA-Factory
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metada

In [16]:
import torch
try:
  assert torch.cuda.is_available() is True
except AssertionError:
  print("Please set up a GPU before using LLaMA Factory: https://medium.com/mlearning-ai/training-yolov4-on-google-colab-316f8fff99c6")

Please set up a GPU before using LLaMA Factory: https://medium.com/mlearning-ai/training-yolov4-on-google-colab-316f8fff99c6


## Setting up Hugging Face Token

To access models and datasets from Hugging Face, we need to set up our access token.

In [10]:
from huggingface_hub import login
from google.colab import userdata
hf_token = userdata.get("HUGGINGFACE_TOKEN")
login(token = hf_token)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
%cd /content/LLaMA-Factory/
!GRADIO_SHARE=1 llamafactory-cli webui

/content/LLaMA-Factory
2024-09-18 06:44:17.049855: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-18 06:44:17.075784: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-18 06:44:17.083484: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-18 06:44:17.103989: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Running on local URL:  http://

## 1. Supervised Fine-Tuning (LoRA)

We'll demonstrate supervised fine-tuning using LoRA on a small dataset.

In [17]:
!python src/train_bash.py \
    --stage sft \
    --model_name_or_path meta-llama/Llama-2-7b-hf \
    --do_train \
    --dataset alpaca_gpt4_en \
    --template alpaca \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir ./output/lora_sft \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

python3: can't open file '/content/LLaMA-Factory/src/train_bash.py': [Errno 2] No such file or directory


## 2. DPO (Direct Preference Optimization) Training

Now, we'll demonstrate DPO training using a preference dataset.

In [14]:
!python src/train_bash.py \
    --stage dpo \
    --model_name_or_path meta-llama/Llama-2-7b-hf \
    --do_train \
    --dataset hh_rlhf_en \
    --template human_bot \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir ./output/lora_dpo \
    --overwrite_cache \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

python3: can't open file '/content/LLaMA-Factory/src/train_bash.py': [Errno 2] No such file or directory


## 3. PPO (Proximal Policy Optimization) Training

Lastly, we'll demonstrate PPO training using a reward model.

In [15]:
!python src/train_bash.py \
    --stage ppo \
    --model_name_or_path meta-llama/Llama-2-7b-hf \
    --do_train \
    --dataset hh_rlhf_en \
    --template human_bot \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --reward_model /path/to/your/reward/model \
    --output_dir ./output/lora_ppo \
    --overwrite_cache \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

python3: can't open file '/content/LLaMA-Factory/src/train_bash.py': [Errno 2] No such file or directory


## Conclusion

This notebook has demonstrated the use of LLama-Factory for three different training methods: Supervised Fine-Tuning (LoRA), DPO Training, and PPO Training. Each method has its own use case and can be further customized based on specific requirements.

