# 👷 MacgAIver Finetuning mit LLaMA Factory

Please use a **free** Tesla T4 Colab GPU to run this!

Project homepage: https://github.com/hiyouga/LLaMA-Factory

## Abhängigkeiten installieren

In [1]:
%cd /content/
%rm -rf LLaMA-Factory
!git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
%ls
!pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pyarrow==14.0.2 datasets==2.16.1
!pip uninstall -y jax
!pip install -e .[torch,bitsandbytes,liger-kernel]

/content
Cloning into 'LLaMA-Factory'...
remote: Enumerating objects: 316, done.[K
remote: Counting objects: 100% (316/316), done.[K
remote: Compressing objects: 100% (245/245), done.[K
remote: Total 316 (delta 82), reused 214 (delta 58), pack-reused 0 (from 0)[K
Receiving objects: 100% (316/316), 8.92 MiB | 12.72 MiB/s, done.
Resolving deltas: 100% (82/82), done.
/content/LLaMA-Factory
[0m[01;34massets[0m/       [01;34mdocker[0m/      LICENSE      pyproject.toml  requirements.txt  [01;34msrc[0m/
CITATION.cff  [01;34mevaluation[0m/  Makefile     README.md       [01;34mscripts[0m/          [01;34mtests[0m/
[01;34mdata[0m/         [01;34mexamples[0m/    MANIFEST.in  README_zh.md    setup.py
Collecting torch==2.3.1
  Downloading torch-2.3.1-cp310-cp310-manylinux1_x86_64.whl.metadata (26 kB)
Collecting torchvision==0.18.1
  Downloading torchvision-0.18.1-cp310-cp310-manylinux1_x86_64.whl.metadata (6.6 kB)
Collecting torchaudio==2.3.1
  Downloading torchaudio-2.3.1-cp31

### Überprüfen der GPU

In [2]:
import torch
try:
  assert torch.cuda.is_available() is True
  print("GPU ist verfügbar. Es kann losgehen!")
except AssertionError:
  print("Please set up a GPU before using LLaMA Factory")

GPU ist verfügbar. Es kann losgehen!


### Huggingface-Login mit geheimem Token



In [4]:
!pip install huggingface_hub
from huggingface_hub import login

hf_token = "TOKEN" # @param{type:“string”}
login(token=hf_token, add_to_git_credential=True)

Token is valid (permission: fineGrained).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.cache/huggingface/token
Login successful


### Update Identity Dataset

In [5]:
import json

%cd /content/LLaMA-Factory/

NAME = "Gors Ndod, der virtuelle Touristenführer für den wunderschönen Kontinent Notteik."
AUTHOR = "DH-Workshop"

with open("data/identity.json", "r", encoding="utf-8") as f:
  dataset = json.load(f)

for sample in dataset:
  sample["output"] = sample["output"].replace("{{"+ "name" + "}}", NAME).replace("{{"+ "author" + "}}", AUTHOR)

with open("data/identity.json", "w", encoding="utf-8") as f:
  json.dump(dataset, f, indent=2, ensure_ascii=False)

/content/LLaMA-Factory


# Nächster Schritt:
 📌  eigene Datensätze hochladen

## Das Modell mit dem Webfrontend LLaMA Board finetunen

In [None]:
%cd /content/LLaMA-Factory/
!GRADIO_SHARE=1 llamafactory-cli webui

## 📌 Das Modell über die Kommandozeile finetunen

Je nach Größe des Datensatzes und Anzahl der Epochen, Lern-Rate usw. dauert das zw. 5 und 45 Minuten

In [None]:
import json

args = dict(
  stage="sft",                        # do supervised fine-tuning
  do_train=True,
  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit",
  dataset="identity, notteik",             # use alpaca and identity datasets
  template="llama3",                     # use llama3 prompt template
  finetuning_type="lora",                   # use LoRA adapters to save memory
  lora_target="all",                     # attach LoRA adapters to all linear layers
  output_dir="llama3_notteik",                  # the path to save LoRA adapters
  per_device_train_batch_size=2,               # the batch size
  gradient_accumulation_steps=4,               # the gradient accumulation steps
  lr_scheduler_type="cosine",                 # use cosine learning rate scheduler
  logging_steps=10,                      # log every 10 steps
  warmup_ratio=0.2,                      # use warmup scheduler
  save_steps=1000,                      # save checkpoint every 1000 steps
  learning_rate=5e-5,                     # the learning rate
  num_train_epochs=6.0,                    # the epochs of training
  max_samples=500,                      # use 500 examples in each dataset
  max_grad_norm=1.0,                     # clip gradient norm to 1.0
  loraplus_lr_ratio=16.0,                   # use LoRA+ algorithm with lambda=16.0
  fp16=True,                         # use float16 mixed precision training
  #use_liger_kernel=True,                   # use liger kernel for efficient training
)

json.dump(args, open("train_llama3.json", "w", encoding="utf-8"), indent=2)

%cd /content/LLaMA-Factory/

!llamafactory-cli train train_hilt_llama3.json

# Mit dem Modell testweise chatten:

In [8]:
!llamafactory-cli chat examples/inference/llama3.yaml

2024-10-30 11:06:56.638246: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-30 11:06:56.657803: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-30 11:06:56.663801: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-30 11:06:56.678693: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/u

## Den trainierten LoRA Adapter auf Huggingface hochladen

Es ist schneller, die Daten von einer Cloud (google) in eine andere (Huggingface) zu kopieren, als sie direkt aus Colab runterzuladen.

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 

### hier die richtigen Pfade wählen

In [None]:
!huggingface-cli upload trenkert/llama3_notteik llama3_notteik

als nächstes den LoRA Adapter auf den lokalen Rechner herunterladen und dort mit Llama-Factory mergen.