<a href="https://colab.research.google.com/github/seiichiinoue/label-studio-sample/blob/main/train_qlora.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 更新記録

- 23/10/01: stablelmを学習可能に

# 概要

[artidoro/qlora](https://github.com/artidoro/qlora)を用いて、llama2をQLoRAでinstruction tuningするコード


*   参考実装: https://note.com/npaka/n/na7c631175111
*   手順書: https://docs.google.com/document/d/1GIoUtoJFuGAVLfHaWHb4ILX8RLFajb7qmxF-dmoFhak/edit?usp=sharing

# 前提

https://huggingface.co/meta-llama/Llama-2-7b-hf

で利用申請済み（meta form提出 + huggingface hubでsubmit）

# 学習

利用したいベースLLMモデルを指定

In [1]:
model_name_or_path = 'meta-llama/Llama-2-7b-chat-hf' #@param ["meta-llama/Llama-2-7b-chat-hf", "stabilityai/japanese-stablelm-base-alpha-7b"] {allow-input: true}

メモ:
- "meta-llama/Llama-2-7b-hf: 学習時ロスゼロ問題が生じる
- "elyza/ELYZA-japanese-Llama-2-7b": RAMが足りない

利用したいデータセットを指定

In [11]:
# dataset_name = 'databricks-dolly-15k-ja' #@param ["databricks-dolly-15k-ja", "hh-rlhf-49k-ja"] {allow-input: true}
dataset_name = '/content/drive/MyDrive/llama2_qlora/data/label-studio_output.json'  # convert.pyで出力したやつ

In [3]:
# モデルごとに必要な設定
tokenizer_name = None
variant = None
bits = 4
per_device_train_batch_size = 4

if model_name_or_path == 'stabilityai/japanese-stablelm-base-alpha-7b':
  tokenizer_name = "novelai/nerdstash-tokenizer-v1"
  # pytorch_model.int8.binをロードするための設定
  variant = "int8"
  bits = 8
  # Colab freeだと2以下にしないとVRAMが足りない
  per_device_train_batch_size = 2

In [4]:
# Google Driveマウント
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [5]:
# 作業フォルダへの移動
import os
os.makedirs("/content/drive/My Drive/llama2_qlora", exist_ok=True)
%cd "/content/drive/My Drive/llama2_qlora"

/content/drive/My Drive/llama2_qlora


In [6]:
# モデルのチェックポイントのドライブ保存先相対パス
output_dir = "./results/qlora"

In [7]:
%rm -rf qlora_ja
!git clone https://github.com/Sosuke115/qlora_ja

Cloning into 'qlora_ja'...
remote: Enumerating objects: 42, done.[K
remote: Counting objects: 100% (42/42), done.[K
remote: Compressing objects: 100% (33/33), done.[K
remote: Total 42 (delta 15), reused 28 (delta 9), pack-reused 0[K
Receiving objects: 100% (42/42), 16.94 KiB | 1.30 MiB/s, done.
Resolving deltas: 100% (15/15), done.


In [8]:
# パッケージのインストール
!pip install -U -r qlora_ja/qlora/requirements.txt



In [9]:
# HuggingFaceのログイン
# Add token as git credential? (Y/n) はnで良い
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root

In [12]:
%%time

# A100だと--bf16が使える
# kunishou/databricks-dolly-15k-ja: 1h 15mほど
# hh-rlhf-49k-ja: 1h 18min 44sほど
# チェックポイントをドライブに保存するため、ドライブのストレージ容量に注意
# 学習ログの確認にwandbを利用する場合は別で登録が必要

# 学習の実行
!python qlora_ja/qlora/qlora.py \
    --model_name $model_name_or_path \
    --output_dir $output_dir \
    --dataset $dataset_name \
    --dataset_format "alpaca" \
    --max_steps 500 \
    --use_auth \
    --logging_steps 10 \
    --save_strategy steps \
    --data_seed 42 \
    --save_steps 100 \
    --save_total_limit 40 \
    --max_new_tokens 32 \
    --dataloader_num_workers 1 \
    --group_by_length \
    --logging_strategy steps \
    --remove_unused_columns False \
    --do_train \
    --lora_r 64 \
    --lora_alpha 16 \
    --lora_modules all \
    --lora_dropout 0.1 \
    --double_quant \
    --quant_type nf4 \
    --fp16 \
    --bits $bits \
    --warmup_ratio 0.03 \
    --lr_scheduler_type constant \
    --gradient_checkpointing \
    --source_max_len 16 \
    --target_max_len 512 \
    --per_device_train_batch_size $per_device_train_batch_size \
    --gradient_accumulation_steps 2 \
    --eval_steps 50 \
    --learning_rate 0.0002 \
    --adam_beta2 0.999 \
    --max_grad_norm 0.3 \
    --weight_decay 0.0 \
    --seed 42 \
    --use_peft \
    --trust_remote_code True \
    --report_to wandb


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
  "max_new_tokens": 32,
  "transformers_version": "4.31.0"
}
, cache

# 推論とhubへのpush

In [None]:
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# トークナイザーとモデルの読み込み
tokenizer = AutoTokenizer.from_pretrained(
    tokenizer_name if tokenizer_name else model_name_or_path,
    use_fast=False,
    trust_remote_code=True
)

# 3分ほどかかる
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    variant=variant,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=bits == 4,
        load_in_8bit=bits == 8,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    ),
    device_map="auto",
    trust_remote_code=True
)


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
# LoRAの読み込み
# 5分くらいかかる
# 注: 学習時ロスゼロ問題が生じたときは、ロスがゼロになる前のチェックポイントを読み込めば一応動作は可能
checkpoint_path = os.path.join(output_dir, "checkpoint-500")
model = PeftModel.from_pretrained(
    model,
    os.path.join(checkpoint_path, "adapter_model"),
    device_map={"":0}
)
model.eval()

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32000, 4096, padding_idx=0)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): Linear4bit(
                in_features=4096, out_features=4096, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): Linear4bit(
                in_features=4096, out_features=4096, bias=False

In [None]:
# プロンプトの準備
prompt = "### Instruction: 富士山とは？\n\n### Response: "

# 推論の実行
inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

### Instruction: 富士山とは？

### Response: 富士山は、日本の山で最も高く、標高3776m。日本の国章の山の一部をなし、日本の象徴としても知られています。


In [None]:
# huggingface hubにpush（※適宜保存先パスを修正してpushしてください）

# 自分のアカウントにupする場合
upload_hf_hub_path = f"Llama-2-7b-chat-hf-{dataset_name}-qlora-sft"
# studio-ousia organizationにupする場合
# upload_hf_hub_path = f"studio-ousia/Llama-2-7b-chat-hf-{dataset_name}-qlora-sft"

model.push_to_hub(upload_hf_hub_path)

adapter_model.bin:   0%|          | 0.00/640M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/studio-ousia/Llama-2-7b-chat-hf-hh-rlhf-49k-ja-qlora-sft/commit/4378b108c1cdfdf461a4ae4d01ebc3eda18928a7', commit_message='Upload model', commit_description='', oid='4378b108c1cdfdf461a4ae4d01ebc3eda18928a7', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
# 学習設定情報のみhuggingface hubにpushするコード

import os
import torch
import json
from transformers import TrainingArguments
from huggingface_hub import HfApi, HfFolder
training_args = torch.load(os.path.join(checkpoint_path, "training_args.bin"))

# TrainingArgumentsオブジェクトを辞書に変換
training_args_dict = training_args.to_dict()
json_file_path = os.path.join(checkpoint_path, "training_args.json")
# 辞書をJSONファイルとして保存
with open(json_file_path, "w") as f:
    json.dump(training_args_dict, f)

# ユーザートークンの取得
token = HfFolder().get_token()
# Hugging Face APIの初期化
api = HfApi()

# ファイルをHugging Face Hubにアップロード
url = api.upload_file(
    token=token,
    path_or_fileobj=json_file_path,
    repo_id=upload_hf_hub_path,
    path_in_repo="training_args.json"  # この名前でHugging Face Hubに保存される
)

# 補足
# 目的: QLoRAでのadapter学習時の学習設定をhubに上げたい
# training_argsのみpush to hubはできない: https://discuss.huggingface.co/t/how-to-load-training-args/5720/3
# TrainerやRepositoryを利用する方法も厄介そう
# 上記理由で本セルのようなhuggingface APIを利用したコードを書いている（がもっと良い方法ありそう）