# 文書生成AIのファインチューニング・レシピ
本Notebookでは、MARC-jaというAmazonのレビューデータセットに含まれる文の一対比較によって報酬モデルを訓練するコードを記載します。

## 応用レシピ：報酬モデルの訓練

### (1) ライブラリのインストール

In [1]:
%pip install transformers==4.35.2
%pip install trl[peft]==0.7.10
%pip install wandb==0.16.2
%pip install sentencepiece==0.1.99
%pip install accelerate==0.26.1
%pip install bitsandbytes==0.42.0
%pip install datasets==2.16.1

Collecting transformers==4.35.2
  Downloading transformers-4.35.2-py3-none-any.whl.metadata (123 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m123.5/123.5 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.19,>=0.14 (from transformers==4.35.2)
  Downloading tokenizers-0.15.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading transformers-4.35.2-py3-none-any.whl (7.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tokenizers-0.15.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m94.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.19.1
    Uninstalling tokenizers-0.19.1:
      Successfully uninsta

### (2) Weights & Biases及びHugging Face Hubへのログイン
学習時のログを記録するWeights & Biases、及び訓練後のモデルの保存先であるHuggingface Hubにログインします。いずれも事前にアカウントの作成とトークンの取得が必要です。

In [2]:
import wandb
wandb.init()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [3]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### (3) ライブラリのインポート

In [4]:
import torch
from tqdm import tqdm
import numpy as np
import pandas as pd

tqdm.pandas()

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import load_dataset
from peft import LoraConfig, TaskType
from trl import RewardTrainer, RewardConfig

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


### (4) データセットの準備

In [5]:
base_model_name = 'rinna/japanese-gpt2-medium'
reward_model_name = 'taku-yoshioka/reward-model-0828'
dataset_name, subset_name = 'shunk031/JGLUE', 'MARC-ja'

In [6]:
ds = load_dataset(dataset_name, split='train', name=subset_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/28.7k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/38.9k [00:00<?, ?B/s]

Downloading data: 0.00B [00:00, ?B/s]

Downloading data:   0%|          | 0.00/3.09k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.42k [00:00<?, ?B/s]



Downloading data:   0%|          | 0.00/65.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.93M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

In [7]:
type(ds), len(ds)

(datasets.arrow_dataset.Dataset, 187528)

`RewardTrainer`クラスが利用可能なデータセットを作成します。

In [8]:
def preprocess(dataset, tokenizer, n_pairs):
    # positiveとnegativeに分割
    pos = []
    neg = []
    for item in tqdm(ds):
        if item['label'] == 0:
            pos.append(item['sentence'])
        else:
            neg.append(item['sentence'])
    print("Num of samples pos {}: neg {}".format(len(pos), len(neg)))

    # トーカナイズ
    pos_ids, pos_masks = [], []
    for text in tqdm(pos):
        tokenized = tokenizer(text, return_tensors='pt', truncation=True)
        pos_ids.append(tokenized['input_ids'].squeeze(0))
        pos_masks.append(tokenized['attention_mask'].squeeze(0))

    neg_ids, neg_masks = [], []
    for text in tqdm(neg):
        tokenized = tokenizer(text, return_tensors='pt', truncation=True)
        neg_ids.append(tokenized['input_ids'].squeeze(0))
        neg_masks.append(tokenized['attention_mask'].squeeze(0))

    # サンプリング
    rng = np.random.default_rng(seed=42)
    ixs_pos = rng.choice(len(pos), size=n_pairs)
    ixs_neg = rng.choice(len(neg), size=n_pairs)

    examples = []

    for ix_pos, ix_neg in zip(ixs_pos, ixs_neg):
        examples.append({
            'input_ids_chosen': pos_ids[ix_pos],
            'attention_mask_chosen': pos_masks[ix_pos],
            'input_ids_rejected': neg_ids[ix_neg],
            'attention_mask_rejected': neg_masks[ix_neg],
        })

    return examples


In [9]:
tokenizer = AutoTokenizer.from_pretrained(base_model_name, model_max_length=512)
tokenizer.pad_token = tokenizer.eos_token
dataset = preprocess(ds, tokenizer, n_pairs=180000) # 元のデータセットに含まれる文書数と大体同じ数の組とする



tokenizer_config.json:   0%|          | 0.00/282 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/806k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/153 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
100%|██████████| 187528/187528 [00:06<00:00, 29035.71it/s]


Num of samples pos 165477: neg 22051


100%|██████████| 165477/165477 [01:20<00:00, 2063.94it/s]
100%|██████████| 22051/22051 [00:10<00:00, 2056.37it/s]


### (5) 報酬モデルの訓練

In [10]:
model = AutoModelForSequenceClassification.from_pretrained(
    base_model_name, num_labels=1
)
model.config.pad_token_id = tokenizer.eos_token_id

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    # inference_mode=False,
    inference_mode=True,
    bias="none",
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)

reward_config = RewardConfig(
    output_dir="output",
    per_device_train_batch_size=64,
    num_train_epochs=5,
    gradient_accumulation_steps=16,
    learning_rate=1.41e-3,
    report_to="wandb",
    remove_unused_columns=False,
    optim="adamw_torch",
    logging_strategy="steps",
    logging_steps=1,
    max_length=512,
    seed=42,
)

trainer = RewardTrainer(
    model=model,
    args=reward_config,
    tokenizer=tokenizer,
    train_dataset=dataset,
    peft_config=peft_config,
)

trainer.train()

config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

  _torch_pytree._register_pytree_node(


model.safetensors:   0%|          | 0.00/1.37G [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at rinna/japanese-gpt2-medium and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
  return fn(*args, **kwargs)
Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss
1,0.653
2,0.6078
3,0.5485
4,0.4903
5,0.4411
6,0.4176
7,0.3845
8,0.3519
9,0.3139
10,0.3183


  return fn(*args, **kwargs)


TrainOutput(global_step=875, training_loss=0.12030494185856411, metrics={'train_runtime': 21706.7427, 'train_samples_per_second': 41.462, 'train_steps_per_second': 0.04, 'total_flos': 0.0, 'train_loss': 0.12030494185856411, 'epoch': 4.98})

### (6) 報酬モデルの保存

In [11]:
trainer.model.push_to_hub(reward_model_name)



adapter_model.safetensors:   0%|          | 0.00/3.16M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/taku-yoshioka/reward-model-0828/commit/d44de852058b444890caac227eb8c92f1b23c8ee', commit_message='Upload model', commit_description='', oid='d44de852058b444890caac227eb8c92f1b23c8ee', pr_url=None, pr_revision=None, pr_num=None)