<a href="https://colab.research.google.com/github/sit-xinli/ai-course7/blob/main/GenAI_dpo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 生成AIモデルアライメント
## 目的
- ラベル付けされたプリファレンスデータを使用してモデルの動作を調整する方法を学ぶ。

## 必要なライブラリのインストールとインポート (~2分)
## ブロックが正常に終了した場合は警告を無視する。

In [1]:
!pip install -U bitsandbytes
!pip install datasets peft trl accelerate

Collecting bitsandbytes
  Downloading bitsandbytes-0.46.0-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch<3,>=2.2->bitsandbytes)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch<3,>=2.2->bitsandbytes)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch<3,>=2.2->bitsandbytes)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch<3,>=2.2->bitsandbytes)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch<3,>=2.2->bitsandbytes)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-c

In [2]:
import os
import torch
import json
from datasets import Dataset
import pandas as pd
from peft import LoraConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig, GenerationConfig
from tqdm.auto import tqdm
from trl import DPOConfig, DPOTrainer

## データセットを読み込む

In [7]:
!git clone https://github.com/sit-xinli/ai-course7.git

Cloning into 'ai-course7'...
remote: Enumerating objects: 25, done.[K
remote: Counting objects: 100% (25/25), done.[K
remote: Compressing objects: 100% (21/21), done.[K
Receiving objects: 100% (25/25), 17.21 KiB | 2.87 MiB/s, done.
Resolving deltas: 100% (10/10), done.
remote: Total 25 (delta 10), reused 12 (delta 3), pack-reused 0 (from 0)[K


In [8]:
# Open and load the json dataset
with open("/content/ai-course7/DPO_trainingdata_ja.json", 'r') as jsonfile:
    full_data = json.load(jsonfile)

with open("/content/ai-course7/DPO_testingdata_ja.json", 'r') as jsonfile:
    test_data = json.load(jsonfile)

## ロードモデル

In [9]:
import getpass
from google.colab import userdata
API_TOKEN=userdata.get('HF_TOKEN')

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-1.5B-Instruct",
    device_map='auto',
    trust_remote_code=True,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type='nf4'
    ),
    use_auth_token=False
)



config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

## オリジナルモデルからのレスポンスを得る

In [None]:
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen2.5-1.5B-Instruct",
    use_auth_token=False
    )
tokenizer.padding_side = "right"
tokenizer.pad_token = tokenizer.eos_token

def data_formulate(data):
    messages = [
        {"role": "system", "content": '20字以内でご回答ください。'},
        {"role": "user", "content": data['prompt']},
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    return prompt

original_model_response = []
for data in tqdm(test_data):
    id = data['id']
    print(f'Question {id}:\n'+data['prompt'])
    inputs = tokenizer(data_formulate(data), return_tensors="pt").to('cuda')
    generation_config=GenerationConfig(
            do_sample=False,
            max_new_tokens = 200,
            pad_token_id = tokenizer.pad_token_id
    )
    output = model.generate(**inputs, generation_config=generation_config)
    output = tokenizer.batch_decode(output, skip_special_tokens=True)[0].split('assistant\n')[1]

    original_model_response.append(output)
    print('Response from original model:\n'+output+'\n')

  0%|          | 0/10 [00:00<?, ?it/s]

Question 1:
真人化は日本の漫画の世界的なアクセス性を改善できるだろうか？
Response from original model:
はい、真人化は漫画の世界観をよりリアルに表現し、読者との親密な関係を築く効果があると考えられる。

Question 2:
真人化は若い世代の日本の漫画に対する見方にどのような影響を与えるのか？
Response from original model:
影響は大きい。若者はリアル感を求める傾向に。

Question 3:
真人化は原作漫画の文学的価値を高めることができるのか？
Response from original model:
はい、真人化は原作の文学的価値を高める可能性があります。

Question 4:
真人化は日本の漫画の伝統を守り保存するのに役立つのでしょうか？
Response from original model:
はい、真人化は日本の漫画の伝統を守り、漫画の世界を広める役割があります。

Question 5:
真人化は日本の漫画業界の経済効果向上に寄与するか？
Response from original model:
はい、真人化は漫画業界の経済効果向上に寄与しています。

Question 6:
真人化は日本の漫画原作者の創作意欲にどのように影響するのか？


## パラメーターを設定する
### このブロックだけを変更すればよい。 他の部分は変更しないでください。

In [None]:
num_epoch = 1
data_size = 50
support_ratio = 0

## トレーニングデータの準備

In [None]:
# Select part of the data for training
training_data = full_data[:data_size]

# Define the size of the support dataset
support_data_size = int(data_size * support_ratio)

# Prepare the data for the training dataset
prompt_list = [data_formulate(data) for data in training_data]
chosen_list = [data['support'] for data in training_data[:support_data_size]] + [data['oppose'] for data in training_data[support_data_size:]]
rejected_list = [data['oppose'] for data in training_data[:support_data_size]] + [data['support'] for data in training_data[support_data_size:]]
position_list = ['support' for _ in range(support_data_size)] + ['oppose' for _ in range(data_size - support_data_size)]

# Create the training dataset
train_dataset = Dataset.from_dict({'prompt': prompt_list, 'position': position_list, 'chosen': chosen_list, 'rejected': rejected_list})
pd.DataFrame(train_dataset).rename(columns={"chosen": "preferred", "rejected": "non-preferred"})

## トレーニング

In [None]:

# DPOConfig を使ってトレーニング設定を定義
training_args = DPOConfig(
   output_dir='./',
   per_device_train_batch_size=1,
   num_train_epochs=num_epoch,
   gradient_accumulation_steps=8,
   gradient_checkpointing=False,
   learning_rate=2e-4,
   optim="paged_adamw_8bit",
   logging_steps=1,
   warmup_ratio=0.1,
   report_to=None,
)


In [None]:
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

In [None]:
# 推奨：このセルでコードを変更しないこと
os.environ["WANDB_MODE"] = "offline"
os.environ["WANDB_DISABLED"] = "true"

# DPOTrainer の初期化（ref_model も必要です）
dpo_trainer = DPOTrainer(
    model=model,
    #ref_model=ref_model,
    args=training_args,
    train_dataset=train_dataset,
    #eval_dataset=eval_dataset,# 任意
    peft_config=peft_config,
    processing_class=tokenizer# 任意（ログや保存に便利）
)


In [None]:
dpo_trainer.train()

## 学習済みモデルからの応答を得る

In [None]:
trained_model_response = []
for data in tqdm(test_data):
    id = data['id']
    print(f'Question {id}:\n'+data['prompt'])
    inputs = tokenizer(data_formulate(data), return_tensors="pt").to('cuda')
    generation_config=GenerationConfig(
            do_sample=False,
            max_new_tokens = 200,
            pad_token_id = tokenizer.pad_token_id
    )
    output = model.generate(**inputs, generation_config=generation_config)
    output = tokenizer.batch_decode(output, skip_special_tokens=True)[0].split('assistant\n')[1]
    trained_model_response.append(output)
    print('Response from trained model:\n'+output+'\n')

## レポートを完成させるために、このブロックの出力を観察し、結果のスクリーンショットを撮る。

In [None]:
model_response = []
print(f'num_epoch: {num_epoch}\ndata_size: {data_size}\nsupport_ratio: {support_ratio}')
print()
for data in test_data:
    id = data['id']
    ref_output = original_model_response[id-1]
    output = trained_model_response[id-1]
    print(f'Question {id}:\n'+data['prompt'])
    print('Response from original model:\n'+ref_output)
    print('Response from trained model:\n'+output)
    print()
    model_response.append({'id':data['id'], 'prompt':data['prompt'], 'response_from_original_model':ref_output, 'response_from_trained_model':output})

## 出力ファイルの取得

In [None]:
with open(f"epoch-{num_epoch}_size-{data_size}_ratio-{support_ratio}.json", "w", encoding='UTF-8') as outfile:
    json.dump(model_response, outfile, indent=4, ensure_ascii=False)