# 第10章: 事前学習済み言語モデル（GPT型）

本章では、GPT型（Transformerのデコーダ型）の事前学習済みモデルを利用して、言語生成、評判分析器（ポジネガ分類器）の構築、ファインチューニング、強化学習などに取り組む。

## 90. 次単語予測

“The movie was full of"に続くトークン（トークン列ではなく一つのトークンであることに注意せよ）として適切なもの上位10個と、その確率（尤度）を求めよ。ただし、言語モデルへのプロンプトがどのようなトークン列に変換されたか、確認せよ。

In [8]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "openai-community/gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

sentence = "The movie was full of"

inputs = tokenizer(sentence, return_tensors="pt")
input_ids = inputs.input_ids

print("Input:")
token_ids = input_ids[0].tolist()
tokens = [tokenizer.decode([token_id]) for token_id in token_ids]
for i, (token_id, token) in enumerate(zip(token_ids, tokens)):
    print(f"  {i+1}. [ID]: {token_id}, [Token]: '{token}'")

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits[:, -1, :]  
    probabilities = torch.nn.functional.softmax(logits, dim=-1)[0]

top_k = 10
top_k_probs, top_k_indices = torch.topk(probabilities, top_k)

print(f"\nTop {top_k} next tokens predictions:")
for i, (prob, idx) in enumerate(zip(top_k_probs, top_k_indices)):
    token = tokenizer.decode([idx])
    print(f"  {i+1}. [Token]: '{token}', [Probability]: {prob:.6f}")

Input:
  1. [ID]: 464, [Token]: 'The'
  2. [ID]: 3807, [Token]: ' movie'
  3. [ID]: 373, [Token]: ' was'
  4. [ID]: 1336, [Token]: ' full'
  5. [ID]: 286, [Token]: ' of'

Top 10 next tokens predictions:
  1. [Token]: ' jokes', [Probability]: 0.021892
  2. [Token]: ' great', [Probability]: 0.018644
  3. [Token]: ' laughs', [Probability]: 0.011524
  4. [Token]: ' bad', [Probability]: 0.010874
  5. [Token]: ' surprises', [Probability]: 0.010667
  6. [Token]: ' references', [Probability]: 0.010528
  7. [Token]: ' fun', [Probability]: 0.009992
  8. [Token]: ' humor', [Probability]: 0.007415
  9. [Token]: ' "', [Probability]: 0.007408
  10. [Token]: ' the', [Probability]: 0.006709


## 91. 続きのテキストの予測

“The movie was full of"に続くテキストを複数予測せよ。このとき、デコーディングの方法や温度パラメータ（temperature）を変えながら、予測される複数のテキストの変化を観察せよ。

In [9]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

generator = pipeline("text-generation", model="openai-community/gpt2")
sentence = "The movie was full of"

temperature = [0.5, 1.0, 1.5, 2.0]
for temp in temperature:
    print(f"\ntext with temperature {temp}:")
    generated_text = generator(sentence, max_length=50, num_return_sequences=1, temperature=temp)
    for i, text in enumerate(generated_text):
        print(f"  {i+1}. '{text['generated_text']}'")

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



text with temperature 0.5:


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  1. 'The movie was full of great characters and great action. It was a big hit, and I was really happy to see it. It's a very unique movie and it's very unique in that the movie has a lot of great characters and great action'

text with temperature 1.0:


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  1. 'The movie was full of scenes of the people of the city that were destroyed or just a lot of empty places that weren't anything special," says Ben Johnson, who worked on the film as a security engineer. "Things that you just don't see'

text with temperature 1.5:


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  1. 'The movie was full of plot and a fair bit of comedy that also reminded me more and greater of my old "Kurt-a-loo movie" films about dinosaurs in my youth... but a bit bland, didn't keep you interested or'

text with temperature 2.0:
  1. 'The movie was full of nudity and graphic sexual themes, a huge increase upon a 2012 debut starring Tatum. Fans often accused The Good Witch being violent and sexualized on that movie with plenty of hot action as they waited until 2026 to do "'


## 92. 予測されたテキストの確率を計算

“The movie was full of"に続くテキストを予測し、生成された各単語の尤度を表示せよ（生成されるテキストが長いと出力が読みにくくなるので、適当な長さで生成を打ち切るとよい）。

In [12]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import torch.nn.functional as F

model_name = "openai-community/gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

sentence = "The movie was full of"
inputs = tokenizer(sentence, return_tensors="pt")
input_ids = inputs.input_ids


max_new_tokens = 20
with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_new_tokens=max_new_tokens,
        return_dict_in_generate=True,
        output_scores=True
    )

generated_ids = outputs.sequences[0]
generated_tokens = generated_ids.tolist()
new_tokens = generated_tokens[input_ids.shape[1]:]  

log_probs = []
for i, score in enumerate(outputs.scores):
    probs = F.softmax(score[0], dim=-1)
    token_id = new_tokens[i]
    prob = probs[token_id].item()
    log_probs.append(prob)


print("\n[Generated text]:")
print(tokenizer.decode(generated_ids))

for i, (token_id, prob) in enumerate(zip(new_tokens, log_probs), start=1):
    token_str = tokenizer.decode([token_id])
    print(f"{i:2d}. [Token]: '{token_str}', [Probability]: {prob:.6f}")


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



[Generated text]:
The movie was full of jokes and jokes about how the movie was a joke. It was a joke about how the movie was
 1. [Token]: ' jokes', [Probability]: 0.021892
 2. [Token]: ' and', [Probability]: 0.289225
 3. [Token]: ' jokes', [Probability]: 0.098501
 4. [Token]: ' about', [Probability]: 0.205558
 5. [Token]: ' how', [Probability]: 0.099715
 6. [Token]: ' the', [Probability]: 0.084637
 7. [Token]: ' movie', [Probability]: 0.036412
 8. [Token]: ' was', [Probability]: 0.296344
 9. [Token]: ' a', [Probability]: 0.067677
10. [Token]: ' joke', [Probability]: 0.173507
11. [Token]: '.', [Probability]: 0.280386
12. [Token]: ' It', [Probability]: 0.123000
13. [Token]: ' was', [Probability]: 0.519725
14. [Token]: ' a', [Probability]: 0.149313
15. [Token]: ' joke', [Probability]: 0.268987
16. [Token]: ' about', [Probability]: 0.424155
17. [Token]: ' how', [Probability]: 0.174168
18. [Token]: ' the', [Probability]: 0.123645
19. [Token]: ' movie', [Probability]: 0.616076
20. [Token]:

## 93. パープレキシティ

適当な文を準備して、事前学習済み言語モデルでパープレキシティを測定せよ。例えば、

+ The movie was full of surprises
+ The movies were full of surprises
+ The movie were full of surprises
+ The movies was full of surprises

の4文に対して、パープレキシティを測定して観察せよ（最後の2つの文は故意に文法的な間違いを入れた）。

In [14]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import math

model_name = "openai-community/gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model.eval()

if torch.cuda.is_available():
    model = model.cuda()
    
sentences = [
    "The movie was full of surprises",
    "The movies were full of surprises",
    "The movie were full of surprises",
    "The movies was full of surprises"
]

def calculate_perplexity(sentence):
    encodings = tokenizer(sentence, return_tensors="pt")
    input_ids = encodings.input_ids
    if torch.cuda.is_available():
        input_ids = input_ids.cuda()
    
    with torch.no_grad():
        outputs = model(input_ids, labels=input_ids)
        loss = outputs.loss
        perplexity = torch.exp(loss).item()
    return perplexity

for sentence in sentences:
    ppl = calculate_perplexity(sentence)
    print(f"'{sentence}' → Perplexity: {ppl:.2f}")


'The movie was full of surprises' → Perplexity: 99.35
'The movies were full of surprises' → Perplexity: 126.48
'The movie were full of surprises' → Perplexity: 278.88
'The movies was full of surprises' → Perplexity: 274.66


## 94. チャットテンプレート

"What do you call a sweet eaten after dinner?"という問いかけに対する応答を生成するため、チャットテンプレートを適用し、言語モデルに与えるべきプロンプトを作成せよ。また、そのプロンプトに対する応答を生成し、表示せよ。

In [2]:
from transformers import AutoTokenizer, pipeline
import os

access_token = os.environ["HUGGING_FACE_TOKEN"]

generator = pipeline("text-generation", model = "meta-llama/Llama-3.2-1B-Instruct", token=access_token)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct", token=access_token)

chat = [
    {"role": "user", "content": "What do you call a sweet eaten after dinner?"}
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
print(f"Prompt: {prompt}")
response = generator(prompt, max_new_tokens=50, do_sample=True, temperature=0.7)
print(f"Response: {response[0]['generated_text']}")

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 25 Jun 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

What do you call a sweet eaten after dinner?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Response: <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 25 Jun 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

What do you call a sweet eaten after dinner?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Some common names for a sweet eaten after dinner include:

1. Dessert
2. Sweet
3. Snack
4. Dessert
5. Sweet treat
6. Candy
7. Dessert
8. Sweetie


## 95. マルチターンのチャット

問題94で生成された応答に対して、追加で"Please give me the plural form of the word with its spelling in reverse order."と問いかけたときの応答を生成・表示せよ。また、その時に言語モデルに与えるプロンプトを確認せよ。

In [3]:
from transformers import AutoTokenizer, pipeline
import os

access_token = os.environ["HUGGING_FACE_TOKEN"]

generator = pipeline("text-generation", model="meta-llama/Llama-3.2-1B-Instruct", token=access_token)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct", token=access_token)

assistant_response = response[0]['generated_text'].replace(prompt, "")

chat = [
    {"role": "user", "content": "What do you call a sweet eaten after dinner?"},
    {"role": "assistant", "content": assistant_response},
    {"role": "user", "content": "Please give me the plural form of the word with its spelling in reverse order."}
]

prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
print(prompt)

response = generator(prompt, max_new_tokens=100, do_sample=True, temperature=0.7)
print(response[0]['generated_text'])

new_response = response[0]['generated_text'].replace(prompt, "")
print("\nExtracted new response:")
print(new_response)

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 25 Jun 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

What do you call a sweet eaten after dinner?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Some common names for a sweet eaten after dinner include:

1. Dessert
2. Sweet
3. Snack
4. Dessert
5. Sweet treat
6. Candy
7. Dessert
8. Sweetie<|eot_id|><|start_header_id|>user<|end_header_id|>

Please give me the plural form of the word with its spelling in reverse order.<|eot_id|><|start_header_id|>assistant<|end_header_id|>


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 25 Jun 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

What do you call a sweet eaten after dinner?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Some common names for a sweet eaten after dinner include:

1. Dessert
2. Sweet
3. Snack
4. Dessert
5. Sweet treat


## 96. プロンプトによる感情分析

事前学習済み言語モデルで感情分析を行いたい。テキストを含むプロンプトを事前学習済み言語モデルに与え、（ファインチューニングは行わずに）テキストのポジネガを予測するという戦略で、[SST-2](https://dl.fbaipublicfiles.com/glue/data/SST-2.zip)の開発データにおける正解率を測定せよ。

In [10]:
from transformers import AutoTokenizer, pipeline
import os
from datasets import load_dataset

access_token = os.environ["HUGGING_FACE_TOKEN"]
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct", token=access_token)
generator = pipeline(
    "text-generation",
    model="meta-llama/Llama-3.2-1B-Instruct",
    tokenizer=tokenizer,
    token=access_token,
    pad_token_id=tokenizer.eos_token_id
)

dev_dataset = load_dataset("glue", "sst2", split="validation")

def sentiment_analysis(sentence):
    prompt = (
        "You are a professional sentiment analysis model. Your task is to classify the sentiment of the given sentence as positive or negative. If the sentiment is positive, respond with '1'. If it is negative, respond with '0'.\n\n"
        f"Sentence: {sentence}\nSentiment:"
    )
    response = generator(prompt, max_new_tokens=5, return_full_text=False)[0]["generated_text"]
    prediction = response.strip()[0]
    return prediction


correct = 0
total = len(dev_dataset)

for example in dev_dataset:
    sentence = example["sentence"]
    label = str(example["label"]) 
    pred = sentiment_analysis(sentence)
    if pred == label:
        correct += 1

accuracy = correct / total
print(f"Accuracy on SST-2 dev set: {accuracy:.4f}")


Device set to use cuda:0


Accuracy on SST-2 dev set: 0.5791


## 97. 埋め込みに基づく感情分析

事前学習済み言語モデルでテキストをベクトルで表現（エンコード）し、そのベクトルにフィードフォワード層を通すことで極性ラベルを予測するモデルを学習せよ。

In [16]:
import torch
from torch.utils.data import DataLoader, TensorDataset
from datasets import load_dataset
from transformers import AutoModel, AutoTokenizer
from tqdm.auto import tqdm
import numpy as np
import torch.nn as nn
import torch.optim as optim

train_dataset = load_dataset("glue", "sst2", split="train")
dev_dataset = load_dataset("glue", "sst2", split="validation")

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
encoder = AutoModel.from_pretrained(model_name)

def get_embeddings(texts, batch_size=32):
    all_embeddings = []

    for i in range(0, len(texts), batch_size):
        batch_texts = texts[i:i + batch_size]
        inputs = tokenizer(batch_texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
        input_ids = inputs["input_ids"]
        attention_mask = inputs["attention_mask"]

        with torch.no_grad():
            outputs = encoder(input_ids=input_ids, attention_mask=attention_mask)
            last_hidden = outputs.last_hidden_state  

            lengths = attention_mask.sum(dim=1) - 1  
            batch_indices = torch.arange(input_ids.size(0))
            embeddings = last_hidden[batch_indices, lengths] 
            embeddings = embeddings.cpu().numpy()

        all_embeddings.append(embeddings)

    return np.vstack(all_embeddings)

train_texts = train_dataset["sentence"]
train_labels = train_dataset["label"]
train_embeddings = get_embeddings(train_texts)

dev_texts = dev_dataset["sentence"]
dev_labels = dev_dataset["label"]
dev_embeddings = get_embeddings(dev_texts)

train_data = TensorDataset(
    torch.tensor(train_embeddings, dtype=torch.float32),
    torch.tensor(train_labels, dtype=torch.long)
)
dev_data = TensorDataset(
    torch.tensor(dev_embeddings, dtype=torch.float32),
    torch.tensor(dev_labels, dtype=torch.long)
)

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
dev_loader = DataLoader(dev_data, batch_size=64)

class SentimentClassifier(nn.Module):
    def __init__(self, input_dim=768, hidden_dim=256):
        super(SentimentClassifier, self).__init__()
        self.classifier = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(hidden_dim, 2)
        )

    def forward(self, x):
        return self.classifier(x)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
classifier = SentimentClassifier().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(classifier.parameters(), lr=3e-5)

num_epochs = 3
best_accuracy = 0.0

for epoch in range(num_epochs):
    classifier.train()
    train_loss = 0.0

    for embeddings, labels in tqdm(train_loader, desc=f"Epoch {epoch+1}/{num_epochs}"):
        embeddings, labels = embeddings.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = classifier(embeddings)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()

    avg_train_loss = train_loss / len(train_loader)

    classifier.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for embeddings, labels in tqdm(dev_loader, desc="Evaluating"):
            embeddings, labels = embeddings.to(device), labels.to(device)
            outputs = classifier(embeddings)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = correct / total
    print(f"Epoch {epoch+1}: Train Loss = {avg_train_loss:.4f}, Validation Accuracy = {accuracy:.4f}")

    if accuracy > best_accuracy:
        best_accuracy = accuracy
        torch.save(classifier.state_dict(), "sentiment_classifier.pt")

print(f"Best validation accuracy: {best_accuracy:.4f}")


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Epoch 1/3:   0%|          | 0/1053 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/14 [00:00<?, ?it/s]

Epoch 1: Train Loss = 0.6801, Validation Accuracy = 0.7764


Epoch 2/3:   0%|          | 0/1053 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/14 [00:00<?, ?it/s]

Epoch 2: Train Loss = 0.5236, Validation Accuracy = 0.7947


Epoch 3/3:   0%|          | 0/1053 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/14 [00:00<?, ?it/s]

Epoch 3: Train Loss = 0.4673, Validation Accuracy = 0.8016
Best validation accuracy: 0.8016


## 98. ファインチューニング

問題96のプロンプトに対して、正解の感情ラベルをテキストの応答として返すように事前学習済みモデルをファインチューニングせよ。

In [3]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig 
)
from peft import (
    LoraConfig, 
    get_peft_model, 
    prepare_model_for_kbit_training
) 

dataset = load_dataset("glue", "sst2")
train_data = dataset["train"]
val_data = dataset["validation"]

def format_prompt(example):
    return {
        "text": (
            "You are a professional sentiment analysis model. "
            "Your task is to classify the sentiment of the given sentence as positive or negative. "
            "If it is positive, respond with '1'. If it is negative, respond with '0'.\n\n"
            f"Sentence: {example['sentence']}\nSentiment: {example['label']}"
        )
    }

train_data = train_data.map(format_prompt)
val_data = val_data.map(format_prompt)

model_name = "meta-llama/Llama-3.2-1B-Instruct"
access_token = os.environ.get("HUGGING_FACE_TOKEN")

tokenizer = AutoTokenizer.from_pretrained(model_name, token=access_token)
tokenizer.pad_token = tokenizer.eos_token

def tokenize(example):
    return tokenizer(example["text"], padding="max_length", truncation=True, max_length=256)

train_tokenized = train_data.map(tokenize, batched=True, remove_columns=train_data.column_names)
val_tokenized = val_data.map(tokenize, batched=True, remove_columns=val_data.column_names)


# LoRA + 8ビット量子化
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    token=access_token,
    quantization_config=bnb_config,
    device_map={"": 0} 
)

model = prepare_model_for_kbit_training(model)

# LoRAの設定
lora_config = LoraConfig(
    r=16, # LoRAアダプターのランク
    lora_alpha=32, # スケーリング用のアルファ値
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], # LoRAを適用する層
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# ベースモデルをLoRAでラップ
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

training_args = TrainingArguments(
    output_dir="./model/finetuned-llama-sst2-lora",
    per_device_train_batch_size=2,   
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,   
    learning_rate=2e-4,              
    evaluation_strategy="epoch",     
    save_strategy="epoch",           
    num_train_epochs=3,              
    weight_decay=0.01,
    logging_steps=50,                
    fp16=torch.cuda.is_available(), 
    report_to=["tensorboard"],       
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_tokenized,
    eval_dataset=val_tokenized,
    data_collator=data_collator,   
)

trainer.train()
trainer.save_model("./model/finetuned-llama-sst2-lora-final")

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

trainable params: 3,407,872 || all params: 1,239,222,272 || trainable%: 0.2750


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)


Epoch,Training Loss,Validation Loss
0,0.5961,1.226195
1,0.4501,1.327517
2,0.3837,1.419619


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


In [4]:
from transformers import pipeline

pipe = pipeline("text-generation", model="./model/finetuned-llama-sst2-lora-final", tokenizer=tokenizer)

def predict(sentence):
    prompt = (
        "You are a professional sentiment analysis model. "
        "Your task is to classify the sentiment of the given sentence as positive or negative. "
        "If it is positive, respond with '1'. If it is negative, respond with '0'.\n\n"
        f"Sentence: {sentence}\nSentiment:"
    )
    response = pipe(prompt, max_new_tokens=5, return_full_text=False)[0]['generated_text'].strip()
    return response[0]

correct = 0
total = 0

for ex in val_data:
    pred = predict(ex["sentence"])
    gold = str(ex["label"])
    if pred == gold:
        correct += 1
    total += 1

print(f"Validation Accuracy: {correct / total:.4f}")


Device set to use cuda:0
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Validation Accuracy: 0.6571


## 99. 選好チューニング

問題96のプロンプトに対して、正解の感情ラベルを含むテキストを望ましい応答、間違った感情ラベルを含むテキストを望ましくない応答として、事前学習済み言語モデルを選好チューニング (preference tuning) を実施せよ。選好チューニングのアルゴリズムとしては、近傍方策最適化 (PPO: Proximal Policy Optimization) や直接選好最適化 (DPO: Direct Preference Optimization) などが考えられる。


In [None]:
import os
import torch
from datasets import load_dataset, Dataset
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer, 
    BitsAndBytesConfig,
    TrainingArguments
)
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model
from trl import DPOTrainer, DPOConfig

access_token = os.environ.get("HUGGING_FACE_TOKEN")
dataset = load_dataset("glue", "sst2")

prompt = (
    "You are a professional sentiment analysis model. "
    "Your task is to classify the sentiment of the given sentence as positive or negative. "
    "If the sentiment is positive, respond with '1'. If it is negative, respond with '0'.\n\n"
)

def create_dpo_data(examples, prompt):
    chosen_responses = []
    rejected_responses = []
    prompts = []
    
    for sentence, label in zip(examples["sentence"], examples["label"]):
        current_prompt = prompt + f"Sentence: {sentence}\nSentiment:"
        correct_response = str(label)
        incorrect_response = "0" if label == 1 else "1"
        
        prompts.append(current_prompt)
        chosen_responses.append(correct_response)
        rejected_responses.append(incorrect_response)
    
    return {
        "prompt": prompts,
        "chosen": chosen_responses,
        "rejected": rejected_responses
    }

train_dpo_dict = create_dpo_data(dataset["train"], prompt)
train_dpo_dataset = Dataset.from_dict(train_dpo_dict)

model_name = "meta-llama/Llama-3.2-1B-Instruct"

bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    device_map={"": 0}
)

tokenizer = AutoTokenizer.from_pretrained(model_name, token=access_token)
tokenizer.pad_token = tokenizer.eos_token 

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    token=access_token,
    quantization_config=bnb_config
)

model = prepare_model_for_kbit_training(model)

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)

dpo_config = DPOConfig(
    output_dir="./model/llama-sst2-dpo",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=1e-5,
    num_train_epochs=2,
    logging_steps=1000,
    optim="adamw_torch",
    report_to="tensorboard",
    save_strategy="epoch",
    bf16=torch.cuda.is_available(),
    save_total_limit=1,
    beta=0.1,
    max_length=512,
    padding_value=tokenizer.pad_token_id
)

dpo_trainer = DPOTrainer(
    model=model,
    ref_model=None, 
    args=dpo_config,
    train_dataset=train_dpo_dataset,
)

dpo_trainer.train()
dpo_trainer.save_model("./model/llama-sst2-dpo-final")


Extracting prompt in train dataset:   0%|          | 0/67349 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/67349 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/67349 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
  return fn(*args, **kwargs)


Step,Training Loss
10,0.6934
20,0.6893


KeyboardInterrupt: 

In [None]:
from transformers import pipeline

def evaluate_model():
    generator = pipeline(
        "text-generation", 
        model="./model/llama-sst2-dpo-final", 
        tokenizer=tokenizer,
        token=access_token
    )
    
    correct = 0
    total = len(dataset["validation"])
    
    for example in tqdm(dataset["validation"]):
        sentence = example["sentence"]
        label = str(example["label"])
        
        prompt = (
            "You are a professional sentiment analysis model. "
            "Your task is to classify the sentiment of the given sentence as positive or negative. "
            "If the sentiment is positive, respond with '1'. If it is negative, respond with '0'.\n\n"
            f"Sentence: {sentence}\nSentiment:"
        )
        
        response = generator(prompt, max_new_tokens=5, return_full_text=False)[0]["generated_text"]
        pred = response.strip()[0] 
        
        if pred == label:
            correct += 1
    
    return correct / total

accuracy = evaluate_model()
print(f"Accuracy after DPO fine-tuning: {accuracy:.4f}")