# 第10章: 事前学習済み言語モデル（GPT型）

## 90. 次単語予測

In [22]:
%pip install "transformers[torch]"

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting transformers[torch]
  Using cached transformers-4.51.3-py3-none-any.whl.metadata (38 kB)
Collecting accelerate>=0.26.0 (from transformers[torch])
  Using cached accelerate-1.7.0-py3-none-any.whl.metadata (19 kB)
Using cached transformers-4.51.3-py3-none-any.whl (10.4 MB)
Using cached accelerate-1.7.0-py3-none-any.whl (362 kB)
Installing collected packages: transformers, accelerate
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [accelerate]2[0m [accelerate]
[1A[2KSuccessfully installed accelerate-1.7.0 transformers-4.51.3
Note: you may need to restart the kernel to use updated packages.


In [12]:
from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
import torch

set_seed(42)
text = "The movie was full of"
model_name = 'gpt2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_ids = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt")
print("input_ids:", input_ids)

# 次のトークンの予測確率を取得
with torch.no_grad():  
  output = model(input_ids)
  next_token_logits = output.logits[0,-1,:]

# 確率を計算
scores = torch.softmax(next_token_logits, dim=-1)

# 上位10個を出力
topk = 10
topk_scores, topk_ids = torch.topk(scores, topk)
for topk_score, topk_id in zip(topk_scores, topk_ids):
  pred_token = tokenizer.decode([topk_id])
  print(f'{pred_token}: {topk_score:.4f}')

# GPT型の使い方    https://qiita.com/suzuki_sh/items/acf276b55085647bdd75
# CausalLMOutput  https://huggingface.co/docs/transformers/main_classes/output#transformers.modeling_outputs.CausalLMOutput

input_ids: tensor([[ 464, 3807,  373, 1336,  286]])
 jokes: 0.0219
 great: 0.0186
 laughs: 0.0115
 bad: 0.0109
 surprises: 0.0107
 references: 0.0105
 fun: 0.0100
 humor: 0.0074
 ": 0.0074
 the: 0.0067


## 91. 続きのテキストの予測

In [2]:
temp_list = [t * 0.2 for t in range(1, 6)]
topk_list = [k * 10 for k in range(1, 6)]

with torch.no_grad():
  for temp, topk in zip(temp_list, topk_list):
    output_ids = model.generate(
      input_ids,
      do_sample=True,
      temperature=temp,
      top_k=topk,
      pad_token_id=tokenizer.eos_token_id
    )
    preds = tokenizer.decode(output_ids.tolist()[0])
    print(f'temp={temp:.1f}, topk={topk}: {preds}')

# gptのtemperature  https://qiita.com/suzuki_sh/items/8e449d231bb2f09a510c

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


temp=0.2, topk=10: The movie was full of surprises, but it was also a great experience. I was able to get a lot of laughs out
temp=0.4, topk=20: The movie was full of hilarious moments, like the one where the character is trying to kill a girl who is trying to kill
temp=0.6, topk=30: The movie was full of jokes, jokes about women, jokes about men and so on, and I think it's a really
temp=0.8, topk=40: The movie was full of references to the history of the US military and their ability to defeat North Korea. The film was called
temp=1.0, topk=50: The movie was full of people complaining about the way the characters behaved towards other people, but it didn't have an obvious gender


## 92. 予測されたテキストの確率を計算

In [3]:
# 次のトークンの予測確率を取得
with torch.no_grad():
  output_ids = model.generate(input_ids, pad_token_id=tokenizer.eos_token_id)
  generated_tokens_ids = output_ids[0, input_ids.shape[1]:]
  output = model(output_ids)
  next_text_logits = output.logits[0, input_ids.shape[1]-1:, :]

# 各トークンの確率を計算
scores = torch.softmax(next_text_logits, dim=-1)
for i, token_id in enumerate(generated_tokens_ids):
  print(f'{tokenizer.decode([token_id])}: {scores[i, token_id]:.4f}')

 jokes: 0.0219
 and: 0.2892
 jokes: 0.0985
 about: 0.2056
 how: 0.0997
 the: 0.0846
 movie: 0.0364
 was: 0.2963
 a: 0.0677
 joke: 0.1735
.: 0.2804
 It: 0.1230
 was: 0.5197
 a: 0.1493
 joke: 0.2690
 about: 0.4242
 how: 0.1742
 the: 0.1236
 movie: 0.6161
 was: 0.6350


## 93. パープレキシティ

In [4]:
texts = [
  "The movie was full of surprises",
  "The movies were full of surprises",
  "The movie were full of surprises",
  "The movies was full of surprises"
]

tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(texts, return_tensors='pt', padding=True)
with torch.no_grad():
  outputs = model(inputs['input_ids'], attention_mask=inputs['attention_mask'], labels=inputs['input_ids'])

# パープレキシティの計算
shift_logits = outputs.logits[:, :-1, :].contiguous()
shift_labels = inputs['input_ids'][:, 1:].contiguous()
shift_mask = inputs['attention_mask'][:, 1:].contiguous()
batch_size, seq_len = shift_labels.shape
loss_fn = torch.nn.CrossEntropyLoss(reduction='none')
loss = loss_fn(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)).view(batch_size, seq_len)
loss = (loss * shift_mask).sum(dim=1) / shift_mask.sum(dim=1)
ppl = torch.exp(loss).tolist()

for i in range(len(texts)):
  print(f'{texts[i]}: {ppl[i]:.4f}')

# パープレキシティの計算  https://gotutiyan.hatenablog.com/entry/2022/02/23/133414

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


The movie was full of surprises: 99.3539
The movies were full of surprises: 126.4818
The movie were full of surprises: 278.8779
The movies was full of surprises: 274.6610


## 94. チャットテンプレート

In [5]:
import os

token = os.environ["HUGGING_FACE_TOKEN"]

model_name = "meta-llama/Llama-3.2-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=token)
model = AutoModelForCausalLM.from_pretrained(model_name, token=token)

prompt = "What do you call a sweet eaten after dinner?"
messages = [
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512, pad_token_id=128001)
generated_ids = [
  output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

A sweet treat after dinner is often referred to as dessert.


## 95. マルチターンのチャット

In [6]:
prompt = "Please give me the plural form of the word with its spelling in reverse order."
messages.append({"role": "assistant", "content":response})
messages.append({"role": "user", "content": prompt})

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512, pad_token_id=128001)
generated_ids = [
  output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

The plural form of the word "dessert" is "desserts".


## 96. プロンプトによる感情分析

In [7]:
!wget https://dl.fbaipublicfiles.com/glue/data/SST-2.zip -P data/
!unzip -o data/SST-2.zip -d data/
!rm data/SST-2.zip

--2025-05-17 19:48:12--  https://dl.fbaipublicfiles.com/glue/data/SST-2.zip
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


18.65.112.7, 18.65.112.6, 18.65.112.33, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|18.65.112.7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7439277 (7.1M) [application/zip]
Saving to: ‘data/SST-2.zip’


2025-05-17 19:48:12 (49.6 MB/s) - ‘data/SST-2.zip’ saved [7439277/7439277]

Archive:  data/SST-2.zip
  inflating: data/SST-2/dev.tsv      
  inflating: data/SST-2/original/README.txt  
  inflating: data/SST-2/original/SOStr.txt  
  inflating: data/SST-2/original/STree.txt  
  inflating: data/SST-2/original/datasetSentences.txt  
  inflating: data/SST-2/original/datasetSplit.txt  


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  inflating: data/SST-2/original/dictionary.txt  
  inflating: data/SST-2/original/original_rt_snippets.txt  
  inflating: data/SST-2/original/sentiment_labels.txt  
  inflating: data/SST-2/test.tsv     
  inflating: data/SST-2/train.tsv    


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [8]:
%pip install datasets

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-20.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting pandas (from datasets)
  Downloading pandas-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting aiohttp!=4.0.0a0,!=4.0.0a1 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading aiohttp-3.11.18-cp310-cp310-manyl

In [9]:
import pandas as pd
import re

# ファイルの読み込み
file_name = './data/SST-2/dev.tsv'
df = pd.read_csv(file_name, sep='\t')

# 一文に対しての感情分析
def sentiment_analysis(text):
  instructions = """
    Please determine the positive and negative aspects of the text. 
    If it's positive, output 1, if negative, output 0.
    You can only output 0 or 1.
  """
  prompt = f"""
    Instructions: {instructions},
    Text: {text}
  """
  messages = [
    {"role": "system", "content": "You are a helpful assistant. You can only output 0 or 1."},
    {"role": "user", "content": prompt}
  ]
  
  text = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
  )
  model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
  generated_ids = model.generate(**model_inputs, max_new_tokens=512, pad_token_id=128001)
  generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
  ]
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  return response

# 正解率の計算
correct = 0
for index, row in df.iterrows():
  response = sentiment_analysis(row['sentence'])
  if re.search(r"\b[01]\b", response) and int(re.findall(r"\b[01]\b", response)[0]) == row['label']:
    correct += 1
print(f"accuracy: {correct / len(df) * 100:.2f}%")

accuracy: 52.41%


## 97. 埋め込みに基づく感情分析

In [None]:
from transformers import (
  AutoModelForSequenceClassification,
  BatchEncoding, 
  DataCollatorWithPadding,
  TrainingArguments,
  Trainer
)
from datasets import Dataset
import numpy as np

# ファイルの読み込み
train_file_name = './data/SST-2/train.tsv'
dev_file_name = './data/SST-2/dev.tsv'
train_df = pd.read_csv(train_file_name, sep='\t', header=0)
dev_df = pd.read_csv(dev_file_name, sep='\t', header=0)
train_dataset = Dataset.from_pandas(train_df)
dev_dataset = Dataset.from_pandas(dev_df)

# モデルの読み込み
model_name = 'gpt2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
if tokenizer.pad_token is None:
  tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = tokenizer.pad_token_id
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# 前処理
def preprocess_text_classification(example: dict[str, str | int]) -> BatchEncoding:
  encoded_example = tokenizer(example["sentence"], truncation=True, max_length=512)
  encoded_example["labels"] = example["label"]
  return encoded_example
encoded_train_dataset = train_dataset.map(
  preprocess_text_classification, remove_columns=train_dataset.column_names
)
encoded_dev_dataset = dev_dataset.map(
  preprocess_text_classification, remove_columns=dev_dataset.column_names
)

# モデルの学習
training_args = TrainingArguments(
  output_dir="model/model_97",
  per_device_train_batch_size=32,
  per_device_eval_batch_size=32,
  learning_rate=2e-5,
  lr_scheduler_type="linear",
  warmup_ratio=0.1,
  num_train_epochs=5,
  save_strategy="epoch",
  logging_strategy="epoch",
  eval_strategy="epoch",
  load_best_model_at_end=True,
  metric_for_best_model="accuracy",
  fp16=True
)

def compute_accuracy(eval_pred: tuple[np.ndarray, np.ndarray]) -> dict[str, float]:
  predictions, labels = eval_pred
  predictions = np.argmax(predictions, axis=1)
  return {"accuracy": (predictions == labels).mean()}

trainer = Trainer(
    model=model,
    train_dataset=encoded_train_dataset,
    eval_dataset=encoded_dev_dataset,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_accuracy,
)
trainer.train()

# モデルの評価
eval_results = trainer.evaluate()
print(eval_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/67349 [00:00<?, ? examples/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Map:   0%|          | 0/872 [00:00<?, ? examples/s]



Epoch,Training Loss,Validation Loss,Accuracy
1,0.3638,0.238002,0.919725
2,0.2069,0.258629,0.913991
3,0.1624,0.247601,0.922018
4,0.1329,0.254076,0.923165
5,0.1157,0.271663,0.922018




{'eval_loss': 0.2540757358074188, 'eval_accuracy': 0.9231651376146789, 'eval_runtime': 2.0523, 'eval_samples_per_second': 424.879, 'eval_steps_per_second': 6.821, 'epoch': 5.0}


## 98. ファインチューニング

In [2]:
# プロンプトの追加
def add_prompt(text):
  instructions = """
    Please determine the positive and negative aspects of the text. 
    If it's positive, output negative, if negative, output positive.
    You can only output negative or positive.
  """
  prompt = f"""
    Instructions: {instructions},
    Text: {text}
    Answer:
  """
  return prompt

train_dataset = Dataset.from_pandas(train_df)
dev_dataset = Dataset.from_pandas(dev_df)

# モデルの読み込み
model = AutoModelForCausalLM.from_pretrained(model_name)
model.config.pad_token_id = tokenizer.pad_token_id

# 前処理
def preprocess_text_classification(example: dict[str, str | int]) -> BatchEncoding:
  sentences = [add_prompt(text) for text in example["sentence"]]
  labels = ["positive" if label == 1 else "negative" for label in example["label"]]
  texts = [sentence + label for sentence, label in zip(sentences, labels)]
  inputs = tokenizer(sentences, padding="max_length", truncation=True, max_length=128)
  inputs["labels"] = tokenizer(texts, padding="max_length", truncation=True, max_length=128)["input_ids"]
  inputs["labels"] = [
    [(token if token != tokenizer.pad_token_id else -100) for token in input_ids]
    for input_ids in inputs["labels"]
  ]
  return inputs

encoded_train_dataset = train_dataset.map(
  preprocess_text_classification, batched=True, remove_columns=train_dataset.column_names
)
encoded_dev_dataset = dev_dataset.map(
  preprocess_text_classification, batched=True, remove_columns=dev_dataset.column_names
)

# モデルの学習
training_args = TrainingArguments(
  output_dir="model/model_98",
  per_device_train_batch_size=8,
  per_device_eval_batch_size=8,
  learning_rate=2e-5,
  lr_scheduler_type="linear",
  warmup_ratio=0.1,
  num_train_epochs=5,
  save_strategy="epoch",
  logging_strategy="epoch",
  eval_strategy="epoch",
  load_best_model_at_end=True,
  metric_for_best_model="accuracy",
  fp16=True
)

# 評価関数の作成（predictionとlabelのindexを合わせる）
def compute_accuracy(eval_pred: tuple[np.ndarray, np.ndarray]) -> dict[str, float]:
  logits, labels = eval_pred
  predictions = np.argmax(logits, axis=-1)
  label_index_list = []
  for i in range(labels.shape[0]):
    label_index_list.append(np.where(labels[i] != -100)[0][-1])
  pred_labels = predictions[np.arange(labels.shape[0]), np.array(label_index_list)]
  true_labels = labels[np.arange(labels.shape[0]), np.array(label_index_list)]
  return {"accuracy": (pred_labels == true_labels).mean()}

trainer = Trainer(
  model=model,
  train_dataset=encoded_train_dataset,
  eval_dataset=encoded_dev_dataset,
  data_collator=data_collator,
  args=training_args,
  compute_metrics=compute_accuracy,
)
trainer.train()

# モデルの評価
eval_results = trainer.evaluate()
print(eval_results)

Map:   0%|          | 0/67349 [00:00<?, ? examples/s]

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6963,1.009285,0.919725
2,0.4791,1.073673,0.917431
3,0.4122,1.150777,0.917431
4,0.3725,1.192434,0.916284
5,0.3518,1.217567,0.915138


There were missing keys in the checkpoint model loaded: ['lm_head.weight'].


{'eval_loss': 1.0092854499816895, 'eval_accuracy': 0.9197247706422018, 'eval_runtime': 11.2422, 'eval_samples_per_second': 77.565, 'eval_steps_per_second': 9.696, 'epoch': 5.0}


## 99. 選好チューニング

In [4]:
from trl import DPOConfig, DPOTrainer
from trl.trainer.utils import DPODataCollatorWithPadding

# 前処理
def convert_to_dpo_format(example: dict) -> dict:
  if example["label"] == 1:
    chosen = example["sentence"] + "positive"
    rejected = example["sentence"] + "negative"
  else:
    chosen = example["sentence"] + "negative"
    rejected = example["sentence"] + "positive"
  return {"prompt": example["sentence"], "chosen": chosen, "rejected": rejected}

train_dataset = Dataset.from_dict(train_df)
dev_dataset = Dataset.from_dict(dev_df)
train_dataset = train_dataset.map(
  convert_to_dpo_format, remove_columns=train_dataset.column_names
)
dev_dataset = dev_dataset.map(
  convert_to_dpo_format, remove_columns=dev_dataset.column_names
)

# モデルの学習
data_collator = DPODataCollatorWithPadding()
dpo_config = DPOConfig(
  output_dir="model/model_99",
  per_device_train_batch_size=8,
  per_device_eval_batch_size=8,
  learning_rate=2e-5,
  lr_scheduler_type="linear",
  warmup_ratio=0.1,
  num_train_epochs=3,
  save_strategy="epoch",
  logging_strategy="epoch",
  eval_strategy="epoch",
  load_best_model_at_end=True,
  fp16=True
)

trainer = DPOTrainer(
  model=model,
  train_dataset=train_dataset,
  eval_dataset=dev_dataset,
  args=dpo_config,
  processing_class=tokenizer
)
trainer.train()

# モデルの評価
eval_results = trainer.evaluate()
print(eval_results)

Map:   0%|          | 0/67349 [00:00<?, ? examples/s]

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

Extracting prompt in train dataset:   0%|          | 0/67349 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/67349 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/67349 [00:00<?, ? examples/s]

Extracting prompt in eval dataset:   0%|          | 0/872 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/872 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/872 [00:00<?, ? examples/s]

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


Epoch,Training Loss,Validation Loss,Rewards/chosen,Rewards/rejected,Rewards/accuracies,Rewards/margins,Logps/chosen,Logps/rejected,Logits/chosen,Logits/rejected
1,0.2469,0.713921,-7.919707,-14.764617,0.902523,6.84491,-211.613831,-324.97406,-85.944031,-85.876801
2,0.1425,0.519216,-1.016045,-8.202286,0.919725,7.186241,-142.577225,-259.350739,-98.998993,-98.446442
3,0.0471,0.550851,-0.543727,-8.290795,0.911697,7.747069,-137.854034,-260.23584,-100.604843,-100.222481


There were missing keys in the checkpoint model loaded: ['lm_head.weight'].


{'eval_loss': 0.5192157626152039, 'eval_runtime': 3.7994, 'eval_samples_per_second': 229.51, 'eval_steps_per_second': 28.689, 'eval_rewards/chosen': -1.016045093536377, 'eval_rewards/rejected': -8.202285766601562, 'eval_rewards/accuracies': 0.9197247624397278, 'eval_rewards/margins': 7.1862406730651855, 'eval_logps/chosen': -142.5772247314453, 'eval_logps/rejected': -259.3507385253906, 'eval_logits/chosen': -98.99899291992188, 'eval_logits/rejected': -98.44644165039062, 'epoch': 3.0}
