### Installations

In [1]:
%%capture
!pip3 install --upgrade -q -U bitsandbytes
!pip3 install --upgrade -q -U peft
!pip3 install --upgrade -q -U trl
!pip3 install --upgrade -q -U accelerate
!pip3 install --upgrade -q -U datasets
!pip install evaluate rouge_score bert_score sacrebleu nltk sentencepiece accelerate # Need accelerate & sentencepiece for some BERTScore models
!pip3 install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3

### Main Code

In [None]:
import json
import torch
import datasets
from datasets import Dataset
from transformers import (
    AutoTokenizer,
    BitsAndBytesConfig,
    Gemma3ForCausalLM, # Correct import for Gemma 3
    TrainingArguments,
)
from peft import (
    LoraConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
    TaskType
)
from trl import SFTTrainer, SFTConfig
import os # For creating output directory\
from google.colab import userdata
os.environ["HF_TOKEN"] = ""

2025-04-20 17:03:40.967858: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745168621.371385      20 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745168621.494641      20 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [3]:
# --- 1. Configuration ---
model_id = "google/gemma-3-1b-it"
json_file_path = '/kaggle/input/cs563-nlp-group56-data/all_summaries.json'
output_dir = "./gemma3-1b-summarization-finetuned" # Directory to save the results
max_seq_length = 1024 # Adjust based on your data and GPU memory (Gemma 3 has 8k context)
use_packing = False # Pack multiple short sequences for efficiency (optional, can speed up)

In [4]:
# --- 2. Load and Prepare Dataset ---
print("Loading and formatting dataset...")
try:
    with open(json_file_path, 'r', encoding='utf-8') as f:
        raw_data = json.load(f)
    print(f"Successfully loaded {len(raw_data)} entries from {json_file_path}")
except FileNotFoundError:
    print(f"Error: The file '{json_file_path}' was not found.")
    exit() # Exit if data not found
except json.JSONDecodeError:
    print(f"Error: Could not decode JSON from '{json_file_path}'. Check file format.")
    exit()
except Exception as e:
    print(f"An unexpected error occurred while loading the file: {e}")
    exit()

formatted_data = []
skipped_count = 0
for i, example in enumerate(raw_data):
    if 'conversation' in example and 'summary' in example:
        # Using a simplified template suitable for SFTTrainer's default processing
        # You can customize this more if needed, e.g., adding chat tokens,
        # but SFTTrainer often handles standard instruction formats well.
        text = f"""Instruction: Please summarize the following empathetic dialogue conversation.

### Conversation:
{example['conversation']}

### Summary:
{example['summary']}"""

        formatted_data.append({"text": text})
    else:
        skipped_count += 1

if skipped_count > 0:
    print(f"Warning: Skipped {skipped_count} entries due to missing required keys.")

if not formatted_data:
    print("Error: No valid data found after formatting. Exiting.")
    exit()

dataset = Dataset.from_list(formatted_data)
print("Dataset created successfully:")
print(dataset)
print("\nExample entry (first formatted record):")
print(dataset[0]['text'][:500] + "...") # Print start of first example

# --- Split Dataset ---
print("\nSplitting dataset into train and evaluation sets...")
dataset_splits = dataset.train_test_split(test_size=0.2, seed=42) # Added seed for reproducibility
train_dataset = dataset_splits['train']
eval_dataset = dataset_splits['test'] # Use the 'test' split as evaluation set
print(f"Train dataset size: {len(train_dataset)}")
print(f"Evaluation dataset size: {len(eval_dataset)}")

Loading and formatting dataset...
Successfully loaded 1000 entries from /kaggle/input/cs563-nlp-group56-data/all_summaries.json
Dataset created successfully:
Dataset({
    features: ['text'],
    num_rows: 1000
})

Example entry (first formatted record):
Instruction: Please summarize the following empathetic dialogue conversation.

### Conversation:
Context: grateful

Prompt: I went to a park and I set on a bench. I didn't notice that my wallet felt. A man came to me from behind giving me back my wallet.

Conversation:
Speaker 7: Hi_comma_ I went to a park and I set on a bench. I didn't notice that my wallet felt. A man came to me from behind giving me back my wallet.
Speaker 5: Thats a sweet man_comma_ I hope you acknowledged his kind gesture. ...

Splitting dataset into train and evaluation sets...
Train dataset size: 800
Evaluation dataset size: 200


In [5]:
# --- 3. Load Model and Tokenizer ---
print(f"Loading model: {model_id}")

# Quantization config for 8-bit loading
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type='nf4',
)

# Load model with quantization
model = Gemma3ForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    attn_implementation="eager",
    device_map="auto",
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Set padding token if it's not already set
if tokenizer.pad_token is None:
    print("Tokenizer does not have a pad token. Setting pad_token = eos_token.")
    # Common practice: use eos_token as pad_token for autoregressive models
    tokenizer.pad_token = tokenizer.eos_token
    # Important: Update model config to use this padding ID
    model.config.pad_token_id = tokenizer.eos_token_id

Loading model: google/gemma-3-1b-it


config.json:   0%|          | 0.00/899 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.00G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/215 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.16M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

In [6]:
# --- 4. PEFT Configuration (LoRA) ---
print("Configuring PEFT (LoRA)...")

# Prepare model for k-bit training (necessary for quantized models)
model = prepare_model_for_kbit_training(model)

# Find LoRA target modules (specific to model architecture)
# Common targets for Gemma-like models: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj
# You might need to inspect model.named_modules() or check documentation
# for the exact names in Gemma 3 if these defaults don't work well.
lora_target_modules = [
    "q_proj",
    "k_proj",
    "v_proj",
    "o_proj",
    "gate_proj",
    "up_proj",
    # You might add/remove based on experimentation
]

peft_config = LoraConfig(
    lora_alpha=32,
    lora_dropout=0.05,
    r=16,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM",
)

# Add LoRA adapters to the model
# Note: SFTTrainer can also take the peft_config directly,
# but creating the PEFT model explicitly first ensures it's set up correctly.
model = get_peft_model(model, peft_config)
model.print_trainable_parameters() # See how many parameters are being trained

Configuring PEFT (LoRA)...
trainable params: 13,045,760 || all params: 1,012,931,712 || trainable%: 1.2879


In [7]:
from trl import SFTTrainer, SFTConfig
from transformers import TrainingArguments, DataCollatorForSeq2Seq

# # --- 5. Training Arguments ---
print("Defining Training Arguments...")


args = SFTConfig(
    output_dir=output_dir,         # directory to save and repository id
    max_seq_length=max_seq_length,                     # max sequence length for model and packing of the dataset
    packing=False,                           # Groups multiple samples in the dataset into a single sequence
    num_train_epochs=3,                     # number of training epochs
    per_device_train_batch_size=2,          # batch size per device during training
    gradient_accumulation_steps=8,          # number of steps before performing a backward/update pass
    gradient_checkpointing=True,            # use gradient checkpointing to save memory
    optim="adamw_torch_fused",              # use fused adamw optimizer
    logging_steps=20,                       # log every 10 steps
    save_strategy="epoch",                  # save checkpoint every epoch
    learning_rate=2e-4,                     # learning rate, based on QLoRA paper
    bf16=torch.cuda.is_bf16_supported(),
    fp16=not torch.cuda.is_bf16_supported(),
    max_grad_norm=0.3,                      # max gradient norm based on QLoRA paper
    warmup_ratio=0.03,                      # warmup ratio based on QLoRA paper
    lr_scheduler_type="cosine",           # use constant learning rate scheduler
    push_to_hub=True,                       # push model to hub
    report_to="tensorboard",                # report metrics to tensorboard
    evaluation_strategy="no",    # Evaluate during training at specified steps
    eval_steps=100,
    dataset_text_field="text",
)

Defining Training Arguments...




In [8]:
# --- 6. Initialize SFTTrainer ---
print("Initializing SFTTrainer...")

trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,
    processing_class=tokenizer
)

Initializing SFTTrainer...


Converting train dataset to ChatML:   0%|          | 0/800 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/800 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/800 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/800 [00:00<?, ? examples/s]

Converting eval dataset to ChatML:   0%|          | 0/200 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/200 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/200 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/200 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [9]:
# --- 7. Start Training ---
print("Starting training...")
train_result = trainer.train()
print("Training finished.")

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Starting training...


Step,Training Loss
20,2.1643
40,1.5107
60,1.3904
80,1.3264
100,1.2981
120,1.1869
140,1.2272


Training finished.


In [10]:
# --- 8. Save Model ---
print(f"Saving the final LoRA adapter model to {output_dir}")
# This saves only the trained adapter weights, not the base model.
# Use trainer.save_model() which handles PEFT saving correctly.
trainer.save_model(output_dir)

# Save tokenizer associated with the fine-tuned model
tokenizer.save_pretrained(output_dir)

# Optional: Log metrics
metrics = train_result.metrics
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
trainer.save_state()
print("Model adapter and tokenizer saved.")

Saving the final LoRA adapter model to ./gemma3-1b-summarization-finetuned
***** train metrics *****
  total_flos               =  2474012GF
  train_loss               =     1.4281
  train_runtime            = 0:37:43.55
  train_samples_per_second =       1.06
  train_steps_per_second   =      0.066
Model adapter and tokenizer saved.


In [11]:
import torch
from tqdm import tqdm
import evaluate
import nltk
import numpy as np
from torch.utils.data import DataLoader
from peft import PeftModel
from transformers import AutoTokenizer, Gemma3ForCausalLM, BitsAndBytesConfig
import os
import re # For sorting checkpoints

print("\n--- Starting Post-Training Evaluation ---")

# --- Evaluation Parameters ---
EVAL_BATCH_SIZE = 4
MAX_NEW_TOKENS = 120
NUM_BEAMS = 2
EARLY_STOPPING = True

# --- Load ROUGE Metric ---
try:
    rouge = evaluate.load('rouge')
    bleu = evaluate.load('sacrebleu')
    bertscore = evaluate.load("bertscore")
    meteor = evaluate.load('meteor')
    nltk.download('punkt', quiet=True)
    print("Evaluation metrics (ROUGE, BLEU, BERTScore, METEOR) loaded successfully.")
    metrics_loaded = True
except Exception as e:
    print(f"Error loading metrics: {e}")
    print("Evaluation will proceed without calculating scores.")
    metrics_loaded = False

# --- Helper Functions ---
def extract_prompt_and_ref(example_text):
    try:
        summary_marker = "### Summary:\n"
        summary_start_index = example_text.index(summary_marker)
        prompt = example_text[:summary_start_index + len(summary_marker)]
        reference = example_text[summary_start_index + len(summary_marker):].strip()
        return prompt.strip(), reference
    except ValueError:
        # print(f"Warning: Could not find '{summary_marker}' in text: {example_text[:100]}...")
        return None, None
    except Exception as e:
        # print(f"Error processing text: {e}")
        return None, None

def collate_fn(batch):
    return [item['text'] for item in batch] # Assuming 'text' field exists

# --- Prepare DataLoader for Evaluation ---
# Ensure eval_dataset is defined (from your data preparation step)
eval_loader = DataLoader(eval_dataset, batch_size=EVAL_BATCH_SIZE, collate_fn=collate_fn)
print(f"Evaluation DataLoader prepared with {len(eval_dataset)} samples.")

# --- Evaluation Function for a Single Checkpoint ---
def run_evaluation_on_checkpoint(adapter_checkpoint_path, base_model_id, eval_loader, max_new_tokens, num_beams, early_stopping):
    """Loads a model checkpoint and runs the evaluation loop."""
    print(f"\n--- Evaluating Checkpoint: {adapter_checkpoint_path or 'Base Model'} ---")

    # Define quantization config (ensure consistency with training)
    quantization_config = BitsAndBytesConfig(load_in_8bit=True)
    compute_dtype = torch.float16 # Or bf16 if supported and used in training

    # Load the base model
    print("Loading base model...")
    base_model = Gemma3ForCausalLM.from_pretrained(
        base_model_id,
        quantization_config=quantization_config,
        torch_dtype=compute_dtype,
        device_map="auto",
    )

    # Load the tokenizer (use the one saved in the final adapter dir for consistency)
    # Alternatively, load from base_model_id if no changes were made during training
    print("Loading tokenizer...")
    tokenizer = AutoTokenizer.from_pretrained(base_model_id) # Or base_model_id / adapter_checkpoint_path
    if tokenizer.pad_token is None:
        print("Setting pad_token = eos_token")
        tokenizer.pad_token = tokenizer.eos_token
        # Important: Align model config if tokenizer changed AFTER base model load
        # base_model.config.pad_token_id = tokenizer.pad_token_id

    # Load the LoRA adapter ON TOP of the base model
    if adapter_checkpoint_path: # If path is provided, load adapter
        print(f"Loading adapter from: {adapter_checkpoint_path}")
        model = PeftModel.from_pretrained(base_model, adapter_checkpoint_path)
        print("Adapter loaded.")
    else: # If no path, evaluate the base model without adapters
        print("Evaluating the base model without adapters.")
        model = base_model # Use the base model directly

    model.eval() # Set to evaluation mode
    device = model.device
    print(f"Model loaded on device: {device}")

    predictions = []
    references = []

    # Evaluation Loop
    with torch.no_grad():
        for batch_texts in tqdm(eval_loader, desc=f"Generating Summaries ({os.path.basename(str(adapter_checkpoint_path)) if adapter_checkpoint_path else 'Base'})"):
            batch_prompts = []
            batch_refs = []
            for text in batch_texts:
                prompt, ref = extract_prompt_and_ref(text)
                if prompt and ref:
                    batch_prompts.append(prompt)
                    batch_refs.append(ref)

            if not batch_prompts: continue

            # Tokenize prompts
            # Left padding is generally preferred for generation with HF models
            tokenizer.padding_side = "left"
            inputs = tokenizer(
                batch_prompts,
                return_tensors="pt",
                padding=True,
                truncation=True,
                max_length=1024 # Adjust if needed based on max_seq_length - max_new_tokens
            ).to(device)

            # Generate
            outputs = model.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                num_beams=num_beams,
                early_stopping=early_stopping,
                pad_token_id=tokenizer.pad_token_id
            )

            # Decode generated sequences (excluding prompt)
            input_length = inputs.input_ids.shape[1]
            generated_ids = outputs[:, input_length:]
            decoded_preds = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
            cleaned_preds = [pred.strip() for pred in decoded_preds]

            predictions.extend(cleaned_preds)
            references.extend(batch_refs)

    # Calculate Metrics
    results = {"num_samples": len(predictions)}
    if not metrics_loaded or not predictions or not references:
        print("Skipping metric calculation.")
        return results, predictions, references # Return empty metrics if loading failed or no results

    print("Calculating scores...")
    try:
        rouge_results = rouge.compute(predictions=predictions, references=references)
        results.update({
            "rouge1": rouge_results['rouge1'] * 100,
            "rouge2": rouge_results['rouge2'] * 100,
            "rougeL": rouge_results['rougeL'] * 100,
        })
    except Exception as e: print(f"ROUGE calculation failed: {e}")

    try:
        bleu_references = [[ref] for ref in references]
        bleu_results = bleu.compute(predictions=predictions, references=bleu_references)
        results["bleu"] = bleu_results['score']
    except Exception as e: print(f"BLEU calculation failed: {e}")

    try:
        meteor_results = meteor.compute(predictions=predictions, references=references)
        results["meteor"] = meteor_results['meteor'] * 100
    except Exception as e: print(f"METEOR calculation failed: {e}")

    try:
        bertscore_results = bertscore.compute(
            predictions=predictions, references=references, lang="en", device=device,
            # Consider a smaller/faster model if DeBERTa is too slow/memory-intensive
            # model_type="distilbert-base-uncased"
            model_type="microsoft/deberta-xlarge-mnli"
        )
        results["bertscore_f1"] = np.mean(bertscore_results['f1']) * 100
    except Exception as e:
        print(f"BERTScore calculation failed: {e}")
        results["bertscore_f1"] = "N/A"

    # Length Analysis
    pred_lengths = [len(tokenizer.encode(p)) for p in predictions]
    ref_lengths = [len(tokenizer.encode(r)) for r in references]
    results["avg_pred_len"] = np.mean(pred_lengths) if pred_lengths else 0
    results["avg_ref_len"] = np.mean(ref_lengths) if ref_lengths else 0

    # Clean up model to free memory before next checkpoint evaluation
    del model
    del base_model
    if torch.cuda.is_available():
        torch.cuda.empty_cache()

    return results, predictions, references


# --- Main Evaluation Execution ---
all_results = {}
base_model_id = "google/gemma-3-1b-it"

# Optional: Evaluate the base model before fine-tuning (Epoch 0)
# Uncomment if you want a baseline comparison
print("\n--- Evaluating Base Model (Epoch 0) ---")
base_results, _, _ = run_evaluation_on_checkpoint(
    adapter_checkpoint_path=None, # Pass None to evaluate base model
    base_model_id=base_model_id,
    eval_loader=eval_loader,
    max_new_tokens=MAX_NEW_TOKENS,
    num_beams=NUM_BEAMS,
    early_stopping=EARLY_STOPPING
)
all_results["epoch_0"] = base_results

# Find checkpoint directories in the output directory
checkpoint_dir = output_dir # Directory where checkpoints were saved
checkpoint_folders = sorted(
    [os.path.join(checkpoint_dir, d) for d in os.listdir(checkpoint_dir) if d.startswith("checkpoint-") and os.path.isdir(os.path.join(checkpoint_dir, d))],
    key=lambda x: int(re.search(r"checkpoint-(\d+)", x).group(1))
)

print(f"Found {len(checkpoint_folders)} checkpoint(s) for evaluation.")

# Evaluate each checkpoint
for i, ckpt_path in enumerate(checkpoint_folders):
    epoch_num = i + 1
    epoch_results, epoch_preds, epoch_refs = run_evaluation_on_checkpoint(
        adapter_checkpoint_path=ckpt_path,
        base_model_id=base_model_id,
        eval_loader=eval_loader,
        max_new_tokens=MAX_NEW_TOKENS,
        num_beams=NUM_BEAMS,
        early_stopping=EARLY_STOPPING
    )
    all_results[f"epoch_{epoch_num}"] = epoch_results

    # Optional: Print some examples from this epoch's evaluation
    print(f"\n--- Example Generation from Epoch {epoch_num} ---")
    num_examples_to_show = 2 # Show fewer examples per epoch
    for k in range(min(num_examples_to_show, len(epoch_preds))):
        print(f"\nExample {k+1}:")
        print(f"Reference:\n{epoch_refs[k]}")
        print(f"Generated:\n{epoch_preds[k]}")
        print("-" * 15)


# --- Final Summary ---
print("\n--- Overall Evaluation Summary Across Epochs ---")
print(f"Base Model ID: {base_model_id}")
print(f"Checkpoints evaluated from: {output_dir}")
print(f"Number of evaluation samples: {len(eval_dataset)}")
print("-" * 30)
# Pretty print the collected results
for epoch, metrics in all_results.items():
    print(f"Results for {epoch}:")
    if metrics:
        for key, value in metrics.items():
            if isinstance(value, float):
                print(f"  {key}: {value:.2f}")
            else:
                print(f"  {key}: {value}")
    else:
        print("  Metrics calculation skipped or failed.")
    print("-" * 20)

print("\nEvaluation finished.")


--- Starting Post-Training Evaluation ---


Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/8.15k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.95k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.02k [00:00<?, ?B/s]

[nltk_data] Downloading package wordnet to /usr/share/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to /usr/share/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /usr/share/nltk_data...


Evaluation metrics (ROUGE, BLEU, BERTScore, METEOR) loaded successfully.
Evaluation DataLoader prepared with 200 samples.

--- Evaluating Base Model (Epoch 0) ---

--- Evaluating Checkpoint: Base Model ---
Loading base model...
Loading tokenizer...
Evaluating the base model without adapters.
Model loaded on device: cuda:0


Generating Summaries (Base): 100%|██████████| 50/50 [18:54<00:00, 22.70s/it]


Calculating scores...


tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/792 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/3.04G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.04G [00:00<?, ?B/s]

Found 3 checkpoint(s) for evaluation.

--- Evaluating Checkpoint: ./gemma3-1b-summarization-finetuned/checkpoint-50 ---
Loading base model...
Loading tokenizer...
Loading adapter from: ./gemma3-1b-summarization-finetuned/checkpoint-50
Adapter loaded.
Model loaded on device: cuda:0


Generating Summaries (checkpoint-50): 100%|██████████| 50/50 [18:14<00:00, 21.88s/it]


Calculating scores...

--- Example Generation from Epoch 1 ---

Example 1:
Reference:
In an excited exchange about the upcoming Super Smash Bros for the Nintendo Switch, Speaker 10 expresses their anticipation and nostalgia for the game series, recalling their fondness for Brawl and favorite character, Snake. Speaker 1 empathizes by sharing their own memories of playing the game on GameCube, fostering a sense of connection through shared experiences and enthusiasm for the franchise. The conversation highlights a mutual appreciation for the game's legacy and the joy it brings to both speakers.
Generated:
In a lighthearted exchange, Speaker 10 expresses excitement for the upcoming Super Smash Bros. game on the Nintendo Switch, reminiscing about their fondness for the game on the Gamecube. Speaker 1 responds with enthusiasm for Mario and asks about their favorite character, while Speaker 10 shares their personal connection to the game, highlighting their nostalgia for Brawl. The conversat

Generating Summaries (checkpoint-100): 100%|██████████| 50/50 [17:26<00:00, 20.94s/it]


Calculating scores...

--- Example Generation from Epoch 2 ---

Example 1:
Reference:
In an excited exchange about the upcoming Super Smash Bros for the Nintendo Switch, Speaker 10 expresses their anticipation and nostalgia for the game series, recalling their fondness for Brawl and favorite character, Snake. Speaker 1 empathizes by sharing their own memories of playing the game on GameCube, fostering a sense of connection through shared experiences and enthusiasm for the franchise. The conversation highlights a mutual appreciation for the game's legacy and the joy it brings to both speakers.
Generated:
In a light-hearted exchange, Speaker 10 expresses excitement for the upcoming Super Smash Bros. on the Nintendo Switch, reminiscing about their fondness for the game during their childhood. Speaker 10 shares their own experience with Mario and Brawl, while Speaker 1 empathizes by acknowledging the classic nature of Mario and the nostalgia of Smash. The conversation highlights a shared a

Generating Summaries (checkpoint-150): 100%|██████████| 50/50 [17:58<00:00, 21.57s/it]


Calculating scores...

--- Example Generation from Epoch 3 ---

Example 1:
Reference:
In an excited exchange about the upcoming Super Smash Bros for the Nintendo Switch, Speaker 10 expresses their anticipation and nostalgia for the game series, recalling their fondness for Brawl and favorite character, Snake. Speaker 1 empathizes by sharing their own memories of playing the game on GameCube, fostering a sense of connection through shared experiences and enthusiasm for the franchise. The conversation highlights a mutual appreciation for the game's legacy and the joy it brings to both speakers.
Generated:
In a lighthearted exchange, Speaker 10 expresses excitement about the upcoming Super Smash Bros. on the Nintendo Switch, reminiscing about their enjoyment of the game during their childhood. Speaker 1 responds with nostalgia, acknowledging the classic nature of Mario and highlighting their own experience with Brawl. Speaker 10 shares their favorite character, Snake, adding to the conver