# Multimodal Summarization and Reward Modeling System

**Multimodal Summarization and Reward Modeling System - Week 8 Assignment**

## Overview

This notebook implements a complete academic paper summarization and reward modeling pipeline:

1. Summary generation using local Ollama Qwen3:8b model
2. Summary comparison and human/auto annotation
3. Reward model training based on DeBERTa-v3
4. Multi-dimensional evaluation (ROUGE, BERTScore, reward scores)

---

## Feature Modules

| Module | Description |
|--------|-------------|
| **SummaryGenerator** | Generate summaries via Ollama API using local Qwen3:8b |
| **AnnotationInterface** | Interactive summary comparison annotation interface |
| **RewardModelTrainer** | Reward model training based on DeBERTa-v3-base |
| **SummaryEvaluator** | ROUGE, BERTScore, reward model scoring |
| **Pipeline** | Complete end-to-end pipeline management |

## 1. Environment Setup & Imports

First, import required libraries and setup environment.

In [None]:
# Basic libraries
import json
import os
from pathlib import Path
from typing import List, Dict, Tuple

# Deep learning libraries
import torch
import numpy as np

# Transformers libraries
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer
)

# Data and evaluation libraries
from datasets import load_dataset, Dataset
from evaluate import load

# HTTP requests
import requests

print("\nEnvironment setup complete!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

## 2. Custom Reward Model Trainer

Create a reward model trainer compatible with the new transformers version.

In [None]:
class CustomRewardTrainer(Trainer):
    """
    Custom reward model trainer, compatible with new transformers version
    
    This trainer implements reward model loss calculation for comparing
    chosen and rejected responses.
    """

    def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
        """
        Calculate reward loss
        
        Args:
            model: Reward model
            inputs: Input data
            return_outputs: Whether to return output
            num_items_in_batch: Number of items in batch
        """
        if "input_ids" in inputs and "attention_mask" in inputs:
            labels = inputs.pop("labels", None)
            outputs = model(**inputs)
            logits = outputs.logits

            # Create contrastive loss
            if labels is not None:
                loss_fct = torch.nn.BCEWithLogitsLoss()
                loss = loss_fct(logits.view(-1), labels.float().view(-1))
            else:
                # If no labels, use MSE loss to make output close to 1
                loss = torch.nn.functional.mse_loss(logits, torch.ones_like(logits))

            return (loss, outputs) if return_outputs else loss

        return super().compute_loss(model, inputs, return_outputs)

print("CustomRewardTrainer class defined")

## 3. Summary Generator (SummaryGenerator)

Generate paper summaries using local Ollama Qwen3:8b model.

### Features
- Communicate with Ollama via HTTP API
- Support adjustable temperature parameters
- Automatically handle chain-of-thought model output

In [None]:
class SummaryGenerator:
    """
    Generate summaries using local Ollama Qwen3:8b
    
    Attributes:
        model_name: Model name in Ollama
        api_url: URL of Ollama API
    """

    def __init__(self, model_name: str = "qwen3:8b"):
        """
        Initialize summary generator
        
        Args:
            model_name: Model name in Ollama
        """
        print(f"Using local Ollama model: {model_name}")
        self.model_name = model_name
        self.api_url = "http://localhost:11434/api/generate"

    def generate_summary(
        self,
        paper_text: str,
        prompt_template: str = None,
        max_length: int = 512,
        temperature: float = 0.7,
        top_p: float = 0.9
    ) -> str:
        """
        Generate summary via Ollama API

        Args:
            paper_text: Paper text
            prompt_template: Prompt template
            max_length: Maximum generation length
            temperature: Sampling temperature (0.0-1.0)
            top_p: Nucleus sampling parameter

        Returns:
            Generated summary text
        """
        if prompt_template is None:
            prompt_template = """"Please provide a concise summary of the following research paper:\n
{paper_text}\n
Provide a 2-3 sentence summary:"""

        prompt = prompt_template.format(paper_text=paper_text[:3000])

        payload = {
            "model": self.model_name,
            "prompt": prompt,
            "stream": False,
            "options": {
                "num_predict": max_length,
                "temperature": temperature,
                "top_p": top_p,
            }
        }

        try:
            response = requests.post(self.api_url, json=payload)
            response.raise_for_status()
            result = response.json()

            # Get response content
            summary = result.get("response", "").strip()

            # If response is empty, extract from thinking field
            if not summary and "thinking" in result:
                thinking = result.get("thinking", "")
                lines = thinking.split('\n')
                summary_lines = [line.strip() for line in lines if line.strip()]
                if summary_lines:
                    summary = ' '.join(summary_lines[-3:])  # Take last 3 sentences

            return summary
        except Exception as e:
            print(f"Error calling Ollama: {e}")
            return ""

    def generate_summary_pair(self, paper_text: str) -> Tuple[str, str]:
        """
        Generate two different summaries for comparison

        Args:
            paper_text: Paper text

        Returns:
            Summary pair (summary_a, summary_b)
            - summary_a: Low temperature (0.3) - more deterministic
            - summary_b: High temperature (0.9) - more diverse
        """
        # Low temperature - more deterministic
        summary_a = self.generate_summary(
            paper_text,
            temperature=0.3,
            top_p=0.8
        )

        # High temperature - more diverse
        summary_b = self.generate_summary(
            paper_text,
            temperature=0.9,
            top_p=0.95
        )

        return summary_a, summary_b

# Test class definition
print("SummaryGenerator class defined")

## 4. Annotation Interface (AnnotationInterface)

Interactive summary comparison annotation interface.

In [None]:
class AnnotationInterface:
    """
    Human annotation interface
    
    Used for manually comparing two summaries and selecting the better one.
    """
    
    @staticmethod
    def annotate_summary_pair(
        paper_id: str,
        summary_a: str,
        summary_b: str
    ) -> Dict:
        """
        Interactively annotate summary pair

        Args:
            paper_id: Paper ID
            summary_a: Summary A (temperature=0.3)
            summary_b: Summary B (temperature=0.9)

        Returns:
            Annotation result dictionary
        """
        print("\n" + "="*80)
        print(f"Paper ID: {paper_id}")
        print("="*80)
        print("\nSummary A (temperature=0.3):")
        print("-" * 80)
        print(summary_a)
        print("\nSummary B (temperature=0.9):")
        print("-" * 80)
        print(summary_b)
        print("\n" + "="*80)

        while True:
            choice = input("\nPlease select the better summary (A/B) or skip (S): ").strip().upper()
            if choice in ['A', 'B', 'S']:
                break
            print("Invalid input, please enter A, B, or S")

        if choice == 'S':
            return None

        chosen = summary_a if choice == 'A' else summary_b
        rejected = summary_b if choice == 'A' else summary_a

        return {
            "paper_id": paper_id,
            "summary_a": summary_a,
            "summary_b": summary_b,
            "chosen": chosen,
            "rejected": rejected,
            "annotator_choice": choice
        }

print("AnnotationInterface class defined")

## 5. Reward Model Trainer (RewardModelTrainer)

Train reward model based on DeBERTa-v3-base.

### Model Architecture
```
Input: (chosen_text, rejected_text)
       |\n   [DeBERTa Encoder]
       |\n   [Pooler + Classifier]
       |\n   Score: Reward Score
```

In [None]:
class RewardModelTrainer:
    """
    Reward model trainer
    
    Used to train a model that maps input text to reward scores.
    """
    
    def __init__(self, model_name: str = "microsoft/deberta-v3-base"):
        """
        Initialize reward model trainer

        Args:
            model_name: Base model name (default DeBERTa-v3-base)
        """
        self.model_name = model_name
        self.tokenizer = None
        self.model = None

    def prepare_dataset(self, jsonl_path: str) -> Dataset:
        """
        Prepare training dataset

        Args:
            jsonl_path: JSONL file path

        Returns:
            Hugging Face Dataset object
        """
        dataset = load_dataset("json", data_files=jsonl_path, split="train")

        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)

        def preprocess(examples):
            """Preprocessing function"""
            return self.tokenizer(
                examples["chosen"],
                examples["rejected"],
                truncation=True,
                padding="max_length",
                max_length=512
            )

        dataset = dataset.map(preprocess, batched=True)
        return dataset

    def train(
        self,
        train_dataset: Dataset,
        output_dir: str = "reward_model",
        num_epochs: int = 3,
        batch_size: int = 8,
        learning_rate: float = 2e-5
    ):
        """
        Train reward model

        Args:
            train_dataset: Training dataset
            output_dir: Output directory
            num_epochs: Training epochs
            batch_size: Batch size
            learning_rate: Learning rate
        """
        # Initialize model
        self.model = AutoModelForSequenceClassification.from_pretrained(
            self.model_name,
            num_labels=1
        )

        # Set tokenizer pad_token
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token

        # Training parameters
        training_args = TrainingArguments(
            output_dir=output_dir,
            per_device_train_batch_size=batch_size,
            num_train_epochs=num_epochs,
            learning_rate=learning_rate,
            eval_strategy="no",
            save_strategy="epoch",
            logging_steps=10,
            fp16=torch.cuda.is_available(),
            report_to="none"
        )

        # Create trainer
        trainer = CustomRewardTrainer(
            model=self.model,
            args=training_args,
            train_dataset=train_dataset,
            processing_class=self.tokenizer
        )

        # Start training
        print("Starting reward model training...")
        trainer.train()

        # Save model
        trainer.save_model(output_dir)
        self.tokenizer.save_pretrained(output_dir)
        print(f"Model saved to: {output_dir}")

    def score_summary(self, summary: str) -> float:
        """
        Score a summary using the trained reward model

        Args:
            summary: Summary text

        Returns:
            Reward score
        """
        if self.model is None or self.tokenizer is None:
            raise ValueError("Model not loaded, please train or load model first")

        inputs = self.tokenizer(
            summary,
            return_tensors="pt",
            truncation=True,
            padding=True,
            max_length=512
        )

        with torch.no_grad():
            outputs = self.model(**inputs)
            score = outputs.logits[0][0].item()

        return score

print("RewardModelTrainer class defined")

## 6. Summary Evaluator (SummaryEvaluator)

Calculate evaluation metrics like ROUGE, BERTScore, etc.

### Evaluation Metrics
| Metric | Description |
|--------|-------------|
| **ROUGE-1** | Word-level overlap |
| **ROUGE-2** | Bigram-level overlap |
| **ROUGE-L** | Longest common subsequence |
| **BERTScore** | BERT-based semantic similarity |

In [None]:
class SummaryEvaluator:
    """
    Summary evaluator
    
    Supports multiple evaluation metrics.
    """

    def __init__(self):
        """
        Initialize evaluator
        """
        print("Loading evaluation metrics...")
        self.rouge = load("rouge")
        self.bertscore = load("bertscore")

    def evaluate_rouge(
        self,
        predictions: List[str],
        references: List[str]
    ) -> Dict:
        """
        Calculate ROUGE scores

        Args:
            predictions: Generated summary list
            references: Reference summary list

        Returns:
            ROUGE score dictionary
        """
        results = self.rouge.compute(
            predictions=predictions,
            references=references
        )
        return results

    def evaluate_bertscore(
        self,
        predictions: List[str],
        references: List[str]
    ) -> Dict:
        """
        Calculate BERTScore

        Args:
            predictions: Generated summary list
            references: Reference summary list

        Returns:
            BERTScore dictionary
        """
        results = self.bertscore.compute(
            predictions=predictions,
            references=references,
            lang="en",
            model_type="microsoft/deberta-xlarge-mnli"
        )

        # Calculate average scores
        avg_results = {
            "precision": np.mean(results["precision"]),
            "recall": np.mean(results["recall"]),
            "f1": np.mean(results["f1"])
        }

        return avg_results

    def comprehensive_evaluation(
        self,
        predictions: List[str],
        references: List[str],
        reward_model: RewardModelTrainer = None
    ) -> Dict:
        """
        Comprehensive evaluation

        Args:
            predictions: Generated summary list
            references: Reference summary list
            reward_model: Reward model (optional)

        Returns:
            Comprehensive evaluation results
        """
        results = {}

        # ROUGE evaluation
        print("Calculating ROUGE scores...")
        results["rouge"] = self.evaluate_rouge(predictions, references)

        # BERTScore evaluation
        print("Calculating BERTScore...")
        results["bertscore"] = self.evaluate_bertscore(predictions, references)

        # Reward model evaluation
        if reward_model:
            print("Calculating reward model scores...")
            reward_scores = [reward_model.score_summary(pred) for pred in predictions]
            results["reward_scores"] = {
                "scores": reward_scores,
                "mean": np.mean(reward_scores),
                "std": np.std(reward_scores)
            }

        return results

print("SummaryEvaluator class defined")

## 7. Complete Pipeline (Pipeline)

Manage the entire experiment workflow.

### Flow Diagram
```
Paper Input → Ollama Generate Summary → Annotate → Train Reward Model → Evaluate
```

In [None]:
class Pipeline:
    """
    Complete experiment workflow
    
    Encapsulates the entire summary generation, annotation, training, and evaluation workflow.
    """

    def __init__(self):
        """
        Initialize pipeline
        """
        self.generator = None
        self.reward_trainer = None
        self.evaluator = SummaryEvaluator()

    def step1_generate_summaries(
        self,
        papers: List[Dict[str, str]],
        output_path: str = "summary_pairs.json"
    ):
        """
        Step 1: Generate summary pairs

        Args:
            papers: Paper list, each element contains {'id': ..., 'text': ...}
            output_path: Output file path
        """
        print("\n=== Step 1: Generate Summary Pairs ===")
        self.generator = SummaryGenerator()

        summary_pairs = []
        for paper in papers:
            print(f"\nProcessing paper: {paper['id']}")
            summary_a, summary_b = self.generator.generate_summary_pair(paper['text'])

            summary_pairs.append({
                "paper_id": paper['id'],
                "summary_a": summary_a,
                "summary_b": summary_b
            })

        # Save results
        with open(output_path, 'w', encoding='utf-8') as f:
            json.dump(summary_pairs, f, ensure_ascii=False, indent=2)

        print(f"\nSummary pairs saved to: {output_path}")
        return summary_pairs

    def step2_annotate(
        self,
        summary_pairs_path: str = "summary_pairs.json",
        output_path: str = "reward_data.jsonl",
        auto_mode: bool = True
    ):
        """
        Step 2: Annotation

        Args:
            summary_pairs_path: Summary pairs file path
            output_path: Output JSONL path
            auto_mode: Auto mode (default select summary_a as chosen)
        """
        print("\n=== Step 2: Annotation ===")

        with open(summary_pairs_path, 'r', encoding='utf-8') as f:
            summary_pairs = json.load(f)

        annotated_data = []
        for pair in summary_pairs:
            if auto_mode:
                # Auto mode
                result = {
                    "paper_id": pair['paper_id'],
                    "summary_a": pair['summary_a'],
                    "summary_b": pair['summary_b'],
                    "chosen": pair['summary_a'],
                    "rejected": pair['summary_b'],
                    "annotator_choice": "A"
                }
                annotated_data.append({
                    "chosen": result["chosen"],
                    "rejected": result["rejected"]
                })
                print(f"Paper {pair['paper_id']}: Auto-select A (temperature=0.3)")
            else:
                # Interactive mode
                result = AnnotationInterface.annotate_summary_pair(
                    pair['paper_id'],
                    pair['summary_a'],
                    pair['summary_b']
                )
                if result:
                    annotated_data.append({
                        "chosen": result["chosen"],
                        "rejected": result["rejected"]
                    })

        # Save as JSONL
        with open(output_path, 'w', encoding='utf-8') as f:
            for item in annotated_data:
                f.write(json.dumps(item, ensure_ascii=False) + '\n')

        print(f"\nAnnotation data saved to: {output_path}")
        print(f"Total annotated samples: {len(annotated_data)}")

    def step3_train_reward_model(
        self,
        train_data_path: str = "reward_data.jsonl",
        output_dir: str = "reward_model"
    ):
        """
        Step 3: Train reward model

        Args:
            train_data_path: Training data path
            output_dir: Model output directory
        """
        print("\n=== Step 3: Train Reward Model ===")

        self.reward_trainer = RewardModelTrainer()
        dataset = self.reward_trainer.prepare_dataset(train_data_path)

        self.reward_trainer.train(
            train_dataset=dataset,
            output_dir=output_dir
        )

    def step4_evaluate(
        self,
        test_papers: List[Dict[str, str]],
        reward_model_path: str = "reward_model"
    ):
        """
        Step 4: Evaluation

        Args:
            test_papers: Test paper list
            reward_model_path: Reward model path
        """
        print("\n=== Step 4: Evaluation ===")

        # Generate test summaries
        predictions = []
        references = []

        for paper in test_papers:
            summary, _ = self.generator.generate_summary_pair(paper['text'])
            predictions.append(summary)
            references.append(paper.get('reference_summary', summary))

        # Load reward model
        if self.reward_trainer is None:
            self.reward_trainer = RewardModelTrainer()
            self.reward_trainer.tokenizer = AutoTokenizer.from_pretrained(reward_model_path)
            self.reward_trainer.model = AutoModelForSequenceClassification.from_pretrained(
                reward_model_path
            )

        # Comprehensive evaluation
        results = self.evaluator.comprehensive_evaluation(
            predictions,
            references,
            self.reward_trainer
        )

        # Print results
        print("\nEvaluation Results:")
        print("="*80)
        print("\nROUGE Scores:")
        for key, value in results["rouge"].items():
            print(f"  {key}: {value:.4f}")

        print("\nBERTScore:")
        for key, value in results["bertscore"].items():
            print(f"  {key}: {value:.4f}")

        if "reward_scores" in results:
            print("\nReward Model Scores:")
            print(f"  Mean: {results['reward_scores']['mean']:.4f}")
            print(f"  Std: {results['reward_scores']['std']:.4f}")

        return results

print("Pipeline class defined")

---

# Live Demo

## Quick Start

Run the following cell to execute the complete workflow.

In [None]:
# Prepare example paper data
example_papers = [
    {
        "id": "paper_001",
        "text": """
        Title: Attention Is All You Need

        Abstract: The dominant sequence transduction models are based on complex
        recurrent or convolutional neural networks. We propose a new architecture
        called the Transformer that relies entirely on an attention mechanism to
        draw global dependencies between input and output. The Transformer allows
        for significantly more parallelization and can reach a new state of the
        art in translation quality after being trained for as little as twelve
        hours on eight P100 GPUs.
        """,
        "reference_summary": "The Transformer is a new neural network architecture for sequence transduction that relies entirely on attention mechanisms."
    },
    {
        "id": "paper_002",
        "text": """
        Title: BERT: Pre-training of Deep Bidirectional Transformers

        Abstract: We introduce a new language representation model called BERT, which
        stands for Bidirectional Encoder Representations from Transformers. Unlike
        recent language representation models, BERT is designed to pre-train deep
        bidirectional representations from unlabeled text by jointly conditioning on
        both left and right context in all layers.
        """,
        "reference_summary": "BERT is a bidirectional Transformer language model that can be pre-trained on unlabeled text."
    },
    {
        "id": "paper_003",
        "text": """
        Title: GPT-3: Language Models are Few-Shot Learners

        Abstract: Recent work has demonstrated substantial gains on many NLP tasks
        by pre-training on a large corpus of text. We demonstrate that scaling up language
        models greatly improves few-shot performance.
        """,
        "reference_summary": "GPT-3 is a large-scale language model that achieves strong few-shot learning performance."
    }
]

print(f"Prepared {len(example_papers)} papers")

## Step 1: Generate Summary Pairs

Use Ollama Qwen3:8b to generate two different summaries for each paper.

In [None]:
# Create pipeline instance
pipeline = Pipeline()

# Execute step 1
summary_pairs = pipeline.step1_generate_summaries(example_papers, "summary_pairs.json")

print(f"\nGenerated {len(summary_pairs)} summary pairs")

## Step 2: Annotation

Annotate the generated summary pairs.

In [None]:
# Auto mode annotation
pipeline.step2_annotate("summary_pairs.json", "reward_data.jsonl", auto_mode=True)

## Step 3: Train Reward Model

Train DeBERTa-v3 reward model based on annotation data.

In [None]:
# Train reward model (may take a few minutes)
pipeline.step3_train_reward_model("reward_data.jsonl", "reward_model")

## Step 4: Evaluation

Evaluate model performance using multiple metrics.

In [None]:
# Evaluate model
results = pipeline.step4_evaluate(example_papers[:2], "reward_model")

print("\nEvaluation completed!")

---

# Using Components Individually

## Summary Generator Only

In [None]:
# Create summary generator
generator = SummaryGenerator("qwen3:8b")

# Single summary generation
paper_text = """
Title: My Research Paper

This paper presents a novel approach to solving complex problems...
"""

summary = generator.generate_summary(paper_text, temperature=0.7)
print(f"Summary: {summary}")

# Generate summary pair
summary_a, summary_b = generator.generate_summary_pair(paper_text)
print(f"\nSummary A: {summary_a}")
print(f"\nSummary B: {summary_b}")

## Load Trained Reward Model

In [None]:
# Create reward model trainer
trainer = RewardModelTrainer()

# Load trained model
trainer.tokenizer = AutoTokenizer.from_pretrained("reward_model")
trainer.model = AutoModelForSequenceClassification.from_pretrained("reward_model")

# Score summary
test_summary = "This paper presents a novel approach to machine learning."
score = trainer.score_summary(test_summary)
print(f"Reward score: {score:.4f}")

---

# Notes

1. **Ollama Service**: Ensure Ollama is running (`ollama serve`)
2. **Model Download**: First run will download DeBERTa model
3. **GPU Acceleration**: GPU available for faster training
4. **Windows Encoding**: Script handles UTF-8 encoding

---

**Week 8 Assignment - Multimodal Summarization and Reward Modeling**