## Environment Setup and Core Libraries

This cell initializes the essential environment and libraries for our text summarization project. We import:
- **Standard Libraries**: `numpy` and `pandas` for data handling.
- **PyTorch**: For neural network operations.
- **Transformers Library**: From Hugging Face, providing access to BART model utilities.
- **Evaluation and Tracking Tools**: `evaluate` for model performance evaluation and `wandb` for experiment tracking.
- **Hugging Face Hub Utilities**: Optional, for model management.
- **Version Information**: Of `transformers` library for compatibility and debugging.

This setup creates a robust foundation for implementing and tracking the performance of our text summarization model.

In [1]:
# Core libraries
import time
import numpy as np
import pandas as pd

# PyTorch for neural networks
import torch
import torch.nn.functional as F
from torch import cuda
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler

# Hugging Face transformers for NLP tasks
from transformers import BartTokenizer, BartForConditionalGeneration

# Evaluation and experiment tracking tools
import evaluate  # For evaluating model performance
import wandb  # For experiment tracking and logging

# Optional: Hugging Face Hub utilities
from huggingface_hub import interpreter_login

# Version information (can be used for logging or debugging)
from transformers import __version__ as transformers_version

## Initializing Weights & Biases for Experiment Tracking

This cell is dedicated to logging into Weights & Biases (wandb), a tool used for tracking and visualizing the progress and results of machine learning experiments. Key points:

- **wandb Integration**: By executing `wandb.login()`, we establish a connection with the wandb service. This integration allows us to monitor various metrics during model training and evaluation, such as loss, accuracy, and more.
- **Security Note**: It is crucial to set the wandb API key in the environment variables for secure access. This approach ensures that our credentials are not exposed in the notebook.

Setting up wandb is a critical step for maintaining a systematic and thorough record of our model's performance throughout the training and evaluation phases.

In [2]:
# Login to Weights & Biases for experiment tracking
# Note: Ensure that your wandb API key is set in your environment variables for security purposes
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33ma-elnagar3003[0m ([33mnigoo[0m). Use [1m`wandb login --relogin`[0m to force relogin


True

## Hugging Face Hub Interpreter Login

In this cell, we establish a connection with the Hugging Face Hub using the `interpreter_login()` function. The Hugging Face Hub is a platform for sharing and discovering machine learning models, datasets, and evaluation metrics. Important aspects:

- **Hugging Face Hub Connection**: The `interpreter_login()` function is used to log into the Hugging Face Hub. This connection enables us to interact with the platform, providing access to a vast repository of models and datasets.
- **Security Measures**: It's essential to set the Hugging Face API keys (both for reading and writing) in the environment variables. This practice ensures the security of our credentials and seamless integration with the Hub.

Logging into the Hugging Face Hub is a vital step for accessing state-of-the-art models, datasets, and tools that can enhance our text summarization project.

In [3]:
# Login to Hugging Face Hub interpreter
# Note: Set your Hugging Face API keys (for reading and writing) in your environment variables for security
interpreter_login()


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token is valid (permission: write).
Your token has been saved in your configured git credential helpers (osxkeycha

## Setting Up Computation Device

This cell is focused on configuring the computation device for our model training and inference tasks. It involves:

- **CUDA Availability Check**: We use `cuda.is_available()` from PyTorch to check if a CUDA-enabled GPU is available. CUDA (Compute Unified Device Architecture) by NVIDIA is crucial for accelerating deep learning tasks.
- **Device Setting**: Based on the availability, we set the `device` variable to either `'cuda'` (for GPU) or `'cpu'` (for CPU). This ensures that our model utilizes the most efficient available computing resource.
- **Output Confirmation**: The cell outputs the type of device being used (`cuda` or `cpu`), providing clear feedback about the environment setup.

Properly configuring the computation device is essential for efficient model training and can significantly impact the performance and speed of our text summarization task.

In [4]:
# Determine if CUDA (GPU support) is available and set the device accordingly
device = 'cuda' if cuda.is_available() else 'cpu'
print(f"Using device: {device}")

Using device: cpu


## Weights & Biases Project Initialization and Configuration Setup

This cell plays a critical role in setting up the experiment tracking and defining the training configuration:

- **Weights & Biases Project**: We initialize a new project in Weights & Biases (wandb) named "BART_summarization". This project will track all the metrics, logs, and outputs of our model training and evaluation, enabling us to monitor the experiment's progress and performance.

- **Training Configuration Parameters**: The configuration settings for the training process are defined and stored in `wandb.config`. Key parameters include:
  - `TRAIN_BATCH_SIZE` and `VALID_BATCH_SIZE`: Batch sizes for training and validation.
  - `TRAIN_EPOCHS`: The number of epochs for training the model.
  - `LEARNING_RATE`: Learning rate for the optimizer.
  - `SEED`: A seed value for ensuring reproducibility of results.
  - `MAX_LEN` and `SUMMARY_LEN`: Maximum length of input text and target length of the summary, respectively.

- **Reproducibility**: To ensure consistent results across runs, we set seeds for PyTorch and NumPy, and make CUDA operations deterministic.

- **Hugging Face Hub Repository Configuration**: We specify `new_repo` for creating a new repository and `repo_name` for the full name of the repository, including the namespace. This setup is essential for storing and managing the model on the Hugging Face Hub.

These configurations are crucial for maintaining a structured and reproducible training process, as well as for effective tracking and management of the model training lifecycle.

In [5]:
# Initialize a Weights & Biases project for experiment tracking
wandb.init(project="BART_summarization")

# Setting configuration parameters for the training process
config = wandb.config
config.TRAIN_BATCH_SIZE = 2    # Training batch size
config.VALID_BATCH_SIZE = 2    # Validation batch size
config.TRAIN_EPOCHS = 2        # Number of training epochs
config.LEARNING_RATE = 1e-4    # Learning rate
config.SEED = 42               # Seed for reproducibility
config.MAX_LEN = 512           # Maximum length of the input text
config.SUMMARY_LEN = 150       # Target length of the summary

# Set seeds for reproducibility across runs
torch.manual_seed(config.SEED)
np.random.seed(config.SEED)
torch.backends.cudnn.deterministic = True

# Repository configuration for Hugging Face Hub
new_repo = "text_summarizer"                              # Name for a new repository
repo_name = "EducativeCS2023/bart-base-summarization"     # Full name of the repository including namespace

## Data Loading and Preprocessing

In this cell, we focus on preparing the dataset for the text summarization task. The key steps include:

- **Dataset Loading**: We load the dataset from a CSV file (`BBCarticles.csv`). The dataset contains articles and their corresponding summaries.

- **Data Preparation**: 
  - We select the relevant columns, typically the text of the articles and their summaries.
  - The text in the dataset is prepended with 'summarize: ' to signal our model that the task is summarization.

- **Dataset Splitting**:
  - We define a `split_ratio` to determine the proportion of data used for training and evaluation.
  - The dataset is then split into two parts: one for training and another for evaluation, ensuring a balanced representation of data in both sets.

- **Dataset Overview**:
  - The shape (number of samples) of both training and evaluation datasets is printed to provide an overview of the dataset sizes.
  - The first few rows of the dataframe are displayed to give a glimpse of the data format and content.

This preprocessing step is crucial for setting up our dataset correctly for training the BART model and evaluating its performance on text summarization.

In [6]:
# Load the dataset
df = pd.read_csv('../data/BBCarticles.csv', encoding='latin-1')

# Select relevant columns and prepend 'summarize: ' to the text
df = df[['Text', 'Summary']]
df.Text = 'summarize: ' + df.Text

# Splitting the dataset into train and evaluation sets
split_ratio = 0.025  # Split ratio for sampling
# Creating a training dataset
train_dataset = df.sample(frac=split_ratio, random_state=config.SEED).reset_index(drop=True)
# Creating an evaluation dataset
eval_dataset = df.drop(train_dataset.index).sample(frac=split_ratio, random_state=config.SEED).reset_index(drop=True)

# Displaying the shape of the datasets
print("Training Dataset Size:", train_dataset.shape)
print("Evaluation Dataset Size:", eval_dataset.shape)

# Display the first few rows of the dataframe
df.head(3)

Training Dataset Size: (56, 2)
Evaluation Dataset Size: (54, 2)


Unnamed: 0,Text,Summary
0,summarize: Ad sales boost Time Warner profit\n...,TimeWarner said fourth quarter sales rose 2% t...
1,summarize: Dollar gains on Greenspan speech\n\...,The dollar has hit its highest level against t...
2,summarize: Yukos unit buyer faces loan claim\n...,Yukos' owner Menatep Group says it will ask Ro...


## Custom Dataset Class for Text Summarization

This cell defines a `CustomDataset` class, which is crucial for structuring our data for the summarization task using the BART model. Key aspects of this class include:

- **Initialization**:
  - The constructor (`__init__`) takes a DataFrame, a tokenizer, and maximum lengths for the source and summary texts.
  - The DataFrame contains the articles and their summaries, and the tokenizer is used for encoding the texts.

- **Length Method** (`__len__`):
  - This method returns the number of items in the dataset, which is essential for iterating over the dataset during training and evaluation.

- **Get Item Method** (`__getitem__`):
  - Retrieves a specific item from the dataset by its index.
  - Each item consists of the article and its summary, which are preprocessed and encoded using the provided tokenizer.
  - The method returns a dictionary containing the encoded source text, its attention mask, and the encoded target summary.

This custom dataset class is fundamental for efficiently processing and feeding the data into our model for training and evaluation, ensuring that the data is in the correct format and adequately preprocessed.

In [7]:
class CustomDataset(Dataset):
    """
    Custom Dataset for loading articles and summaries into the model.
    """

    def __init__(self, dataframe, tokenizer, source_len, summ_len):
        """
        Initialize the dataset.
        :param dataframe: DataFrame containing the articles and summaries.
        :param tokenizer: Tokenizer for encoding the texts.
        :param source_len: Max length for the source text.
        :param summ_len: Max length for the summary text.
        """
        self.tokenizer = tokenizer
        self.data = dataframe
        self.source_len = source_len
        self.summ_len = summ_len
        self.Summary = self.data.Summary
        self.Text = self.data.Text

    def __len__(self):
        """
        Returns the number of items in the dataset.
        """
        return len(self.Summary)

    def __getitem__(self, index):
        """
        Retrieves an item by its index.
        :param index: Index of the desired item.
        :return: Dictionary containing encoded source and target texts.
        """
        # Adjusted preprocessing to match original notebook
        Text = str(self.Text[index])
        Text = ' '.join(Text.split())

        Summary = str(self.Summary[index])
        Summary = ' '.join(Summary.split())

        # Encoding the source and target texts
        source_encoded = self.tokenizer(Text, max_length=self.source_len, padding='max_length', truncation=True, return_tensors='pt')
        target_encoded = self.tokenizer(Summary, max_length=self.summ_len, padding='max_length', truncation=True, return_tensors='pt')

        # Extracting the encoded ids and attention masks
        source_ids = source_encoded['input_ids'].squeeze()
        source_mask = source_encoded['attention_mask'].squeeze()
        target_ids = target_encoded['input_ids'].squeeze()

        return {
            'source_ids': source_ids.to(dtype=torch.long), 
            'source_mask': source_mask.to(dtype=torch.long), 
            'target_ids': target_ids.to(dtype=torch.long)
        }

## Tokenizer Initialization and DataLoader Setup

This cell focuses on preparing the tokenizer and data loaders, essential components for processing and feeding data into our model:

- **BART Tokenizer Initialization**:
  - We initialize the BART tokenizer using `BartTokenizer.from_pretrained(repo_name)`. The tokenizer is crucial for converting text data into a format that can be processed by our BART model.
  - The tokenizer configuration is then pushed to a new repository on the Hugging Face Hub using `tokenizer.push_to_hub(new_repo)`. This step is important for version control and sharing of the tokenizer configuration.

- **Custom Dataset Instances**:
  - We create instances of our `CustomDataset` class for both training (`training_set`) and evaluation (`eval_set`), passing the respective datasets, tokenizer, and configuration parameters like maximum text lengths.

- **DataLoaders Initialization**:
  - `DataLoader` objects for both training and evaluation datasets are created. The `DataLoader` batches data and provides an efficient iterator over the datasets.
  - The training data loader is set to shuffle the data (`shuffle=True`), which is a good practice for training phases to introduce randomness and improve model generalization.
  - The evaluation data loader does not require shuffling (`shuffle=False`), as the order of data does not impact the evaluation phase.

These steps ensure that our data is correctly tokenized and batched, ready for the training and evaluation processes in our text summarization project.

In [8]:
# Initialize the BART tokenizer from the pretrained model repository
tokenizer = BartTokenizer.from_pretrained(repo_name)

# Push the tokenizer configuration to the Hugging Face Hub
tokenizer.push_to_hub(new_repo)

# Create custom datasets for training and evaluation
training_set = CustomDataset(train_dataset, tokenizer, config.MAX_LEN, config.SUMMARY_LEN)
eval_set = CustomDataset(eval_dataset, tokenizer, config.MAX_LEN, config.SUMMARY_LEN)

# Initialize DataLoaders for the training and evaluation sets
# The DataLoader batches data and provides an iterator over the dataset
training_loader = DataLoader(
    training_set, 
    batch_size=config.TRAIN_BATCH_SIZE, 
    shuffle=True, 
    num_workers=0  # Number of subprocesses for data loading
)
eval_loader = DataLoader(
    eval_set, 
    batch_size=config.VALID_BATCH_SIZE, 
    shuffle=False, 
    num_workers=0  # Set to 0 since we don't need to shuffle evaluation data
)

## BART Model Initialization and Optimization Setup

This cell is critical for setting up our BART model and its optimization parameters:

- **BART Model Initialization**:
  - We initialize the BART model for conditional generation using `BartForConditionalGeneration.from_pretrained(repo_name)`. This pretrained model is well-suited for text summarization tasks.
  - The model is then moved to the designated computation device (GPU or CPU) determined earlier using `model.to(device)`. This ensures that the model utilizes the most efficient computing resources available.

- **Optimizer Setup**:
  - An optimizer is crucial for the training process as it updates the model parameters based on the gradients. We choose the Adam optimizer, known for its efficiency in handling large datasets and complex architectures.
  - The learning rate for the optimizer is set according to the predefined configuration (`config.LEARNING_RATE`). The learning rate is a key hyperparameter that influences the speed and quality of the training process.

- **Weights & Biases Integration**:
  - The model is integrated with Weights & Biases (wandb) using `wandb.watch(model, log="all")`. This integration allows us to monitor various aspects of the model during training, including parameters, gradients, and more.
  - This step is essential for keeping track of the model's performance and making informed adjustments during the training process.

Setting up the BART model and its optimizer correctly, along with integrating it with Weights & Biases, lays a solid foundation for the effective training and monitoring of our text summarization model.

In [9]:
# Initialize the BART model for conditional generation from the pretrained model repository
model = BartForConditionalGeneration.from_pretrained(repo_name)

# Move the model to the designated device (GPU or CPU)
model = model.to(device)

# Set up the optimizer for training the model
# Adam optimizer is used with the specified learning rate from the configuration
optimizer = torch.optim.Adam(params=model.parameters(), lr=config.LEARNING_RATE)

# Integrate the model with Weights & Biases for monitoring and tracking during training
# This allows for logging all model parameters and gradients
wandb.watch(model, log="all")

[]

## Train Function Definition for Model Training

This cell defines the `train` function, a crucial part of our training pipeline:

- **Function Purpose**:
  - The `train` function is designed to train the BART model for one epoch. It takes parameters like the current epoch number, tokenizer, model, device, data loader, and optimizer.

- **Training Mode and Batch Processing**:
  - The model is set to training mode using `model.train()`.
  - The function iterates over batches of data from the `loader`. Each batch consists of input and target data that are prepared for the model.

- **Loss Calculation and Logging**:
  - The forward pass involves computing the model's output and calculating the loss.
  - Special attention is paid to ignoring padding tokens in the loss calculation.
  - The training loss is logged to Weights & Biases at regular intervals for monitoring.

- **Weight Update Process**:
  - The backward pass includes clearing previous gradients, computing new gradients, and updating the model weights.
  - This process is crucial for the learning and improvement of the model over each batch of data.

- **Feedback and Monitoring**:
  - The function provides feedback on the training progress by printing the loss at regular intervals.
  - This feedback is valuable for understanding the model's learning dynamics and making necessary adjustments.

The `train` function is a key component in our training loop, ensuring systematic and efficient training of our model over each epoch.

In [10]:
def train(epoch, tokenizer, model, device, loader, optimizer):
    """
    Function to be called for training the model for one epoch.
    :param epoch: Current epoch number.
    :param tokenizer: Tokenizer for the text data.
    :param model: The BART model for conditional generation.
    :param device: The device (CPU or GPU) to use for training.
    :param loader: DataLoader fofr the training data.
    :param optimizer: Optimizer for updating model weights.
    """
    model.train()  # Set the model to training mode
    for batch_index, data in enumerate(loader, 0):
        # Prepare the input and target data
        y = data['target_ids'].to(device, dtype=torch.long)
        y_ids = y[:, :-1].contiguous()
        labels = y[:, 1:].clone().detach()
        labels[y[:, 1:] == tokenizer.pad_token_id] = -100  # Ignore padding tokens in loss calculation
        ids = data['source_ids'].to(device, dtype=torch.long)
        mask = data['source_mask'].to(device, dtype=torch.long)

        # Forward pass: Compute the model output and calculate loss
        outputs = model(input_ids=ids, attention_mask=mask, decoder_input_ids=y_ids, labels=labels)
        loss = outputs[0]

        # Log the training loss to Weights & Biases every 10 batches
        if batch_index % 10 == 0:
            wandb.log({"Training Loss": loss.item()})

        # Print the loss every 500 batches
        if batch_index % 500 == 0:
            print(f'Epoch: {epoch}, Loss:  {loss.item()}')
        
        # Backward pass: Update model weights
        optimizer.zero_grad()  # Clear previous gradients
        loss.backward()        # Compute gradients
        optimizer.step()       # Update weights

## Model Training Loop and Pushing to Hugging Face Hub

This cell contains the core training loop and code to push the trained model to the Hugging Face Hub:

- **Training Loop**:
  - We iterate over a number of epochs as specified in the `config.TRAIN_EPOCHS`.
  - For each epoch, we call the `train` function defined earlier, passing the necessary parameters like the current epoch, tokenizer, model, device, training data loader, and optimizer.
  - This loop is where the model is effectively trained, learning to generate summaries from the input text.

- **Progress Monitoring**:
  - The progress of training is printed at the start of each epoch, providing visibility into the training process.

- **Pushing Model to Hugging Face Hub**:
  - After the training is complete, the model is pushed to a new repository on the Hugging Face Hub using `model.push_to_hub(new_repo)`.
  - This step is significant for version control, sharing, and potentially deploying the model for future use.

The execution of this cell marks the completion of the model's training phase and its availability for wider use through the Hugging Face Hub.

In [11]:
# Training loop over the specified number of epochs
for epoch in range(config.TRAIN_EPOCHS):
    print(f"Training epoch: {epoch+1}/{config.TRAIN_EPOCHS}")
    # Call the train function for each epoch
    train(epoch, tokenizer, model, device, training_loader, optimizer)

# Push the trained model to the Hugging Face Hub
model.push_to_hub(new_repo)
print("Model successfully pushed to Hugging Face Hub")

Training epoch: 1/2
Epoch: 0, Loss:  1.5276191234588623
Training epoch: 2/2
Epoch: 1, Loss:  0.11024223268032074
Model successfully pushed to Hugging Face Hub


## Prediction Function Definition

This cell introduces the `predict` function, an essential component for evaluating our text summarization model:

- **Function Purpose**:
  - The `predict` function is designed to generate predictions from the BART model. It takes parameters such as the tokenizer, model, device, and data loader for prediction.
  
- **Evaluation Mode and Prediction Generation**:
  - The model is set to evaluation mode using `model.eval()`, which is necessary for making predictions as it disables certain layers like Dropout.
  - Predictions are generated in a loop over the data from the `loader`, where input data is prepared and fed into the model.
  
- **Prediction and Decoding**:
  - The `model.generate` method is used to generate predictions. Parameters like maximum length, beam search settings, and penalties for repetition and length are specified to control the generation process.
  - Generated summaries are then decoded from their tokenized form to human-readable text using the tokenizer.

- **Progress Monitoring**:
  - The progress of prediction generation is printed at regular intervals to provide feedback on the completion of batches.

- **Results Compilation**:
  - The function returns two lists: `predictions` (containing the generated summaries) and `actuals` (containing the actual summaries from the dataset).

This `predict` function is a vital tool for assessing the performance of our model by comparing its generated summaries against the actual summaries.

In [12]:
def predict(tokenizer, model, device, loader):
    """
    Function to generate predictions from the model.
    :param tokenizer: Tokenizer for the text data.
    :param model: The BART model for conditional generation.
    :param device: The device (CPU or GPU) to use for prediction.
    :param loader: DataLoader for the data to be predicted.
    :return: Lists of predictions and actual summaries.
    """
    model.eval()  # Set the model to evaluation mode
    predictions = []
    actuals = []
    with torch.no_grad():  # Disable gradient calculation
        for batch_index, data in enumerate(loader, 0):
            # Prepare the input data
            ids = data['source_ids'].to(device, dtype=torch.long)
            mask = data['source_mask'].to(device, dtype=torch.long)
            y = data['target_ids'].to(device, dtype=torch.long)

            # Generate predictions
            generated_ids = model.generate(
                input_ids=ids,
                attention_mask=mask,
                max_length=150,  # Maximum length of the generated summaries
                num_beams=2,  # Number of beams for beam search
                repetition_penalty=2.5,  # Penalty for repetition
                length_penalty=1.0,  # Penalty for summary length
                early_stopping=True  # Stop generating when all beams reach the EOS token
            )
            # Decode the generated summaries
            preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
            # Decode the actual summaries
            target = [tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True) for t in y]

            # Print progress every 100 batches
            if batch_index % 100 == 0:
                print(f'Completed {batch_index} batches')

            # Extend the lists with predictions and actuals
            predictions.extend(preds)
            actuals.extend(target)

    return predictions, actuals

## Generating Predictions and Compiling Results

This cell executes the prediction process and compiles the results:

- **Timing the Prediction Process**:
  - The start time is recorded using `time.time()` to measure the duration of the prediction process.

- **Prediction Generation**:
  - The `predict` function is called with the necessary parameters (tokenizer, model, device, evaluation data loader) to generate predictions on the evaluation set.
  - The function returns two lists: `predictions` (generated summaries) and `actuals` (actual summaries from the dataset).

- **Results Compilation**:
  - A DataFrame `results` is created to hold both the predictions and actual summaries, facilitating easy comparison and analysis.
  - This DataFrame is then saved to a CSV file (`predictions.csv`), allowing for further analysis or sharing of the results.

- **Time Evaluation**:
  - The end time is recorded, and the time taken for the prediction process is calculated and printed. This information is useful for evaluating the efficiency of the prediction process.
  
- **Results Display**:
  - The first few rows of the results DataFrame are displayed to provide a quick overview of the model's performance.

This step is crucial in evaluating the effectiveness of our text summarization model by comparing its predictions against the actual summaries.

In [13]:
# Measure the start time for the prediction process
start_time = time.time()

# Generate predictions using the defined prediction function
predictions, actuals = predict(tokenizer, model, device, eval_loader)

# Compile the predictions and actual summaries into a DataFrame
results = pd.DataFrame({'predictions': predictions, 'actuals': actuals})

# Save the results to a CSV file for further analysis or sharing
results.to_csv('../results/predictions.csv')

# Measure the end time and calculate the time taken for prediction
end_time = time.time()
time_taken = end_time - start_time
print(f"Time taken for predictions: {time_taken:.2f} seconds")

# Display the first few rows of the results
results.head()

Completed 0 batches
Time taken for predictions: 147.54 seconds


Unnamed: 0,predictions,actuals
0,mobile phone users to send multimedia message...,Getting mobile phone users to send multimedia ...
1,go into the game on the back of a 2-0 victory...,Arsenal go into the game on the back of a 2-0 ...
2,"a statement, Media Labs Europe said the decis...","In a statement, Media Labs Europe said the dec..."
3,Knapman rejected the idea Mr Kilroy-Silk pose...,Mr Knapman rejected the idea Mr Kilroy-Silk po...
4,proponents of the bill said it was a good com...,The European Parliament has thrown out a bill ...


## Evaluating Model Performance with ROUGE Scores

In this cell, we assess the performance of our text summarization model using the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric:

- **ROUGE Metric Initialization**:
  - We load the ROUGE scoring function from the `evaluate` library. ROUGE is a standard metric for evaluating text summarization models, focusing on the overlap between the predicted and actual summaries.

- **Computing ROUGE Scores**:
  - The `rouge_score.compute` method is called with our model's predictions and the actual summaries. This computation provides scores for different ROUGE metrics (like ROUGE-1, ROUGE-2, and ROUGE-L), reflecting various aspects of similarity between the predicted and actual texts.

- **Results Visualization and Analysis**:
  - The computed ROUGE scores are converted into a DataFrame, `rouge_scores_df`, for better visualization and ease of analysis.
  - The first few rows of this DataFrame are displayed to give an immediate sense of the model's summarization performance.

Evaluating the model with ROUGE scores is crucial for understanding its effectiveness in generating coherent, relevant, and concise summaries, and for identifying areas for further improvement.

In [14]:
# Load the ROUGE scoring function for evaluating text summarization
rouge_score = evaluate.load("rouge")

# Compute the ROUGE scores comparing the model's predictions with the actual summaries
scores = rouge_score.compute(
    predictions=results['predictions'], 
    references=results['actuals']
)

# Convert the scores to a DataFrame for better visualization and analysis
rouge_scores_df = pd.DataFrame([scores]).transpose()

# Display the first few rows of the ROUGE scores
rouge_scores_df.head()

Unnamed: 0,0
rouge1,0.775264
rouge2,0.696991
rougeL,0.611014
rougeLsum,0.611888
