The provided code is a Python notebook that performs model evaluation for a text summarization project. Here's a summary of the code:

1. The code imports the necessary libraries and sets the working directory to the project root directory.

2. It defines a dataclass `ModelEvaluationConfig` that represents the configuration for model evaluation. It includes paths for data, model, tokenizer, and the metric file.

3. The `ConfigurationManager` class handles the reading of YAML configuration files and creation of necessary directories.

4. The `get_model_evaluation_config` method in `ConfigurationManager` retrieves the model evaluation configuration and creates the `ModelEvaluationConfig` object.

5. The code imports libraries for the evaluation process, including `transformers`, `datasets`, `torch`, `pandas`, and `tqdm`.

6. The `ModelEvaluation` class is defined, which takes the `ModelEvaluationConfig` as input.

7. The `ModelEvaluation` class has a method called `generate_batch_sized_chunks` to split the dataset into smaller batches for processing.

8. The `calculate_metric_on_test_ds` method in `ModelEvaluation` calculates the evaluation metric (ROUGE score) on the test dataset using the provided model and tokenizer.

9. The `evaluate` method in `ModelEvaluation` initializes the tokenizer and model, loads the test dataset, and calculates the ROUGE score using the `calculate_metric_on_test_ds` method.

10. The calculated ROUGE scores are stored in a DataFrame and saved to the metric file specified in the configuration.

11. The code instantiates the `ConfigurationManager`, retrieves the model evaluation configuration, creates an instance of `ModelEvaluation`, and performs model evaluation by calling the `evaluate` method.

Overall, this code performs model evaluation using the ROUGE metric on a text summarization model. It retrieves the necessary configurations, initializes the model and tokenizer, loads the test dataset, calculates the metric, and saves the results.


In [1]:
import os

In [2]:
%pwd

'f:\\artificial intelegnce\\study\\ML End To End Projects Krish Naik\\github\\Text-Summarizer-Project\\research'

In [3]:
os.chdir("../")

In [4]:
%pwd

'f:\\artificial intelegnce\\study\\ML End To End Projects Krish Naik\\github\\Text-Summarizer-Project'

In [6]:
from dataclasses import dataclass
from pathlib import Path


@dataclass(frozen=True)
class ModelEvaluationConfig:
    root_dir: Path
    data_path: Path
    model_path: Path
    tokenizer_path: Path
    metric_file_name: Path

In [7]:
from textSummarizer.constants import *
from textSummarizer.utils.common import read_yaml, create_directories

In [8]:
class ConfigurationManager:
    def __init__(
        self,
        config_filepath = CONFIG_FILE_PATH,
        params_filepath = PARAMS_FILE_PATH):

        self.config = read_yaml(config_filepath)
        self.params = read_yaml(params_filepath)

        create_directories([self.config.artifacts_root])


    
    def get_model_evaluation_config(self) -> ModelEvaluationConfig:
        config = self.config.model_evaluation

        create_directories([config.root_dir])

        model_evaluation_config = ModelEvaluationConfig(
            root_dir=config.root_dir,
            data_path=config.data_path,
            model_path = config.model_path,
            tokenizer_path = config.tokenizer_path,
            metric_file_name = config.metric_file_name
           
        )

        return model_evaluation_config

In [10]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from datasets import load_dataset, load_from_disk, load_metric
import torch
import pandas as pd
from tqdm import tqdm

In [11]:
class ModelEvaluation:
    def __init__(self, config: ModelEvaluationConfig):
        self.config = config


    
    def generate_batch_sized_chunks(self,list_of_elements, batch_size):
        """split the dataset into smaller batches that we can process simultaneously
        Yield successive batch-sized chunks from list_of_elements."""
        for i in range(0, len(list_of_elements), batch_size):
            yield list_of_elements[i : i + batch_size]

    
    def calculate_metric_on_test_ds(self,dataset, metric, model, tokenizer, 
                               batch_size=16, device="cuda" if torch.cuda.is_available() else "cpu", 
                               column_text="article", 
                               column_summary="highlights"):
        article_batches = list(self.generate_batch_sized_chunks(dataset[column_text], batch_size))
        target_batches = list(self.generate_batch_sized_chunks(dataset[column_summary], batch_size))

        for article_batch, target_batch in tqdm(
            zip(article_batches, target_batches), total=len(article_batches)):
            
            inputs = tokenizer(article_batch, max_length=1024,  truncation=True, 
                            padding="max_length", return_tensors="pt")
            
            summaries = model.generate(input_ids=inputs["input_ids"].to(device),
                            attention_mask=inputs["attention_mask"].to(device), 
                            length_penalty=0.8, num_beams=8, max_length=128)
            ''' parameter for length penalty ensures that the model does not generate sequences that are too long. '''
            
            # Finally, we decode the generated texts, 
            # replace the  token, and add the decoded texts with the references to the metric.
            decoded_summaries = [tokenizer.decode(s, skip_special_tokens=True, 
                                    clean_up_tokenization_spaces=True) 
                for s in summaries]      
            
            decoded_summaries = [d.replace("", " ") for d in decoded_summaries]
            
            
            metric.add_batch(predictions=decoded_summaries, references=target_batch)
            
        #  Finally compute and return the ROUGE scores.
        score = metric.compute()
        return score


    def evaluate(self):
        device = "cuda" if torch.cuda.is_available() else "cpu"
        tokenizer = AutoTokenizer.from_pretrained(self.config.tokenizer_path)
        model_pegasus = AutoModelForSeq2SeqLM.from_pretrained(self.config.model_path).to(device)
       
        #loading data 
        dataset_samsum_pt = load_from_disk(self.config.data_path)


        rouge_names = ["rouge1", "rouge2", "rougeL", "rougeLsum"]
  
        rouge_metric = load_metric('rouge')

        score = self.calculate_metric_on_test_ds(
        dataset_samsum_pt['test'][0:10], rouge_metric, model_pegasus, tokenizer, batch_size = 2, column_text = 'dialogue', column_summary= 'summary'
            )

        rouge_dict = dict((rn, score[rn].mid.fmeasure ) for rn in rouge_names )

        df = pd.DataFrame(rouge_dict, index = ['pegasus'] )
        df.to_csv(self.config.metric_file_name, index=False)


In [12]:
try:
    config = ConfigurationManager()
    model_evaluation_config = config.get_model_evaluation_config()
    model_evaluation_config = ModelEvaluation(config=model_evaluation_config)
    model_evaluation_config.evaluate()
except Exception as e:
    raise e

[2023-06-19 23:35:52,046: INFO: common: yaml file: config\config.yaml loaded successfully]
[2023-06-19 23:35:52,057: INFO: common: yaml file: params.yaml loaded successfully]
[2023-06-19 23:35:52,058: INFO: common: created directory at: artifacts]
[2023-06-19 23:35:52,060: INFO: common: created directory at: artifacts/model_evaluation]


  rouge_metric = load_metric('rouge')
Downloading builder script: 5.65kB [00:00, 39.2MB/s]                   
100%|██████████| 5/5 [03:12<00:00, 38.59s/it]

[2023-06-19 23:39:35,980: INFO: rouge_scorer: Using default tokenizer.]







```python
from Text-Summarization-NLP-Project.research.05_Model_evaluation.ipynb import os
```
This line imports the `os` module, which provides a way to use operating system-dependent functionality.

```python
%pwd
```
This line is a Jupyter Notebook magic command that displays the current working directory.

```python
'd:\\Bappy\\YouTube\\Text-Summarizer-Project\\research'
```
This is the output of the `%pwd` command, indicating the current working directory.

```python
os.chdir("../")
```
This line changes the current working directory to the parent directory.

```python
%pwd
```
This line again displays the current working directory to confirm the change.

```python
'd:\\Bappy\\YouTube\\Text-Summarizer-Project'
```
This is the updated output of the `%pwd` command, indicating the new current working directory.

```python
from dataclasses import dataclass
from pathlib import Path
```
These lines import the `dataclass` decorator from the `dataclasses` module and the `Path` class from the `pathlib` module.

```python
@dataclass(frozen=True)
class ModelEvaluationConfig:
    root_dir: Path
    data_path: Path
    model_path: Path
    tokenizer_path: Path
    metric_file_name: Path
```
This code defines a data class `ModelEvaluationConfig` using the `@dataclass` decorator. It represents the configuration for model evaluation and has five attributes: `root_dir`, `data_path`, `model_path`, `tokenizer_path`, and `metric_file_name`, all of which are of type `Path`.

```python
from textSummarizer.constants import *
from textSummarizer.utils.common import read_yaml, create_directories
```
These lines import constants and utility functions from other modules (`constants.py` and `common.py`) of a package called `textSummarizer`.

```python
class ConfigurationManager:
    def __init__(self, config_filepath=CONFIG_FILE_PATH, params_filepath=PARAMS_FILE_PATH):
        self.config = read_yaml(config_filepath)
        self.params = read_yaml(params_filepath)
        create_directories([self.config.artifacts_root])
```
This code defines a class `ConfigurationManager` that handles the configuration and parameters for the model evaluation. It initializes the configuration and parameters by reading YAML files (`config_filepath` and `params_filepath`) and creates necessary directories based on the configuration.

```python
def get_model_evaluation_config(self) -> ModelEvaluationConfig:
    config = self.config.model_evaluation
    create_directories([config.root_dir])
    model_evaluation_config = ModelEvaluationConfig(
        root_dir=config.root_dir,
        data_path=config.data_path,
        model_path=config.model_path,
        tokenizer_path=config.tokenizer_path,
        metric_file_name=config.metric_file_name
    )
    return model_evaluation_config
```
This method in the `ConfigurationManager` class retrieves the model evaluation configuration from the overall configuration and creates an instance of `ModelEvaluationConfig` based on the retrieved values.

```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from datasets import load_dataset, load_from_disk, load_metric
import torch
import pandas as pd
from tqdm import tqdm
```
These lines import necessary libraries and modules for the model evaluation process, including the Transformers library, the Datasets library, Torch, Pandas, and tqdm.

```python
class ModelEvaluation:
    def __init__(self, config: ModelEvaluationConfig):
        self.config = config
```
This code defines a class `ModelEvaluation` that represents the model

 evaluation process. It takes an instance of `ModelEvaluationConfig` as input and stores it as a configuration attribute.

```python
def generate_batch_sized_chunks(self, list_of_elements, batch_size):
    for i in range(0, len(list_of_elements), batch_size):
        yield list_of_elements[i: i + batch_size]
```
This method in the `ModelEvaluation` class splits a given list of elements into smaller batches of a specified size. It uses a generator to yield successive batch-sized chunks.

```python
def calculate_metric_on_test_ds(self, dataset, metric, model, tokenizer, batch_size=16, device="cuda" if torch.cuda.is_available() else "cpu", column_text="article", column_summary="highlights"):
    article_batches = list(self.generate_batch_sized_chunks(dataset[column_text], batch_size))
    target_batches = list(self.generate_batch_sized_chunks(dataset[column_summary], batch_size))
    # ... Rest of the code ...
```
This method in the `ModelEvaluation` class calculates the evaluation metric (ROUGE score) on the test dataset. It takes the dataset, metric, model, tokenizer, and other parameters as input. It splits the dataset into smaller batches using the `generate_batch_sized_chunks` method.

```python
def evaluate(self):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    tokenizer = AutoTokenizer.from_pretrained(self.config.tokenizer_path)
    model_pegasus = AutoModelForSeq2SeqLM.from_pretrained(self.config.model_path).to(device)
    dataset_samsum_pt = load_from_disk(self.config.data_path)
    rouge_names = ["rouge1", "rouge2", "rougeL", "rougeLsum"]
    rouge_metric = load_metric('rouge')
    score = self.calculate_metric_on_test_ds(dataset_samsum_pt['test'][0:10], rouge_metric, model_pegasus, tokenizer, batch_size=2, column_text='dialogue', column_summary='summary')
    rouge_dict = dict((rn, score[rn].mid.fmeasure) for rn in rouge_names)
    df = pd.DataFrame(rouge_dict, index=['pegasus'])
    df.to_csv(self.config.metric_file_name, index=False)
```
This method in the `ModelEvaluation` class performs the model evaluation process. It initializes the tokenizer and the Pegasus model, loads the test dataset, calculates the ROUGE score using the `calculate_metric_on_test_ds` method, and saves the results to a CSV file specified in the configuration.

```python
try:
    config = ConfigurationManager()
    model_evaluation_config = config.get_model_evaluation_config()
    model_evaluation_config = ModelEvaluation(config=model_evaluation_config)
    model_evaluation_config.evaluate()
except Exception as e:
    raise e
```
This code block creates an instance of the `ConfigurationManager`, retrieves the model evaluation configuration, creates an instance of `ModelEvaluation` with the retrieved configuration, and performs the model evaluation by calling the `evaluate` method. If any exception occurs during the process, it is raised.