<div style="text-align: right">INFO 7390 Advances Data Sciences and Architecture SEC 03 Spring 2024</div>
<div style="text-align: right">Crash Course in Generative AI</div>
<div style="text-align: right">Aditi A. Deodhar, NUID: 002279575</div>

<hide>

Welcome to the Generative AI Worked Example Notebook! 
This notebook serves as a comprehensive guide to understanding and implementing Generative Artificial Intelligence (AI) concepts through real-world examples.

In this notebook, we will cover an overview of [three Coursera course labs related to Generative AI](https://github.com/Ryota-Kawamura/Generative-AI-with-LLMs/tree/main) and then work/make changes of our own to the same dataset. Each lab presents unique scenarios and problem statements, offering opportunities to delve into different aspects of Generative AI, including text generation, fine-tuning large language models, reinforcement learning from human feedback, and more.

## Overview and Theoretical Underpinnings of the reference Coursera Labs

### **Lab 1 - Exploring Model Pre-training and Scaling in Generative AI**

**Overview:**
The first lab offers a deep dive into the intricacies of model pre-training and scaling within the realm of generative AI. It provides a structured framework for understanding the fundamental concepts and practical considerations involved in training large language models (LLMs) while addressing computational challenges and exploring strategies to optimize performance.

**Key Insights:**
- **Model Pre-training Dynamics:** 
The lab adeptly navigates through the significance of model pre-training, emphasizing its role in nurturing adaptable LLMs capable of comprehending and generating coherent text across a spectrum of tasks. By delving into the nuances between continued pre-training and fine-tuning, it highlights pivotal decision factors such as data specificity and computational resource management.
   - This section explores the significance of model pre-training, which involves training a large language model (LLM) on a vast corpus of text data to learn language patterns and semantics. Pre-training nurtures adaptable LLMs capable of understanding and generating coherent text across various tasks.
   - The distinction between continued pre-training and fine-tuning is discussed, emphasizing factors like data specificity and computational resource management.
- **Foundational Concepts:** 
Participants are exposed to essential concepts underpinning generative AI, including the transformer architecture, prompt engineering, and chain-of-thought prompting. Through a blend of theoretical discussions and practical applications, the lab elucidates how these concepts contribute to enhancing LLMs' reasoning, planning abilities, and overall performance.
   - Essential concepts such as the transformer architecture, prompt engineering, and chain-of-thought prompting are introduced. The transformer architecture forms the backbone of modern LLMs, while prompt engineering and chain-of-thought prompting techniques enhance the model's reasoning and planning abilities.
- **Addressing Computational Challenges:** 
A significant portion of the lab is dedicated to dissecting computational hurdles encountered during model pre-training and proposing effective strategies to mitigate them. From memory optimization techniques to scaling laws, participants gain valuable insights into optimizing training efficiency and resource utilization, thereby fostering the development of robust LLMs.
  - This section focuses on dissecting computational hurdles encountered during model pre-training and strategies to mitigate them. Techniques such as memory optimization and scaling laws are explored to optimize training efficiency and resource utilization, ensuring the development of robust LLMs.

### **Lab 2 - Fine-tuning and Evaluating Large Language Models**

**Overview:**
Lab 2 offers a comprehensive exploration of fine-tuning methodologies and techniques aimed at optimizing the performance of large language models (LLMs) while addressing challenges such as catastrophic forgetting and computational efficiency. Through a blend of theoretical discussions and practical insights, participants gain a deeper understanding of fine-tuning strategies, parameter-efficient methodologies, and the role of task-specific instructions in enhancing LLM performance.

**Key Insights:**
1. **Fine-tuning with Prompt Datasets:**
   - Participants delve into the intricacies of fine-tuning LLMs with task-specific instructions using prompt datasets. They learn how tailored prompts guide the model in generating contextually relevant outputs for diverse applications, leading to improved task performance and enhanced adaptability across domains.

2. **Addressing Catastrophic Forgetting:**
   - The lab elucidates the phenomenon of catastrophic forgetting and explores techniques to overcome it. Participants discover rehearsal methods, regularization techniques, and dual memory networks aimed at preserving previous knowledge while adapting to new tasks, ensuring continual learning without compromising past learning.
    - Catastrophic forgetting refers to the phenomenon where a model forgets previously learned information when trained on new tasks. Techniques such as rehearsal methods, regularization, and dual memory networks are explored to preserve previous knowledge while adapting to new tasks.

3. **Parameter-efficient Fine Tuning (PEFT):**
   - PEFT emerges as a pivotal methodology for fine-tuning LLMs with a focus on computational efficiency. Participants uncover how PEFT minimizes parameter updates, optimizes memory usage, and enables rapid model deployment, making it invaluable in resource-constrained environments and large-scale applications.

4. **Enhancing LLM Performance:**
   - Fine-tuning with prompt datasets is highlighted as a transformative approach to increasing LLM performance on various tasks. Participants understand how explicit task-specific instructions refine contextual understanding, reduce ambiguity, and improve model adaptability, ultimately contributing to enhanced generalization capabilities.

### **Lab 3 - Reinforcement Learning and LLM-powered Applications**

**Overview:**
Lab 3 delves into the realm of reinforcement learning (RL) and its application in improving the performance and alignment of large language models (LLMs) through reinforcement learning from human feedback (RLHF). Additionally, the lab explores chain-of-thought prompting techniques to enhance LLMs' reasoning and planning abilities and discusses challenges associated with knowledge cut-offs in LLMs, along with techniques to overcome them.

**Key Insights:**
1. **Reinforcement Learning from Human Feedback (RLHF):**
   - Participants gain insights into RLHF, a methodology that utilizes human-generated feedback to refine LLM behavior. The lab emphasizes the iterative process of incorporating human signals to enhance model performance and alignment with human intent, fostering adaptability to dynamic preferences and ethical considerations.
   - RLHF utilizes human-generated feedback to refine LLM behavior, enhancing model performance and alignment with human intent. The iterative process of incorporating human signals fosters adaptability to dynamic preferences and ethical considerations.

2. **Training a Reward Model for RLHF:**
   - The lab elucidates the process of training a reward model for RLHF using data gathered from human labelers. Participants learn to collect human feedback, construct reward signals, and iteratively train the model based on refined guidance, while adhering to ethical considerations and ensuring transparency in the training process.

3. **Chain-of-Thought Prompting:**
   - Chain-of-thought prompting emerges as a technique to guide LLMs' reasoning and planning abilities through structured sequences of prompts. Participants explore scenarios where this approach facilitates problem-solving, narrative generation, decision-making, and essay construction, promoting contextual continuity and adaptability across diverse tasks.

4. **Overcoming Knowledge Cut-offs:**
   - Challenges associated with knowledge cut-offs in LLMs are addressed, along with techniques to mitigate them. Participants learn about continuous training, external knowledge integration, semantic augmentation, and active context integration, empowering LLMs to stay informed, access domain-specific knowledge, and adapt to evolving information.

## Now, analyzing and making changes of my own...

### Overview

We delve into the capabilities of Large Language Models (LLMs) with a focus on leveraging Parameter Efficient Fine-Tuning (PEFT) to generate dialogue summaries with reduced toxicity. Our approach involves utilizing the FLAN-T5 model in conjunction with Meta AI's hate speech reward model to achieve our primary objective of improving the quality of dialogue summaries while minimizing toxicity.

### Objectives

- Train a Large Language Model (LLM) to generate dialogue summaries with reduced toxicity.
  
### The DialogSum Dataset

We utilize the [DialogSum Dataset](https://huggingface.co/datasets/knkarthick/dialogsum), a large-scale dialogue summarization dataset comprising 13,460 dialogues along with manually labeled summaries and topics. Additionally, 100 holdout data points are reserved for topic generation.

### Project Workflow

1. **Setup**: Import necessary libraries and define project parameters. 
2. **Dataset Exploration**: Explore the DialogSum Dataset to understand its structure and contents. 
3. **Test Model Zero Shot Inferencing**: Initially, assess the performance of the FLAN-T5 model for zero-shot inferencing on dialogue summarization tasks to establish a baseline. 
4. **Dataset Preprocessing**: Preprocess the dialogues and their corresponding summaries from the dataset to prepare for training. 
5. **Perform Parameter Efficient Fine-Tuning (PEFT)**: Implement PEFT, a more efficient fine-tuning approach, to significantly reduce training time while maintaining performance.  
6. **Evaluation**:
- Conduct human evaluation to assess the model's output in terms of readability and coherence, possibly involving annotators ranking generated summaries for quality.
- Utilize ROUGE metrics to measure the quality of the generated summaries by comparing them to human-written references.

In [1]:
# Installing the necessary libraries

%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.11.0 \
    evaluate==0.4.0 \
    rouge_score==0.1.2 \
    peft==0.3.0 --quiet

# Installing the Reinforcement Learning library directly from github.
%pip install git+https://github.com/lvwerra/trl.git@25fa1bd    


Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement torch==1.13.1 (from versions: 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2)
ERROR: No matching distribution found for torch==1.13.1


Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sentence-transformers 2.6.0 requires transformers<5.0.0,>=4.32.0, but you have transformers 4.27.2 which is incompatible.


Collecting git+https://github.com/lvwerra/trl.git@25fa1bd
  Cloning https://github.com/lvwerra/trl.git (to revision 25fa1bd) to c:\users\aditi\appdata\local\temp\pip-req-build-5vaxnb_0
  Resolved https://github.com/lvwerra/trl.git to commit 25fa1bd
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Note: you may need to restart the kernel to use updated packages.


  Running command git clone --filter=blob:none --quiet https://github.com/lvwerra/trl.git 'C:\Users\aditi\AppData\Local\Temp\pip-req-build-5vaxnb_0'
  Running command git checkout -q 25fa1bd


In [1]:
# Importing required packages

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification, AutoModelForSeq2SeqLM, GenerationConfig
from datasets import load_dataset
from peft import PeftModel, PeftConfig, LoraConfig, TaskType

# trl: Transformer Reinforcement Learning library
from trl import PPOTrainer, PPOConfig, AutoModelForSeq2SeqLMWithValueHead
from trl import create_reference_model
from trl.core import LengthSampler

import torch
import evaluate

import numpy as np
import pandas as pd

# tqdm library makes the loops show a smart progress meter.
from tqdm import tqdm
tqdm.pandas()

In [2]:
# Loading dataset and the model to be used

model_name = "google/flan-t5-base"

huggingface_dataset_name = "knkarthick/dialogsum"

dataset_original = load_dataset(huggingface_dataset_name)

In [3]:
# Methods

def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"\ntrainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

### Tokenise the Data

The next step involves dataset preprocessing. We'll select a subset of the data, filter dialogues to a specific length to ensure readability while maintaining meaningful content, and then integrate each dialogue with an instruction before tokenizing the prompts. The resulting token IDs will be stored in the input_ids field, while the decoded prompts will be saved in the query field.

To streamline this process, it's advisable to create a function called build_dataset. This function can be defined as follows:

In [4]:
# Tokenise the dataset 

def build_dataset(model_name,
                  dataset_name,
                  input_min_text_length, 
                  input_max_text_length):

    """
    Preprocess the dataset and split it into train and test parts.

    Parameters:
    - model_name (str): Tokenizer model name.
    - dataset_name (str): Name of the dataset to load.
    - input_min_text_length (int): Minimum length of the dialogues.
    - input_max_text_length (int): Maximum length of the dialogues.
        
    Returns:
    - dataset_splits (datasets.dataset_dict.DatasetDict): Preprocessed dataset containing train and test parts.
    """
    
    # load dataset (only "train" part will be enough for this lab).
    dataset = load_dataset(dataset_name, split="train")
    
    # Filter the dialogues of length between input_min_text_length and input_max_text_length characters.
    dataset = dataset.filter(lambda x: len(x["dialogue"]) > input_min_text_length and len(x["dialogue"]) <= input_max_text_length, batched=False)

    # Prepare tokenizer. Setting device_map="auto" allows to switch between GPU and CPU automatically.
    tokenizer = AutoTokenizer.from_pretrained(model_name, device_map="auto")
    
    def tokenize(sample):
        
        # Wrap each dialogue with the instruction.
        prompt = f"""
Summarize the following conversation.

{sample["dialogue"]}

Summary:
"""
        sample["input_ids"] = tokenizer.encode(prompt)
        
        # This must be called "query", which is a requirement of our PPO library.
        sample["query"] = tokenizer.decode(sample["input_ids"])
        return sample

    # Tokenize each dialogue.
    dataset = dataset.map(tokenize, batched=False)
    dataset.set_format(type="torch")
    
    # Split the dataset into train and test parts.
    dataset_splits = dataset.train_test_split(test_size=0.2, shuffle=False, seed=42)

    return dataset_splits

dataset = build_dataset(model_name=model_name,
                        dataset_name=huggingface_dataset_name,
                        input_min_text_length=200, 
                        input_max_text_length=1000)

print(dataset)

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic', 'input_ids', 'query'],
        num_rows: 8017
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic', 'input_ids', 'query'],
        num_rows: 2005
    })
})


### Enhancing FLAN-T5 Model Fine-Tuned with Summarization Adapter

Next, 
We are enhancing the original FLAN-T5 model by adding a summarization adapter. This adapter is designed to improve the model's performance in summarization tasks.

We begin by configuring the adapter using the following parameters:
- `r`: Rank, which is set to 32.
- `lora_alpha`: LORA alpha value, set to 32.
- `target_modules`: We specify the target modules as ["q", "v"].
- `lora_dropout`: Dropout rate for LORA, set to 0.05.
- `bias`: We use "none" as the bias configuration.
- `task_type`: The task type is set to SEQ_2_SEQ_LM, which is suitable for FLAN-T5.

Next, we load the pre-trained FLAN-T5 model and create an instance of the AutoModelForSeq2SeqLM with the specified model name and data type (torch_dtype).

We also create a PeftModel by incorporating the previously loaded model. 
Additionally, we provide the LORA configuration, torch data type, device mapping, and specify that the model is trainable.

In [5]:
lora_config = LoraConfig(
    r=32, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

model = AutoModelForSeq2SeqLM.from_pretrained(model_name, 
                                              torch_dtype=torch.bfloat16)

peft_model = PeftModel.from_pretrained(model, 
                                       'z7ye/peft-dialogue-summary-checkpoint', 
                                       lora_config=lora_config,
                                       torch_dtype=torch.bfloat16, 
                                       device_map="auto",                                       
                                       is_trainable=True)

print(f'PEFT model parameters to be updated:\n{print_number_of_trainable_model_parameters(peft_model)}\n')


PEFT model parameters to be updated:

trainable model parameters: 3538944
all model parameters: 251116800
percentage of trainable model parameters: 1.41%



###  Enhancing LLM Summarization with Reinforcement Learning with PPO

Now, we are in the process of preparing for fine-tuning the Language Model (LLM) using Reinforcement Learning (RL). Although a more detailed explanation of RL, our current focus is on setting up the Proximal Policy Optimization (PPO) model. 

This PPO model will receive the instruction-fine-tuned PEFT model as input and will be utilized to optimize the RL policy in accordance with the reward model.

In [6]:
ppo_model = AutoModelForSeq2SeqLMWithValueHead.from_pretrained(peft_model,                                                               
                                                               torch_dtype=torch.bfloat16,
                                                               is_trainable=True)

print(f'PPO model parameters to be updated (ValueHead + 769 params):\n{print_number_of_trainable_model_parameters(ppo_model)}\n')
print(ppo_model.v_head)

PPO model parameters to be updated (ValueHead + 769 params):

trainable model parameters: 3539713
all model parameters: 251117569
percentage of trainable model parameters: 1.41%

ValueHead(
  (dropout): Dropout(p=0.1, inplace=False)
  (summary): Linear(in_features=768, out_features=1, bias=True)
  (flatten): Flatten(start_dim=1, end_dim=-1)
)


During the Proximal Policy Optimization (PPO) process, only a subset of parameters will be updated, specifically those associated with the `ValueHead`. You can find more detailed information about this class of models in the [documentation](https://huggingface.co/docs/trl/main/en/models#trl.create_reference_model). The number of trainable parameters in the `ValueHead` can be computed as $(n+1) \cdot m$, where $n$ represents the number of input units (in this case, $n=768$) and $m$ represents the number of output units (which is $m=1$ in this context). The additional $+1$ term in the equation accounts for the bias term.

Now, let's create a frozen copy of the PPO model, which will serve as a reference model. This reference model will represent the Language Model (LLM) before detoxification. Importantly, none of the parameters of the reference model will be updated during PPO training. This is by design.

In [7]:
ref_model = create_reference_model(ppo_model)

print(f'Reference model parameters to be updated:\n{print_number_of_trainable_model_parameters(ref_model)}\n')

Reference model parameters to be updated:

trainable model parameters: 0
all model parameters: 251117569
percentage of trainable model parameters: 0.00%



### Building a Reward Model for Reinforcement Learning


**Reinforcement Learning (RL)** stands as a pivotal branch of machine learning wherein agents make decisions within an environment to maximize their cumulative rewards. The behavior of these agents is governed by a decision-making **policy**, and the fundamental objective of RL is for the agent to acquire an optimal or near-optimal policy that maximizes the **reward function**.

Previously, the original policy was rooted in the instruct PEFT model – essentially, the Language Model (LLM) before undergoing detoxification. While one approach involved soliciting human labelers to provide feedback on the toxicity of the model's outputs, this process can become prohibitively costly when applied throughout the entire fine-tuning phase. A pragmatic solution to circumvent this expense is to implement a reward model that encourages the agent to produce detoxified dialogue summaries.

A sensible approach here is to perform **sentiment analysis** on the model's outputs, classifying them into two categories: `nothate` and `hate`. Higher rewards are assigned when the likelihood of classifying an output as `nothate` is greater.

In this context, we will employ [Meta AI's RoBERTa-based hate speech model](https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target) as our reward model. This model generates **logits** and subsequently predicts probabilities for two classes: `nothate` and `hate`. Positive rewards are derived from the logits associated with the `nothate` class. The model will undergo further fine-tuning using Proximal Policy Optimization (PPO) with these reward values.

In [8]:
toxicity_model_name = "facebook/roberta-hate-speech-dynabench-r4-target"
toxicity_tokenizer = AutoTokenizer.from_pretrained(toxicity_model_name, device_map="auto")
toxicity_model = AutoModelForSequenceClassification.from_pretrained(toxicity_model_name, device_map="auto")
print(toxicity_model.config.id2label)

{0: 'nothate', 1: 'hate'}


Take some non-toxic text, tokenize it, and pass it to the model. Print the output logits, probabilities, and the corresponding reward that will be used for fine-tuning.

In [9]:
non_toxic_text = "#Person 1# tells Tommy that he didn't like the movie."

toxicity_input_ids = toxicity_tokenizer(non_toxic_text, return_tensors="pt").input_ids

logits = toxicity_model(input_ids=toxicity_input_ids).logits
print(f'logits [not hate, hate]: {logits.tolist()[0]}')

# Print the probabilities for [not hate, hate]
probabilities = logits.softmax(dim=-1).tolist()[0]
print(f'probabilities [not hate, hate]: {probabilities}')

# get the logits for "not hate" - this is the reward!
not_hate_index = 0
nothate_reward = (logits[:, not_hate_index]).tolist()
print(f'reward (high): {nothate_reward}')

logits [not hate, hate]: [3.1140999794006348, -2.489616870880127]
probabilities [not hate, hate]: [0.9963293671607971, 0.003670621896162629]
reward (high): [3.1140999794006348]


Let's show a toxic comment.  This will have a low reward because it is more toxic.

In [10]:
toxic_text = "#Person 1# tells Tommy that the movie was terrible, dumb and stupid."

toxicity_input_ids = toxicity_tokenizer(toxic_text, return_tensors="pt").input_ids

logits = toxicity_model(toxicity_input_ids).logits
print(f'logits [not hate, hate]: {logits.tolist()[0]}')

# Print the probabilities for [not hate, hate]
probabilities = logits.softmax(dim=-1).tolist()[0]
print(f'probabilities [not hate, hate]: {probabilities}')

# Get the logits for "not hate" - this is the reward!
nothate_reward = (logits[:, not_hate_index]).tolist() 
print(f'reward (low): {nothate_reward}')

logits [not hate, hate]: [-0.6921142935752869, 0.37226906418800354]
probabilities [not hate, hate]: [0.2564726769924164, 0.743527352809906]
reward (low): [-0.6921142935752869]


### Setup toxicity reward pipeline

Setup Hugging Face inference pipeline to simplify the code for the toxicity reward model:

In [11]:
import keras

device = 0 if torch.cuda.is_available() else "cpu"

sentiment_pipe = pipeline("sentiment-analysis", 
                          model=toxicity_model_name, 
                          device=device)
reward_logits_kwargs = {
    "top_k": None, # Return all scores.
    "function_to_apply": "none", # Set to "none" to retrieve raw logits.
    "batch_size": 16
}

reward_probabilities_kwargs = {
    "top_k": None, # Return all scores.
    "function_to_apply": "softmax", # Set to "softmax" to apply softmax and retrieve probabilities.
    "batch_size": 16
}

print("Reward model output:")
print("For non-toxic text")
print(sentiment_pipe(non_toxic_text, **reward_logits_kwargs))
print(sentiment_pipe(non_toxic_text, **reward_probabilities_kwargs))
print("For toxic text")
print(sentiment_pipe(toxic_text, **reward_logits_kwargs))
print(sentiment_pipe(toxic_text, **reward_probabilities_kwargs))


Reward model output:
For non-toxic text
[{'label': 'nothate', 'score': 3.1140999794006348}, {'label': 'hate', 'score': -2.489616870880127}]
[{'label': 'nothate', 'score': 0.9963293671607971}, {'label': 'hate', 'score': 0.0036706216633319855}]
For toxic text
[{'label': 'hate', 'score': 0.37226906418800354}, {'label': 'nothate', 'score': -0.6921142935752869}]
[{'label': 'hate', 'score': 0.7435272932052612}, {'label': 'nothate', 'score': 0.2564726769924164}]


In [12]:
print(sentiment_pipe(non_toxic_text, **reward_logits_kwargs))
print(sentiment_pipe(non_toxic_text, **reward_probabilities_kwargs))

[{'label': 'nothate', 'score': 3.1140999794006348}, {'label': 'hate', 'score': -2.489616870880127}]
[{'label': 'nothate', 'score': 0.9963293671607971}, {'label': 'hate', 'score': 0.0036706216633319855}]


In [13]:
print(sentiment_pipe(toxic_text, **reward_logits_kwargs))
print(sentiment_pipe(toxic_text, **reward_probabilities_kwargs))

[{'label': 'hate', 'score': 0.37226906418800354}, {'label': 'nothate', 'score': -0.6921142935752869}]
[{'label': 'hate', 'score': 0.7435272932052612}, {'label': 'nothate', 'score': 0.2564726769924164}]


### Evaluate Toxicity

To assess the model's performance both before and after the fine-tuning and detoxification processes, it is essential to establish the toxicity evaluation metric. The toxicity score is represented as a decimal value ranging from 0 to 1, where 1 signifies the highest degree of toxicity.

In [14]:
toxicity_evaluator = evaluate.load("toxicity", 
                                    toxicity_model_name,
                                    module_type="measurement",
                                    toxic_label="hate")

Try to calculate toxicity for the same sentences as earlier, it's no surprise that the toxicity scores are the probabilities of `hate` class returned directly from the reward model.

In [15]:
toxicity_score = toxicity_evaluator.compute(predictions=[
    non_toxic_text
])

print("Toxicity score for non-toxic text:")
print(toxicity_score["toxicity"])

toxicity_score = toxicity_evaluator.compute(predictions=[
    toxic_text
])

print("\nToxicity score for toxic text:")
print(toxicity_score["toxicity"])

Toxicity score for non-toxic text:
[0.0036706216633319855]

Toxicity score for toxic text:
[0.7435272932052612]


This evaluator can be effectively employed to calculate the toxicity levels of the dialogues. 

To accomplish this, you will need to provide several essential components, including the test dataset (`dataset["test"]`), the tokenizer used in the aforementioned section, the previously frozen PEFT model, and the toxicity evaluator itself. For a streamlined and organized approach, it is recommended to encapsulate these necessary procedures within a dedicated function named `evaluate_toxicity`.

In [16]:
def evaluate_toxicity(model, 
                      toxicity_evaluator, 
                      tokenizer, 
                      dataset, 
                      num_samples):
    
    """
    Preprocess the dataset and split it into train and test parts.

    Parameters:
    - model (trl model): Model to be evaluated.
    - toxicity_evaluator (evaluate_modules toxicity metrics): Toxicity evaluator.
    - tokenizer (transformers tokenizer): Tokenizer to be used.
    - dataset (dataset): Input dataset for the evaluation.
    - num_samples (int): Maximum number of samples for the evaluation.
        
    Returns:
    tuple: A tuple containing two numpy.float64 values:
    - mean (numpy.float64): Mean of the samples toxicity.
    - std (numpy.float64): Standard deviation of the samples toxicity.
    """

    max_new_tokens=100

    toxicities = []
    input_texts = []
    for i, sample in tqdm(enumerate(dataset)):
        input_text = sample["query"]

        if i > num_samples:
            break
            
        input_ids = tokenizer(input_text, return_tensors="pt", padding=True).input_ids
        
        generation_config = GenerationConfig(max_new_tokens=max_new_tokens,
                                             top_k=0.0,
                                             top_p=1.0,
                                             do_sample=True)

        response_token_ids = model.generate(input_ids=input_ids,
                                            generation_config=generation_config)
        
        generated_text = tokenizer.decode(response_token_ids[0], skip_special_tokens=True)
        
        toxicity_score = toxicity_evaluator.compute(predictions=[(input_text + " " + generated_text)])

        toxicities.extend(toxicity_score["toxicity"])

    # Compute mean & std using np.
    mean = np.mean(toxicities)
    std = np.std(toxicities)
        
    return mean, std

And now perform the calculation of the model toxicity before fine-tuning/detoxification:

In [17]:
tokenizer = AutoTokenizer.from_pretrained(model_name, device_map="auto")

mean_before_detoxification, std_before_detoxification = evaluate_toxicity(model=ref_model, 
                                                                          toxicity_evaluator=toxicity_evaluator, 
                                                                          tokenizer=tokenizer, 
                                                                          dataset=dataset["test"], 
                                                                          num_samples=10)

print(f'toxicity [mean, std] before detox: [{mean_before_detoxification}, {std_before_detoxification}]')

11it [04:40, 25.53s/it]

toxicity [mean, std] before detox: [0.029261352811855348, 0.03265672114305328]





### Perform Fine-Tuning to Detoxify the Summaries

Optimize a RL policy against the reward model using Proximal Policy Optimization (PPO).

## Initialize PPOTrainer

For the `PPOTrainer` initialization, you will need a collator. Here it will be a function transforming the dictionaries in a particular way. You can define and test it:

In [18]:
def collator(data):
    return dict((key, [d[key] for d in data]) for key in data[0])

test_data = [{"key1": "value1", "key2": "value2", "key3": "value3"}]
print(f'Collator input: {test_data}')
print(f'Collator output: {collator(test_data)}')

Collator input: [{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}]
Collator output: {'key1': ['value1'], 'key2': ['value2'], 'key3': ['value3']}


Configure the essential parameters. Load the `ppo_model` and the corresponding tokenizer. 

Additionally, load a static version of the model, referred to as `ref_model`. 

The purpose of having two models is twofold: the first model, `ppo_model`, undergoes optimization, while the second model, `ref_model`, functions as a reference point to compute the KL-divergence from the initial state. 

This serves as an additional reward signal in the PPO training process, ensuring that the optimized model does not stray too far from the original Language Model (LLM).

In [19]:
learning_rate=1.41e-5
max_ppo_epochs=1
mini_batch_size=4
batch_size=16

config = PPOConfig(
    model_name=model_name,    
    learning_rate=learning_rate,
    ppo_epochs=max_ppo_epochs,
    mini_batch_size=mini_batch_size,
    batch_size=batch_size
)

ppo_trainer = PPOTrainer(config=config, 
                         model=ppo_model, 
                         ref_model=ref_model, 
                         tokenizer=tokenizer, 
                         dataset=dataset["train"], 
                         data_collator=collator)

The fine-tuning loop comprises the following key steps:

1. Retrieve query responses from the policy Language Model (PEFT model).
2. Determine the sentiments associated with the queries and responses using the hate speech RoBERTa model.
3. Optimize the policy using Proximal Policy Optimization (PPO) with the triplet of inputs, which includes the query, response, and the associated reward.

You can confirm that the operation is successfully running by monitoring the following metrics:

- `objective/kl`: Minimization of the Kullback-Leibler (KL) divergence.
- `ppo/returns/mean`: Maximization of the mean returns.
- `ppo/policy/advantages_mean`: Maximization of the mean advantages.

These metrics serve as indicators of the training process's progress and the achievement of specific objectives within the fine-tuning loop.

In [20]:
output_min_length = 100
output_max_length = 400
output_length_sampler = LengthSampler(output_min_length, output_max_length)

generation_kwargs = {
    "min_length": 5,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True
}

reward_kwargs = {
    "top_k": None, # Return all scores.
    "function_to_apply": "none", # You want the raw logits without softmax.
    "batch_size": 16
}

max_ppo_steps = 10

for step, batch in tqdm(enumerate(ppo_trainer.dataloader)):
    # Break when you reach max_steps.
    if step >= max_ppo_steps:
        break   

    prompt_tensors = batch["input_ids"]

    # Get response from FLAN-T5/PEFT LLM.
    summary_tensors = []

    for prompt_tensor in prompt_tensors:
        max_new_tokens = output_length_sampler()        
            
        generation_kwargs["max_new_tokens"] = max_new_tokens
        summary = ppo_trainer.generate(prompt_tensor, **generation_kwargs)
        
        summary_tensors.append(summary.squeeze()[-max_new_tokens:])
        
    # This needs to be called "response".
    batch["response"] = [tokenizer.decode(r.squeeze()) for r in summary_tensors]

    # Compute reward outputs.
    query_response_pairs = [q + r for q, r in zip(batch["query"], batch["response"])]    
    rewards = sentiment_pipe(query_response_pairs, **reward_kwargs)

    # You use the `nothate` item because this is the score for the positive `nothate` class.
    reward_tensors = [torch.tensor(reward[not_hate_index]["score"]) for reward in rewards]    

    # Run PPO step.
    stats = ppo_trainer.step(prompt_tensors, summary_tensors, reward_tensors)
    ppo_trainer.log_stats(stats, batch, reward_tensors)
    
    print(f'objective/kl: {stats["objective/kl"]}')
    print(f'ppo/returns/mean: {stats["ppo/returns/mean"]}')
    print(f'ppo/policy/advantages_mean: {stats["ppo/policy/advantages_mean"]}')
    print('-'.join('' for x in range(100)))

1it [4:42:01, 16921.09s/it]

objective/kl: 30.363040924072266
ppo/returns/mean: -0.48076772689819336
ppo/policy/advantages_mean: -0.004063474014401436
---------------------------------------------------------------------------------------------------


2it [5:36:24, 8886.96s/it] 

objective/kl: 26.428688049316406
ppo/returns/mean: -0.2867184281349182
ppo/policy/advantages_mean: 0.02232614904642105
---------------------------------------------------------------------------------------------------


3it [6:24:05, 6135.44s/it]

objective/kl: 25.135053634643555
ppo/returns/mean: -0.11719291657209396
ppo/policy/advantages_mean: 0.045380767434835434
---------------------------------------------------------------------------------------------------


4it [8:01:04, 6010.78s/it]

objective/kl: 21.118507385253906
ppo/returns/mean: 0.0019967034459114075
ppo/policy/advantages_mean: 0.034106601029634476
---------------------------------------------------------------------------------------------------


5it [8:51:45, 4939.64s/it]

objective/kl: 30.728361129760742
ppo/returns/mean: -0.32960039377212524
ppo/policy/advantages_mean: 0.006062053143978119
---------------------------------------------------------------------------------------------------


6it [9:46:47, 4383.02s/it]

objective/kl: 25.728355407714844
ppo/returns/mean: -0.2059704065322876
ppo/policy/advantages_mean: 0.02402125485241413
---------------------------------------------------------------------------------------------------


7it [11:57:46, 5519.18s/it]

objective/kl: 28.03209114074707
ppo/returns/mean: -0.25091832876205444
ppo/policy/advantages_mean: 0.01567326858639717
---------------------------------------------------------------------------------------------------


8it [12:57:27, 4902.13s/it]

objective/kl: 26.06850814819336
ppo/returns/mean: -0.22763723134994507
ppo/policy/advantages_mean: 0.005941160023212433
---------------------------------------------------------------------------------------------------


9it [14:20:43, 4931.70s/it]

objective/kl: 24.170854568481445
ppo/returns/mean: -0.16248102486133575
ppo/policy/advantages_mean: 0.03705313801765442
---------------------------------------------------------------------------------------------------


10it [15:20:38, 5523.85s/it]

objective/kl: 24.540199279785156
ppo/returns/mean: -0.111860491335392
ppo/policy/advantages_mean: 0.0754142478108406
---------------------------------------------------------------------------------------------------





### Evaluate the Model Quantitatively


Retrieve the PPO/PEFT model from the saved disk checkpoint and employ the test dataset split to assess the toxicity score of the RL-fine-tuned model.

In [21]:
mean_after_detoxification, std_after_detoxification = evaluate_toxicity(model=ppo_model, 
                                                                        toxicity_evaluator=toxicity_evaluator, 
                                                                        tokenizer=tokenizer, 
                                                                        dataset=dataset["test"], 
                                                                        num_samples=10)
print(f'toxicity [mean, std] after detox: [{mean_after_detoxification}, {std_after_detoxification}]')

11it [04:00, 21.87s/it]

toxicity [mean, std] after detox: [0.03229118371382356, 0.02902330218285877]





And compare the toxicity scores of the reference model (before detoxification) and fine-tuned model (after detoxification).

In [22]:
mean_improvement = (mean_before_detoxification - mean_after_detoxification) / mean_before_detoxification
std_improvement = (std_before_detoxification - std_after_detoxification) / std_before_detoxification

print(f'Percentage improvement of toxicity score after detoxification:')
print(f'mean: {mean_improvement*100:.2f}%')
print(f'std: {std_improvement*100:.2f}%')

Percentage improvement of toxicity score after detoxification:
mean: -10.35%
std: 11.13%


Explore sample examples from the test dataset, allowing for a comparison between the initial `ref_model` and the fine-tuned/detoxified `ppo_model` using the toxicity evaluator.

In [23]:
batch_size = 20
compare_results = {}

df_batch = dataset["test"][0:batch_size]

compare_results["query"] = df_batch["query"]
prompt_tensors = df_batch["input_ids"]

summary_tensors_ref = []
summary_tensors = []

# Get response from ppo and base model.
for i in tqdm(range(batch_size)):
    gen_len = output_length_sampler()
    generation_kwargs["max_new_tokens"] = gen_len
    
    summary = ref_model.generate(
        input_ids=torch.as_tensor(prompt_tensors[i]).unsqueeze(dim=0).to(device), 
        **generation_kwargs
    ).squeeze()[-gen_len:]
    summary_tensors_ref.append(summary)

    summary = ppo_model.generate(
        input_ids=torch.as_tensor(prompt_tensors[i]).unsqueeze(dim=0).to(device), 
        **generation_kwargs
    ).squeeze()[-gen_len:]
    summary_tensors.append(summary)

# Decode responses.
compare_results["response_before"] = [tokenizer.decode(summary_tensors_ref[i]) for i in range(batch_size)]
compare_results["response_after"] = [tokenizer.decode(summary_tensors[i]) for i in range(batch_size)]

# Sentiment analysis of query/response pairs before/after.
texts_before = [d + s for d, s in zip(compare_results["query"], compare_results["response_before"])]
rewards_before = sentiment_pipe(texts_before, **reward_kwargs)
compare_results["reward_before"] = [reward[not_hate_index]["score"] for reward in rewards_before]

texts_after = [d + s for d, s in zip(compare_results["query"], compare_results["response_after"])]
rewards_after = sentiment_pipe(texts_after, **reward_kwargs)
compare_results["reward_after"] = [reward[not_hate_index]["score"] for reward in rewards_after]

100%|██████████| 20/20 [15:35<00:00, 46.77s/it]


## Results

Store and review the results in a DataFrame

In [24]:
pd.set_option('display.max_colwidth', 500)
df_compare_results = pd.DataFrame(compare_results)
df_compare_results["reward_diff"] = df_compare_results['reward_after'] - df_compare_results['reward_before']
df_compare_results_sorted = df_compare_results.sort_values(by=['reward_diff'], ascending=False).reset_index(drop=True)
df_compare_results_sorted

Unnamed: 0,query,response_before,response_after,reward_before,reward_after,reward_diff
0,"Summarize the following conversation. #Person1#: Judy, what is everybody talking about? #Person2#: Haven't you heard? Richard was fired by our manager. #Person1#: You're kidding. It can't be true. #Person2#: Believe it or not. Everybody is talking about it in the company. #Person1#: Really? I'm surprised. #Person2#: Me too. Summary: </s>",<pad> Judy and Judy think everyone is talking about Richard's termination of Richard by their manager. Judy thinks he's strange.</s>,<pad> Judy and Shudd 1# hear the story of how Richard was fired.</s>,1.254621,1.887931,0.63331
1,"Summarize the following conversation. #Person1#: I'd like to have this cashed, please. #Person2#: Please put you name and address here. May I see your passport? #Person1#: Yes. #Person2#: How would you like it? #Person1#: Ten hundreds and ten twenties, and the rest in small change, please. #Person2#: OK. Here you are. Summary: </s>",<pad> #Person1# wants to have the currency stamped. #Person2# gives #Person1# ten different kinds of money with change.</s>,"<pad> #Person1#'ll buy ten twenties with $160 in changing at Hanna haven, to be cashed.</s>",1.904012,2.313659,0.409647
2,"Summarize the following conversation. #Person1#: Excuse me, could you tell me how to get to the Cross Bakery building? #Person2#: The Cross Bakery building? Oh sure. You're actually walking in the opposite direction. #Person1#: Oh, you're kidding! I thought I was heading east. #Person2#: No, east is the other direction. To get to the Bakery, you need to turn around and go three blocks to Broadway. When you get to the intersection of Broadway and Elm, you hang a left. Go straight down that st...","<pad> #Person2# is walking in the opposite direction and tells #Person1# there are 3 blocks to Broadway. #Person1# wants to see the Bakery on #Person1#'s left hand side, but #Person2# doesn't want to show #Person1# the way.</s>",<pad> #Person2# tells #Person1# the way to the Cross Bakery building in the opposite direction. #Person1# takes a turn while #Person1# Companies Cross Bakery building for the next few years. #Person1# doesn't know about this place. #Person2# offers #Person1# the route and #Person2# changes it in October.</s>,2.641189,2.988603,0.347414
3,"Summarize the following conversation. #Person1#: Hello. I want to reconfirm our flight to London. #Person2#: Yes, sir. Did you call the airline? #Person1#: Yes, I did. But I couldn't communicate with them in English. They speak only Spanish. So I need your help. #Person2#: Certainly, sir. What is the flight number and when are you leaving? #Person1#: We are taking IB 385 to London tomorrow at 1 p. m. #Person2#: Oh, I see, sir. We have the airline office inside the hotel. They have an English...",<pad> #Person1# asks #Person2# to reconfirm their flight to London tomorrow. #Person2# gives #335 to #Person1# who can communicate in English.</s>,<pad> #Person1# goes to the airport and tells erwartet to books #Person3# as #Person3# United express the flightns and the cuid they are doing a flight.</s>,1.795023,2.037829,0.242806
4,"Summarize the following conversation. #Person1#: Let's take a coffee break, shall we? #Person2#: I wish I could, but I can't. #Person1#: What keeps you so busy? You've been sitting there for hours. You've got to walk around. You just can't stay on the computer forever. #Person2#: Well, I am up to my neck in work. I've got to finish this report. Sarah needs it by noon. I don't want to be scolded if I can't finish my work by the deadline. #Person1#: I understand that, but you'd feel better if ...",<pad> #Person2# can't walk along the computer for hours so she decides to take a coffee break. #Person1# agrees because #Person2# is up to #Person2#'s neck and wants to finish his report to Sarah to finish by noon.</s>,<pad> #Person2# wishes to retake a coffee break even for some time but #Person2# can't leave him. They make up there's agreung during #Person3#'s working breaks.</s>,1.653246,1.892593,0.239347
5,"Summarize the following conversation. #Person1#: Today more and more families have personal computers. People have wider range of choice to communicate with the outside world. #Person2#: Right. With the establishment of Internet and a lot of web companies, people are getting more and more dependent on the web. #Person1#: One of the common uses of PC is that people can buy goods through it without going out to the physical stores. #Person2#: Can you tell me how it is done? #Person1#: If a cus...",<pad> #Person1# tells #Person2# about the widespread use of personal computers. Whether people can buy goods through PC is solved mainly by receiving a specimen of the goods shown on computer screen and placing an order online.</s>,"<pad> #Person1# checks out the various internet websites for information and winners from the press website, HP and AOL suggest #Person1# the detailed tips on how to use computers on the internet allow for projects.</s>",2.41496,2.613434,0.198474
6,"Summarize the following conversation. #Person1#: I'm forming a music band. #Person2#: Do you already know how to play an instrument? #Person1#: Uh... Yeah! I'Ve told you a thousand times that I'm learning to play the drums. Now that I know how to play well, I would like to form a rock band. #Person2#: Aside from yourself, who are the other members of the band? #Person1#: We have a guy who plays guitar, and another who plays bass. Although we still haven't found anyone to be our singer. You t...",<pad> #Person1# asks #Person2# about #Person1#s music skills. #Person1# has joined a rock bandwith the music talent but we haven't found anyone to be our singer. #Person2# tells #Person1# that #Person1# is interested in auditioning but doesn't have enough room for them. #Person1# decides to audition in an audition only because he can reach out to #Person1# and is willing to practice and watch other musicians.</s>,"<pad> #Person1# wants to form a music band but #Person2# wanted to play a different genre. #Person1# wanted to join a rock band, while #Person2# wants a rock band. But #Person1# gave me lots of thanks for trying the auditions.</s>",2.734084,2.857643,0.123559
7,"Summarize the following conversation. #Person1#: So how did you like the restaurant? #Person2#: Actually, it could have been better. #Person1#: What didn't you like about it? #Person2#: It is a new restaurant. I don't think they have their act together yet. #Person1#: What did you think about the food? #Person2#: I felt that the food was pretty mediocre. #Person1#: The service wasn't that great, either. #Person2#: I agree. The service was not good. #Person1#: Do you think that you want to tr...",<pad> #Person2# tells #Person1# that #1_1#'s new restaurant should have been better. #Person2# took the food and service dose not start with the service. The restaurant nobody knows.</s>,<pad> #Person2# doesn't order much else at this restaurant because they don't have issues with the service of the restaurant. They are disappointed and decide to get another servers.</s>,1.875288,1.99233,0.117042
8,"Summarize the following conversation. #Person1#: Could you help me figure out how to look for a job? #Person2#: We have lots of options, what type of job do you need? #Person1#: I want to work in an office. #Person2#: Do you want to work part-time or full-time? #Person1#: I want to work full-time. #Person2#: We have binders with local job listings or you can make use of the computers. OK? #Person1#: I am confused a bit but I am sure that I can figure it out. #Person2#: If you make an appoint...","<pad> #Person2# helps #Person1# to know how to find a full-time job at a job center. They have binders with local job listings and computers. Otherwise, #Person1# refuses to go to a job center.</s>",<pad> #Person1# finds a job center with options for the job seeker. #Person2# helps #Person1# to find a job and would make an appointment.</s>,1.897589,1.977381,0.079792
9,"Summarize the following conversation. #Person1#: Hello? #Person2#: Hello? #Person1#: Can I speak to Li Hong, please? #Person2#: Speaking. #Person1#: Hi, Li Hong. This is Alice. #Person2#: Hi, Alice. How are you? #Person1#: Not bad. Li Hong, I am sorry that I can't go to see Mrs. Brown with you tomorrow morning. My mother is ill. I must take care of her. #Person2#: I'm sorry to hear that. You'd better stay at home. After all, we can visit Mrs. Brown later #Person1#: OK. Bye - bye. #Person2#: ...",<pad> Alice apologizes to Li Hong for not going to Mrs. Brown with Li Hong tomorrow morning because her mother is ill. Li Hong and Alice decide on visiting Mrs. Brown later. Are they going to visit Mrs. Brown?</s>,<pad> Alice falls ill. Li Hong asks her to stay in home because she has to visit Ms Brown. Alice accepts the invitation. #Person1# will visit Ms Brown now.</s>,1.617947,1.685857,0.067909


## Challenges faced during the process


**Hyperparameter Optimization:**
- Grid Search or Random Search: Experiment with different combinations of hyperparameters (like learning rate, batch size, etc.) using methods like grid search or random search.
- Automated Tools: Consider using automated hyperparameter tuning tools like Hyperopt or Bayesian Optimization for more efficient searching.

**Understanding Model Capacity in PEFT:**
- Model Capacity: Refers to the ability of a model to capture complex patterns in data. In PEFT, you need to balance the capacity to avoid underfitting (too little capacity) and overfitting (too much capacity).
- Tuning Approach: Adjust the extent of fine-tuning in PEFT. For instance, decide which layers to fine-tune and to what extent. Sometimes, fine-tuning only the top layers or just a fraction of all parameters is sufficient.

**Managing Overfitting Risks:**
- Regularization Techniques: Use methods like dropout, weight decay, or early stopping during training to prevent overfitting.
- Data Splitting Strategy: Ensure proper division of data into training, validation, and test sets to evaluate the model's generalization capability effectively.


In summary, the journey through hyperparameter optimization and understanding model capacity in PEFT involved a series of challenges, from the complexity of parameter space to balancing model capacity and avoiding overfitting. These challenges were overcome through a combination of strategic use of automated tools, methodical experimentation, and implementing best practices in model training and evaluation. This approach led to a more efficient and effective model tuning process, yielding a model that was well-balanced in terms of capacity and generalization.

## References

1. Generative-AI-with-LLMs - Retrieved from - https://github.com/Ryota-Kawamura/Generative-AI-with-LLMs/tree/5563fae447eb09be0cd1b8970fa8b07fa3b88042
2. https://www.kaggle.com/code/yannicksteph/nlp-ppo-dialogsum-less-toxic-summarize
3. [DialogSum Dataset](https://huggingface.co/datasets/knkarthick/dialogsum) DialogSum is a large-scale dialogue summarization dataset, consisting of 13,460 (Plus 100 holdout data for topic generation) dialogues with corresponding manually labeled summaries and topics.
4. [Generative AI with Large Language Models | Coursera](https://www.coursera.org/learn/generative-ai-with-llms?utm_medium=sem&utm_source=gg&utm_campaign=B2C_NAMER_generative-ai-with-llms_deeplearning-ai_FTCOF_learn_country-US-country-CA&campaignid=20534248984&adgroupid=160068579824&device=c&keyword=&matchtype=&network=g&devicemodel=&adposition=&creativeid=673251286004&hide_mobile_promo&gclid=CjwKCAjwg4SpBhAKEiwAdyLwvEW_WnNyptOwzHtsGmn5-OxT5BKsQeUXHPahO-opBJ0JjsSynHkPAxoCaoAQAvD_BwE) - An informative guide that provides in-depth explanations and examples on various LLMs.

## License

Copyright 2024 Aditi Deodhar

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.