# Generative AI Task: Summarise Dialogue

This notebook has been used to perform the task of dialogue summarisation using Generative AI. Through the use of different techniques to the inference process, the exploration of how different prompts (input text) affect the completion (output) of the model, was performed. Prompt engineering was carried out, by comparing zero shot, one shot and few shot inferences, with the intention to see how to best enhance the generative output of the Large Language Model.

### Install the required dependencies

Given the scope of the task, we need to install packages to use PyTorch and Hugging Face transformers and datasets

In [71]:
!pip install torch==1.13.1
!pip install torchdata==0.5.1 --quiet
!pip install transformers==4.27.2
!pip install datasets==2.11.0 --quiet

Looking in indexes: https://james.aymer:****@nexus.tools.btcsp.co.uk/repository/pypi-group/simple

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Looking in indexes: https://james.aymer:****@nexus.tools.btcsp.co.uk/repository/pypi-group/simple

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A 

In [73]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

In [74]:
!pip install -U datasets

Looking in indexes: https://james.aymer:****@nexus.tools.btcsp.co.uk/repository/pypi-group/simple

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


We need to load a few resources that will be used: datasets, Large Language Model (LLM), tokeniser and configurator

### Summarise dialogue without Prompt Engineering

In this use case, a summary of the dialogue will be generated with the pre-trained Large Language Model (LLM) `FLAN-T5` from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index)

We can now upload some simple dialogues from the [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum) Hugging Face dataset. This dataset contains over 10,000 dialogues with the corresponding manually labelled summaries and topics

In [75]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

Found cached dataset csv (file:///Users/jamesaymer/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)


NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

Now that we have loaded the dataset, we can print a few dialogues from the dataset, together with their baseline summaries

In [25]:
# example_dialogue_indice_slice: list = [40, 200]

dashed_line_ouptut_divider = '-'.join('' for x in range(116))

for example, index in enumerate(example_dialogue_indice_slice):
    print(dashed_line_ouptut_divider)
    print(f"Example: {example + 1}")
    print(dashed_line_ouptut_divider)
    
    print("INPUT DIALOGUE:")
    print(dataset['test'][index]['dialogue'])
    print(dashed_line_ouptut_divider)
    
    print("BASELINE HUMAN SUMMARY:")
    print(dataset['test'][index]['summary'])
    print(dashed_line_ouptut_divider)
    print()

-------------------------------------------------------------------------------------------------------------------
Example: 1
-------------------------------------------------------------------------------------------------------------------
INPUT DIALOGUE:


NameError: name 'dataset' is not defined

Now we can load the [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5), and create an instance of the `AutoModelForSeq2SeqLM` class with the `.from_pretrained()` method

In [None]:
model_name = "google/flan-t5-base"

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Before we can perform endcoding and decoding, we need to tokenize the text. `Tokenisation` is the process of splitting the texts into smaller units that can be processed by LLM models. This means that converting each word into a number representing a position in a dictionary of all the possible words that the model can work with.

We can download the tokenizer for the `FLAN-T5` model using `AutoTOkenizer.from_pretrained()` method. The `use_fast` parameter can be used to use a fast Rust-based tokenizer if supported, otherwise a normal Python-based tokeniser is returned instead

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

We can now test that the tokenizer can encode and decode a simple sentence

In [None]:
sentence = "What time is it, Tom?"

# np = Numpy
# tf = TensorFlow
# pt = PyTorch

encoded_sentence = tokenizer(sentence, return_tensors="pt")

decoded_sentence = tokenizer.decode(
        encoded_sentence["input_ids"][0],
        skip_special_tokens=True
    )

print("ENCODED SENTENCE:")
print(encoded_sentence["input_ids"][0])
print("\nDECODED SENTENCE:")
print(decoded_sentence)

Now we can assess how well the base LLM summarises a dialogue without any prompt engineering

In [None]:
for example, index in enumerate(example_dialogue_indice_slice):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
    
    inputs = tokenizer(dialogue, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )
    
    print(dashed_line_ouptut_divider)
    print(f"Example: {example + 1}")
    print(dashed_line_ouptut_divider)
          
    print(f"INPUT PROMPT:\n{dialogue}")
    print(dashed_line_ouptut_divider)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dashed_line_ouptut_divider)
          
    print(f"MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n")

It is clear that the guesses made by the model make some sense, but it doesn't seem exactly right. We can continue to use Prompt engineering to help here

### Zero shot inference with an instruction Prompt

In [None]:
for example, index in enumerate(example_dialogue_indice_slice):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """

    # Input constructed prompt instead of the dialogue
    inputs = tokenizer(prompt, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )
    
    print(dashed_line_ouptut_divider)
    print("Example: {example + 1}")
    print(dashed_line_ouptut_divider)
    
    print(f"INPUT PROMPT:\n{prompt}")
    print(dashed_line_ouptut_divider)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dashed_line_ouptut_divider)
    
    print(f"MODEL GENERATION - ZERO SHOT:\n{output}\n")

### Zero shot inference with a Prompt template from FLAN-T5

We can use a [pre-built prompt template](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py) from `FLAN-T5` to help

In [None]:
for example, index in enumerate(example_dialogue_indice_slice):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
        
    prompt = f"""
Dialogue:

{dialogue}

What was going on?
"""

    inputs = tokenizer(prompt, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0], 
        skip_special_tokens=True
    )

    print(dashed_line_ouptut_divider)
    print("Example: {example + 1}")
    print(dashed_line_ouptut_divider)
    
    print(f"INPUT PROMPT:\n{prompt}")
    print(dashed_line_ouptut_divider)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dashed_line_ouptut_divider)
    
    print(f"MODEL GENERATION - ZERO SHOT:\n{output}\n")

### One shot inference 

We can build a function that takes a list of `example_indices_full`, generates a prompt with full exapmples, then at the end, it will append the prompt which you want the model to complete (`example_index_to_summarize`)

In [None]:
def make_prompt(example_indices_full, example_index_to_summarize):
    prompt = ""
    for index in example_indices_full:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']
        
        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5...
        # ...other models may have their own preferred stop sequence
        prompt += f"""
Dialogue:

{dialogue}

What was going on?
{summary}


"""
    
    dialogue = dataset['test'][example_index_to_summarize]['dialogue']
    
    prompt += f"""
Dialogue:

{dialogue}

What was going on?
"""
        
    return prompt

We can now construct the prompt to perform one shot inference

In [None]:
example_indices_full = [40]
example_index_to_summarize = 200

one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(one_shot_prompt)

Now we can pass the prompt above to perform the one shot inference

In [None]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors="pt")
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

print(dashed_line_ouptut_divider)
print(f"BASELINE HUMAN SUMMARY:\n{summary}\n")
print(dashed_line_ouptut_divider)
print(f"MODEL GENERATION - ONE SHOT:\n{output}")

### Few shot inference

We can add two more dialogue-summary pairs to the prompt, before performing few shot inference

In [None]:
example_indices_full = [40, 80, 120]
example_index_to_summarize = 200

few_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(few_shot_prompt)

As in the case of one shot inference, we can now pass the prompt to perform few shot inference

In [None]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors="pt")
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

print(dashed_line_ouptut_divider)
print(f"BASELINE HUMAN SUMMARY:\n{summary}\n")
print(dashed_line_ouptut_divider)
print(f"MODEL GENERATION - FEW SHOT:\n{output}")

Given the output from the above cell, we can see that few shot inference did not provide much of an improvement over one shot inference. It is also important to remember to not exceed the model's input context length (`512 tokens`), as anything above the context length will be ignored

However, it is clear that passing in one full example (`one shot inference`), provides the model with more information to help improve the overall completion 

### Changing Generative configuration parameters

We can change some of the configuration parameters, to influence the way that the model makes the final decision about next word generation. We can change parameters such as `do_sample`, `temperature`, `top_k` & `top_p`

In [None]:
generation_config = GenerationConfig(max_new_tokens=50)
# generation_config = GenerationConfig(max_new_tokens=10)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.1)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=1.0)

inputs = tokenizer(few_shot_prompt, return_tensors="pt")
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        generation_config=generation_config,
    )[0], 
    skip_special_tokens=True
)

print(dashed_line_ouptut_divider)
print(f"MODEL GENERATION - FEW SHOT:\n{output}")
print(dashed_line_ouptut_divider)
print(f"BASELINE HUMAN SUMMARY:\n{summary}\n")