# Generative AI Use Case: Summarize Dialogue

Welcome to the practical side of this course. in this lab you will do the dialogue summarization task using generative Al.You will explore how the input text affects the output of the model, and perform prompt engineering to direct ittowards the task you need. By comparing zero shot, one shot, and few shot inferences, you will take the first step towards prompt engineering and see how it can enhance the generative output of LLMs. 

In [1]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

## Summarize without Prompt Engineering

In this use case,you wil be generating a summary of a dialogue with the pre-trained Large Language Model (LLM) FLAN-Ts from Hugging Face. The list of avallable
models in the Hugging Face transformers package can be found here.
Let's upload some simple dialogues from the Dialogsum Hugging Face dataset. This dataset contains 10,000+ dialogues with the corresponding manually labeled
summaries and topics.

In [3]:
huggingface_dataset_name = 'knkarthick/dialogsum'
dataset = load_dataset(huggingface_dataset_name)

README.md:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


train.csv:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

validation.csv:   0%|          | 0.00/442k [00:00<?, ?B/s]

test.csv:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/12460 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/500 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1500 [00:00<?, ? examples/s]

In [6]:
example_indices = [40, 200]

dash_line = '_'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

___________________________________________________________________________________________________
Example 1
___________________________________________________________________________________________________
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
___________________________________________________________________________________________________
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
___________________________________________________________________________________________________

___________________________________________________________________________________________________
Exam

#### Load pre-trained model

In [7]:
model_name = 'google/flan-t5-base'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [8]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

In [9]:
sentence = 'What time is it, Tom?'

sentence_encoded = tokenizer(sentence, return_tensors='pt')

sentence_decoded = tokenizer.decode(
    sentence_encoded['input_ids'][0],
    skip_special_tokens=True
)

In [10]:
print('ENCODED SENTENCE:')
print(sentence_encoded['input_ids'][0])
print('\nDECODED_SENTENCE:')
print(sentence_decoded)

ENCODED SENTENCE:
tensor([ 363,   97,   19,   34,    6, 3059,   58,    1])

DECODED_SENTENCE:
What time is it, Tom?


## Prompt Engineering

Now it's time to explore how well the base LLM summarizesan act of a human changing theprompt (input) to improve the response for a given task.

In [13]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
    
    inputs = tokenizer(dialogue, return_tensors='pt')
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING: \n{output}\n')

___________________________________________________________________________________________________
Example 1
___________________________________________________________________________________________________
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
___________________________________________________________________________________________________
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
___________________________________________________________________________________________________
MODEL GENERATION - WITHOUT PROMPT ENGINEERING: 
Person1: It's ten to nine.

_______________________________

## 3. Summarize Dialogue with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation. You can check out this blog from Amazon Science for a quick introduction to prompt engineering.

#### 3.1 Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - you can take the dialog and convert it into an instruction prompt. This is often called zero shot inference. 

Wrap the dialogue in a descriptive instruction and see how the generated text will change:

In [15]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
        Summarize the following conversation. 
        {dialogue}
        summary: 
    """
    
    inputs = tokenizer(prompt, return_tensors='pt')
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT: \n{output}\n')

___________________________________________________________________________________________________
Example 1
___________________________________________________________________________________________________
INPUT PROMPT:

        Summarize the following conversation. 
        #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
        summary: 
    
___________________________________________________________________________________________________
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
___________________________________________________________________________________________________
MODEL GENERATION - ZERO SHOT

In [16]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
        Dialogue: 
        {dialogue}
        What was going on?: 
    """
    
    inputs = tokenizer(prompt, return_tensors='pt')
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT: \n{output}\n')

___________________________________________________________________________________________________
Example 1
___________________________________________________________________________________________________
INPUT PROMPT:

        Dialogue: 
        #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
        What was going on?: 
    
___________________________________________________________________________________________________
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
___________________________________________________________________________________________________
MODEL GENERATION - ZERO SHOT: 
Tom is late fo

## 4. Summarize Dialogue with One Shot and Few Shot Inference 

One shot and few shot inference are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match your task before your actual prompt that you want completed. This is caled "in-context earing" and puts your model into a state that understands your specific task. 

#### 4.1 One Shot Inference

Let's build a function that takes a list of example indices full, generates a prompt with full examples, then at the end appends the prompt which you want the model to complete ( example index to summarize ). You will use the same FAN-T5 prompt template from section 3.2.

In [38]:
def make_prompt(example_indices_full, example_index_to_summarize): 
    
    prompt = ''
    
    for index in example_indices_full:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']
    
        prompt += f"""
Dialogue: 

{dialogue}

What was going on?: 
{summary}
"""

        dialogue = dataset['test'][example_index_to_summarize]['dialogue']

        prompt += f"""
Dialogue: 

{dialogue}
            
What was going on? 
        """
    return prompt

In [39]:
example_indices_full = [40]
example_index_to_summarize = 200
one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)
print(one_shot_prompt)


Dialogue: 

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?: 
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

Dialogue: 

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also 

In [40]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

___________________________________________________________________________________________________
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

___________________________________________________________________________________________________
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.


#### 4.2 Few Shot Inference

In [41]:
example_indices_full = [40, 80, 120]
example_index_to_summarize = 200

In [42]:
few_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)
print(few_shot_prompt)


Dialogue: 

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?: 
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

Dialogue: 

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also 

In [43]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

___________________________________________________________________________________________________
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

___________________________________________________________________________________________________
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.


## 5. Generative Configuration Parameters for Inference

You can change the confiquration parameters ofthe generate() method to sce a diferent output from the llM. o far the only parameter that you have beensetting was max_new_tokens=58 , which defines the maximum number of tokens to generate. A full list ofavailable parameters can be found in the Hugging FaceGeneration documentation.

A convenient way of organizing the confgurationparameters is to useGenerationconfig class

Exercise:

Change the configuration parameters to investigate their influence on the output.

Puting the parameter do sample = True ,you activate various decoding strategies which inhuence the next oken from the probabiity distrbution over theentire vocabulary. You can then adjust the outputs changing temperature and other parameters (such as top_k and top_p ).

Uncomment the lines in the cell below and rerun the code, Try to analyze the results.

In [48]:
generation_config = GenerationConfig(max_new_token=50)
generation_config = GenerationConfig(max_new_token=10)
generation_config = GenerationConfig(max_new_token=50, do_sample=True, temperature=0.1)
# generation_config = GenerationConfig(max_new_token=50, do_sample=True, temperature=0.5)
# generation_config = GenerationConfig(max_new_token=50, do_sample=True, temperature=1.0)

In [49]:
inputs = tokenizer(one_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - Few SHOT:\n{output}')

___________________________________________________________________________________________________
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

___________________________________________________________________________________________________
MODEL GENERATION - Few SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.
