In this exercise, you will perform prompt engineering on a dialogue summarization task using [Flan-T5](https://huggingface.co/google/flan-t5-large) and the [dialogsum dataset](https://huggingface.co/datasets/knkarthick/dialogsum). You will explore how different prompts affect the output of the model, and compare zero-shot and few-shot inferences. <br/>
Complete the code in the cells below.

### 1. Set up Required Dependencies

In [None]:
!pip install datasets -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig
from datasets import load_dataset

### 2. Explore the Dataset

In [None]:
from datasets import load_dataset

dataset = load_dataset('knkarthick/dialogsum')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Print several dialogues with their baseline summaries.

In [None]:
example_indices = [0, 42, 800]
dash_line = '-' * 100

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

### 3. Summarize Dialogues without Prompt Engineering

Load the Flan-T5-large model and its tokenizer.

In [None]:
model_name = 'google/flan-t5-large'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

**Exercise**: Use the pre-trained model to summarize the example dialogues without any prompt engineering. Use the `model.generate()` function with `max_new_tokens=50`.

In [None]:
### WRITE YOUR CODE HERE

for index in example_indices:
    inputs = tokenizer(dataset['test'][index]['dialogue'], return_tensors="pt", padding=True, truncation=True, max_length=512)
    outputs = model.generate(inputs['input_ids'], max_new_tokens=50)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

#Person1: Ms. Dawson, please take dictation for me.
#Person1#: Thank you, #Person2#.
#Person1#: That's great.


You can see that the model generations make some sense, but the model doesn't seem to be sure what task it is supposed to accomplish and it often just makes up the next sentence in the dialogue. Prompt engineering can help here.

### 4. Summarize Dialogues with Instruction Prompts

In order to instruct the model to perform a task (e.g., summarize a dialogue), you can take the dialogue and convert it into an instruction prompt. This is often called **zero-shot inference**.

**Exercise**: Wrap the dialogues in a descriptive instruction (e.g., "Summarize the following conversation."), and examine how the generated text changes.

In [None]:
### WRITE YOUR CODE HERE

instruction = "Summarize the following conversation:"
for index in example_indices:
    dialogue = dataset['test'][index]['dialogue']
    prompt = f"{instruction} {dialogue}"
    inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=512)
    outputs = model.generate(inputs['input_ids'], max_new_tokens=50)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))


#Person1# wants Ms. Dawson to take dictation for him.
#Person1# is worried about his future. #Person2# gives him some advice.
Dad, you keep talking about your uncle Bill, his wife and two of their daughters in New Zealand.


This is much better! But the model still does not pick up on the nuance of the conversations though.

**Exercise:** Experiment with the prompt text and see how it influences the generated output. Do the inferences change if you end the prompt with just empty string vs. `Summary: `?

In [None]:
### WRITE YOUR CODE HERE

instruction_prompts = [
    "Summarize the following conversation.",
    "Please provide a summary for this dialogue:",
    "What is the essence of this conversation?",
    "Summary: "  # Ending the prompt with just "Summary: "
]

for prompt in instruction_prompts:
    print(f"Using prompt: {prompt}\n")
    for index in example_indices:
        dialogue = dataset['test'][index]['dialogue']
        full_prompt = f"{prompt} {dialogue}"
        inputs = tokenizer(full_prompt, return_tensors="pt", padding=True, truncation=True, max_length=512)
        outputs = model.generate(inputs['input_ids'], max_new_tokens=50)
        print(f"Example {index} Summary:")
        print(tokenizer.decode(outputs[0], skip_special_tokens=True))
        print("-" * 100)
    print("\n")

Using prompt: Summarize the following conversation.

Example 0 Summary:
#Person1#: Ms. Dawson, please take dictation for me. #Person1#: All office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours
----------------------------------------------------------------------------------------------------
Example 42 Summary:
#Person1# is worried about his future. #Person2# is very kind.
----------------------------------------------------------------------------------------------------
Example 800 Summary:
Dad, you keep talking about your uncle Bill, his wife and two of their daughters in New Zealand.
----------------------------------------------------------------------------------------------------


Using prompt: Please provide a summary for this dialogue:

Example 0 Summary:
#Person1 wants to send an intra-office memo to all employees. It is about a new policy limiting the use of Instant Messaging.

**Exercise:** Flan-T5 has many prompt templates that are published for certain tasks [here](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py). Try using its pre-built prompts for dialogue summarization (e.g., the ones under the `"samsum"` key) and see how they influence the outputs.


In [15]:
### WRITE YOUR CODE HERE

flan_prompts = [
    "{dialogue}\n\nBriefly summarize that dialogue.",
    "Here is a dialogue:\n{dialogue}\n\nWrite a short summary!",
    "Dialogue:\n{dialogue}\n\nWhat is a summary of this dialogue?",
    "{dialogue}\n\nWhat was that dialogue about, in two sentences or less?"
]

for flan_prompt in flan_prompts:
    print(f"Using Flan-T5 prompt: {repr(flan_prompt)}\n")
    for index in example_indices:
        dialogue = dataset['test'][index]['dialogue']
        full_prompt = flan_prompt.format(dialogue=dialogue)
        print(full_prompt)
        inputs = tokenizer(full_prompt, return_tensors="pt", padding=True, truncation=True, max_length=512)
        outputs = model.generate(inputs['input_ids'], max_new_tokens=50)
        print(f"Example {index} Summary:")
        print(tokenizer.decode(outputs[0], skip_special_tokens=True))
        print("-" * 100)
    print("\n")

Using Flan-T5 prompt: '{dialogue}\n\nBriefly summarize that dialogue.'

Example 0 Summary:
----------------------------------------------------------------------------------------------------
Example 42 Summary:
Person1 is worried about his future. He should get plenty of sleep, drink less wine and exercise.
----------------------------------------------------------------------------------------------------
Example 800 Summary:
Dad keeps talking about his uncle Bill, his wife and two of their daughters in New Zealand. Sarah and Jane are his cousins. They want to travel to Europe next year and will visit them at the same Ae.
----------------------------------------------------------------------------------------------------


Using Flan-T5 prompt: 'Here is a dialogue:\n{dialogue}\n\nWrite a short summary!'

Example 0 Summary:
Person1 wants to send an intra-office memo to all employees. It's about a new policy on communications. Employees who use Instant Messaging during working hours wi

Notice that the prompts from Flan-T5 did help, but the model still struggles to pick up on the nuance of the conversation in some cases. This is what you will try to solve with few-shot inferencing.

### 5. Summarize Dialogues with a Few-Shot Inference

**Few-shot inference** is the practice of providing an LLM with several examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.

**Exercise:** Build a function that takes a list of `in_context_example_indexes`, generates a prompt with the examples, then at the end appends the prompt that you want the model to complete (`test_example_index`). Use the same Flan-T5 prompt template from Section 3. Make sure to separate between the examples with `"\n\n\n"`.

In [50]:
### WRITE YOUR CODE HERE

def make_prompt(in_context_example_indices, test_example_index, instruction="Summarize the following conversation:"):
    examples = []
    for index in in_context_example_indices:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']
        examples.append(f"{instruction}\n{dialogue}\nSummary: {summary}")
    test_dialogue = dataset['test'][test_example_index]['dialogue']
    prompt = "\n\n\n".join(examples) + f"\n\n\n{instruction}\n{test_dialogue}\nSummary:"
    return prompt

In [51]:
in_context_example_indices = [0, 10, 20]
test_example_index = 800

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)
print(few_shot_prompt)

Summarize the following conversation:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to communicate with their clients.
#Person1#: They will just have to change their communication methods. I don't want any - one using Instant Messaging in this office. It wastes too much time! 

Now pass this prompt to the model perform a few shot inference:

In [52]:
### WRITE YOUR CODE HERE

inputs = tokenizer(few_shot_prompt, return_tensors="pt", padding=True, truncation=True, max_length=512)
outputs = model.generate(inputs['input_ids'], max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Summary: It's Brian's birthday. He's going to have a dance with Person2.


**Exercise:** Experiment with the few-shot inferencing:
- Choose different dialogues - change the indices in the `in_context_example_indices` list and `test_example_index` value.
- Change the number of examples. Be sure to stay within the model's 512 context length, however.

How well does few-shot inference work with other examples?

In [53]:
in_context_example_indices = [0, 10, 20, 364]
test_example_index = 800

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)

inputs = tokenizer(few_shot_prompt, return_tensors="pt", padding=True, truncation=True, max_length=512)
outputs = model.generate(inputs['input_ids'], max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Summary: It's Brian's birthday. He's going to have a dance with Person2.


In [58]:
### WRITE YOUR CODE HERE

in_context_example_indices = [4, 12, 69, 122, 242]
test_example_index = 800

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)

inputs = tokenizer(few_shot_prompt, return_tensors="pt", padding=True, truncation=True, max_length=512)
outputs = model.generate(inputs['input_ids'], max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Summary: The Olympic park is very big.


### 6. Generative Configuration Parameters for Inference

You can change the configuration parameters of the `generate()` method to see a different output from the LLM. So far the only parameter that you have been setting was `max_new_tokens=50`, which defines the maximum number of tokens to generate. A convenient way of organizing the configuration parameters is to use `GenerationConfig` class. By setting the parameter `do_sample = True`, you can activate various decoding strategies which influence the next token from the probability distribution over the entire vocabulary. You can then adjust the outputs changing `temperature` and other parameters (such as `top_k` and `top_p`). A full list of available parameters can be found in the [Hugging Face Generation documentation](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig).

**Exercise:** Change the configuration parameters to investigate their influence on the output. Analyze your results.

In [78]:
### WRITE YOUR CODE HERE

# Define generation configurations
configurations = {
    "Greedy Decoding": GenerationConfig(max_new_tokens=50),  # Default settings for greedy decoding
    "Sampling with Temperature": GenerationConfig(do_sample=True, temperature=0.7, max_new_tokens=50),
    "Top-K Sampling": GenerationConfig(do_sample=True, top_k=50, max_new_tokens=50),
    "Top-P (Nucleus) Sampling": GenerationConfig(do_sample=True, top_p=0.92, max_new_tokens=50),
    "Beam Search": GenerationConfig(num_beams=5, max_new_tokens=50, no_repeat_ngram_size=2)
}

for index in example_indices:
    print(f"Example {index}:")
    print('-'*100)
    dialogue = dataset['test'][index]['dialogue']
    prompt = f"Summarize the following conversation: {dialogue}"
    print(prompt)

    # Tokenize the input prompt
    input_ids = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).input_ids

    # Generate summaries under different configurations
    for description, config in configurations.items():
        print(f"\n{description}:")
        outputs = model.generate(input_ids, **config.to_dict())
        summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print(summary)
    print("-" * 80)

Example 0:
----------------------------------------------------------------------------------------------------
Summarize the following conversation: #Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to communicate with their clients.
#Person1#: They will just have to change their

Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1# wants Ms. Dawson to take dictation for him.

Sampling with Temperature:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1: I need you to take a dictation for me. I need you to type up and distribute an intra-office memo to all employees by this afternoon.

Top-K Sampling:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1 is giving dictation to Ms. Dawson about a new rule concerning the use of Instant Messaging.

Top-P (Nucleus) Sampling:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1# needs Ms. Dawson to type up a memo for him.

Beam Search:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1 wants Ms. Dawson to take dictation for him.
--------------------------------------------------------------------------------
Example 42:
----------------------------------------------------------------------------------------------------
Summarize the following conversation: #Person1#: I don't know how to adjust my life. Would you give me a piece of advice?
#Person2#: You look a bit pale, don't you?
#Person1#: Yes, I can't sleep well every night.
#Person2#: You should get plenty of sleep.
#Person1#: I drink a lot of wine.
#Person2#: If I were you, I wouldn't drink too much.
#Person1#: I often feel so tired.
#Person2#: You better do some exercise every morning.
#Person1#: I sometimes find the shadow of death in front of me.
#Person2#: Why do you worry about your future? You're very young, and you'll make great contribution to the world. I hope you take my advice.

Greedy Decoding:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1# is worried about his future. #Person2# gives him some advice.

Sampling with Temperature:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1: I'm worried about my future.

Top-K Sampling:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1 has been worried about her future. #Person2 encourages her and encourages her to change her life.

Top-P (Nucleus) Sampling:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1# is worried about her future and wants her life to be easier. #Person2# gives her some advice.

Beam Search:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1# is worried about his future.
--------------------------------------------------------------------------------
Example 800:
----------------------------------------------------------------------------------------------------
Summarize the following conversation: #Person1#: Dad, you keep talking about family in New Zealand. Who are they?
#Person2#: Well, that's your uncle Bill, his wife and two of their daughters.
#Person1#: Is uncle Bill your brother?
#Person2#: No, your uncle Jack is my brother, Bill is my brother-in-law, your mom's brother.
#Person1#: So his two daughters are my cousins?
#Person2#: That's right, Sarah and Jane are both your cousins although they are step-sisters.
#Person1#: What are step-sisters?
#Person2#: Sarah is your uncle Bill's older daughter. When she was young, Bill's first wife, Sarah's mom died. Three years later Bill married again.
#Person1#: So uncle Bill's wife is Jane's mother but not Sarah's mother. Right?
#Person2#: Yes. She is Sarah's step-m

Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Dad, you keep talking about your uncle Bill, his wife and two of their daughters in New Zealand.

Sampling with Temperature:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Dad keeps telling his daughter about family in New Zealand.

Top-K Sampling:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Dad, you're always talking about your family in New Zealand. What's your uncle Bill's wife's name?

Top-P (Nucleus) Sampling:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Mother and father talk about their family in New Zealand.

Beam Search:


Both `max_new_tokens` (=50) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Person1#: My dad keeps talking about his family in New Zealand. They are his uncle Bill, his wife and two of their daughters.
--------------------------------------------------------------------------------
