### Prompt Engineering: Summarize CNN news articles

Generation of the summary of an article with the pre-trained Google's LLM [FLAN-T5](https://huggingface.co/google/flan-t5-base) from HuggingFace.The articles come from the [CNN dataset](https://huggingface.co/datasets/cnn_dailymail), which contains ~1M articles from the CNN DailyMail. They come with the corresponding manually labeled summaries.

Key points:
- The input text affects the output of the model
- **Prompt engineering** can direct it towards the task you need
- Zero-shot, one-shot, and few-shot inferences are all different ways to enhance the output of an LLM.

In [44]:
# libraries
import numpy as np
from datasets import load_dataset                # huggingface datasets
from transformers import AutoModelForSeq2SeqLM   # generic model class
from transformers import AutoTokenizer           # generic tokenizer class
from transformers import GenerationConfig        # generic task class

#### Dataset

In [10]:
dataset = load_dataset("cnn_dailymail", "3.0.0") # load dataset

Downloading and preparing dataset cnn_dailymail/3.0.0 to C:/Users/A0860164/.cache/huggingface/datasets/cnn_dailymail/3.0.0/3.0.0/1b3c71476f6d152c31c1730e83ccb08bcf23e348233f4fcc11e182248e6bf7de...


Downloading data files:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/159M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/376M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/12.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/661k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/572k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/287113 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/13368 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11490 [00:00<?, ? examples/s]

Dataset cnn_dailymail downloaded and prepared to C:/Users/A0860164/.cache/huggingface/datasets/cnn_dailymail/3.0.0/3.0.0/1b3c71476f6d152c31c1730e83ccb08bcf23e348233f4fcc11e182248e6bf7de. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

In [92]:
# some examples
np.random.seed(30)
example_index = np.random.randint(0,dataset['test'].shape[0])
print('-'.join('' for x in range(100)))
print(f"Article n. {example_index}")
print('-'.join('' for x in range(100)))
print('INPUT ARTICLE:')
print(dataset['test'][example_index]['article'])
print('-'.join('' for x in range(100)))
print()
print('BASELINE HUMAN SUMMARY:')
print(dataset['test'][example_index]['highlights'])
print('-'.join('' for x in range(100)))
print()

---------------------------------------------------------------------------------------------------
Article n. 5925
---------------------------------------------------------------------------------------------------
INPUT ARTICLE:
You’ve found the house of your dreams – but you want to make sure the costly nightmare of dry rot isn’t lurking under the carpets and skirting boards. The solution? Call in the dogs! The appropriately named Mark Doggett has trained his two animals to sniff out the destructive fungus in old houses where it can hide in places a person would miss. Mr Doggett gave up a ten-year career in construction after hitting on the idea to set up a business using the animals’ sense of smell, which is said to be up to a million times better than that of humans. Skilled: Meg and Jess, pictured with Mark Doggett, were trained for six months to sniff out dry rot . On the case: Four-year-old Border collie Meg gets down to work sniffing out the destructive fungus . When they find

#### Model

Load the [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5) and tokenizer.

In [93]:
model_name = 'google/flan-t5-base'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)            # instantiate the model
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True) # instantiate the tokenizer

In [94]:
# tokenizer encoding/decoding example
sentence = "Breaking News: Christmas holidays are over! @^^^**#$%"
sentence_encoded = tokenizer(sentence, return_tensors='pt')                       # return tokenization as pytorch tensor
sentence_decoded = tokenizer.decode(sentence_encoded["input_ids"][0], skip_special_tokens=True) # skipping special tokens

print('INPUT SENTENCE:')
print(f"{sentence}")
print('\nENCODED SENTENCE:')
print(f"Input IDs: {sentence_encoded['input_ids']}")
print(f"Attention Mask: {sentence_encoded['attention_mask']}")
print('\nDECODED SENTENCE:')
print(sentence_decoded)

INPUT SENTENCE:
Breaking News: Christmas holidays are over! @^^^**#$%

ENCODED SENTENCE:
Input IDs: tensor([[11429,    53,  3529,    10,  1619,  6799,    33,   147,    55,  3320,
             2, 19844,  4663,  3229,  1454,     1]])
Attention Mask: tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

DECODED SENTENCE:
Breaking News: Christmas holidays are over! @**#$%


#### Summarizing an Article without Prompt Engineering

Let's generate a summary that is as long as the human-made one. **Note** that this is an arbitrary choice and that any length can be chosen.

In [None]:
article = dataset['test'][example_index]['article']
summary = dataset['test'][example_index]['highlights']
max_new_tokens = len(tokenizer(summary)['input_ids'])

In [96]:
inputs = tokenizer(article, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=max_new_tokens)[0], skip_special_tokens=True)

print('-'.join('' for x in range(100)))
print(f"Article n. {example_index}")
print('-'.join('' for x in range(100)))
print('INPUT ARTICLE:')
print(dataset['test'][example_index]['article'])
print('-'.join('' for x in range(100)))
print()
print(f'BASELINE HUMAN SUMMARY (n. of tokens = {max_new_tokens}):')
print(dataset['test'][example_index]['highlights'])
print('-'.join('' for x in range(100)))
print()
print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:')
print(output)
print()

---------------------------------------------------------------------------------------------------
Article n. 5925
---------------------------------------------------------------------------------------------------
INPUT ARTICLE:
You’ve found the house of your dreams – but you want to make sure the costly nightmare of dry rot isn’t lurking under the carpets and skirting boards. The solution? Call in the dogs! The appropriately named Mark Doggett has trained his two animals to sniff out the destructive fungus in old houses where it can hide in places a person would miss. Mr Doggett gave up a ten-year career in construction after hitting on the idea to set up a business using the animals’ sense of smell, which is said to be up to a million times better than that of humans. Skilled: Meg and Jess, pictured with Mark Doggett, were trained for six months to sniff out dry rot . On the case: Four-year-old Border collie Meg gets down to work sniffing out the destructive fungus . When they find

As we can see, the generated text is linguistically correct. However, it doesn't accomplish the task we used it for: summarize the article. It justs pulls out some sentences inspired by the input text.

#### Summarizing an Article providing an Instruction Prompt

##### Zero-Shot Inference

This means wrapping the article in a prompt that clearly states the task to accomplish.

In [98]:
prompt = f"""
    Summarize the following article:
        {article}
    Summary:
"""

inputs = tokenizer(prompt, return_tensors='pt') # Input the constructed prompt instead of the article
output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=max_new_tokens)[0], skip_special_tokens=True)

print('-'.join('' for x in range(100)))
print(f"Article n. {example_index}")
print('-'.join('' for x in range(100)))
print('INPUT PROMPT:')
print(dataset['test'][example_index]['article'])
print('-'.join('' for x in range(100)))
print()
print(f'BASELINE HUMAN SUMMARY (n. of tokens = {max_new_tokens}):')
print(dataset['test'][example_index]['highlights'])
print('-'.join('' for x in range(100)))
print()
print(f'MODEL GENERATION - ZERO SHOT:')
print(output)
print()

---------------------------------------------------------------------------------------------------
Article n. 5925
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
You’ve found the house of your dreams – but you want to make sure the costly nightmare of dry rot isn’t lurking under the carpets and skirting boards. The solution? Call in the dogs! The appropriately named Mark Doggett has trained his two animals to sniff out the destructive fungus in old houses where it can hide in places a person would miss. Mr Doggett gave up a ten-year career in construction after hitting on the idea to set up a business using the animals’ sense of smell, which is said to be up to a million times better than that of humans. Skilled: Meg and Jess, pictured with Mark Doggett, were trained for six months to sniff out dry rot . On the case: Four-year-old Border collie Meg gets down to work sniffing out the destructive fungus . When they find 

We can see some improvements: the model understands which task to perform. However, it still doesn't perform it well. **Note** that changing the prompt structure `Summarize the following article: {article} Summary:` leads to different results.

##### Zero-Shot Inference with a Prompt Template from FLAN-T5

Predefined prompts are available on HuggingFace for a wide range of models. The ones available for the model used here, FLAN-T5, can be found [here](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py).

In [113]:
prompt = f"""
    Article:
        {article}
    What was going on?
"""

inputs = tokenizer(prompt, return_tensors='pt') # Input the constructed prompt instead of the article
output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=max_new_tokens)[0], skip_special_tokens=True)

print('-'.join('' for x in range(100)))
print(f"Article n. {example_index}")
print('-'.join('' for x in range(100)))
print('INPUT PROMPT:')
print(dataset['test'][example_index]['article'])
print('-'.join('' for x in range(100)))
print()
print(f'BASELINE HUMAN SUMMARY (n. of tokens = {max_new_tokens}):')
print(dataset['test'][example_index]['highlights'])
print('-'.join('' for x in range(100)))
print()
print(f'MODEL GENERATION - ZERO SHOT:')
print(output)
print()

---------------------------------------------------------------------------------------------------
Article n. 5925
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
You’ve found the house of your dreams – but you want to make sure the costly nightmare of dry rot isn’t lurking under the carpets and skirting boards. The solution? Call in the dogs! The appropriately named Mark Doggett has trained his two animals to sniff out the destructive fungus in old houses where it can hide in places a person would miss. Mr Doggett gave up a ten-year career in construction after hitting on the idea to set up a business using the animals’ sense of smell, which is said to be up to a million times better than that of humans. Skilled: Meg and Jess, pictured with Mark Doggett, were trained for six months to sniff out dry rot . On the case: Four-year-old Border collie Meg gets down to work sniffing out the destructive fungus . When they find 

No difference. Let's try now with **one-shot and few-shots inference**. These techniques provide one or few prompt-response pairs that match the task to the LLM, literally "showing" it what to do (*in-context learning*). It's useful to define a function that sets up one or more examples in the prompt.

In [119]:
# function that creates prompts for one- or few-shots inference
def make_prompt(full_example_indices, example_to_summarize_index):
    prompt = ''
    for index in full_example_indices:
        article = dataset['test'][index]['article']
        summary = dataset['test'][index]['highlights']
        
        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        prompt += f"""
Article:

{article}

What was going on?
{summary}


"""
    
    article = dataset['test'][example_to_summarize_index]['article']

    prompt += f"""
Article:

{article}

What was going on?
"""
        
    return prompt

##### One-Shot Inference

Construct the prompt to perform one shot inference:

In [121]:
full_example_indices = [40]             # one random example with human summary
example_to_summarize_index = 5925       # the article used as reference in this notebook

one_shot_prompt = make_prompt(full_example_indices, example_to_summarize_index)
print(one_shot_prompt)


Article:

(CNN)A high temperature of 63.5 degrees Fahrenheit might sound like a pleasant day in early spring -- unless you're in Antarctica. The chilly continent recorded the temperature (15.5 degrees Celsius) on March 24, possibly the highest ever recorded on Antarctica, according to the Weather Underground. The temperature was recorded at Argentina's Esperanza Base on the northern tip of the Antarctica Peninsula, according to CNN affiliate WTNH. (Note to map lovers: The Argentine base is not geographically part of the South American continent.) The World Meteorological Organization, a specialized United Nations agency, is in the process of setting up an international ad-hoc committee of about 10 blue-ribbon climatologists and meteorologists to begin collecting relevant evidence, said Randy Cerveny, the agency's lead rapporteur of weather and climate extremes and Arizona State University professor of geographical sciences. The committee will examine the equipment used to measure the 

Now we pass this prompt to perform the one-shot inference:

In [126]:
summary = dataset['test'][example_to_summarize_index]['highlights']

inputs = tokenizer(one_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=max_new_tokens)[0], skip_special_tokens=True)

print('-'.join('' for x in range(100)))
print(f'BASELINE HUMAN SUMMARY:')
print(f'{summary}')
print()
print('-'.join('' for x in range(100)))
print(f'MODEL GENERATION - ONE SHOT:')
print(output)
print()

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Mark Doggett has trained his two dogs to sniff out the destructive fungus .
His Border collie and English springer spaniel had six months training .
When they find dry rot they stop, stare at it and point with their nose .

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
Mark Doggett has trained his dogs to sniff out dry rot. The businessman, 30, from Wolverhampton, has been successful. He plans to train his dogs to hunt bed bugs for hospitals and hotels.



##### Few-Shots Inference

In [137]:
full_example_indices = [40, 123, 6001]  # random examples with human summary
example_to_summarize_index = 5925       # the article used as reference in this notebook

few_shots_prompt = make_prompt(full_example_indices, example_to_summarize_index)
print(few_shots_prompt)


Article:

(CNN)A high temperature of 63.5 degrees Fahrenheit might sound like a pleasant day in early spring -- unless you're in Antarctica. The chilly continent recorded the temperature (15.5 degrees Celsius) on March 24, possibly the highest ever recorded on Antarctica, according to the Weather Underground. The temperature was recorded at Argentina's Esperanza Base on the northern tip of the Antarctica Peninsula, according to CNN affiliate WTNH. (Note to map lovers: The Argentine base is not geographically part of the South American continent.) The World Meteorological Organization, a specialized United Nations agency, is in the process of setting up an international ad-hoc committee of about 10 blue-ribbon climatologists and meteorologists to begin collecting relevant evidence, said Randy Cerveny, the agency's lead rapporteur of weather and climate extremes and Arizona State University professor of geographical sciences. The committee will examine the equipment used to measure the 

Now we pass this prompt to perform the few-shots inference:

In [138]:
summary = dataset['test'][example_to_summarize_index]['highlights']

inputs = tokenizer(few_shots_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs["input_ids"], max_new_tokens=max_new_tokens)[0], skip_special_tokens=True)

print('-'.join('' for x in range(100)))
print(f'BASELINE HUMAN SUMMARY:')
print(f'{summary}')
print()
print('-'.join('' for x in range(100)))
print(f'MODEL GENERATION - FEW SHOTS:')
print(output)
print()

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Mark Doggett has trained his two dogs to sniff out the destructive fungus .
His Border collie and English springer spaniel had six months training .
When they find dry rot they stop, stare at it and point with their nose .

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOTS:
Mark Doggett has trained his dogs to sniff out dry rot. The businessman, 30, from Wolverhampton, has been successful. He plans to train his dogs to hunt out bed bugs for hotels and hospitals.



As it's possible to see, changing the number of shots doesn't *always* improves the results. In this case, it didn't at all. In general, more than ca. 5 examples doesn't help, and so is the case if you exceed the maximum model's input-context length (here 512 tokens). Anything above this threshold will be ignored. It's important to realize that prompt engineering is powerful but doesn' always improve the results.

#### Configuration Parameters for the `generate()` method

So far we've used the default settings for `model.generate()` without changing anything but the `max_new_tokens`, i.e., the maximum number of tokens to generate. By looking at the documentation (available on the HuggingFace website), one can see that other parameters can be changed. The best way is to set the configuration via the `GenerationConfig` class.

In [140]:
model.generate?

In [143]:
GenerationConfig?

Running the code line above, we can see that, among the other parameters, one can tweak the following ones:
```
    do_sample (`bool`, *optional*, defaults to `False`):
        Whether or not to use sampling ; use greedy decoding otherwise.
    temperature (`float`, *optional*, defaults to 1.0):
        The value used to modulate the next token probabilities.
    top_k (`int`, *optional*, defaults to 50):
        The number of highest probability vocabulary tokens to keep for top-k-filtering.
    top_p (`float`, *optional*, defaults to 1.0):
        If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to
        `top_p` or higher are kept for generation.
```
Let's see what happens in few cases:

In [148]:
print('-'.join('' for x in range(100)))
for my_max_new_tokens in [10, 20, 50,]:
    for my_do_sample in [True]:
        for my_temperature in [0.1, 1., 10., 50.]:
            generation_config = GenerationConfig(
                max_new_tokens=my_max_new_tokens, 
                do_sample=my_do_sample, 
                temperature=my_temperature
            )
            inputs = tokenizer(few_shots_prompt, return_tensors='pt')
            output = tokenizer.decode(
                model.generate(inputs["input_ids"], generation_config=generation_config)[0], 
                skip_special_tokens=True
            )
            print(f'MODEL GENERATION - FEW SHOT:\n{output}')
            print('-'.join('' for x in range(100)))
print(f'BASELINE HUMAN SUMMARY:)
print(summary)

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
Mark Doggett has trained his dogs to sniff
---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
Mark Doggett has trained his two dogs to
---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
There could really cost lifes with meggines
---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
Small house on North Yorkshire'll see millions under
---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
Mark Doggett has trained his dogs to sniff out dry rot. He has set up 
---------------------------------------------------------------------------------------------------
MOD

### Acknowledgements

Thanks to DeepLearning.AI for the courses that inspired this notebook.