# Dialogue Summarization with FLAN-T5

This notebook presents a GAI use case focused on the task of dialogue summarization. We will explore how the pre-trained FLAN-T5 language model from Hugging Face can be used to generate summaries, and how different Prompt Engineering techniques can influence the quality of the results.

The objective is to demonstrate the capacity of Transformer models to understand and condense textual information, and how the proper formulation of instructions can significantly improve model performance without the need for intensive re-training.

In [35]:
# Import the necessary libraries from Hugging Face transformers to work with T5 models.
from transformers import AutoModelForSeq2SeqLM, T5ForConditionalGeneration
from transformers import AutoTokenizer, T5Tokenizer
from transformers import GenerationConfig

## 1 - Summarizing the dialogue without prompt engineering

In this section, we explore the baseline performance of the pre-trained FLAN-T5 LLM from Hugging Face in generating summaries of dialogues without the use of specific prompt engineering techniques. The goal is to see how the model performs when simply presented with the dialogue text.

The list of available models in the Hugging Face `transformers` library can be found [here](https://huggingface.co/docs/transformers/index).

Below are some example dialogues along with their corresponding human-created summaries. These examples will be used to evaluate the model's performance in the following sections.

In [36]:
ejemplo_dialogo_1 = """
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
"""
ejemplo_resumen_1 = """
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
"""
ejemplo_dialogo_2 = """
#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. May, can you help me take all these things to the living room?
#Person2#: Yes, madam.
#Person1#: Ask Daniel to give you a hand?
#Person2#: No, mom, I can manage it by myself. His help just causes more trouble.
"""
ejemplo_resumen_2 = """
Mom asks May to help to prepare for the picnic and May agrees.
"""
ejemplo_dialogo_3 = """
#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also need a more powerful hard disc, more memory and a faster modem. Do you have a CD-ROM drive?
#Person2#: No.
#Person1#: Then you might want to add a CD-ROM drive too, because most new software programs are coming out on Cds.
#Person2#: That sounds great. Thanks.
"""
ejemplo_resumen_3 = """
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
"""
ejemplo_dialogo_4 = """
#Person1#: Hello, I bought the pendant in your shop, just before.
#Person2#: Yes. Thank you very much.
#Person1#: Now I come back to the hotel and try to show it to my friend, the pendant is broken, I'm afraid.
#Person2#: Oh, is it?
#Person1#: Would you change it to a new one?
#Person2#: Yes, certainly. You have the receipt?
#Person1#: Yes, I do.
#Person2#: Then would you kindly come to our shop with the receipt by 10 o'clock? We will replace it.
#Person1#: Thank you so much.
"""
ejemplo_resumen_4 = """
#Person1# wants to change the broken pendant in #Person2#'s shop.
"""

In [37]:
print(ejemplo_dialogo_1.strip().replace("\n"," "))

#Person1#: What time is it, Tom? #Person2#: Just a minute. It's ten to nine by my watch. #Person1#: Is it? I had no idea it was so late. I must be off now. #Person2#: What's the hurry? #Person1#: I must catch the nine-thirty train. #Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.


Below, the example dialogues and their human-created summaries are printed, separated by dashed lines for clarity.

In [38]:
linea_punteada = '-'.join('' for x in range(100))

dialogos = [ejemplo_dialogo_1, ejemplo_dialogo_2, ejemplo_dialogo_3, ejemplo_dialogo_4]
resumenes = [ejemplo_resumen_1, ejemplo_resumen_2, ejemplo_resumen_3, ejemplo_resumen_4]

for i, (dialogo, resumen) in enumerate(zip(dialogos, resumenes)):
    print(linea_punteada)
    print('Ejemplo ', i + 1)
    print(linea_punteada)
    print('DIALOGO DE ENTRADA:')
    print(dialogo)
    print(linea_punteada)
    print('RESUMEN HECHO POR UNA HUMANO:')
    print(resumen)
    print(linea_punteada)
    print()

---------------------------------------------------------------------------------------------------
Ejemplo  1
---------------------------------------------------------------------------------------------------
DIALOGO DE ENTRADA:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

---------------------------------------------------------------------------------------------------
RESUMEN HECHO POR UNA HUMANO:

#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

---------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------

Load the [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5) by creating an instance of the `AutoModelForSeq2SeqLM` class using the `.from_pretrained()` method.

In [39]:
nombre_modelo='google/flan-t5-base'
modelo = AutoModelForSeq2SeqLM.from_pretrained(nombre_modelo)
modelo.to('cuda')

T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseGatedActDense(
              (wi_0): Linear(in_features=768, out_features=2048, bias=False)
              (wi_1): Linear(in_features=768, out_features=2048, bias=False)
              (wo):

Download the tokenizer for the FLAN-T5 model by using the `AutoTokenizer.from_pretrained() ` method.


In [40]:
tokenizador = AutoTokenizer.from_pretrained(nombre_modelo, use_fast=True)

Let's try with a different codification.

In [41]:
oracion = "What time is it, Tom?"
oracion = "Who are you madafucker?"

oracion_codificada = tokenizador(oracion, return_tensors='pt')

oracion_decodificada = tokenizador.decode(
        oracion_codificada["input_ids"][0],
        skip_special_tokens=True
    )

print('Embbeddings:')
print(oracion_codificada["input_ids"][0])
print('\nOriginal Sentence:')
print(oracion_decodificada)

Embbeddings:
tensor([ 2645,    33,    25, 11454,     9,    89,  4636,    49,    58,     1])

Original Sentence:
Who are you madafucker?


Now it's time to explore how well the LLM summarizes a dialogue without prompt engineering. **Prompt engineering** is the act of a human changing the **prompt/instruction/input** to improve the response to a given task.


In [42]:
for i, (dialogo, resumen) in enumerate(zip(dialogos[:-2], resumenes[:-2])):
    entradas = tokenizador(dialogo, return_tensors='pt')
    salida = tokenizador.decode(
        modelo.generate(
            entradas["input_ids"].to('cuda'),
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(linea_punteada)
    print('Example ', i + 1)
    print(linea_punteada)
    print(f'Prompt input:\n{dialogo}')
    print(linea_punteada)
    print(f'Summary by human:\n{resumen}')
    print(linea_punteada)
    print(f'Models output without prompt engineering:\n{salida}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Prompt input:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

---------------------------------------------------------------------------------------------------
Summary by human:

#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

---------------------------------------------------------------------------------------------------
Models output without prompt engineering:
Person1: It's ten to nine.

--------------------------------------

It can be seen that the model's assumptions make some sense, but it doesn't seem sure what task it is supposed to perform. It seems to just invent the next sentence in the dialogue. Prompt engineering can help in this case.


## 3 - Summarizing the dialogue by being explicit with the input instruction.


###  3.1 - Zero-Shot

In [43]:
for i, (dialogo, resumen) in enumerate(zip(dialogos[:-2], resumenes[:-2])):

    prompt = f"""
              Summarize the following conversation.

              {dialogo}

              Summary:
              """

    entradas = tokenizador(prompt, return_tensors='pt')
    salida = tokenizador.decode(
        modelo.generate(
            entradas["input_ids"].to('cuda'),
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(linea_punteada)
    print('Exmple ', i + 1)
    print(linea_punteada)
    print(f'Prompt input:\n{prompt}')
    print(linea_punteada)
    print(f'Summary by human:\n{resumen}')
    print(linea_punteada)
    print(f'Models output - zero shot:\n{salida}\n')

---------------------------------------------------------------------------------------------------
Exmple  1
---------------------------------------------------------------------------------------------------
Prompt input:

              Summarize the following conversation.

              
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.


              Summary:
              
---------------------------------------------------------------------------------------------------
Summary by human:

#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

---------------------------------------------------------------------------------------------------
Mo

This is much better! But the model still doesn't capture the nuances of the conversations..


###  3.2 - Templates for FLAN-T5

It's worth mentioning that FLAN-T5 has many message templates for specific tasks. You can find the official FLAN-T5 predefined message templates [here](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py).

This point is informative for further experimentation. Since these "official" templates are in English, you could either run the model in English or try running these templates with their respective translations.

In [44]:
for i, (dialogo, resumen) in enumerate(zip(dialogos[:-2], resumenes[:-2])):

    prompt = f"""
Dialogue:

{dialogo}

What was going on?
    """

    entradas = tokenizador(prompt, return_tensors='pt')
    salida = tokenizador.decode(
        modelo.generate(
            entradas["input_ids"].to('cuda'),
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(linea_punteada)
    print('Example ', i + 1)
    print(linea_punteada)
    print(f'Prompt input:\n{prompt}')
    print(linea_punteada)
    print(f'Summary by human:\n{resumen}')
    print(linea_punteada)
    print(f'Output - ZERO SHOT:\n{salida}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Prompt input:

Dialogue:


#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.


What was going on?
    
---------------------------------------------------------------------------------------------------
Summary by human:

#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

---------------------------------------------------------------------------------------------------
Output - ZERO SHOT:
Tom is late for the train.

-----------------------

##  4 - One and Few-Shot

###  4.1 - One-Shot

In [45]:
def crear_prompt(indices_ejemplo_prompt, indice_a_resumir):
    prompt = ''
    for indice in indices_ejemplo_prompt:
        dialogo = dialogos[indice]
        resumen = resumenes[indice]

        prompt += f"""
Dialogue:

{dialogo}

What was going on?:
{resumen}


"""
    ejemplo_a_resumir = dialogos[indice_a_resumir]
    prompt += f"""
Dialogue:

{ejemplo_a_resumir}

What was going on?
"""

    return prompt


Construya el mensaje para realizar una inferencia One-Shot:


In [46]:
indices_ejemplo_prompt = [0]
indice_a_resumir = 1

one_shot_prompt = crear_prompt(indices_ejemplo_prompt, indice_a_resumir)

print(one_shot_prompt)


Dialogue:


#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.


What was going on?:

#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.




Dialogue:


#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All

Let's try one shot


In [47]:
resumen = dialogos[indice_a_resumir]

entradas = tokenizador(one_shot_prompt, return_tensors='pt')
salida = tokenizador.decode(
    modelo.generate(
        entradas["input_ids"].to('cuda'),
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(linea_punteada)
print(f'Summary by human:\n{resumen}\n')
print(linea_punteada)
print(f'Output - ONE SHOT:\n{salida}')

---------------------------------------------------------------------------------------------------
Summary by human:

#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. May, can you help me take all these things to the living room?
#Person2#: Yes, madam.
#Person1#: Ask Daniel to give you a hand?
#Person2#: No, mom, I can manage it by myself. His help just causes more trouble.


---------------------------------------------------------------------------------------------------
Output - ONE SHOT:
#Person1 wants to prepare for the picnic.

###  4.2 - Few-Shot


In [48]:
indices_ejemplo_prompt = [0, 2, 3]
indice_a_resumir = 1

few_shot_prompt = crear_prompt(indices_ejemplo_prompt, indice_a_resumir)

print(few_shot_prompt)


Dialogue:


#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.


What was going on?:

#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.




Dialogue:


#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you a

Now pass this prompt to perform a few shot inference:

In [49]:
resumen = dialogos[indice_a_resumir]

entradas = tokenizador(few_shot_prompt, return_tensors='pt')
salida = tokenizador.decode(
    modelo.generate(
        entradas["input_ids"].to('cuda'),
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(linea_punteada)
print(f'Summary by human:\n{resumen}\n')
print(linea_punteada)
print(f'Output:\n{salida}')

Token indices sequence length is longer than the specified maximum sequence length for this model (834 > 512). Running this sequence through the model will result in indexing errors


---------------------------------------------------------------------------------------------------
Summary by human:

#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. May, can you help me take all these things to the living room?
#Person2#: Yes, madam.
#Person1#: Ask Daniel to give you a hand?
#Person2#: No, mom, I can manage it by myself. His help just causes more trouble.


---------------------------------------------------------------------------------------------------
Output:
#Person1 wants to prepare for the picnic. She asks h

In this case, Few-Shot inference did not provide a significant improvement over One-Shot. And any value greater than 5 or 6 instructions within Few-Shot is usually not very helpful either. In addition, you must make sure not to exceed the model's input context length, which in our case is 512 tokens. Any value exceeding the context length will be ignored.

However, it can be observed that introducing at least one complete example (One-shot) is sufficient, providing the model with more information and qualitatively improving the overall summary.

##  5 - Parameters for inference

Let's change the configuration parameters of the `generate()` method to see different output from the LLM. So far, the only parameter that has been set was `max_new_tokens=50`, which defines the maximum number of tokens to generate. You can find a complete list of available parameters in the [Hugging Face Generation documentation](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig).

A convenient way to organize the configuration parameters is to use the `GenerationConfig` class.

By setting the parameter `do_sample = True`, several decoding strategies are activated that influence the next token from the probability distribution across the entire vocabulary. Then, you can fine-tune the outputs by modifying `temperature` and other parameters (such as `top_k` and `top_p`).

In [50]:
# generacion_config = GenerationConfig(max_new_tokens=50)
# generacion_config = GenerationConfig(max_new_tokens=10)
# generacion_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.1)
generacion_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)
# generacion_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=1.0)

entradas = tokenizador(few_shot_prompt, return_tensors='pt')
salida = tokenizador.decode(
    modelo.generate(
        entradas["input_ids"].to('cuda'),
        generation_config=generacion_config,
    )[0],
    skip_special_tokens=True
)

print(linea_punteada)
print(f'RESUMEN HECHO POR UN HUMANO:\n{resumen}\n')
print(linea_punteada)
print(f'GENERACION/SALIDA DEL MODELO - ONE SHOT:\n{salida}')

---------------------------------------------------------------------------------------------------
RESUMEN HECHO POR UN HUMANO:

#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. May, can you help me take all these things to the living room?
#Person2#: Yes, madam.
#Person1#: Ask Daniel to give you a hand?
#Person2#: No, mom, I can manage it by myself. His help just causes more trouble.


---------------------------------------------------------------------------------------------------
GENERACION/SALIDA DEL MODELO - ONE SHOT:
#Person1

In [None]:
git push -u origin main