<a href="https://colab.research.google.com/github/ninja03jod/Hands_on_Gen_AI_-_LLM-s/blob/main/Dialogue_Summarization_Using_FLAN_T5_LLM_Model_With_Zero_One_%26_Few_Shot_Inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
     torch==1.13.1 \
     torchdata==0.5.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.11.0 --quiet

Collecting pip
  Downloading pip-24.0-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-24.0
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m887.5/887.5 MB[0m [31m734.9 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.6/4.6 MB[0m [31m58.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.1/317.1 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.0/21.0 MB[0m [31m75.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m849.3/849.3 kB[0m [31m46.8 MB/s[0m eta [36m0:00:00[0m
[

In [None]:
import warnings
warnings.filterwarnings("ignore")

- ***The AutoModelForSeq2SeqLM class automatically loads a model that can handle sequence-to-sequence language modeling tasks, such as translation, summarization, and text generation.***

- ***The AutoTokenizer class automatically loads the appropriate tokenizer for a given pre-trained model. Tokenizers are responsible for converting text into tokens that models can process.***

- ***The GenerateConfig class is used to configure the generation parameters for text generation tasks. It allows specifying various options like maximum length, temperature, beam search settings, and more.***

In [None]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

#**2 - Summarize Dialogue without Promt Engineering**

##### In this use case, you will be generating a summary of a dialogue with the pre-trained Large Language Model (LLM) FLAN-T5 from Hugging Face. The list of available models in the Hugging Face transformers package can be found here.
##### Let's upload some simple dialogues from the DialogSum Hugging Face dataset. This dataset contains 10,000+ dialogues with the corresponding manually labeled summaries and topics.

In [None]:
huggingface_dataset_name = 'knkarthick/dialogsum'

dataset = load_dataset(huggingface_dataset_name)

dataset

Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading and preparing dataset csv/knkarthick--dialogsum to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
})

In [None]:
example_indices = [40, 1400]

dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
  print(dash_line)
  print("Example", i+1)
  print(dash_line)
  print("INPUT DIALOGUE:")
  print(dataset['test'][index]['dialogue'])
  print(dash_line)
  print("BASELINE HUMAN SUMMARY:")
  print(dataset['test'][index]['summary'])
  print(dash_line)
  print()

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Exam

##### **Now we are going to load the model FLANT-5 model, creating an Intsance of AutoModelForSeq2Se2LM class with the .from_pretrained() method**

In [None]:
model_name = 'google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

model

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseGatedActDense(
              (wi_0): Linear(in_features=768, out_features=2048, bias=False)
              (wi_1): Linear(in_features=768, out_features=2048, bias=False)
              (wo):

##### **To perform encoding and decoding you need work with with the text in tokenized form.**
##### **Tokenization is the process convert text into smaller units that can be proceesed by LLM's.**
##### **Download the tokenizer for FLANT-5 model by using AutoTokenizer.from_pretrained() method. Use parameter called use_fast = True thats help to convert faster tokens.**

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name,use_fast=True)

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

##### **Test the tokenizer on any sentence**

In [None]:
sentence = 'Python is high level programming language'

encoded_sent = tokenizer(sentence,return_tensors='pt')
print(f' Enoded Sentence:\n {encoded_sent}')

 Enoded Sentence:
 {'input_ids': tensor([[20737,    19,   306,   593,  6020,  1612,     1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}


#### **Each element in the attention_mask corresponds to a token in the input_ids.**
- A value of 1 means that the token should be attended to.
- A value of 0 means that the token should be ignored or masked out (typically used for padding tokens).

#### **2nd Meaning:**
- The input_ids tensor represents the tokenized version of your sentence.
- The attention_mask tensor shows that every token in this sentence is important (all values are 1), meaning there are no padding tokens in this particular example.

In [None]:
# lets decode it...

decode_sent = tokenizer.decode(encoded_sent['input_ids'][0],skip_special_tokens=True)

print(f'Decoded sentence:\n {decode_sent}')

Decoded sentence:
 Python is high level programming language


##### **Now its time to build an llm model that able to summarize the dialoug without using prompt engineering..**

### **Below the entire code snippet takes a tokenized input sentence, generates a sequence of up to 50 new tokens using the model, and decodes these tokens back into a readable string. The final output is the generated text.**

- inputs['input_ids']: This is the tokenized representation of the input sentence. It is passed to the model as input.
- max_new_tokens=50: This parameter specifies the maximum number of new tokens to generate. The model will generate up to 50 new tokens.
- model.generate(): This method generates text based on the input tokens. The output is a tensor of token IDs representing the generated text.


In [None]:
for i, index in enumerate(example_indices):

  dialogue = dataset['test'] [index] ['dialogue']
  summary  = dataset['test'] [index] ['summary']
  inputs = tokenizer(dialogue, return_tensors = 'pt')
  output = tokenizer.decode(
      model.generate(
          inputs['input_ids'],
          max_new_tokens = 50,
      )[0],
      skip_special_tokens = True
  )

  print(dash_line)
  print("Example", i+1)
  print(dash_line)
  print("INPUT DIALOGUE:")
  print(dataset['test'][index]['dialogue'])
  print(dash_line)
  print(f"BASELINE HUMAN SUMMARY: {summary}")
  print(dash_line)
  print(f"MODEL GENERATION WITHOUT PROMPT ENGINEERING:\n{output}\n")

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY: #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION WITHOUT PROMPT ENGINEERING:
Person1: It's ten to nine.

--------------------------------

### **Zero Shot Inference: (In Context Learning)**
- Inference means the output from the model...
- Zero shot inference means giving an input instructions as prompt to the model with the example...
- In this case we take an dialouge and convert it to an instruction prompt...

In [None]:
for i, index in enumerate(example_indices):

  dialogue = dataset['test'] [index] ['dialogue']
  summary  = dataset['test'] [index] ['summary']

  prompt = f"""
Summarized the following conversation.

{dialogue}

Summary:
  """

  # input takes an prompt instead of dialogue
  inputs = tokenizer(prompt, return_tensors = 'pt')
  output = tokenizer.decode(
      model.generate(
          inputs['input_ids'],
          max_new_tokens = 50,
      )[0],
      skip_special_tokens = True
  )

  print(dash_line)
  print("Example", i+1)
  print(dash_line)
  print("INPUT DIALOGUE:")
  print(dataset['test'][index]['dialogue'])
  print(dash_line)
  print(f"BASELINE HUMAN SUMMARY: {summary}")
  print(dash_line)
  print(f"MODEL GENERATION - ZERO SHOT:\n{output}\n")

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY: #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
The train is about to leave.

---------------------------------------------

#### **Do Experimentation with the prompt by changing summary to any other example and see will it make impact on the output of sumamry...**

In [None]:
for i, index in enumerate(example_indices):

  dialogue = dataset['test'] [index] ['dialogue']
  summary  = dataset['test'] [index] ['summary']

  prompt = f"""
Dialogue:

{dialogue}

What was going on?
  """
# prompt giving an instruction like what was going on in this dialogue

  # input takes an prompt instead of dialogue
  inputs = tokenizer(prompt, return_tensors = 'pt')
  output = tokenizer.decode(
      model.generate(
          inputs['input_ids'],
          max_new_tokens = 50,
      )[0],
      skip_special_tokens = True
  )

  print(dash_line)
  print("Example", i+1)
  print(dash_line)
  print("INPUT DIALOGUE:")
  print(dataset['test'][index]['dialogue'])
  print(dash_line)
  print(f"BASELINE HUMAN SUMMARY: {summary}")
  print(dash_line)
  print(f"MODEL GENERATION - ZERO SHOT:\n{output}\n")

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY: #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
Tom is late for the train.

-----------------------------------------------

### **Summarized the Dialogue with One Shot or Few Shot Inference:**
- One shot and few shot inference means providing one or multiple examples in the prompt so the model can able to predict the correct summary from the dialogue..
- In Summary the model will gets a prompt of dialogue and there summary as an example so it can able to summarized and finally it generates an inference.

In [None]:
def make_prompt(example_indices_full, example_index_to_summarized):
    prompt = ''

    for index in example_indices_full:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']
        prompt += f"Dialogue:\n{dialogue}\nSummary:\n{summary}\n\n"

    # Optionally add the dialogue to be summarized separately at the end
    dialogue_to_summarize = dataset['test'][example_index_to_summarized]['dialogue']
    prompt += f"Dialogue to summarize:\n{dialogue_to_summarize}\n\nwhat was going on?"

    return prompt

In [None]:
example_indices_full = [40]
example_index_to_summarized = 200

one_shot_prompt = make_prompt(example_indices_full,example_index_to_summarized)

print(one_shot_prompt)

Dialogue:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
Summary:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

Dialogue to summarize:
#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also need 

In [None]:
summary = dataset['test'][example_index_to_summarized]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors = 'pt')
output = tokenizer.decode(
      model.generate(
          inputs['input_ids'],
          max_new_tokens = 50,
      )[0],
      skip_special_tokens = True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his computer. #Person2 wants to upgrade his hardware.


### **Few Shot Inference:**
- In this case we supply the instrcution as prompt to the model of multiple exaples with tere summary add on we supply one dialogue for summarization purpose..

In [None]:
example_indices_full = [10,30,50]
example_index_to_summarized = 200

few_shot_prompt = make_prompt(example_indices_full,example_index_to_summarized)

print(few_shot_prompt)

Dialogue:
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
Summary:
#Person1# attends Brian's birthday party. Brian thinks #Person1# looks great and charming.

Dialogue:
#Person1#: Where are you going for your trip?
#Person2#: I think Hebei is a good place.
#Person1#: But I heard the north of China are experiencing severe sandstorms!
#Person2#: Really?
#Person1#: Yes, it's said 

In [None]:
summary = dataset['test'][example_index_to_summarized]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors = 'pt')
output = tokenizer.decode(
      model.generate(
          inputs['input_ids'],
          max_new_tokens = 50,
      )[0],
      skip_special_tokens = True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his computer. #Person2 wants to upgrade his hardware.


### **Point to consider is that as we use one shot and few shot so few shot wont gives the better results as one shot does..**
### **here sometimes people were used multiple shots to get the smooth output but it wont happens some times..**
### **here we can see that one shot is enough to get the good context of summary from the dialogue..**


## **Experiment with few shot:**

- choose different dialogues: use different example_indices_full list and example_index_to_summarized value
- change the number of shots. Be sure that use context length within 512

## **Generating Configuration Parameter:**

#### **Max len:**
- we can use output length so that will controls the output text...

#### **Top - K sample:**
- In this case we generate the top samples according to the k value..
- so in this case we generate the top summaries according to k value..
- lets say of we are working with next word prediction so in this case it can able to genearte the next word over there highest score accoridng to the value of k..

#### **Top - P sampling:**
- In Top - p sampling first it performs an iteration over tokens and calculates the cumulative probability further kept the threshold value..
- So if we set threshold 0.9 so it as it calcultes cumulative probability of multiple tokens until exceeds the threshold value..
- So it store whole tokens assign them weights over there individual probability and finally it selects those tokens over there probability or it select random toknens from them..


#### **Temeperature:**
- There are low and high temperature..
- So low temerature generates the conversative text..means it generates the text in repeated way..
- High temeperature were used to generate the text in new combination of tokens means it generate  the text with new words at each summarization..
- Sometimes it may leads towards undesirable results..

### **do_smaple:**
- making it True can able to generate various decoding strategies which influence next tokens from the probability distribution over entire vocabulary..

In [None]:
#generation_config = GenerationConfig(max_new_tokens=50)
#generation_config = GenerationConfig(max_new_tokens=50,do_sample = True)
#generation_config = GenerationConfig(max_new_tokens=50,do_sample = True,temperature=0.1)
#generation_config = GenerationConfig(max_new_tokens=50,do_sample = True,temperature=0.5)
generation_config = GenerationConfig(max_new_tokens=50,do_sample = True,temperature=1.0)

inputs = tokenizer(few_shot_prompt, return_tensors = 'pt')
output = tokenizer.decode(
      model.generate(
          inputs['input_ids'],
          generation_config=generation_config,
      )[0],
      skip_special_tokens = True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
Considering upgrading the computer, people want to do things like uploading photos, adding a painting program to the software, upgrading the hardware or the software itself. They are going to do it anyway on a CD-ROM drive as they want
