<a href="https://colab.research.google.com/github/kinjaljoshi/llm_param_config/blob/main/prompt_engg_llm_params.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GenAI : Summarize Dialogue

We will compare one shot, and few shot inferences

### Summarize Dialogue without Prompt Engineering

* Data set DialogSum - https://huggingface.co/datasets/knkarthick/dialogsum

In [None]:
!pip install datasets

In [3]:
#load dataset
from datasets import load_dataset

# Load the dataset
dataset = load_dataset("knkarthick/dialogsum")

# Print a few samples
for i in range(5):  # Print 5 samples from the dataset
    print(f"Sample {i+1}:")
    print(dataset['train'][i])
    print("-" * 100)


Sample 1:
{'id': 'train_0', 'dialogue': "#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\n#Person2#: I found it would be a good idea to get a check-up.\n#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.\n#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?\n#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\n#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.\n#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.\n#Person2#: Ok, thanks doctor.", 'summary': "Mr

In [4]:
dataset.shape

{'train': (12460, 4), 'validation': (500, 4), 'test': (1500, 4)}

Load the FLAN-T5 model, creating an instance of the `AutoModelForSeq2SeqLM` class with the `.from_pretrained()` method.

To perform encoding and decoding, you need to work with text in a tokenized form. **Tokenization** is the process of splitting texts into smaller units that can be processed by the LLM models.

Download the tokenizer for the FLAN-T5 model using `AutoTokenizer.from_pretrained()` method. Parameter `use_fast` switches on fast tokenizer.

* The use_fast=True argument ensures that a fast tokenizer (based on the Rust tokenizers library) is used, which is much more efficient than the standard Python-based tokenizer.
* Fast tokenizers are optimized for speed and provide better handling of special cases like wordpiece or byte-pair encoding (BPE).

In [None]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

#Load pretrained model and tokenizer
model_name='google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
#AutoTokenizer will load internally tokenizer for the model
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

## Generate output from base model without any Prompt Engineering

In [7]:
sample_sentence = "What is the evaporation ?"

tokenized_input = tokenizer(sample_sentence, return_tensors='pt').to('cuda')

model.to('cuda')
tokenized_input = tokenized_input.to('cuda')

output = model.generate(
        input_ids=tokenized_input["input_ids"],
        max_new_tokens=100,
    )
decoded_output = tokenizer.decode(
        output[0],
        skip_special_tokens=True
    )

print(decoded_output)



evaporation of water


## Sample dialogue for summary

In [8]:
sample_indexes = [10,20,30]
for i, index in enumerate(sample_indexes):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    inputs = tokenizer(dialogue, return_tensors='pt').to('cuda')
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=100,
        )[0],
        skip_special_tokens=True
    )

    print('-' *100)
    print('Sample ', i + 1)
    print('-' *100)
    print(f'Dialogue:\n{dialogue}')
    print('-' *100)
    print(f'Labelled Summary:\n{summary}')
    print('-' *100)
    print(f'Model Output:\n{output}\n')

----------------------------------------------------------------------------------------------------
Sample  1
----------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
----------------------------------------------------------------------------------------------------
Labelled 

## Zero shot Vs Few Shot Inference

###Zero-Shot Inference refers to the ability of a machine learning model (especially a large language model) to perform a task without having been explicitly trained on that specific task. Instead, the model generalizes its learned knowledge from related tasks to make predictions in a completely new scenario.


In [9]:
for i, index in enumerate(sample_indexes):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
Create a summary of the following conversation.

{dialogue}

Summary:
    """

    # Input constructed prompt instead of the dialogue.
    inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )


    print('-' *100)
    print('Sample ', i + 1)
    print('-' *100)
    print(f'Dialogue:\n{dialogue}')
    print('-' *100)
    print(f'Labelled Summary:\n{summary}')
    print('-' *100)
    print(f'Model Output - Zero Shot :\n{output}\n')

----------------------------------------------------------------------------------------------------
Sample  1
----------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
----------------------------------------------------------------------------------------------------
Labelled 

FLAN-T5 prompt templates that are published  [here](https://github.com/google-research/FLAN/tree/main/flan/v2).


A **stop sequence** is a specific string of characters or tokens that signals an LLM to stop generating further output. It helps in controlling the output length and ensuring structured responses.

The stop sequence '\n\n\n' is used for FLAN-T5



####One Shot Inference
The model is given only one example (a single demonstration) to understand and perform a new task.


In [11]:
def make_inference_prompt(example_indices, test_indices):
  #Give an example
    prompt = ''
    for index in example_indices:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']
        prompt += f"""
Dialogue:

{dialogue}

Summary:
{summary}


"""
    #Test dialogue
    dialogue = dataset['test'][test_indices]['dialogue']

    prompt += f"""
Dialogue:

{dialogue}

Summary:
"""

    return prompt

one_shot_prompt = make_inference_prompt([30], 20)
print(one_shot_prompt)


#Use above prompt to process summary
label_summary = dataset['test'][20]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors='pt').to('cuda')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print('-' *100)
print(f'Labelled Summary:\n{label_summary}')
print('-' *100)
print(f'Model Output - One Shot :\n{output}\n')


Dialogue:

#Person1#: Where are you going for your trip?
#Person2#: I think Hebei is a good place.
#Person1#: But I heard the north of China are experiencing severe sandstorms!
#Person2#: Really?
#Person1#: Yes, it's said that Hebes was experiencing six degree strong winds.
#Person2#: How do these storms affect the people who live in these areas?
#Person1#: The report said the number of people with respiratory tract infections tended to rise after sandstorms. The sand gets into people's noses and throats and creates irritation.
#Person2#: It sounds that sandstorms are trouble for everybody!
#Person1#: You are quite right.

Summary:
#Person2# plans to have a trip in Hebei but #Person1# says there are sandstorms in there.



Dialogue:

#Person1#: What's wrong with you? Why are you scratching so much?
#Person2#: I feel itchy! I can't stand it anymore! I think I may be coming down with something. I feel lightheaded and weak.
#Person1#: Let me have a look. Whoa! Get away from me!
#Person2#

###Few Shot Inference

The model is given a few examples (typically 5 to 10) before performing the task.The model generalizes based on multiple demonstrations and applies the learned pattern to new inputs.

In [12]:
few_shot_prompt = make_inference_prompt([1,5,10,15,25,30,35], 20)
print(few_shot_prompt)


#Use above prompt to process summary
label_summary = dataset['test'][20]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors='pt').to('cuda')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print('-' *100)
print(f'Labelled Summary:\n{label_summary}')
print('-' *100)
print(f'Model Output - One Shot :\n{output}\n')

Token indices sequence length is longer than the specified maximum sequence length for this model (2402 > 512). Running this sequence through the model will result in indexing errors



Dialogue:

#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to communicate with their clients.
#Person1#: They will just have to change their communication methods. I don't want any - one using Instant Messaging in this office. It wastes too much time! Now, please continue with 

We can modify the configuration parameters of the `generate()` method to see a different output from the LLM. So far the only parameter that you have been setting was `max_new_tokens=50`, which defines the maximum number of tokens to generate. A full list of available parameters can be found in the [here](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig).


Some Important ones are as under
**Parameters that control the generation strategy used**

1. **do_sample** (bool, optional, defaults to False) — Whether or not to use sampling ; use greedy decoding otherwise.
2. **num_beams** (int, optional, defaults to 1) — Number of beams for beam search. 1 means no beam search.
3. **num_beam_groups** (int, optional, defaults to 1) — Number of groups to divide num_beams into in order to ensure diversity among different groups of beams. this paper for more details.
4. **use_cache** (bool, optional, defaults to True) — Whether or not the model should use the past last key/values attentions (if applicable to the model) to speed up decoding.

**Parameters for manipulation of the model output logits**

1. **temperature** (float, optional, defaults to 1.0) — The value used to modulate the next token probabilities.
2. **top_k** (int, optional, defaults to 50) — The number of highest probability vocabulary tokens to keep for top-k-filtering.
3. **top_p** (float, optional, defaults to 1.0) — If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.

#### Temperature - Controls the randomness of token selection.

* Higher values (>1.0) → More randomness (creative, diverse outputs).
* Lower values (< 1.0) → More deterministic (focused, repetitive).
* At 0, always picks the highest-probability token).

In [20]:
few_shot_prompt = make_inference_prompt([1,5,10,15,25,30,35], 100)
#print(few_shot_prompt)


#Use above prompt to process summary
label_summary = dataset['test'][100]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors='pt').to('cuda')
output_temp_0_1 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        temperature=0.1
    )[0],
    skip_special_tokens=True
)

output_temp_0_7 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        temperature=0.7
    )[0],
    skip_special_tokens=True
)

output_temp_1 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        temperature=1.0
    )[0],
    skip_special_tokens=True
)

output_temp_1_5 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        temperature=1.5
    )[0],
    skip_special_tokens=True
)

print('-' *100)
print(f'Labelled Summary:\n{label_summary}')
print('-' *100)
print(f'Model Output - Few Shot temperature 0.1 :\n{output_temp_0_1}\n')

print('-' *100)
print(f'Model Output - Few Shot temperature 0.7:\n{output_temp_0_7}\n')

print('-' *100)
print(f'Model Output - Few Shot temperature 1.0:\n{output_temp_1}\n')

print('-' *100)
print(f'Model Output - Few Shot temperature 1.5:\n{output_temp_1_5}\n')


----------------------------------------------------------------------------------------------------
Labelled Summary:
#Person1# and Mike have a disagreement on how to act out a scene. #Person1# proposes that Mike can try to act in #Person1#'s way.
----------------------------------------------------------------------------------------------------
Model Output - Few Shot temperature 0.1 :
The two men are trying to figure out how to react to a cut.

----------------------------------------------------------------------------------------------------
Model Output - Few Shot temperature 0.7:
#Person1#: Jason and Laura have been together for three years.

----------------------------------------------------------------------------------------------------
Model Output - Few Shot temperature 1.0:
A close up of their relationship. They agree on how they make their lines.

----------------------------------------------------------------------------------------------------
Model Output - Few Sho

#### ***top_k sampling*** Limits the model to selecting from only the top k most probable tokens at each step.

In [23]:
few_shot_prompt = make_inference_prompt([1,5,10,15,25,30,35], 100)
#print(few_shot_prompt)


#Use above prompt to process summary
label_summary = dataset['test'][100]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors='pt').to('cuda')
output_top10 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        top_k=10
    )[0],
    skip_special_tokens=True
)

output_top20 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        top_k=20
    )[0],
    skip_special_tokens=True
)

output_top30 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        top_k=30
    )[0],
    skip_special_tokens=True
)

output_top40 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        top_k=40
    )[0],
    skip_special_tokens=True
)

print('-' *100)
print(f'Labelled Summary:\n{label_summary}')
print('-' *100)
print(f'Model Output - Few Shot top_k - 10 :\n{output_top10}\n')

print('-' *100)
print(f'Model Output - Few Shot top_k - 20:\n{output_top20}\n')

print('-' *100)
print(f'Model Output - Few Shot top_k - 30:\n{output_top30}\n')

print('-' *100)
print(f'Model Output - Few Shot top_k - 40:\n{output_top40}\n')


----------------------------------------------------------------------------------------------------
Labelled Summary:
#Person1# and Mike have a disagreement on how to act out a scene. #Person1# proposes that Mike can try to act in #Person1#'s way.
----------------------------------------------------------------------------------------------------
Model Output - Few Shot top_k - 10 :
#Person1: Jason and Laura will try to make it work for them by talking to each other.

----------------------------------------------------------------------------------------------------
Model Output - Few Shot top_k - 20:
Jason wants to tell someone his feelings but Mike is frustrated with her.

----------------------------------------------------------------------------------------------------
Model Output - Few Shot top_k - 30:
The two guys talked about the problem of Mike and Laura with others.

----------------------------------------------------------------------------------------------------
Model 

#### ***top_p*** - Instead of a fixed number (k), top_p dynamically selects tokens whose cumulative probability adds up to top_p.

In [24]:
few_shot_prompt = make_inference_prompt([1,5,10,15,25,30,35], 100)
#print(few_shot_prompt)


#Use above prompt to process summary
label_summary = dataset['test'][100]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors='pt').to('cuda')
output_top0_5 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        top_p=0.5
    )[0],
    skip_special_tokens=True
)

output_top0_7 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        top_p=0.7
    )[0],
    skip_special_tokens=True
)

output_top0_8 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        top_p=0.8
    )[0],
    skip_special_tokens=True
)

output_top1 = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample = True,
        top_p=1.0
    )[0],
    skip_special_tokens=True
)

print('-' *100)
print(f'Labelled Summary:\n{label_summary}')
print('-' *100)
print(f'Model Output - Few Shot top_p=0.5  :\n{output_top0_5}\n')

print('-' *100)
print(f'Model Output - Few Shot top_p=0.7 :\n{output_top0_7}\n')

print('-' *100)
print(f'Model Output - Few Shot top_p=0.8 :\n{output_top0_8}\n')

print('-' *100)
print(f'Model Output - Few Shot top_p=1.0 :\n{output_top1}\n')


----------------------------------------------------------------------------------------------------
Labelled Summary:
#Person1# and Mike have a disagreement on how to act out a scene. #Person1# proposes that Mike can try to act in #Person1#'s way.
----------------------------------------------------------------------------------------------------
Model Output - Few Shot top_p=0.5  :
Mike and Laura will try to find a solution to the problem that Jason and Laura have been together for three years.

----------------------------------------------------------------------------------------------------
Model Output - Few Shot top_p=0.7 :
The problem with Jason and Laura is that they are not in a happy or positive mood.

----------------------------------------------------------------------------------------------------
Model Output - Few Shot top_p=0.8 :
A cut may be a solution, but Mike and Laura aren't convinced.

----------------------------------------------------------------------------