<a href="https://colab.research.google.com/github/parthivz/Fundamentals-of-GenAI-Course-Lab/blob/main/12_Prompt_Engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generative AI Prompt Engineering

In this lab we will use a famous Encoder-Decoder LLM: Flan-T5. You will first do simple tasks to get your hands dirty.

Then you will learn about few shot prompting, and see how at a certain point the LLM just cannot do the task.

You will finish by testing the different possible configurations.

## Install Required Dependencies

Now install the required packages to use Hugging Face transformers and datasets.

In [1]:
!pip install --upgrade pip
!pip install \
    transformers==4.35.2 \
    datasets==2.15.0  --quiet

Collecting pip
  Downloading pip-25.0.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-25.0.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-25.0.1
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m44.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m49.1 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2025.3.0 requires fsspec==2025.3.0, but you have fsspec 2023.10.0 which is incompatible.
sentence-transformers 3.4.

Load the datasets, Large Language Model (LLM), tokenizer, and configurator. Do not worry if you do not understand yet all of those components - they will be described and discussed later in the notebook.

In [3]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

## Doing Simple Tasks with Flan-T5

In this case we wil do simple sentiment analysis so you get the gist of how to use these LLMs. You will use the pre-trained Large Language Model (LLM) FLAN-T5 from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index)

In [4]:
huggingface_dataset_name = "imdb"

dataset = load_dataset(huggingface_dataset_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [5]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

Let's use from the train set, but it is the same for us now

In [6]:
import numpy as np
def get_random_review_and_label():
  random_index = np.random.randint(1, 25000)
  random_review = None # get random review
  label = None  # get label of that review
  return random_review, label

random_review, label = get_random_review_and_label()

dash_line = '-'.join('' for x in range(100))

print(f'Review: \n\n{random_review}')
print(dash_line)
print(f'Label: {label}')

Review: 

None
---------------------------------------------------------------------------------------------------
Label: None


Let's now use the model! For that we need to use the Tokenizer to transform the text into the "model language" (more on this during the course). Also we need to download the model.

In [7]:
model_name= 'google/flan-t5-base' # load google's flan-t5
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")   # Load the model
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small") # Load the tokenizer



config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

In [8]:
sentence = random_review[:50]
print(f'Review trimmed: {sentence}')

sentence_encoded = tokenizer(sentence)

sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"],
        skip_special_tokens=True
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

TypeError: 'NoneType' object is not subscriptable

Now let's call the model. As this is a AutoModelForSeq2SeqLM this means that is a LLM for seq2seq tasks, like summarizing or text generation, so let's put our prompt that way.

In [9]:
review, label = get_random_review_and_label()

prompt = f"""
Analyze the sentiment of the following review:

{review}

Sentiment:

"""

input = tokenizer(prompt)

In [10]:
import torch
model.generate(torch.tensor([input['input_ids']]), max_new_tokens=50)

tensor([[   0, 2841,    1]])

In [11]:
tokenizer.decode(
        model.generate(torch.tensor([input['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

'negative'

And what was the real sentiment? Remember in this dataset `0` is negative and `1` is positive

In [12]:
label

## Summarize News without Prompt Engineering

In this use case, you will be generating a summary of news with Flan-T5.

Let's upload some simple dialogues from the dialogsum Hugging Face dataset. This dataset contains 10,000+ articles with the corresponding manually labeled summaries.

In [13]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

  return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs)


Generating validation split: 0 examples [00:00, ? examples/s]

  return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs)


Generating test split: 0 examples [00:00, ? examples/s]

  return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs)


In [14]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

Print a couple of dialogues with their baseline summaries.

In [16]:
def get_random_dialogue_and_summary():
    # Select the dataset split (train, validation, or test)
    split = "train"  # Change to "validation" or "test" if needed

    # Get the number of rows in the selected dataset split
    num_rows = len(dataset[split])

    # Pick a random index
    random_index = np.random.randint(0, num_rows)

    # Retrieve a random dialogue and its summary
    random_dialogue = dataset[split][random_index]["dialogue"]
    summary = dataset[split][random_index]["summary"]

    return random_dialogue, summary

# Get a random dialogue and summary
random_dialogue, summary = get_random_dialogue_and_summary()

# Print results
dash_line = '-' * 100
print(f'Dialogue: \n\n{random_dialogue}')
print(dash_line)
print(f'Summary: \n\n{summary}')


Dialogue: 

#Person1#: I am really thirsty.
#Person2#: How about we go and get something to drink?
#Person1#: Let's do that.
#Person2#: Do you know what you want to get?
#Person1#: A soda sounds good.
#Person2#: Soda isn't the best thing to drink when you're thirsty.
#Person1#: Why is that?
#Person2#: Soda isn't good for you.
#Person1#: What should I drink then?
#Person2#: You should really drink water.
#Person1#: That sounds good.
#Person2#: It's a lot better than soda.
----------------------------------------------------------------------------------------------------
Summary: 

#Person1#'s thirsty and wants some soda. #Person2# thinks soda isn't good and suggests drinking water.


Test the tokenizer encoding and decoding a simple sentence:

Now it's time to explore how well the base LLM summarizes a dialogue without any prompt engineering. **Prompt engineering** is an act of a human changing the **prompt** (input) to improve the response for a given task.

In [17]:
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    inputs = None # Tokenize the dialogue
    output = None # Get the response from the model

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Without prompt engineering:\n{output}\n')

----------------------------------------------------------------------------------------------------
Example  1
----------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: You have to finish your dinner before you leave the table.
#Person2#: But, Mom, I can't eat anymore. I'm stuffed.
#Person1#: What's the matter? It doesn't taste good?
#Person2#: It's not that I don't like your cooking, but I ate some cookies before we had dinner.
----------------------------------------------------------------------------------------------------
Summary:
#Person2# tells #Person1# #Person2# can't eat anymore because #Person2# ate some cookies before dinner.
----------------------------------------------------------------------------------------------------
Model Summary - Without prompt engineering:
None

----------------------------------------------------------------------------------------------------
Example  2
-------------------------

You can see that the guesses of the model make some sense, but it doesn't seem to be sure what task it is supposed to accomplish. Seems it just makes up the next sentence in the dialogue. Prompt engineering can help here.

## Summarize Dialogue with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation.

<a name='3.1'></a>
### 3.1 - Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - you can take the dialogue and convert it into an instruction prompt. This is often called **zero shot inference**.  
Wrap the dialogue in a descriptive instruction and see how the generated text will change:

In [18]:
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """
    inputs = None # Tokenize the dialogue
    output = None # Get the response from the model

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Zero shot inference prompt engineering:\n{output}\n')


----------------------------------------------------------------------------------------------------
Example  1
----------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: How are you doing?
#Person2#: Everything's fine with me.
#Person1#: What can I do for you today?
#Person2#: Is it possible for me to view the apartment today?
#Person1#: Unfortunately, you will not be able to view it today.
#Person2#: Why can't I view it today?
#Person1#: You'll need to make an appointment to view the apartment.
#Person2#: I understand. May I make an appointment then?
#Person1#: How does this Friday sound?
#Person2#: Friday at 6 pm. would be perfect.
#Person1#: That will be fine.
#Person2#: Thanks for your help.
----------------------------------------------------------------------------------------------------
Summary:
#Person2# wants to view the apartment. #Person1# helps #Person2# to make an appointment on Friday at 6 pm.
--------------

This is much better! But the model still does not pick up on the nuance of the conversations though.

## Summarize Dialogue with One Shot and Few Shot Inference

**One shot and few shot inference** are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.  

## One Shot Inference



In [21]:
def make_prompt_and_return_real_summary(number_of_shots):
    prompt = ""

    for _ in range(number_of_shots):
        dialogue, summary = get_random_dialogue_and_summary()

        # Properly formatting the few-shot example
        prompt += f"""
Dialogue:

{dialogue}

Summary:

{summary}


"""

    # Final dialogue without summary for model inference
    dialogue_to_analyze, real_summary = get_random_dialogue_and_summary()

    prompt += f"""
Dialogue:

{dialogue_to_analyze}

Summary:
"""

    return prompt, real_summary


Construct the prompt to perform one shot inference:

In [22]:
one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)

print(one_shot_prompt)


Dialogue:

#Person1#: Sir, may I please see your license and registration? Do you know how fast you were going?
#Person2#: No, I'm not sure. I think about 65 mph, right?
#Person1#: You're not sure? You were going at 90 miles per hour! That's 25 mph over the legal speed limit! Have you been drinking?
#Person2#: No, Officer, not at all.
#Person1#: Then how can you explain your behavior?
#Person2#: Well, I guess I just wasn't paying attention to the speedometer.
#Person1#: Not paying attention to the speedometer? Why not?
#Person2#: Um, because I was busy talking to my friend.
#Person1#: On a cell phone?
#Person2#: Yes, I was using a cell phone. I just bought it, so I decided to give my friend a phone call to tell him about it. While I did that I also turned on the radio and was listening to one of my favorite songs, and eating some food I had bought at a fast food restaurant, and, um. . . guess I had too many distractions.
#Person1#: That's definitely true. I'm going to have to give you

Now pass this prompt to perform the one shot inference:

In [23]:
for i in range (3):
  one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)
  inputs = tokenizer(one_shot_prompt)
  output = tokenizer.decode(
      model.generate(torch.tensor([inputs['input_ids']]), max_new_tokens=50)[0],
      skip_special_tokens=True
  )

  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{one_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - One shot inference prompt engineering:\n{output}\n')

----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
Dialogue:

Dialogue:

#Person1#: Tom has grown six inches within a year. 
#Person2#: He has reached puberty. His mind and body both will change a lot. 
#Person1#: Yeah, do you see his Adam's apple? It becomes bigger. 
#Person2#: Time is flying. I still remember everything when he was a child. 

Summary:

#Person1# and #Person2# talk about Tom's change over time.



Dialogue:

#Person1#: May I see your license?
#Person2#: But officer, did I do something wrong?
#Person1#: Did you see the speed limit sign. It says thirty five miles an hour here.
#Person2#: But my speed meter reads only thirty miles.
#Person1#: Then why did my radar show you're going forty five?

Summary:

----------------------------------------------------------------------------------------------------
Summary:


### Few Shot Inference

Let's explore few shot inference by adding two more full dialogue-summary pairs to your prompt.

In [24]:
for i in range (3):
  few_shot_prompt, real_summary = make_prompt_and_return_real_summary(5)
  inputs = tokenizer(few_shot_prompt)
  output = tokenizer.decode(
      model.generate(torch.tensor([inputs['input_ids']]), max_new_tokens=50)[0],
      skip_special_tokens=True
  )

  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{few_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - Few shot inference prompt engineering:\n{output}\n')

Token indices sequence length is longer than the specified maximum sequence length for this model (1939 > 512). Running this sequence through the model will result in indexing errors


----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
Dialogue:

Dialogue:

#Person1#: Good afternoon. what can I do for you?
#Person2#: I want to pick up my valuables.
#Person1#: May I have your key please?
#Person2#: Sure. Here you are.
#Person1#: Here is your valuable. Is that right?
#Person2#: Yes, thank you.

Summary:

#Person1# helps #Person2# to pick up #Person2#'s valuables.



Dialogue:

#Person1#: What can I do for you, Sir?
#Person2#: I'm Tom in room 508, and I want a wake-up call tomorrow morning.
#Person1#: At what time?
#Person2#: 6:15 am, please.
#Person1#: No problem, we have a computer wake-up service. Please dial 2 first and then the time. That is to say, dial 2 and then 0615.
#Person2#: I see. I should dial all the numbers 20615 in turn. Thank you. By the way, if I want to change my wake up time, what shall I do

In this case, few shot did not provide much of an improvement over one shot inference.  And, anything above 5 or 6 shot will typically not help much, either.  Also, you need to make sure that you do not exceed the model's input-context length which, in our case, if 512 tokens.  Anything above the context length will be ignored.

However, you can see that feeding in at least one full example (one shot) provides the model with more information and qualitatively improves the summary overall.

## Configuration Parameters

In [25]:
generation_config = None # Create a very creative generative config

inputs = tokenizer(few_shot_prompt)
output = tokenizer.decode(
    model.generate(torch.tensor([inputs['input_ids']]), generation_config=generation_config)[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{real_summary}\n')



----------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
The Opera House is playing a musical at 17:25.
----------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person2# drives #Person1# to the Opera House at 17:25 before a musical starts at 6:00.



Comments related to the choice of the parameters in the code cell above:
- Choosing `max_new_tokens=10` will make the output text too short, so the dialogue summary will be cut.
- Putting `do_sample = True` and changing the temperature value you get more flexibility in the output.

As you can see, prompt engineering can take you a long way for this use case, but there are some limitations.