# 🚀 Prompt Engineering kick-start :

## About this notebook

This notebook is my first contact with **prompt engineering**.
I found this very compreheensive guide on Kaggle and decided to follow step-by-step, writing everything down - NOT copy-past - to understand everything about the codes and its outputs.

Along the cell I added some notes, such as some explanations about the code that I want to understand better or some thoretical concepts that I have never heard before and wanted to add to this notebook.


## References:

1. The guide on Kaggle I refer is the [Mastering Prompt Engineering with Generative AI](https://www.kaggle.com/code/marawanxmamdouh/mastering-prompt-engineering-with-generative-ai)

2. [Everything you need to know about Few-Shot Learning](https://blog.paperspace.com/few-shot-learning/)

3. And the good fellow ChatGPT, from OpenAI.




# 1. Configuring Kernel and Installing Dependencies¶

Let's set up the kernel and install the necessary packages to leverage PyTorch, Hugging Face transformers, and datasets.

Note: Executing this cell may require a few minutes.




In [1]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.11.0  --quiet

Collecting pip
  Downloading pip-24.0-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-24.0
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m887.5/887.5 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.6/4.6 MB[0m [31m69.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.1/317.1 MB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.0/21.0 MB[0m [31m65.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m849.3/849.3 kB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m
[2K 

Load the datasets, Large Language Model (LLM), tokenizer, and configurator. Don't stress if you haven't grasped all these components yet; they'll be explained and discussed later in the notebook.


# 2. Dialogue Summarization without Prompt Engineering

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

In [3]:
# 1st option:
#huggingface_dataset_name = "knkarthick/dialogsum"
#dataset = load_dataset(huggingface_dataset_name)

# 2nd option:
# Load the dataset using Hugging Face's datasets library
dataset = load_dataset("knkarthick/dialogsum")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading and preparing dataset csv/knkarthick--dialogsum to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

In [4]:
example_indices = [40, 200]

dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Exa

In [5]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
})

In [6]:
# Specify the model name
model_name='google/flan-t5-base'

In [7]:
# Load the model
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

To handle encoding and decoding, it's crucial to engage with text in a tokenized format. Tokenization is the practice of breaking down texts into smaller units, facilitating processing by LLM models.

Retrive the tokenizer for the FLAN-T5 model by employing the AutoTokenizer.from_pretrained() method. The use_fast parameter activates the fast tokenizer. CUrrently, we won´t delve into the intricacies of this setting, but you can explore the tokenizer parameters further in the documentation.


In [8]:
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Test the tokenizer encoding and decoding a simple sentence

**Explaining the code below**:

1. Defining a Test Sentence:

sentence = "This is a test sentence.": This line defines a test sentence. It's a string containing the text "This is a test sentence." This is the input text that we want to encode and decode using the tokenizer.

2. Encoding the Sentence:

sentence_encoded = tokenizer(sentence, return_tensors='pt'): This line uses the tokenizer to encode the sentence. The tokenizer converts the sentence into a format suitable for processing by a model. It returns a dictionary-like object (sentence_encoded) containing the encoded representation of the sentence. The return_tensors='pt' argument specifies that the tokenizer should return PyTorch tensors. This is a common practice for compatibility with PyTorch-based models.

3. Decoding the Encoded Sentence:

   - sentence_decoded = tokenizer.decode(sentence_encoded["input_ids"][0], skip_special_tokens=True): This line decodes the encoded sentence back into its original form.

      - sentence_encoded["input_ids"][0]: Accesses the encoded representation of the sentence. sentence_encoded["input_ids"] likely contains a list of token IDs representing the sentence. [0] accesses the first (presumably only) sentence in the input. -
      
      - skip_special_tokens=True: This argument tells the tokenizer to skip special tokens during decoding. Special tokens are tokens added by the tokenizer to mark the beginning and end of sequences, as well as to denote padding, unknown tokens, etc. By setting this argument to True, the special tokens are excluded from the decoded string.

4. tokenizer.decode(...): This is a method provided by the tokenizer to decode the tokenized input. It converts the token IDs back into human-readable text. The decoded sentence is stored in the variable sentence_decoded.


Overall, this code snippet demonstrates the basic process of encoding a sentence using a tokenizer, converting it into a format suitable for model input, and then decoding it back into its original form. This is a common workflow in natural language processing tasks.


---



**NOTE**:
return_tensors (str or TensorType, optional) — If set, will return tensors instead of list of python integers. Acceptable values are:
'tf': Return TensorFlow tf.constant objects.
'pt': Return PyTorch torch.Tensor objects.
'np': Return Numpy np.ndarray objects.

In [10]:
# Define a test sentence
sentence = "This is a test sentence"

# Encode the sentence using the tokenizer, returning PyTorch tensors
sentence_encoded = tokenizer(sentence, return_tensors="pt") #  'pt': Return PyTorch torch.Tensor objects.

# Decode the encoded sentence, skipping the special tokens

sentence_decoded = tokenizer.decode(
          sentence_encoded["input_ids"][0],
          skip_special_tokens=True
)

# Print the encoded sentence's representation
print("ENCODED SENTENCE:")
print(sentence_encoded["input_ids"][0]) #[0] accesses the first (presumably only) sentence in the input

# Print the decoded sentence
print('\nDECODED SENTENCE:')
print(sentence_decoded)

ENCODED SENTENCE:
tensor([ 100,   19,    3,    9,  794, 7142,    1])

DECODED SENTENCE:
This is a test sentence


Let's dive into assessing how effectively the base LLM summarizes a dialogue without incorporating any prompt engineering. In simpler terms, **prompt engineering** involves humans tweaking the input to enhance the model's response for a specific task.

In [16]:
# Iterate through example indices, where each index represents a specific example:

for i,index in enumerate (example_indices):
  # retrieve dialogue and summary for the current example
  dialogue = dataset["test"][index]['dialogue']
  summary = dataset ['test'][index]['summary']

  #tokenize the dialogue and convert it to a vector of Pytorch tensors
  inputs = tokenizer(dialogue, return_tensors='pt')

  # Generate an output using the model, limiting the new tokens to 50
  #this uses the LLM to generate a summary of the dialogue without any prompt engineering
  output = tokenizer.decode(
      model.generate(inputs["input_ids"],
                     max_new_tokens = 50,

      )[0],
      skip_special_tokens=True
    )

     # Show the results
  print(dash_line)
  print('Example ', i + 1)
  print(dash_line)
  print(f'INPUT PROMPT:\n{dialogue}')
  print(dash_line)
  print(f'BASELINE HUMAN SUMMARY:\n{summary}')
  print(dash_line)
  print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1: It's ten to nine.

-------------------------------

# 3. Summarizing Dialogue Using an Instruction Prompt

## 3.1 Zero Shot Inference Using an Instruction Prompt

When you want to guide the model to perform a specific task, like summarizing a dialogue, one approach is to transform the dialogue into an instruction prompt. This technique is commonly known as zero-shot inference. For insights into what zero-shot learning is and why it's significant for LLM models, you might find this blog from AWS helpful.

So here we will wrap the dialogue in a clear instruction and observe how the generated text responds:

In [18]:
# Iterate through example indices, where each index represents a specific example

for i, index in enumerate(example_indices):
  # retrieve dialogue and summary for the current example
  dialogue = dataset["test"][index]['dialogue']
  summary = dataset ['test'][index]['summary']

  # construct an instruction prompt for summarizing the dialogue
  prompt = f"""
Summarize the following conversation.

  {dialogue}

Summary:
  """
  # Tokenize the constructed prompt and convert it to PyTorch tensors
  inputs = tokenizer(prompt, return_tensors='pt')

  #Generate an output using the model, limiting the new tokens to 50
  # THis uses the LLM to generate a summary of the dialogue with the constructed prompt
  output = tokenizer.decode(
      model.generate(
          inputs["input_ids"],
          max_new_tokens=50,

      )[0],
      skip_special_tokens=True
  )
  # Show the results
  print(dash_line)
  print('Example ', i + 1)
  print(dash_line)
  print(f'INPUT PROMPT:\n{prompt}')
  print(dash_line)
  print(f'BASELINE HUMAN SUMMARY:\n{summary}')
  print(dash_line)
  print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')






---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation. 

  #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

Summary:
  
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
The train is about t

## Optional:

1. Explore variations in the prompt text to observe changes in inferences. Test whether ending the prompt with an empty string versus **Summary** affects the generated output.

2. Experiment with rephrasing the initial part of the prompt text from Summarize the following conversation. to something else, and observe its impact on the generated output.

### Doing the optionals:

1. Explore variations in the prompt text to observe changes in inferences. Test whether ending the prompt with an empty string versus **Summary** affects the generated output:


In [21]:
# Iterate through example indices, where each index represents a specific example
for i, index in enumerate(example_indices):
    # retrieve dialogue and summary for the current example
    dialogue = dataset["test"][index]['dialogue']
    summary = dataset['test'][index]['summary']

    # Construct an instruction prompt for summarizing the dialogue
    # Prompt ends with an empty string
    prompt_empty = f"""
    Summarize the following conversation.

    {dialogue}

    """

    # Construct an instruction prompt for summarizing the dialogue
    # Prompt ends with the summary
    prompt_with_summary = f"""
    Summarize the following conversation.

    {dialogue}

    Summary: {summary}
    """

    # Tokenize the constructed prompts and convert them to PyTorch tensors
    inputs_empty = tokenizer(prompt_empty, return_tensors='pt')
    inputs_with_summary = tokenizer(prompt_with_summary, return_tensors='pt')

    # Generate an output using the model, limiting the new tokens to 50
    # This uses the LLM to generate a summary of the dialogue with the constructed prompts
    output_empty = tokenizer.decode(
        model.generate(
            inputs_empty["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    output_with_summary = tokenizer.decode(
        model.generate(
            inputs_with_summary["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    # Show the results for prompt ending with an empty string
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT (Empty String):\n{prompt_empty}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - Empty String Prompt:\n{output_empty}\n')

    # Show the results for prompt ending with the summary
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT (With Summary):\n{prompt_with_summary}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - Prompt With Summary:\n{output_with_summary}\n')


---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT (Empty String):

    Summarize the following conversation. 

    #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - Empty String Prom

**Conclusion**: when using the {summary} on the prompt the model got confused and the output was very innacurate.



---



2. Experiment with rephrasing the initial part of the prompt text from Summarize the following conversation. to something else, and observe its impact on the generated output.

In [23]:
# Iterate through example indices, where each index represents a specific example
for i, index in enumerate(example_indices):
    # retrieve dialogue and summary for the current example
    dialogue = dataset["test"][index]['dialogue']
    summary = dataset['test'][index]['summary']

    # Rephrase the initial part of the prompt text
    # Example: "Provide a summary for the following conversation."
    rephrased_prompt = f"""
    Provide a summary for the following conversation.

    {dialogue}

    Summary:
    """

    # Tokenize the constructed prompt and convert it to PyTorch tensors
    inputs = tokenizer(rephrased_prompt, return_tensors='pt')

    # Generate an output using the model, limiting the new tokens to 50
    # This uses the LLM to generate a summary of the dialogue with the rephrased prompt
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    # Show the results
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT (Rephrased):\n{rephrased_prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - Rephrased Prompt:\n{output}\n')


---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT (Rephrased):

    Provide a summary for the following conversation. 

    #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

    Summary:
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATI

**Conclusion**: when rephrasing the initial part of the prompt text, nothing changed in the output.



---



##  3.2 Zero Shot Inference Using the FLAN-T5 Prompt Template


In [27]:
# Iterate through example indices, where each index represents a specific example:

for  i, index in enumerate(example_indices):
  # retrieve dialogue and summary for the current example
  dialogue = dataset['test'][index]['dialogue']
  summary = dataset['test'][index]['summary']

  #construct a prompt for summarizing the dialogue using the FLAN-T5 template:

  prompt = f"""
  Dialogue:

  {dialogue}

  What was going on?
  """

  # Tokenize the constructed prompt and convert it to PyTorch tensors:
  inputs = tokenizer(prompt, return_tensors='pt')

  # Generate an output using the model, limiting the new tokens to 50
  # This uses the LLM to generate a summary of the dialogue with the constructed prompt
  output = tokenizer.decode(
      model.generate(
          inputs['input_ids'],
          max_new_tokens = 50,
      )[0],
      skip_special_tokens=True
  )

  # Show the results
  print(dash_line)
  print('Example ', i + 1)
  print(dash_line)
  print(f'INPUT PROMPT:\n{prompt}')
  print(dash_line)
  print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
  print(dash_line)
  print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')



---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

  Dialogue: 

  #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

  What was going on?
  
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
Tom is late for the train.

-----

Notice that this prompt from FLAN-T5 did help a bit, but still struggles to pick up on the nuance of the conversation. This is what you will try to solve with the few shot inferencing.

# 4. Summarizing Dialogue Using One Shot and Few Shot Inference

In the realms of one-shot and few-shot inference, the approach involves presenting an LLM with either a single or a handful of complete examples of prompt-response pairs that align with your task. This practice, known as "in-context learning," establishes a state in the model that comprehends the specifics of your task. You can delve deeper into this concept by reading this blog from HuggingFace.

## 4.1  One Shot Inference

One-shot inference is a specific case of few-shot inference where the model is trained with only a single example (or "shot") of each task or category.

Imagine you're teaching a model to recognize fruits. In a one-shot learning scenario, you would provide the model with just one picture of each type of fruit, such as one apple, one banana, one orange, etc. The model then learns to recognize these fruits based on just these single examples.

One-shot inference is quite challenging because the model has very limited information to learn from. However, it can be useful in situations where obtaining large amounts of training data is difficult or expensive. It also highlights the model's ability to generalize from very little information, which can be important in real-world applications where new tasks or categories may arise with only a few examples available for training.



---

We'll construct a function that accepts a list of example_indices_full, creates a prompt with full examples, and finally appends the prompt you want the model to complete (example_index_to_summarize). For this, we'll use the same FLAN-T5 prompt template from section 3.2.



In [35]:
def make_prompt(full_examples_indices, index_to_summarize):
    """
    Construct a prompt for one-shot or few-shot inference.

    Parameters
    ----------
    full_examples_indices : list
        A list containing indices for complete dialogues to be included in the prompt. These dialogues serve as examples
        for the model to learn from (for one-shot or few-shot inference).
    index_to_summarize : int
        The index for the dialogue that the model is expected to give a summary for.

    Returns
    -------
    str
        A prompt string that is constructed as per the given parameters - full dialogues examples followed by a dialogue
        that needs to be summarized.
    """
    prompt = ''

    # Go through each index in the full examples list
    for index in full_examples_indices:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']

        # Add each dialogue and its summary to the prompt string, followed by a stop sequence. The stop sequence
        # '{summary}\n\n\n' is essential for FLAN-T5 model. Other models may have their own different stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

What was going on?
{summary}


"""

    # Now add the dialogue that needs to be summarized by the model
    dialogue_to_summarize = dataset['test'][index_to_summarize]['dialogue']

    # Append this new dialogue to the prompt string
    prompt += f"""
Dialogue:

{dialogue_to_summarize}

What was going on?
"""

    # Return the constructed prompt
    return prompt

In [36]:
# Create the prompt for one-shot inference:

# Define index for full example to be included in the prompt as a one-shot example
full_examples_indices = [40]
# Define the index for the dialogue that the model is expected to give a summary for
example_index_to_summarize = 200

# Create the prompt for one-shot inference
one_shot_prompt = make_prompt(full_examples_indices, example_index_to_summarize)

print(one_shot_prompt)


Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.



Dialogue:

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also ne

In [37]:
# Now, let's use this prompt for one-shot inference and observe the results (Generate a summary using the LLM with the prompt you just created):


# Retrieve the human-generated summary for the 'example_index_to_summarize' example
summary = dataset['test'][example_index_to_summarize]['summary']

# Tokenize the one-shot prompt and convert it to PyTorch tensors
inputs = tokenizer(one_shot_prompt, return_tensors='pt')

# Generate an output using the model, limiting the new tokens to 50
# This uses the LLM to generate a summary of the dialogue with the one-shot prompt
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

# Show the results
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.


## 4.2 Few Shot Inference

Few-shot inference is a technique used in artificial intelligence, especially in machine learning models like language models. In easy words, it means teaching a model to do a task even when it only has a few examples or shots to learn from.

Think of it like this: Imagine you're learning to recognize different types of dogs. With traditional learning methods, you might need to see hundreds of pictures of different dog breeds to recognize them accurately. However, with few-shot learning, you might only need to see a handful of pictures of each breed to understand what they look like. This is like teaching a model with just a few examples or shots of each type of dog.

In machine learning, few-shot inference is often used when there's not much data available for training a model, or when the model needs to quickly adapt to new tasks or scenarios with limited examples. Instead of needing tons of examples to learn from, the model can generalize from just a few examples and still make accurate predictions or generate relevant outputs.

In [38]:
# Define indices for full examples to be included in the prompt as a few-shot examples
full_examples_indices = [40, 80, 120]
# Define the index for the dialogue that the model is expected to give a summary for
example_index_to_summarize = 200

# Create the prompt for few-shot inference
few_shot_prompt = make_prompt(full_examples_indices, example_index_to_summarize)

print(few_shot_prompt)


Dialogue:

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.



Dialogue:

#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. 

Now pass this prompt to perform a few shot inference:

In [39]:
# Retrieve the human-generated summary for the specified example
summary = dataset['test'][example_index_to_summarize]['summary']

# Tokenize the few-shot prompt and convert it to PyTorch tensors
inputs = tokenizer(few_shot_prompt, return_tensors='pt')

# Generate an output using the model, limiting the new tokens to 50
# This uses the LLM to generate a summary of the dialogue with the few-shot prompt
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

# Show the results
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

Token indices sequence length is longer than the specified maximum sequence length for this model (819 > 512). Running this sequence through the model will result in indexing errors


---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.


"In this scenario, using few-shot inference didn't yield a significant improvement over one-shot inference. Moreover, going beyond 5 or 6 shots generally doesn't offer much help either. It's crucial to be mindful of not exceeding the model's input-context length, which, in our case, is 512 tokens. Any content beyond this context length will be disregarded.

However, it's noticeable that including at least one full example (one shot) furnishes the model with additional information, resulting in a qualitative enhancement in the overall summary.

Feel free to experiment with few-shot inference:

Select different dialogues by modifying the indices in the example_indices_full list and the example_index_to_summarize value. Adjust the number of shots, ensuring it remains within the model's 512 context lengths for fair comparison.

Observe how well few-shot inference performs with other examples."

Reference: https://www.kaggle.com/code/marawanxmamdouh/mastering-prompt-engineering-with-generative-ai

# 5. Conclusion

Interesting guide to follow. It gave me a very good understanding of what prompt engineering works. I had contact with NLP and the tokenization before, but never had this vision to use for prompt engineering.

I recommend to anyone who is interested in this subject.
