# **Prompt Engineering**

This Jupyter Notebook explores various techniques for prompt engineering. Prompt engineering is a cost-effective approach to fine-tune language models (LLMs) for specific tasks.

**Techniques Covered:**
1. Zero-shot and template
2. Few-shot learning
3. System prompt and template
4. Chain-of-Thought
5. Self-consistency sampling

By exploring these prompt engineering techniques, we can enhance the capabilities of language models and tailor their output to specific tasks or contexts.

In [None]:
from pathlib import Path
import re
import os
import datasets
from collections import Counter
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM
import torch
from tqdm.notebook import tqdm
import random
from utils import seed_everything

DSDIR = Path(os.environ['DSDIR'])
os.environ['TOKENIZERS_PARALLELISM'] = 'false'
seed_everything(53)

In this notebook, we will once again utilize the Phi-2 model, which has proven to be highly effective in our previous experiments.

In [None]:
# Initialize the model and its tokenizer
model = AutoModelForCausalLM.from_pretrained(
    DSDIR / "HuggingFace_Models/microsoft/phi-2",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,  # Allow using code that was not written by HuggingFace
    attn_implementation="flash_attention_2"  # Optimize the model with Flash Attention
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(DSDIR / "HuggingFace_Models/microsoft/phi-2")

To generate text from the model, we will utilize the same function that was used in the first hands-on.

In [None]:
def generation(prompt, **gen_parameters):
    """Generate text from a prompt and print it."""
    model_inp = tokenizer(prompt, return_tensors="pt").to("cuda")
    # the generate() method is a succession of forward (auto-regressive) 
    out = model.generate(input_ids=model_inp["input_ids"], **gen_parameters)
    print(tokenizer.decode(out[0]))

In [None]:
generation("What is a supercomputer ?", do_sample=False, max_new_tokens=20)

## **Zero-shot and template**
Zero-shot and template is a powerful technique in prompt engineering. It allows us to guide the generation of text from a prompt by using predefined templates. This technique enables the language model to generate responses in a specific format or style, making it highly versatile and adaptable to various tasks and contexts.
![image](./images/template.jpg)

In [None]:
prompt = "Write a haiku about winter."
generation(prompt, do_sample=False, max_new_tokens=30)

<hr style="border:1px solid red"> 

<span style="color:red">**Task**:</span> Your task is to generate a haiku about winter using the language model. You are not allowed to change the generation parameters of the model. You can either use a predefined template or provide specific instructions to the model. Be creative and capture the essence of winter in your haiku.

In [None]:
############ Complete or modify here ############
prompt = ""
#################################################

**Solution:**

**Test it here:**

In [None]:
generation(prompt, do_sample=False, max_new_tokens=30)

<hr style="border:1px solid red"> 

## **Few-shot learning**
Few-shot learning allows the language model to improve its performance on a specific task using a small amount of labeled data. By providing one or several examples in its context (prompts), the language model can generalize and generate responses that align with the given task. This approach is particularly useful when only limited labeled data is available, making it a cost-effective solution for fine-tuning language models.
![image](./images/few_shot.jpg)

In [None]:
prompt = "Give me 3 Chinese names."
generation(prompt, do_sample=False, max_new_tokens=20)

<hr style="border:1px solid red"> 

<span style="color:red">**Task**:</span> Your task is to generate three Chinese names using the language model. You are not allowed to change the generation parameters of the model. To accomplish this, you will utilize the technique of few-shot learning. Try to keep the same format as the example in the previous cell.

In [None]:
############ Complete or modify here ############
prompt = """
Give me 3 Chinese names."""
#################################################

**Solution:**

**Test it here:**

In [None]:
generation(prompt, do_sample=False, max_new_tokens=20)

<hr style="border:1px solid red"> 

Let's try few-shot learning with another example.

In [None]:
prompt = """Pierre and Nathan fight while Hatim reads a manga next to them. Thomas carries a chair.
List all the objects in the story."""
generation(prompt, do_sample=False, max_new_tokens=50)

<hr style="border:1px solid red"> 

<span style="color:red">**Task**:</span> Your task is to make the language model correctly list the objects in the story. You are not allowed to change the generation parameters of the model. To accomplish this, you will utilize the technique of few-shot learning. Try to keep the same format as the example in the previous cell.

In [None]:
############ Complete or modify here ############
prompt = """
Pierre and Nathan fight while Hatim reads a manga next to them. Thomas carries a chair.
List all the objects in the story."""
#################################################

**Solution:**

**Test it here:**

In [None]:
generation(prompt, do_sample=False, max_new_tokens=6)

<hr style="border:1px solid red">

## **System prompt and templates**
By providing a system prompt, which sets the context or scenario, and using predefined templates, the language model can generate responses that align with the given context. This approach allows for more controlled and targeted generation, making it highly effective for specific tasks and contexts.
![image](./images/system_prompt.jpg)

We will try to make a roleplay assistant like the first hands-on but this time we will only use prompt engineering.

In [None]:
prompt = """<|system|>Orphaned at age three, when he witnessed his mother's brutal murder, Dexter was adopted by Miami police officer Harry Morgan. Recognizing the boy's trauma and the subsequent development of his sociopathic tendencies, Harry trained Dexter to channel his gruesome bloodlust into vigilantism, killing only heinous criminals who slip through the criminal justice system.
<|user|>How do you approach a new case, Dexter?
<|assistant|>"""
generation(prompt, do_sample=False, temperature=0.8, max_new_tokens=50)

<hr style="border:1px solid red"> 

<span style="color:red">**Task Description**:</span> Your task is to modify the given templates to make the language model act like the character described. You should only change the templates and not the system prompt or user input. Remember, you are not allowed to modify the generation parameters of the model either.

In [None]:
############ Complete or modify here ############
prompt = """<|system|>Orphaned at age three, when he witnessed his mother's brutal murder, Dexter was adopted by Miami police officer Harry Morgan. Recognizing the boy's trauma and the subsequent development of his sociopathic tendencies, Harry trained Dexter to channel his gruesome bloodlust into vigilantism, killing only heinous criminals who slip through the criminal justice system.
<|user|>How do you approach a new case, Dexter?
<|assistant|>"""
#################################################

**Solution:**

**Test it here:**

In [None]:
generation(prompt, do_sample=False, max_new_tokens=100)

<hr style="border:1px solid red">

## **Chain-of-Thought**

The concept of Chain-of-Thought is to enhance the language model's ability to generate logical responses by incorporating thinking steps before providing the answer. This technique significantly improves the efficiency and accuracy of the model when performing tasks that require logical reasoning. By simulating a thought process, the model can generate more coherent and contextually appropriate responses, making it highly effective for logic-based tasks.
![image](./images/chain_of_thought.jpg)

In [None]:
prompt = """Question: 5+11-12
Answer: 4
Question: 8+22*5
Answer:"""
generation(prompt, do_sample=False, max_new_tokens=4)

<hr style="border:1px solid red"> 

<span style="color:red">**Task**:</span> Your task is to make the language model calculate the given equation correctly. You are not allowed to change the generation parameters of the model. To accomplish this, you should use the Chain-of-Thought technique. Please follow the same format as the example in the previous cell.

In [None]:
############ Complete or modify here ############
prompt = """Question: 5+11-12
Answer: 4
Question: 8+22*5
Answer:"""
#################################################

**Solution:**

**Test it here:**

In [None]:
generation(prompt, do_sample=False, max_new_tokens=15)

<hr style="border:1px solid red">

## **Self-consistency sampling**
In this section, we will explore the technique of self-consistency sampling. This method involves generating multiple responses based on the same prompt and selecting the most consistent ones. By doing so, we can improve the coherence and consistency of the generated text.

For the exercise in this section, we will use the self-consistency sampling method to make the language model generate a Python function that counts the number of letter 's' (lower and upper case) in a string.
![image](./images/self-consistency_sample.jpg)

First, let's try to generate the Python function on zero-shot with the model:

In [None]:
prompt = "Question: Write a Python function named `count_s` that count the number of s in a string.\nAnswer:"
generation(prompt, do_sample=False, max_new_tokens=100)

The answer is almost correct as it counts all the lowercase 's'. To further enhance the generation, we can provide more specific instructions in the prompt. However, for the purpose of this exercise, we will keep the input prompt unchanged.

<hr style="border:1px solid red"> 

<span style="color:red">**Task**:</span> Write a function named `sampling_generation` that takes a prompt as input and generates multiple samples from the prompt using the language model. The function should return a list of generated samples. You should get rid of the input prompt from the output.

**Ease level 1:**

**Ease level 2:**

**Solution:**

**Test it here:**<br>
note: `<|endoftext|>` is token end of sequence and the padding token.

In [None]:
list_out = sampling_generation(prompt, nb_samples=8, do_sample=True, temperature=0.8, max_new_tokens=150)

for text in list_out:
    print(text)
    print("#" * 20)

<hr style="border:1px solid red">

To evaluate the generated code and choose the most suitable option, we can utilize the following function. However, please note that this function is designed for the purpose of this exercise and may not be suitable for real-life tasks.

In [None]:
def eval_count_s(code_str):
    """Evaluate the count_s function on the text."""
    test = code_str + "\nresult = count_s('Same old same old.')"
    try:
        exec(test)
        return locals()['result'] == 2
    except Exception as e:
        return False

In [None]:
code_str = """def count_s(string):
    return string.count('s')"""
eval_count_s(code_str)

In [None]:
code_str = """def count_s(string):
    return string.lower().count('s')"""
eval_count_s(code_str)

In [None]:
code_str = """def count_s(string):
    ret"""
eval_count_s(code_str)

We will also require a function that extracts only the generated function from the LLM generation. We will use the following function:

In [None]:
def extract_function(llm_gen):
    """Extract the function named count_s from the code."""
    match = re.search(r".*?(def count_s.*?return .*?)\n", llm_gen, flags=re.DOTALL)
    if match:
        return match.group(1)
    else:
        return None

In [None]:
llm_gen = """```Python
def count_s(s):
    return s.count('s')
```"""
print(extract_function(llm_gen))

<hr style="border:1px solid red"> 

<span style="color:red">**Task**:</span> Your task is to write a function named `self_const_gen` that utilizes the `sampling_generation` function to generate multiple outputs from a given prompt. The generated code will be extracted using the `extract_function` function and evaluated using the `eval_count_s` function. The `self_const_gen` function should return the first valid code generated (or `None` if there is none).

**Ease level 1:**

**Ease level 2:**

**Solution:**

**Test it here:**<br>
You may have to run the cell several times.

In [None]:
valid_func_str = self_const_gen(prompt, nb_samples=16, do_sample=True, temperature=0.8, max_new_tokens=100)
print(valid_func_str)