# Prompt Engineering

## Using Text Generation Models

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [None]:
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

# Create a pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

Device set to use cuda


In [None]:
# Prompt
messages = [
    {"role": "user", "content": "create a funny joke about chickens,"}
]

# Generate the output
output = pipe(messages)

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48


In [None]:
print(output[0]['generated_text'])

 Why did the chicken join the band? Because it had the drumsticks!


In [None]:
# Apply prompt template
prompt = pipe.tokenizer.apply_chat_template(
    messages, tokenize=False
)

print(prompt)

<|user|>
create a funny joke about chickens,<|end|>
<|endoftext|>


### Controlling Model Output

In [None]:
# Using a high temperature
output = pipe(
    messages,
    do_sample=True,
    temperature=1.0
)

print(output[0]['generated_text'])

 Why did the chicken go to the dance club? Because it wanted to break its routine and strut its stuff on the floor!


In [None]:
# Using a high top_p
output = pipe(
    messages,
    do_sample=True,
    top_p=1
)

print(output[0]['generated_text'])

 How do you know a chicken is playing poker?


## Intro to Prompt Engineering

In [None]:
persona = "You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.\n\n"
instruction = "Summarize the key findings of the paper provided.\n\n"
context = "Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.\n\n"
data_format = "Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.\n\n"
audience = "The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.\n\n"
tone = "The tone should be professional and clear.\n\n"
text = "MY TEXT TO SUMMARIZE"
data = f"Text to summarize: {text}"

query = f"{persona}{instruction}{context}{data_format}{audience}{tone}{data}"

In [None]:
print(query)

You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.

Summarize the key findings of the paper provided.

Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.

Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.

The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.

The tone should be professional and clear.

Text to summarize: MY TEXT TO SUMMARIZE


## In-Context Learning: Providing Examples

In [None]:
# Use a single example of using the made-up word in a sentence
one_shot_prompt = [
    {
        "role": "user",
        "content": "A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:"
    },
    {
        "role": "assistant",
        "content": "I have a Gigamuru that my uncle game me as a gift. I love to play it at home."
    },
    {
        "role": "user",
        "content": "To 'screeg' somethin is to swing a sword at it. An example of a sentence that uses the word screeg is:"
    }
]

print(tokenizer.apply_chat_template(one_shot_prompt, tokenize=False))

<|user|>
A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:<|end|>
<|assistant|>
I have a Gigamuru that my uncle game me as a gift. I love to play it at home.<|end|>
<|user|>
To 'screeg' somethin is to swing a sword at it. An example of a sentence that uses the word screeg is:<|end|>
<|endoftext|>


In [None]:
# Generate the output
outputs = pipe(one_shot_prompt)

In [None]:
print(outputs[0]['generated_text'])

 In the ancient battle, the knight bravely screeged at the charging dragon, hoping to protect his kingdom from its fiery wrath.


## Chain Prompting: Breaking up the Problem

In [None]:
# Create name and slogan for a product
product_prompt = [
    {"role": "user",
    "content": "Create a name and slogan for a chatbot that leverages LLMs."}
]

In [None]:
outputs = pipe(product_prompt)
product_description = outputs[0]['generated_text']
print(product_description)

 Name: ChatSage
Slogan: "Unleashing the power of AI to enhance your conversations."


In [None]:
# Based on a name and slogan for a product, generate a sales pitch
sales_prompt = [
    {
        "role": "user",
        "content": f"Generate a very short sales pitch for the following product: '{product_description}'"
    }
]

In [None]:
outputs = pipe(sales_prompt)
sales_pitch = outputs[0]['generated_text']
print(sales_pitch)

 Introducing ChatSage, the revolutionary AI-powered tool designed to elevate your conversations to new heights. With our cutting-edge technology, we unleash the power of AI to enhance your interactions, making every conversation more engaging, insightful, and meaningful. Experience the future of communication with ChatSage today!


## Reasoning with Generative Models

### Chain-of-Thought

In [None]:
# answering with chain-of-thought
cot_prompt = [
    {
        "role": "user",
        "content": "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?"
    },
    {
        "role": "assistant",
        "content": "Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11."
    },
    {
        "role": "user",
        "content": "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?"
    }
]

# Generate the output
outputs = pipe(cot_prompt)
print(outputs[0]['generated_text'])

 The cafeteria started with 23 apples. They used 20 apples for lunch, so they had 23 - 20 = 3 apples left. After buying 6 more apples, they now have 3 + 6 = 9 apples. The answer is 9.


In [None]:
# Zero-shot chain-of-thought
zeroshot_cop_prompt = [
    {
        "role": "user",
        "content": "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Let's think step-by-step."
    }
]

# Generate the output
outputs = pipe(zeroshot_cop_prompt)
print(outputs[0]['generated_text'])

 Step 1: Start with the initial number of apples in the cafeteria, which is 23.

Step 2: Subtract the number of apples used to make lunch, which is 20.
23 - 20 = 3 apples remaining.

Step 3: Add the number of apples bought, which is 6.
3 + 6 = 9 apples.

So, the cafeteria now has 9 apples.


### Tree-of-Thought

In [None]:
# Zero-shot tree-of-thought
zeroshot_tot_prompt = [
    {
        "role": "user",
        "content": "Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realizes they're wrong at any point then they leave. The question is 'The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?' Make sure to discuss the results."
    }
]

# Generate the output
outputs = pipe(zeroshot_tot_prompt)
print(outputs[0]['generated_text'])

 Expert 1:
Step 1: Start with the initial number of apples, which is 23.

Expert 2:
Step 1: Subtract the number of apples used for lunch, which is 20. This leaves us with 3 apples.
Step 2: Add the number of apples bought, which is 6. This results in a total of 9 apples.

Expert 3:
Step 1: Begin with the initial number of apples, which is 23.
Step 2: Subtract the number of apples used for lunch, which is 20. This leaves us with 3 apples.
Step 3: Add the number of apples bought, which is 6. This results in a total of 9 apples.

Discussion:
All three experts arrived at the same answer, which is 9 apples. This indicates that their calculations were correct. The cafeteria started with 23 apples, used 20 for lunch, and then bought 6 more, resulting in a total of 9 apples.


## Output Verification

In [None]:
# Zero-shot learning: Providing no examples
zeroshot_prompt = [
    {
        "role": "user",
        "content": "Create a character profile for an RPG game in JSON format."
    }
]

# Generate the output
outputs = pipe(zeroshot_prompt)
print(outputs[0]['generated_text'])

 ```json
{
  "name": "Eldrin the Wise",
  "race": "Elf",
  "class": "Wizard",
  "level": 10,
  "alignment": "Chaotic Good",
  "strength": 8,
  "dexterity": 14,
  "constitution": 12,
  "intelligence": 18,
  "wisdom": 16,
  "charisma": 10,
  "weapon_skill": "Magic",
  "armor_skill": "Light",
  "spell_slots": {
    "cantrips": ["Mage Hand", "Detect Magic", "Mage Armor", "Prestidigitation", "Identify", "Invisibility"],
    "1st level": ["Fireball", "Magic Missile", "Shield", "Cure Wounds", "Detect Thoughts", "Charm Person"],
    "2nd level": ["Light", "Hold Person", "Sleep", "Committee", "Enlarge Person", "Teleport"],
    "3rd level": ["Frostbite", "Fog Cloud", "Disintegrate", "Dimension Door", "Mirror Image", "Misty Step"]
  },
  "equipment": {
    "weapon": "Staff of the Ancients",
    "armor": "Leather Armor",
    "accessories": ["Staff of Power", "Ring of Protection", "Boots of Speed"]
  },
  "background": "Adept",
  "personality": "Curious and inventive, Eldrin is always seeking new k

In [None]:
# One-shot learning: Providing an example of the output structure
one_shot_template = """
Create a short character profile for an RPG game. Make
sure to only use this format:
{
 "description": "A SHORT DESCRIPTION",
 "name": "THE CHARACTER'S NAME",
 "armor": "ONE PIECE OF ARMOR",
 "weapon": "ONE OR MORE WEAPONS"
}
"""

one_shot_prompt = [
    {
        "role": "user",
        "content": one_shot_template
    }
]

# Generate the output
outputs = pipe(one_shot_prompt)
print(outputs[0]['generated_text'])

 {
 "description": "A cunning rogue with a mysterious past, skilled in stealth and deception.",
 "name": "Shadowcloak",
 "armor": "Leather Hood",
 "weapon": "Dagger"
}


## Grammar: Constrained Sampling

In [4]:
import gc
import torch
if 'model' in globals():
    del model
if 'tokenizer' in globals():
    del tokenizer
if 'pipe' in globals():
    del pipe

In [5]:
# Flush memory
gc.collect()
torch.cuda.empty_cache()

In [9]:
!pip install llama-cpp-python

Collecting llama-cpp-python
  Downloading llama_cpp_python-0.3.6.tar.gz (66.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.9/66.9 MB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.3.6-cp311-cp311-linux_x86_64.whl size=4070561 sha256=4292e20ee4e47ded772d

In [10]:
from llama_cpp.llama import Llama

In [11]:
#Load Phi-3
llm = Llama.from_pretrained(
    repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
    filename="*fp16.gguf",
    n_gpu_layers=-1,
    n_ctx=2048,
    verbose=False
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Phi-3-mini-4k-instruct-fp16.gguf:   0%|          | 0.00/7.64G [00:00<?, ?B/s]

llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


In [13]:
output = llm.create_chat_completion(
    messages=[
        {"role": "user", "content": "create a warrior for an RPG in JSON for mat."},
    ],
    response_format={"type": "json_object"},
    temperature=0,
)

In [19]:
import json
json_output = json.dumps(json.loads(output['choices'][0]['message']['content']), indent=4)
print(json_output)

{
    "warrior": {
        "name": "Grom the Brave",
        "class": "Warrior",
        "level": 1,
        "attributes": {
            "strength": 10,
            "dexterity": 5,
            "constitution": 15,
            "intelligence": 8,
            "wisdom": 6,
            "charisma": 7
        },
        "skills": {
            "melee_combat": {
                "basic_attack": {
                    "damage": 5,
                    "accuracy": 0.8
                },
                "special_attack": {
                    "name": "Club Swing",
                    "damage": 10,
                    "accuracy": 0.7
                }
            },
            "defense": {
                "armor": {
                    "type": "Chainmail",
                    "defense": 10
                },
                "shield": {
                    "type": "Round Shield",
                    "defense": 5
                }
            },
            "healing": {
                "basic_heal": {
