In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from transformers.pipelines.pt_utils import KeyDataset

Loading the model.

In [2]:
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    dtype="auto",
    trust_remote_code=False,
)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    trust_remote_code=True,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False,
)

Analyzing the structure of the prompt.

In [3]:
msg = [{
    "role": "user",
    "content": "Tell me a joke about chickens."
}]

prompt = pipe.tokenizer.apply_chat_template(msg, tokenize=False)

print(prompt)

<|user|>
Tell me a joke about chickens.<|end|>
<|endoftext|>


It is possible to see that there is a `role` and a `content` flag. These can be modified to extract a more precise behaviour.

If one wishes to obtain more variations in the output, the temperature parameter can be used. As it gets higher, the distribution of next words gets more uniform, causing more diverse outputs.

In [4]:
output = pipe(msg, do_sample=True, temperature=1)

print(output[0]["generated_text"])

 Why did the farmer throw his egg instead of eating it? Because it was still chicken!


For some tasks, the user can provide some examples of prompt-outputs for the LLM. This is referred as In-Context Learning. Here, the model can "infer" how to use a word it has never seen.

In [5]:
one_shot_prompt = [
    {
        "role": "user",
        "content": "A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:"
    },
    {
        "role": "assistant",
        "content": "I have a Gigamuru that my uncle gave me as a gift. I love to play it at home."
    },
    {
        "role": "user",
        "content": "To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:"
    }
]

outputs = pipe(one_shot_prompt)
print(outputs[0]["generated_text"])


 During the medieval reenactment, the knight skillfully screeged the wooden target with precision and grace.


One can also break the desired task into smaller sub-problems. For that, the output of the LLM can be included as the input for a new prompt, saving resources and potentially getting better results. This is referred as Chain-Prompting.

In [6]:
product_prompt = [
    {
        "role": "user",
        "content": "Create a name and slogan for a chatbot based on LLMs for analyzing gym data."}
]
outputs = pipe(product_prompt)
product_description = outputs[0]["generated_text"]

print(product_description)

print()

sales_prompt = [
    {
        "role": "user",
        "content": f"Generate a very short sales pitch for the following product: '{product_description}'"
    }
]

outputs = pipe(sales_prompt)
sales_pitch = outputs[0]["generated_text"]

print(sales_pitch)

 Name: FitAI
Slogan: "Transforming Gym Data into Actionable Insights with AI"

 "Elevate your fitness journey with FitAI! Our cutting-edge AI technology takes your gym data and transforms it into actionable insights, helping you achieve your fitness goals faster and smarter. Say goodbye to guesswork and hello to personalized workout plans that adapt to your progress. FitAI â€“ where AI meets fitness!"


Another broad area of Prompt-Engineering is Reasoning. Here, the focus is to somehow make the LLM think about the anwser before actually giving the final output. Superficially, this can be achieved through a well structured prompt, which can create a Chain-of-Thought behaviour.

In [7]:
zeroshot_cot_prompt = [
    {
        "role": "user",
        "content": "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Let's think step-by-step."}
]

outputs = pipe(zeroshot_cot_prompt)

print(outputs[0]["generated_text"])

 Step 1: The cafeteria starts with 23 apples.
Step 2: They used 20 apples to make lunch, so we subtract 20 from the initial amount: 23 - 20 = 3 apples remaining.
Step 3: The cafeteria bought 6 more apples, so we add 6 to the remaining amount: 3 + 6 = 9 apples.

The cafeteria now has 9 apples.


Another Reasoning approach is to leverage severall outputs from the LLM. This is an expensive technique, as the model needs to process the output more than once. This can be simulated through a single prompt.

In [8]:
zeroshot_tot_prompt = [
    {
        "role": "user",
        "content": "Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realizes they're wrong at any point then they leave. The question is 'The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?' Make sure to discuss the results."
    }
]

outputs = pipe(zeroshot_tot_prompt)

print(outputs[0]["generated_text"])

 Expert 1:
Step 1: Start with the initial number of apples, which is 23.

Expert 2:
Step 1: Subtract the number of apples used for lunch, which is 20.
Step 2: Add the number of apples bought, which is 6.

Expert 3:
Step 1: Start with the initial number of apples, which is 23.
Step 2: Subtract the number of apples used for lunch, which is 20.
Step 3: Add the number of apples bought, which is 6.

Results:
All three experts arrived at the same answer:

Expert 1: 23 - 20 + 6 = 9 apples
Expert 2: (23 - 20) + 6 = 9 apples
Expert 3: (23 - 20) + 6 = 9 apples

All three experts agree that the cafeteria has 9 apples left.
