<h1>Chapter 6 - Prompt Engineering</h1>

<i>Methods for improving the output through prompt engineering.</i>

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon"></a>  <a href="https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/"><img src="https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K"></a>  <a href="https://github.com/HandsOnLLM/Hands-On-Large-Language-Models"><img src="https://img.shields.io/badge/GitHub%20Repository-black?logo=github"></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter06/Chapter%206%20-%20Prompt%20Engineering.ipynb)

---

This notebook is for Chapter 6 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).

---

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961">
<img src="https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png" width="350"/></a>

### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---


In [1]:
# %%capture
# !pip install langchain>=0.1.17 openai>=1.13.3 langchain_openai>=0.1.6 transformers>=4.40.1 datasets>=2.18.0 accelerate>=0.27.2 sentence-transformers>=2.5.1 duckduckgo-search>=5.2.2

## Loading our model

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
model

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)
          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)
          (rotary_emb): Phi3RotaryEmbedding()
        )
        (mlp): Phi3MLP(
          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)
          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)
          (activation_fn): SiLU()
        )
        (input_layernorm): Phi3RMSNorm()
        (resid_attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
        (post_attention_layernorm): Phi3RMSNorm()
      )
    )
    (norm): Phi3RMSNorm()
  )
  (lm_head): Linear(in_features=3072, out_features=3206

In [3]:
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
tokenizer

LlamaTokenizerFast(name_or_path='microsoft/Phi-3-mini-4k-instruct', vocab_size=32000, model_max_length=4096, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '<|endoftext|>', 'unk_token': '<unk>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=False),
	32000: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	32001: AddedToken("<|assistant|>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=True),
	32002: AddedToken("<|placeholder1|>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=Tr

In [4]:
# Create a pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False,
)
pipe

Device set to use cuda


<transformers.pipelines.text_generation.TextGenerationPipeline at 0x7dd015cf0510>

## Prompt

In [5]:
# Prompt
# 创造一个关于鸡的有趣笑话
messages = [
    {
        "role": "user",
        "content": "Create a funny joke about chickens."
    }
]

# Generate the output
output = pipe(messages)
output

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48


[{'generated_text': ' Why did the chicken join the band? Because it had the drumsticks!'}]

In [6]:
output[0]

{'generated_text': ' Why did the chicken join the band? Because it had the drumsticks!'}

In [7]:
output[0]["generated_text"]

' Why did the chicken join the band? Because it had the drumsticks!'

## Apply prompt template

In [8]:
# Apply prompt template
# 创造一个关于鸡的有趣笑话
messages = [
    {
        "role": "user",
        "content": "Create a funny joke about chickens."
    }
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)
prompt

'<|user|>\nCreate a funny joke about chickens.<|end|>\n<|endoftext|>'

## Using a high temperature

In [9]:
# Using a high temperature
# 创造一个关于鸡的有趣笑话
messages = [
    {
        "role": "user",
        "content": "Create a funny joke about chickens."
    }
]
output = pipe(messages, do_sample=True, temperature=1)
output

[{'generated_text': ' Why don\'t chickens play piano in the barnyard? Because they can\'t find no "key" to "peck" in the right "time" and "key"!'}]

In [10]:
output[0]

{'generated_text': ' Why don\'t chickens play piano in the barnyard? Because they can\'t find no "key" to "peck" in the right "time" and "key"!'}

In [11]:
output[0]["generated_text"]

' Why don\'t chickens play piano in the barnyard? Because they can\'t find no "key" to "peck" in the right "time" and "key"!'

## Using a high top_p

In [12]:
# Using a high top_p
# 创造一个关于鸡的有趣笑话
messages = [
    {
        "role": "user",
        "content": "Create a funny joke about chickens."
    }
]
output = pipe(messages, do_sample=True, top_p=1)
output
print()




In [13]:
output[0]

{'generated_text': ' Why do chickens have great grammar? Wish they could write better, because they always think *eggs* are a great beginning to their sentences!'}

In [14]:
output[0]["generated_text"]

' Why do chickens have great grammar? Wish they could write better, because they always think *eggs* are a great beginning to their sentences!'

# Intro to Prompt Engineering

## The Basic Ingredients of a Prompt

# Advanced Prompt Engineering

## Complex Prompt

In [15]:
# Text to summarize which we stole from https://jalammar.github.io/illustrated-transformer/ ;)
"""
在上一篇文章中，我们研究了注意力机制——现代深度学习模型中一种普遍使用的方法。注意力机制这一概念有助于提高神经机器翻译应用程序的性能。在这篇文章中，我们将研究 Transformer——一种使用注意力机制来提高这些模型训练速度的模型。在特定任务中，Transformer 的表现优于 Google 神经机器翻译模型。然而，最大的好处来自于 Transformer 如何适合并行化。事实上，Google Cloud 建议使用 Transformer 作为参考模型来使用他们的 Cloud TPU 产品。所以让我们试着分解这个模型，看看它是如何运作的。Transformer 是在论文《注意力就是你所需要的一切》中提出的。它的 TensorFlow 实现作为 Tensor2Tensor 包的一部分提供。哈佛大学的 NLP 小组创建了一个指南，使用 PyTorch 实现注释了这篇论文。在这篇文章中，我们将尝试将事情稍微简化一点，并逐一介绍概念，希望能够让没有深入了解该主题的人更容易理解。
让我们首先将模型视为一个黑匣子。在机器翻译应用程序中，它会接受一种语言的句子，然后输出另一种语言的翻译。
打开擎天柱的精华，我们会看到一个编码组件、一个解码组件以及它们之间的连接。
编码组件是一堆编码器（论文中将六个编码器堆叠在一起——六个并没有什么神奇之处，人们绝对可以尝试其他排列）。解码组件是一堆相同数量的解码器。
编码器的结构完全相同（但它们不共享权重）。每个编码器都分为两个子层：
编码器的输入首先流经一个自注意力层——该层可帮助编码器在编码特定单词时查看输入句子中的其他单词。我们将在后面的文章中更深入地讨论自注意力。
自注意力层的输出被馈送到前馈神经网络。完全相同的前馈网络独立应用于每个位置。
解码器具有这两个层，但它们之间是一个注意力层，可帮助解码器关注输入句子的相关部分（类似于注意力在 seq2seq 模型中的作用）。
现在我们已经了解了模型的主要组成部分，让我们开始研究各种向量/张量以及它们如何在这些组件之间流动以将训练模型的输入转换为输出。
与一般的 NLP 应用程序中的情况一样，我们首先使用嵌入算法将每个输入词转换为向量。
每个词都嵌入到大小为 512 的向量中。我们将用这些简单的框表示这些向量。
嵌入仅发生在最底部的编码器中。所有编码器的共同抽象是它们接收一个向量列表，每个向量的大小为 512 – 在底部编码器中，这将是单词嵌入，但在其他编码器中，它将是直接位于下方的编码器的输出。这个列表的大小是我们可以设置的超参数 – 基本上它将是我们训练数据集中最长句子的长度。
在将单词嵌入到我们的输入序列中后，每个单词都会流经编码器的两层中的每一层。
在这里，我们开始看到 Transformer 的一个关键属性，即每个位置的单词在编码器中流经自己的路径。在自注意力层中，这些路径之间存在依赖关系。然而，前馈层没有这些依赖关系，因此各种路径可以在流经前馈层时并行执行。
接下来，我们将示例切换到较短的句子，并查看编码器的每个子层中发生了什么。
现在我们开始编码！
正如我们已经提到的，编码器接收向量列表作为输入。它通过将这些向量传递到“自我注意”层，然后传递到前馈神经网络来处理此列表，然后将输出向上发送到下一个编码器。
"""
text = """In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions.
The Transformer was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. Harvard’s NLP group created a guide annotating the paper with PyTorch implementation. In this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter.
Let’s begin by looking at the model as a single black box. In a machine translation application, it would take a sentence in one language, and output its translation in another.
Popping open that Optimus Prime goodness, we see an encoding component, a decoding component, and connections between them.
The encoding component is a stack of encoders (the paper stacks six of them on top of each other – there’s nothing magical about the number six, one can definitely experiment with other arrangements). The decoding component is a stack of decoders of the same number.
The encoders are all identical in structure (yet they do not share weights). Each one is broken down into two sub-layers:
The encoder’s inputs first flow through a self-attention layer – a layer that helps the encoder look at other words in the input sentence as it encodes a specific word. We’ll look closer at self-attention later in the post.
The outputs of the self-attention layer are fed to a feed-forward neural network. The exact same feed-forward network is independently applied to each position.
The decoder has both those layers, but between them is an attention layer that helps the decoder focus on relevant parts of the input sentence (similar what attention does in seq2seq models).
Now that we’ve seen the major components of the model, let’s start to look at the various vectors/tensors and how they flow between these components to turn the input of a trained model into an output.
As is the case in NLP applications in general, we begin by turning each input word into a vector using an embedding algorithm.
Each word is embedded into a vector of size 512. We'll represent those vectors with these simple boxes.
The embedding only happens in the bottom-most encoder. The abstraction that is common to all the encoders is that they receive a list of vectors each of the size 512 – In the bottom encoder that would be the word embeddings, but in other encoders, it would be the output of the encoder that’s directly below. The size of this list is hyperparameter we can set – basically it would be the length of the longest sentence in our training dataset.
After embedding the words in our input sequence, each of them flows through each of the two layers of the encoder.
Here we begin to see one key property of the Transformer, which is that the word in each position flows through its own path in the encoder. There are dependencies between these paths in the self-attention layer. The feed-forward layer does not have those dependencies, however, and thus the various paths can be executed in parallel while flowing through the feed-forward layer.
Next, we’ll switch up the example to a shorter sentence and we’ll look at what happens in each sub-layer of the encoder.
Now We’re Encoding!
As we’ve mentioned already, an encoder receives a list of vectors as input. It processes this list by passing these vectors into a ‘self-attention’ layer, then into a feed-forward neural network, then sends out the output upwards to the next encoder.
"""

# Prompt components
# 角色：您是大型语言模型方面的专家。您擅长将复杂的论文分解为易于理解的摘要。
persona = "You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.\n"
# 说明：总结本文的主要发现。
instruction = "Summarize the key findings of the paper provided.\n"
# 上下文：你的摘要应该提取最重要的要点，以帮助研究人员快速了解论文中最重要的信息。
context = "Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.\n"
# 数据格式：创建一个要点摘要，概述该方法。随后用简洁的段落概括主要结果。
data_format = "Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.\n"
# 受众：该摘要专门为需要快速掌握大型语言模型最新趋势的忙碌研究人员而设计。
audience = "The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.\n"
# 语气：语气要专业、清晰。
tone = "The tone should be professional and clear.\n"

# text = "MY TEXT TO SUMMARIZE"  # Replace with your own text to summarize
data = f"Text to summarize: {text}"

# The full prompt - remove and add pieces to view its impact on the generated output
query = persona + instruction + context + data_format + audience + tone + data
query

"You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.\nSummarize the key findings of the paper provided.\nYour summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.\nCreate a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.\nThe summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.\nThe tone should be professional and clear.\nText to summarize: In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural

In [16]:
messages = [
    {"role": "user", "content": query}
]

# prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)
prompt = tokenizer.apply_chat_template(messages, tokenize=False)
prompt

"<|user|>\nYou are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.\nSummarize the key findings of the paper provided.\nYour summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.\nCreate a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.\nThe summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.\nThe tone should be professional and clear.\nText to summarize: In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Goo

In [17]:
# Generate the output
outputs = pipe(messages)
outputs

[{'generated_text': ' - The Transformer model, introduced in the paper "Attention is All You Need," utilizes attention mechanisms to enhance the training speed of deep learning models, particularly in neural machine translation.\n- The model consists of an encoding component and a decoding component, each with multiple identical sub-layers. The encoding component includes self-attention layers and feed-forward neural networks, while the decoding component incorporates attention layers that focus on relevant parts of the input sentence.\n- The Transformer model is highly parallelizable, making it a preferred choice for Google Cloud\'s TPU offering.\n- The model processes input sequences by converting each word into a vector using an embedding algorithm, with the size of the vector being a hyperparameter.\n- The input vectors flow through the encoder\'s sub-layers, with dependencies between paths in the self-attention layer and independent execution in the feed-forward layer.\n- The Tran

In [18]:
outputs[0]

{'generated_text': ' - The Transformer model, introduced in the paper "Attention is All You Need," utilizes attention mechanisms to enhance the training speed of deep learning models, particularly in neural machine translation.\n- The model consists of an encoding component and a decoding component, each with multiple identical sub-layers. The encoding component includes self-attention layers and feed-forward neural networks, while the decoding component incorporates attention layers that focus on relevant parts of the input sentence.\n- The Transformer model is highly parallelizable, making it a preferred choice for Google Cloud\'s TPU offering.\n- The model processes input sequences by converting each word into a vector using an embedding algorithm, with the size of the vector being a hyperparameter.\n- The input vectors flow through the encoder\'s sub-layers, with dependencies between paths in the self-attention layer and independent execution in the feed-forward layer.\n- The Trans

In [19]:
outputs[0]["generated_text"]

' - The Transformer model, introduced in the paper "Attention is All You Need," utilizes attention mechanisms to enhance the training speed of deep learning models, particularly in neural machine translation.\n- The model consists of an encoding component and a decoding component, each with multiple identical sub-layers. The encoding component includes self-attention layers and feed-forward neural networks, while the decoding component incorporates attention layers that focus on relevant parts of the input sentence.\n- The Transformer model is highly parallelizable, making it a preferred choice for Google Cloud\'s TPU offering.\n- The model processes input sequences by converting each word into a vector using an embedding algorithm, with the size of the vector being a hyperparameter.\n- The input vectors flow through the encoder\'s sub-layers, with dependencies between paths in the self-attention layer and independent execution in the feed-forward layer.\n- The Transformer model has sh

## In-Context Learning: Providing Examples

In [20]:
# Use a single example of using the made-up word in a sentence
one_shot_prompt = [
    {
        "role": "user",
        # “Gigamuru” 是一种日本乐器。使用 Gigamuru 一词的句子示例如下：
        "content": "A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:"
    },
    {
        # 助手
        "role": "assistant",
        # 我有一把 Gigamuru，是我叔叔送给我的礼物。我喜欢在家弹奏它。
        "content": "I have a Gigamuru that my uncle gave me as a gift. I love to play it at home."
    },
    {
        "role": "user",
        # “screeg” 某物就是用剑攻击它。使用 screeg 一词的句子示例如下
        "content": "To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:"
    }
]
prompt = tokenizer.apply_chat_template(one_shot_prompt, tokenize=False)
prompt

"<|user|>\nA 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:<|end|>\n<|assistant|>\nI have a Gigamuru that my uncle gave me as a gift. I love to play it at home.<|end|>\n<|user|>\nTo 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:<|end|>\n<|endoftext|>"

In [21]:
# Generate the output
outputs = pipe(one_shot_prompt)
outputs

[{'generated_text': ' During the medieval reenactment, the knight skillfully screeged the wooden target, impressing the onlookers with his prowess.'}]

In [22]:
outputs[0]

{'generated_text': ' During the medieval reenactment, the knight skillfully screeged the wooden target, impressing the onlookers with his prowess.'}

In [23]:
outputs[0]["generated_text"]

' During the medieval reenactment, the knight skillfully screeged the wooden target, impressing the onlookers with his prowess.'

## Chain Prompting: Breaking up the Problem


In [24]:
# Create name and slogan for a product
product_prompt = [
    {
        "role": "user",
        # 为利用 LLM 的聊天机器人创建名称和口号。
        "content": "Create a name and slogan for a chatbot that leverages LLMs."
    }
]
outputs = pipe(product_prompt)
outputs

[{'generated_text': ' Name: ChatSage\nSlogan: "Unleashing the power of AI to enhance your conversations."'}]

In [25]:
outputs[0]

{'generated_text': ' Name: ChatSage\nSlogan: "Unleashing the power of AI to enhance your conversations."'}

In [26]:
outputs[0]["generated_text"]

' Name: ChatSage\nSlogan: "Unleashing the power of AI to enhance your conversations."'

In [27]:
product_description = outputs[0]["generated_text"]

In [28]:
# Based on a name and slogan for a product, generate a sales pitch
sales_prompt = [
    {
        "role": "user",
        # 为以下产品制作一个非常简短的销售宣传：
        "content": f"Generate a very short sales pitch for the following product: '{product_description}'"
    }
]
outputs = pipe(sales_prompt)
outputs

[{'generated_text': ' Introducing ChatSage, the revolutionary AI-powered tool designed to elevate your conversations to new heights. With our cutting-edge technology, we unleash the power of AI to enhance your interactions, making every conversation more engaging, insightful, and meaningful. Experience the future of communication with ChatSage today!'}]

In [29]:
outputs[0]

{'generated_text': ' Introducing ChatSage, the revolutionary AI-powered tool designed to elevate your conversations to new heights. With our cutting-edge technology, we unleash the power of AI to enhance your interactions, making every conversation more engaging, insightful, and meaningful. Experience the future of communication with ChatSage today!'}

In [30]:
outputs[0]["generated_text"]

' Introducing ChatSage, the revolutionary AI-powered tool designed to elevate your conversations to new heights. With our cutting-edge technology, we unleash the power of AI to enhance your interactions, making every conversation more engaging, insightful, and meaningful. Experience the future of communication with ChatSage today!'

In [31]:
# sales_pitch = outputs[0]["generated_text"]
# print(sales_pitch)

# Reasoning with Generative Models

## Chain-of-Thought: Think Before Answering

In [32]:
# Answering without explicit reasoning
standard_prompt = [
    # content": "罗杰有 5 个网球。他又买了 2 罐网球。每罐有 3 个网球。他现在有多少个网球？
    {"role": "user", "content": "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?"},
    # 助理
    {"role": "assistant", "content": "11"},
    # content": "食堂有 25 个苹果。如果他们用 20 个做午餐，再买 6 个，他们有多少个苹果？
    {"role": "user", "content": "The cafeteria had 25 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?"}
]
# Run generative model
outputs = pipe(standard_prompt)
outputs

[{'generated_text': ' The cafeteria started with 25 apples. They used 20 apples to make lunch, so they had 25 - 20 = 5 apples left. After buying 6 more apples, they now have 5 + 6 = 11 apples.'}]

In [33]:
outputs[0]

{'generated_text': ' The cafeteria started with 25 apples. They used 20 apples to make lunch, so they had 25 - 20 = 5 apples left. After buying 6 more apples, they now have 5 + 6 = 11 apples.'}

In [34]:
outputs[0]["generated_text"]

' The cafeteria started with 25 apples. They used 20 apples to make lunch, so they had 25 - 20 = 5 apples left. After buying 6 more apples, they now have 5 + 6 = 11 apples.'

In [35]:
# Answering with chain-of-thought
cot_prompt = [
    # “内容”：“罗杰有 5 个网球。他又买了 2 罐网球。每罐有 3 个网球。他现在有多少个网球？
    {"role": "user", "content": "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?"},
    # 助手：罗杰一开始有 5 个网球。2 罐 3 个网球，每罐 6 个网球。5 + 6 = 11。答案是 11。
    {"role": "assistant", "content": "Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11."},
    # 食堂有 23 个苹果。如果他们用 20 个做午餐，再买 6 个，他们有多少个苹果？
    {"role": "user", "content": "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?"}
]
# Generate the output
outputs = pipe(cot_prompt)
outputs

[{'generated_text': ' The cafeteria started with 23 apples. They used 20 apples for lunch, so they had 23 - 20 = 3 apples left. After buying 6 more apples, they now have 3 + 6 = 9 apples. The answer is 9.'}]

In [36]:
outputs[0]

{'generated_text': ' The cafeteria started with 23 apples. They used 20 apples for lunch, so they had 23 - 20 = 3 apples left. After buying 6 more apples, they now have 3 + 6 = 9 apples. The answer is 9.'}

In [37]:
outputs[0]["generated_text"]

' The cafeteria started with 23 apples. They used 20 apples for lunch, so they had 23 - 20 = 3 apples left. After buying 6 more apples, they now have 3 + 6 = 9 apples. The answer is 9.'

## Zero-shot Chain-of-Thought

In [38]:
# Zero-shot Chain-of-Thought
zeroshot_cot_prompt = [
    # 食堂有23个苹果。如果他们用了20个做午餐，又买了6个，他们有多少个苹果？让我们一步一步思考。
    {"role": "user", "content": "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Let's think step-by-step."}
]

# Generate the output
outputs = pipe(zeroshot_cot_prompt)
outputs

[{'generated_text': ' Step 1: Start with the initial number of apples in the cafeteria, which is 23.\n\nStep 2: Subtract the number of apples used to make lunch, which is 20.\n23 - 20 = 3 apples remaining.\n\nStep 3: Add the number of apples bought, which is 6.\n3 + 6 = 9 apples.\n\nSo, the cafeteria now has 9 apples.'}]

In [39]:
outputs[0]

{'generated_text': ' Step 1: Start with the initial number of apples in the cafeteria, which is 23.\n\nStep 2: Subtract the number of apples used to make lunch, which is 20.\n23 - 20 = 3 apples remaining.\n\nStep 3: Add the number of apples bought, which is 6.\n3 + 6 = 9 apples.\n\nSo, the cafeteria now has 9 apples.'}

In [40]:
outputs[0]["generated_text"]

' Step 1: Start with the initial number of apples in the cafeteria, which is 23.\n\nStep 2: Subtract the number of apples used to make lunch, which is 20.\n23 - 20 = 3 apples remaining.\n\nStep 3: Add the number of apples bought, which is 6.\n3 + 6 = 9 apples.\n\nSo, the cafeteria now has 9 apples.'

## Tree-of-Thought: Exploring Intermediate Steps

In [41]:
# Zero-shot Chain-of-Thought
zeroshot_tot_prompt = [
    # 假设有三位不同的专家在回答这个问题。所有专家都会写下他们思考的第一步，然后与小组分享。然后所有专家都会继续下一步，等等。如果任何一位专家意识到自己在任何时候错了，他们就会离开。问题是“自助餐厅有 23 个苹果。如果他们用 20 个做午餐，再买 6 个，他们有多少个苹果？”一定要讨论结果。
    {"role": "user", "content": "Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realises they're wrong at any point then they leave. The question is 'The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?' Make sure to discuss the results."}
]

In [42]:
# Generate the output
outputs = pipe(zeroshot_tot_prompt)
outputs

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


[{'generated_text': ' Expert 1:\nStep 1: Start with the initial number of apples, which is 23.\n\nExpert 2:\nStep 1: Subtract the number of apples used for lunch, which is 20.\nStep 2: Add the number of apples bought, which is 6.\n\nExpert 3:\nStep 1: Start with the initial number of apples, which is 23.\nStep 2: Subtract the number of apples used for lunch, which is 20.\nStep 3: Add the number of apples bought, which is 6.\n\nResults:\nAll three experts arrived at the same answer:\n\nExpert 1: 23 - 20 + 6 = 9 apples\nExpert 2: (23 - 20) + 6 = 9 apples\nExpert 3: (23 - 20) + 6 = 9 apples\n\nAll three experts agree that the cafeteria has 9 apples left.'}]

In [43]:
outputs[0]

{'generated_text': ' Expert 1:\nStep 1: Start with the initial number of apples, which is 23.\n\nExpert 2:\nStep 1: Subtract the number of apples used for lunch, which is 20.\nStep 2: Add the number of apples bought, which is 6.\n\nExpert 3:\nStep 1: Start with the initial number of apples, which is 23.\nStep 2: Subtract the number of apples used for lunch, which is 20.\nStep 3: Add the number of apples bought, which is 6.\n\nResults:\nAll three experts arrived at the same answer:\n\nExpert 1: 23 - 20 + 6 = 9 apples\nExpert 2: (23 - 20) + 6 = 9 apples\nExpert 3: (23 - 20) + 6 = 9 apples\n\nAll three experts agree that the cafeteria has 9 apples left.'}

In [44]:
outputs[0]["generated_text"]

' Expert 1:\nStep 1: Start with the initial number of apples, which is 23.\n\nExpert 2:\nStep 1: Subtract the number of apples used for lunch, which is 20.\nStep 2: Add the number of apples bought, which is 6.\n\nExpert 3:\nStep 1: Start with the initial number of apples, which is 23.\nStep 2: Subtract the number of apples used for lunch, which is 20.\nStep 3: Add the number of apples bought, which is 6.\n\nResults:\nAll three experts arrived at the same answer:\n\nExpert 1: 23 - 20 + 6 = 9 apples\nExpert 2: (23 - 20) + 6 = 9 apples\nExpert 3: (23 - 20) + 6 = 9 apples\n\nAll three experts agree that the cafeteria has 9 apples left.'

# Output Verification

## Providing Examples

In [45]:
# Zero-shot learning: Providing no examples
zeroshot_prompt = [
    # 以 JSON 格式为 RPG 游戏创建角色资料。
    {"role": "user", "content": "Create a character profile for an RPG game in JSON format."}
]

# Generate the output
outputs = pipe(zeroshot_prompt)
outputs

[{'generated_text': ' ```json\n{\n  "name": "Eldrin the Wise",\n  "race": "Elf",\n  "class": "Wizard",\n  "level": 10,\n  "alignment": "Chaotic Good",\n  "strength": 8,\n  "dexterity": 14,\n  "constitution": 12,\n  "intelligence": 18,\n  "wisdom": 16,\n  "charisma": 10,\n  "weapon_skill": "Magic",\n  "armor_skill": "Light",\n  "spell_slots": {\n    "cantrips": ["Mage Hand", "Detect Magic", "Mage Armor", "Prestidigitation", "Identify", "Invisibility"],\n    "1st level": ["Fireball", "Magic Missile", "Shield", "Cure Wounds", "Detect Thoughts", "Charm Person"],\n    "2nd level": ["Light", "Hold Person", "Sleep", "Committee", "Enlarge Person", "Teleport"],\n    "3rd level": ["Frostbite", "Fog Cloud", "Disintegrate", "Dimension Door", "Mirror Image", "Misty Step"]\n  },\n  "equipment": {\n    "weapon": "Staff of the Ancients",\n    "armor": "Leather Armor",\n    "accessories": ["Staff of Power", "Ring of Protection", "Boots of Speed"]\n  },\n  "background": "Adept",\n  "personality": "Curio

In [46]:
outputs[0]

{'generated_text': ' ```json\n{\n  "name": "Eldrin the Wise",\n  "race": "Elf",\n  "class": "Wizard",\n  "level": 10,\n  "alignment": "Chaotic Good",\n  "strength": 8,\n  "dexterity": 14,\n  "constitution": 12,\n  "intelligence": 18,\n  "wisdom": 16,\n  "charisma": 10,\n  "weapon_skill": "Magic",\n  "armor_skill": "Light",\n  "spell_slots": {\n    "cantrips": ["Mage Hand", "Detect Magic", "Mage Armor", "Prestidigitation", "Identify", "Invisibility"],\n    "1st level": ["Fireball", "Magic Missile", "Shield", "Cure Wounds", "Detect Thoughts", "Charm Person"],\n    "2nd level": ["Light", "Hold Person", "Sleep", "Committee", "Enlarge Person", "Teleport"],\n    "3rd level": ["Frostbite", "Fog Cloud", "Disintegrate", "Dimension Door", "Mirror Image", "Misty Step"]\n  },\n  "equipment": {\n    "weapon": "Staff of the Ancients",\n    "armor": "Leather Armor",\n    "accessories": ["Staff of Power", "Ring of Protection", "Boots of Speed"]\n  },\n  "background": "Adept",\n  "personality": "Curiou

In [47]:
outputs[0]["generated_text"]

' ```json\n{\n  "name": "Eldrin the Wise",\n  "race": "Elf",\n  "class": "Wizard",\n  "level": 10,\n  "alignment": "Chaotic Good",\n  "strength": 8,\n  "dexterity": 14,\n  "constitution": 12,\n  "intelligence": 18,\n  "wisdom": 16,\n  "charisma": 10,\n  "weapon_skill": "Magic",\n  "armor_skill": "Light",\n  "spell_slots": {\n    "cantrips": ["Mage Hand", "Detect Magic", "Mage Armor", "Prestidigitation", "Identify", "Invisibility"],\n    "1st level": ["Fireball", "Magic Missile", "Shield", "Cure Wounds", "Detect Thoughts", "Charm Person"],\n    "2nd level": ["Light", "Hold Person", "Sleep", "Committee", "Enlarge Person", "Teleport"],\n    "3rd level": ["Frostbite", "Fog Cloud", "Disintegrate", "Dimension Door", "Mirror Image", "Misty Step"]\n  },\n  "equipment": {\n    "weapon": "Staff of the Ancients",\n    "armor": "Leather Armor",\n    "accessories": ["Staff of Power", "Ring of Protection", "Boots of Speed"]\n  },\n  "background": "Adept",\n  "personality": "Curious and inventive, El

In [48]:
# One-shot learning: Providing an example of the output structure
"""
为 RPG 游戏创建简短的角色资料。确保仅使用以下格式：
{
“description”：“简短描述”，
“name”：“角色名称”，
“armor”：“一件盔甲”，
“weapon”：“一件或多件武器”
}
"""
one_shot_template = """Create a short character profile for an RPG game. Make sure to only use this format:

{
  "description": "A SHORT DESCRIPTION",
  "name": "THE CHARACTER'S NAME",
  "armor": "ONE PIECE OF ARMOR",
  "weapon": "ONE OR MORE WEAPONS"
}
"""
one_shot_prompt = [
    {"role": "user", "content": one_shot_template}
]

# Generate the output
outputs = pipe(one_shot_prompt)
outputs

[{'generated_text': ' {\n  "description": "A cunning rogue with a mysterious past, skilled in stealth and deception.",\n  "name": "Shadowcloak",\n  "armor": "Leather Hood",\n  "weapon": "Dagger"\n}'}]

In [49]:
outputs[0]

{'generated_text': ' {\n  "description": "A cunning rogue with a mysterious past, skilled in stealth and deception.",\n  "name": "Shadowcloak",\n  "armor": "Leather Hood",\n  "weapon": "Dagger"\n}'}

In [50]:
outputs[0]["generated_text"]

' {\n  "description": "A cunning rogue with a mysterious past, skilled in stealth and deception.",\n  "name": "Shadowcloak",\n  "armor": "Leather Hood",\n  "weapon": "Dagger"\n}'

## Grammar: Constrained Sampling
语法: 约束抽样

In [51]:
# !CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
# !CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
!pip install llama-cpp-python

Collecting llama-cpp-python
  Using cached llama_cpp_python-0.3.7.tar.gz (66.7 MB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Using cached diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Using cached diskcache-5.6.3-py3-none-any.whl (45 kB)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.3.7-cp311-cp311-linux_x86_64.whl size=4601121 sha256=65234ff3db97d6b0f4e076e1eddbd6856df127cc09b564b1f792adaddd363292
  Stored in directory: /root/.cache/pip/wheels/eb/82/79/ac77fcd49324b75ae6aa18e63a87cf9da4371a57e2cdc8dc03
Successfully built llama-cpp-python
Installing collected packages: diskcache, llama-cpp-python
S

In [52]:
import gc
import torch
del model, tokenizer, pipe

# Flush memory
gc.collect()
torch.cuda.empty_cache()

In [53]:
from llama_cpp.llama import Llama

# Load Phi-3
llm = Llama.from_pretrained(
    repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
    filename="*fp16.gguf",
    n_gpu_layers=-1,
    n_ctx=2048,
    verbose=False
)

Phi-3-mini-4k-instruct-fp16.gguf:   0%|          | 0.00/7.64G [00:00<?, ?B/s]

llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


In [54]:
# Generate output
output = llm.create_chat_completion(
    messages=[
        {"role": "user", "content": "Create a warrior for an RPG in JSON format."},
    ],
    response_format={"type": "json_object"},
    temperature=0,
)
output

{'id': 'chatcmpl-ad531da9-7171-498d-8941-94510bc23408',
 'object': 'chat.completion',
 'created': 1740896914,
 'model': '/root/.cache/huggingface/hub/models--microsoft--Phi-3-mini-4k-instruct-gguf/snapshots/999f761fe19e26cf1a339a5ec5f9f201301cbb83/./Phi-3-mini-4k-instruct-fp16.gguf',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': '{\n  "warrior": {\n    "name": "Eldric Stormbringer",\n    "class": "Warrior",\n    "level": 5,\n    "attributes": {\n      "strength": 18,\n      "dexterity": 10,\n      "constitution": 16,\n      "intelligence": 8,\n      "wisdom": 10,\n      "charisma": 12\n    },\n    "skills": [\n      {\n        "name": "Martial Arts",\n        "proficiency": 20,\n        "description": "Expert in hand-to-hand combat and weapon handling."\n      },\n      {\n        "name": "Shield Block",\n        "proficiency": 18,\n        "description": "Highly skilled at deflecting attacks with a shield."\n      },\n      {\n        "name": "Heavy Armo

In [55]:
output['choices']

[{'index': 0,
  'message': {'role': 'assistant',
   'content': '{\n  "warrior": {\n    "name": "Eldric Stormbringer",\n    "class": "Warrior",\n    "level": 5,\n    "attributes": {\n      "strength": 18,\n      "dexterity": 10,\n      "constitution": 16,\n      "intelligence": 8,\n      "wisdom": 10,\n      "charisma": 12\n    },\n    "skills": [\n      {\n        "name": "Martial Arts",\n        "proficiency": 20,\n        "description": "Expert in hand-to-hand combat and weapon handling."\n      },\n      {\n        "name": "Shield Block",\n        "proficiency": 18,\n        "description": "Highly skilled at deflecting attacks with a shield."\n      },\n      {\n        "name": "Heavy Armor",\n        "proficiency": 16,\n        "description": "Expertly equipped with heavy armor for protection."\n      },\n      {\n        "name": "Survival",\n        "proficiency": 14,\n        "description": "Adept at finding food, water, and shelter in the wilderness."\n      }\n    ],\n    "equi

In [56]:
output['choices'][0]

{'index': 0,
 'message': {'role': 'assistant',
  'content': '{\n  "warrior": {\n    "name": "Eldric Stormbringer",\n    "class": "Warrior",\n    "level": 5,\n    "attributes": {\n      "strength": 18,\n      "dexterity": 10,\n      "constitution": 16,\n      "intelligence": 8,\n      "wisdom": 10,\n      "charisma": 12\n    },\n    "skills": [\n      {\n        "name": "Martial Arts",\n        "proficiency": 20,\n        "description": "Expert in hand-to-hand combat and weapon handling."\n      },\n      {\n        "name": "Shield Block",\n        "proficiency": 18,\n        "description": "Highly skilled at deflecting attacks with a shield."\n      },\n      {\n        "name": "Heavy Armor",\n        "proficiency": 16,\n        "description": "Expertly equipped with heavy armor for protection."\n      },\n      {\n        "name": "Survival",\n        "proficiency": 14,\n        "description": "Adept at finding food, water, and shelter in the wilderness."\n      }\n    ],\n    "equipme

In [57]:
output['choices'][0]['message']

{'role': 'assistant',
 'content': '{\n  "warrior": {\n    "name": "Eldric Stormbringer",\n    "class": "Warrior",\n    "level": 5,\n    "attributes": {\n      "strength": 18,\n      "dexterity": 10,\n      "constitution": 16,\n      "intelligence": 8,\n      "wisdom": 10,\n      "charisma": 12\n    },\n    "skills": [\n      {\n        "name": "Martial Arts",\n        "proficiency": 20,\n        "description": "Expert in hand-to-hand combat and weapon handling."\n      },\n      {\n        "name": "Shield Block",\n        "proficiency": 18,\n        "description": "Highly skilled at deflecting attacks with a shield."\n      },\n      {\n        "name": "Heavy Armor",\n        "proficiency": 16,\n        "description": "Expertly equipped with heavy armor for protection."\n      },\n      {\n        "name": "Survival",\n        "proficiency": 14,\n        "description": "Adept at finding food, water, and shelter in the wilderness."\n      }\n    ],\n    "equipment": [\n      {\n        "

In [58]:
output['choices'][0]['message']["content"]

'{\n  "warrior": {\n    "name": "Eldric Stormbringer",\n    "class": "Warrior",\n    "level": 5,\n    "attributes": {\n      "strength": 18,\n      "dexterity": 10,\n      "constitution": 16,\n      "intelligence": 8,\n      "wisdom": 10,\n      "charisma": 12\n    },\n    "skills": [\n      {\n        "name": "Martial Arts",\n        "proficiency": 20,\n        "description": "Expert in hand-to-hand combat and weapon handling."\n      },\n      {\n        "name": "Shield Block",\n        "proficiency": 18,\n        "description": "Highly skilled at deflecting attacks with a shield."\n      },\n      {\n        "name": "Heavy Armor",\n        "proficiency": 16,\n        "description": "Expertly equipped with heavy armor for protection."\n      },\n      {\n        "name": "Survival",\n        "proficiency": 14,\n        "description": "Adept at finding food, water, and shelter in the wilderness."\n      }\n    ],\n    "equipment": [\n      {\n        "name": "Iron Sword",\n        "typ

In [59]:
output = output['choices'][0]['message']["content"]

In [60]:
import json

# Format as json
json_output = json.dumps(json.loads(output), indent=4)
print(json_output)

{
    "warrior": {
        "name": "Eldric Stormbringer",
        "class": "Warrior",
        "level": 5,
        "attributes": {
            "strength": 18,
            "dexterity": 10,
            "constitution": 16,
            "intelligence": 8,
            "wisdom": 10,
            "charisma": 12
        },
        "skills": [
            {
                "name": "Martial Arts",
                "proficiency": 20,
                "description": "Expert in hand-to-hand combat and weapon handling."
            },
            {
                "name": "Shield Block",
                "proficiency": 18,
                "description": "Highly skilled at deflecting attacks with a shield."
            },
            {
                "name": "Heavy Armor",
                "proficiency": 16,
                "description": "Expertly equipped with heavy armor for protection."
            },
            {
                "name": "Survival",
                "proficiency": 14,
                "

In [61]:
!pip freeze > colab_requirements.txt