<h1>Chapter 6 - Prompt Engineering</h1>
<i>Methods for improving the output through prompt engineering.</i>

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon"></a>
<a href="https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/"><img src="https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K"></a>
<a href="https://github.com/HandsOnLLM/Hands-On-Large-Language-Models"><img src="https://img.shields.io/badge/GitHub%20Repository-black?logo=github"></a>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter06/Chapter%206%20-%20Prompt%20Engineering.ipynb)

---

This notebook is for Chapter 6 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).

---

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961">
<img src="https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png" width="350"/></a>

### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---


In [None]:
# %%capture
# !pip install langchain>=0.1.17 openai>=1.13.3 langchain_openai>=0.1.6 transformers>=4.40.1 datasets>=2.18.0 accelerate>=0.27.2 sentence-transformers>=2.5.1 duckduckgo-search>=5.2.2
# !CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

第六章主要目標是探討如何使用生成模型 (Generative Models)，特別是 GPT 類型的解碼器模型 (Decoder-Only)，並通過提示工程 (Prompt Engineering) 來引導和控制模型的文本生成輸出

## Loading our model

In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cpu",
    torch_dtype="auto",
    trust_remote_code=False,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

# Create a pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False,
)

Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [2]:
# Prompt
messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate the output
output = pipe(messages)
print(output[0]["generated_text"])

You are not running the flash-attention implementation, expect numerical differences.


 Why did the chicken join the band? Because it had the drumsticks!


In [3]:
# Apply prompt template
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False) #應用分詞器的聊天模板。
print(prompt)

<|user|>
Create a funny joke about chickens.<|end|>
<|endoftext|>


In [4]:
# Using a high temperature
output = pipe(messages, do_sample=True, temperature=1) #執行生成，並啟用高溫採樣。
#temperature 控制輸出的隨機性或創造性。高溫值 (1) 允許模型選擇更多低概率的 Token
print(output[0]["generated_text"])

 Why did the chicken win an award at the animal Olympics?


In [5]:
# Using a high top_p
output = pipe(messages, do_sample=True, top_p=1) #執行生成，並啟用核採樣。
#Top_p (核採樣) 控制 LLM 考慮的 Token 子集，直到累積概率達到 top_p。這裡設定為 1 包含所有 Token
print(output[0]["generated_text"])

 Why do chickens always cross the road? To get to the other side where the corn is in a different position...


# **Intro to Prompt Engineering**


## The Basic Ingredients of a Prompt


# **Advanced Prompt Engineering**


## Complex Prompt

In [6]:
# Text to summarize which we stole from https://jalammar.github.io/illustrated-transformer/ ;)
text = """In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions.
The Transformer was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. Harvard’s NLP group created a guide annotating the paper with PyTorch implementation. In this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter.
Let’s begin by looking at the model as a single black box. In a machine translation application, it would take a sentence in one language, and output its translation in another.
Popping open that Optimus Prime goodness, we see an encoding component, a decoding component, and connections between them.
The encoding component is a stack of encoders (the paper stacks six of them on top of each other – there’s nothing magical about the number six, one can definitely experiment with other arrangements). The decoding component is a stack of decoders of the same number.
The encoders are all identical in structure (yet they do not share weights). Each one is broken down into two sub-layers:
The encoder’s inputs first flow through a self-attention layer – a layer that helps the encoder look at other words in the input sentence as it encodes a specific word. We’ll look closer at self-attention later in the post.
The outputs of the self-attention layer are fed to a feed-forward neural network. The exact same feed-forward network is independently applied to each position.
The decoder has both those layers, but between them is an attention layer that helps the decoder focus on relevant parts of the input sentence (similar what attention does in seq2seq models).
Now that we’ve seen the major components of the model, let’s start to look at the various vectors/tensors and how they flow between these components to turn the input of a trained model into an output.
As is the case in NLP applications in general, we begin by turning each input word into a vector using an embedding algorithm.
Each word is embedded into a vector of size 512. We'll represent those vectors with these simple boxes.
The embedding only happens in the bottom-most encoder. The abstraction that is common to all the encoders is that they receive a list of vectors each of the size 512 – In the bottom encoder that would be the word embeddings, but in other encoders, it would be the output of the encoder that’s directly below. The size of this list is hyperparameter we can set – basically it would be the length of the longest sentence in our training dataset.
After embedding the words in our input sequence, each of them flows through each of the two layers of the encoder.
Here we begin to see one key property of the Transformer, which is that the word in each position flows through its own path in the encoder. There are dependencies between these paths in the self-attention layer. The feed-forward layer does not have those dependencies, however, and thus the various paths can be executed in parallel while flowing through the feed-forward layer.
Next, we’ll switch up the example to a shorter sentence and we’ll look at what happens in each sub-layer of the encoder.
Now We’re Encoding!
As we’ve mentioned already, an encoder receives a list of vectors as input. It processes this list by passing these vectors into a ‘self-attention’ layer, then into a feed-forward neural network, then sends out the output upwards to the next encoder.
"""

# Prompt components
persona = "You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.\n"
instruction = "Summarize the key findings of the paper provided.\n"
context = "Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.\n"
data_format = "Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.\n"
audience = "The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.\n"
tone = "The tone should be professional and clear.\n"
text = "MY TEXT TO SUMMARIZE"  # Replace with your own text to summarize
data = f"Text to summarize: {text}"

# The full prompt - remove and add pieces to view its impact on the generated output
query = persona + instruction + context + data_format + audience + tone + data

In [7]:
messages = [
    {"role": "user", "content": query}
]
print(tokenizer.apply_chat_template(messages, tokenize=False))

<|user|>
You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.
Summarize the key findings of the paper provided.
Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.
Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.
The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.
The tone should be professional and clear.
Text to summarize: MY TEXT TO SUMMARIZE<|end|>
<|endoftext|>


In [8]:
# Generate the output
outputs = pipe(messages)
print(outputs[0]["generated_text"])

 - The paper investigates the impact of pre-training data size on the performance of Large Language Models (LLMs).

- It compares models trained on different volumes of data, ranging from smaller datasets to those equivalent to the entirety of human-written text.

- The study finds that models trained on larger datasets generally perform better on a variety of tasks, including language understanding and generation.

- However, the improvement plateaus beyond a certain point, indicating diminishing returns on performance with increasing data size.

- The paper also discusses the challenges of training LLMs on vast datasets, such as computational costs and potential biases in the data.

- It suggests that future research should focus on optimizing data usage and exploring alternative training methodologies to improve efficiency.


The paper presents a comprehensive analysis of how the size of pre-training data affects the performance of Large Language Models. It demonstrates that while l

## In-Context Learning: Providing Examples

這部分程式碼展示了單次示範提示 (one-shot prompting)，模型通過提供的範例來學習任務的格式和風格

In [None]:
# Use a single example of using the made-up word in a sentence
# 定義一個包含多個使用者/助手互動輪次的列表。
# 無需更新模型權重，即可通過範例將預期的行為或格式傳授給模型
one_shot_prompt = [
    {
        "role": "user",
        "content": "A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:"
    },
    {
        "role": "assistant",
        "content": "I have a Gigamuru that my uncle gave me as a gift. I love to play it at home."
    },
    {
        "role": "user",
        "content": "To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:"
    }
]
print(tokenizer.apply_chat_template(one_shot_prompt, tokenize=False))
# 顯示帶有範例的完整提示文本。
# 確保模型明確區分學習範例 (assistant) 和實際問題 (user)

<|user|>
A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:<|end|>
<|assistant|>
I have a Gigamuru that my uncle gave me as a gift. I love to play it at home.<|end|>
<|user|>
To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:<|end|>
<|endoftext|>


In [12]:
# Generate the output
outputs = pipe(one_shot_prompt)
print(outputs[0]["generated_text"])

 During the medieval reenactment, the knight skillfully screeg the wooden target, impressing the onlookers with his prowess.


## Chain Prompting: Breaking up the Problem


In [13]:
# Create name and slogan for a product
product_prompt = [
    {"role": "user", "content": "Create a name and slogan for a chatbot that leverages LLMs."}
]
outputs = pipe(product_prompt)
product_description = outputs[0]["generated_text"]
print(product_description)

 Name: ChatSage
Slogan: "Unleashing the power of AI to enhance your conversations."


In [14]:
# Based on a name and slogan for a product, generate a sales pitch
sales_prompt = [
    {"role": "user", "content": f"Generate a very short sales pitch for the following product: '{product_description}'"}
]
outputs = pipe(sales_prompt)
sales_pitch = outputs[0]["generated_text"]
print(sales_pitch)

 Introducing ChatSage, the ultimate AI companion that revolutionizes your conversations. With our cutting-edge technology, we unleash the power of AI to enhance your interactions, making every conversation more engaging and meaningful. Experience the future of communication with ChatSage today!


# **Reasoning with Generative Models**


這部分展示了兩種主要的推理技巧：思維鏈 (CoT) 和零次示範思維鏈 (Zero-shot CoT)

## Chain-of-Thought: Think Before Answering


讓模型將計算分攤到多個 Token 上，提高模型解決複雜問題（如數學）的準確性

In [None]:
# # Answering without explicit reasoning
# standard_prompt = [
#     {"role": "user", "content": "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?"},
#     {"role": "assistant", "content": "11"},
#     {"role": "user", "content": "The cafeteria had 25 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?"}
# ]

# # Run generative model
# outputs = pipe(standard_prompt)
# print(outputs[0]["generated_text"])

 The cafeteria started with 25 apples. They used 20 apples to make lunch, so they had:

25 - 20 = 5 apples left.

Then they bought 6 more apples, so they now have:

5 + 6 = 11 apples.


In [None]:
# Answering with chain-of-thought
# 定義包含數學問題及其逐步解決方案的提示

cot_prompt = [
    {"role": "user", "content": "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?"},
    {"role": "assistant", "content": "Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11."},
    {"role": "user", "content": "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?"}
]

# Generate the output
outputs = pipe(cot_prompt)
print(outputs[0]["generated_text"])

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


 The cafeteria started with 23 apples. They used 20 apples, so they had 23 - 20 = 3 apples left. Then they bought 6 more apples, so they now have 3 + 6 = 9 apples. The answer is 9.


## Zero-shot Chain-of-Thought


In [None]:
# Zero-shot Chain-of-Thought
zeroshot_cot_prompt = [
    {"role": "user", "content": "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Let's think step-by-step."}
]

# Generate the output
outputs = pipe(zeroshot_cot_prompt)
print(outputs[0]["generated_text"])

 Step 1: Start with the initial number of apples, which is 23.
Step 2: Subtract the number of apples used to make lunch, which is 20. So, 23 - 20 = 3 apples remaining.
Step 3: Add the number of apples bought, which is 6. So, 3 + 6 = 9 apples.

The cafeteria now has 9 apples.


## Tree-of-Thought: Exploring Intermediate Steps


通過提示工程模擬多專家討論的過程

In [None]:
# Zero-shot Chain-of-Thought
zeroshot_tot_prompt = [
    {"role": "user", "content": "Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realises they're wrong at any point then they leave. The question is 'The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?' Make sure to discuss the results."}
]

In [None]:
# Generate the output
outputs = pipe(zeroshot_tot_prompt)
print(outputs[0]["generated_text"])

 Expert 1: Step 1 - Start with the initial number of apples: 23 apples.

Expert 2: Step 1 - Subtract the apples used for lunch: 23 - 20 = 3 apples remaining.

Expert 3: Step 1 - Add the newly bought apples: 3 + 6 = 9 apples.


Expert 1: Step 2 - Confirm the final count: The cafeteria has 9 apples.

Expert 2: Step 2 - Review the calculations: 23 - 20 = 3, then 3 + 6 = 9. The calculations are correct.

Expert 3: Step 2 - Agree with the result: The cafeteria indeed has 9 apples.


All experts agree on the final count: The cafeteria has 9 apples.


# **Output Verification**

## Providing Examples

In [None]:
# Zero-shot learning: Providing no examples
zeroshot_prompt = [
    {"role": "user", "content": "Create a character profile for an RPG game in JSON format."}
]

# Generate the output
outputs = pipe(zeroshot_prompt)
print(outputs[0]["generated_text"])

 {
  "character_profile": {
    "name": "Eldrin Stormbringer",
    "race": "Human",
    "class": "Warlock",
    "level": 5,
    "alignment": "Chaotic Good",
    "attributes": {
      "strength": 8,
      "dexterity": 14,
      "constitution": 10,
      "intelligence": 12,
      "wisdom": 10,
      "charisma": 16
    },
    "skills": [
      {
        "name": "Fireball",
        "proficiency": 18
      },
      {
        "name": "Shadowstep",
        "proficiency": 16
      },
      {
        "name": "Charm Person",
        "proficiency": 14
      }
    ],
    "equipment": [
      {
        "name": "Stormbringer Staff",
        "type": "Magic Weapon",
        "damage": 15
      },
      {
        "name": "Leather Armor",
        "type": "Light Armor",
        "defense": 10
      },
      {
        "name": "Ring of Protection",
        "type": "Magic Armor",
        "defense": 5
      }
    ],
    "background": "Eldrin grew up in a small village, where he was known for his mischievous na

In [None]:
# One-shot learning: Providing an example of the output structure
one_shot_template = """Create a short character profile for an RPG game. Make sure to only use this format:

{
  "description": "A SHORT DESCRIPTION",
  "name": "THE CHARACTER'S NAME",
  "armor": "ONE PIECE OF ARMOR",
  "weapon": "ONE OR MORE WEAPONS"
}
"""
one_shot_prompt = [
    {"role": "user", "content": one_shot_template}
]

# Generate the output
outputs = pipe(one_shot_prompt)
print(outputs[0]["generated_text"])

 {
  "description": "A cunning rogue with a mysterious past, skilled in stealth and deception.",
  "name": "Lysandra Shadowstep",
  "armor": "Leather Cloak of the Night",
  "weapon": "Dagger of Whispers, Throwing Knives"
}


## Grammar: Constrained Sampling


這部分展示瞭如何在運行環境中通過約束採樣來強制模型輸出特定的格式 (JSON)

In [None]:
# 清空GPU 資源

import gc
import torch
del model, tokenizer, pipe

# Flush memory
gc.collect()
torch.cuda.empty_cache()

In [None]:
from llama_cpp.llama import Llama

# Load Phi-3
llm = Llama.from_pretrained(
    repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
    filename="*fp16.gguf",
    n_gpu_layers=-1,
    n_ctx=2048,
    verbose=False
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
# Generate output
output = llm.create_chat_completion(
    messages=[
        {"role": "user", "content": "Create a warrior for an RPG in JSON format."},
    ],
    response_format={"type": "json_object"},
    temperature=0,
)['choices'][0]['message']["content"]


In [None]:
import json

# Format as json
json_output = json.dumps(json.loads(output), indent=4)
print(json_output)

{
    "name": "Eldrin Stormbringer",
    "class": "Ranger",
    "level": 5,
    "attributes": {
        "strength": 14,
        "dexterity": 18,
        "constitution": 12,
        "intelligence": 10,
        "wisdom": 13,
        "charisma": 9
    },
    "skills": {
        "archery": {
            "proficiency": 20,
            "critical_hit_chance": 5,
            "damage_range": [
                8,
                14
            ]
        },
        "stealth": {
            "proficiency": 17,
            "critical_hit_chance": 3,
            "damage_range": [
                2,
                6
            ]
        },
        "nature_magic": {
            "proficiency": 15,
            "critical_hit_chance": 4,
            "healing_range": [
                3,
                7
            ],
            "damage_range": [
                -2,
                2
            ]
        }
    },
    "equipment": {
        "weapons": [
            "Longbow",
            "Dagger"
      

# 總結與後續內容  
第六章深入探討了如何通過精確控制輸入（提示工程）和輸出（採樣參數與約束採樣）來最大化生成模型的能力。    
* 模型選擇： 主要使用 Phi-3-mini（一種解碼器模型）來演示文本生成，因為它平衡了性能和硬體要求。在輸出驗證環節，為了演示約束採樣，使用了 Phi-3 的 GGUF 格式。
*  功能實現： 實現了包括 指令遵循、隨機性控制 (Temperature/Top_P)、情境學習 (Few-shot) 和 逐步推理 (Chain-of-Thought) 等功能。
*  這樣做的好處： 通過提示工程，可以無需重新訓練模型就能讓其適應新的任務要求、提高輸出準確性、並確保輸出格式的穩定性。
*  後續內容： 第七章將在此基礎上，探討更進階的文本生成技術和工具，例如 LangChain 框架、Agent（具備推理和行動能力）以及記憶功能

您提出的問題是關於零樣本 (Zero-shot) 和少樣本 (Few-shot) 學習之間的關係，以及是否有相似的概念可以一同比較。這些概念主要屬於大型語言模型 (LLMs) 中的**提示工程 (Prompt Engineering)** 和**情境學習 (In-Context Learning)** 範疇。

以下是根據來源資料整理的詳細解釋、關係與比較：

---

## 零樣本與少樣本的關係 (Relationship between Zero-shot and Few-shot)

零樣本和少樣本 (以及單樣本 One-shot) 都屬於一種稱為**情境學習 (In-Context Learning)** 的提示工程技術。它們的共同點是透過精心設計的**提示詞 (prompt)** 來**引導**生成模型產生所需的回答或遵循指令，而**無需更新或重新訓練模型本身的權重**。

它們之間的關係主要取決於在提示詞中提供了多少**範例 (examples)**：

| 概念 | 範例數量 (在提示詞中) | 目的 | 來源 |
| :--- | :--- | :--- | :--- |
| **零樣本 (Zero-shot)** | **零**（不提供任何範例） | 模型僅依賴**指令**或**提示詞本身**來回答。 | |
| **單樣本 (One-shot)** | **一個**範例 | 提供一個正確範例，以指導模型理解預期的**格式**或**風格**。 | |
| **少樣本 (Few-shot)** | **兩個或更多**範例 | 提供多個正確範例，以提升模型輸出的**品質、結構**和**一致性**。 | |

**關係總結：** 少樣本提示和零樣本提示都是**情境學習**的不同形式，用來告知模型「應該達成什麼目標」以及「如何達成」。單樣本是少樣本（使用兩個或更多範例）的一種特殊情況。

---

## 與零樣本、少樣本相似的可比較概念

有幾個概念與零樣本/少樣本提示密切相關，並可用於區分 LLM 在不同任務上的能力：

### 1. 情境學習 (In-Context Learning)

情境學習是一個**涵蓋性概念**，用來描述透過在輸入提示詞中提供範例來指導模型預期行為的能力。

*   **關係：** 零樣本、單樣本和少樣本都是**情境學習**的實踐方式。情境學習的優勢在於它能讓模型在**無需更新權重**的情況下，將預期的行為或格式傳授給模型。

### 2. 指令式提示 (Instruction-Based Prompting)

指令式提示是一種更廣泛的類別，重點在於讓 LLM 遵循特定的**指示 (Instruction)** 來解決問題或完成任務。

*   **關係：** 零樣本提示和少樣本提示都是執行**指令式提示**的方式。例如，無論有無範例，要求模型「將句子分類為正面或負面」就是一種指令式提示。

### 3. 思維鏈 (Chain-of-Thought, CoT)

思維鏈是一種進階的提示技巧，旨在讓模型在給出最終答案前，先展示其**逐步的推理過程**。這模仿了人類的「系統 2 思考」模式（有意識、緩慢、邏輯的過程）。

*   **Zero-shot CoT (零樣本思維鏈)：** 這是零樣本概念與思維鏈概念的結合。它**不需要範例**，只需在提示詞末尾加上一句話，例如「**Let's think step-by-step.**」（讓我們一步一步思考）來引導模型產生推理過程。
    *   **與 Zero-shot 的比較：** 零樣本提示（Zero-shot Prompting）僅依賴指令得到答案，而零樣本思維鏈（Zero-shot CoT）則依賴指令加上**引導推理的短語**來提高複雜問題（例如數學問題）的準確性。
*   **Few-shot CoT (少樣本思維鏈)：** 在提示詞中提供包含**逐步推理過程**的範例，以教會模型如何進行邏輯計算。

### 4. 監督式分類 (Supervised Classification)

監督式分類是傳統機器學習方法，需要模型透過**大量已標註的訓練數據**來學習特定任務（例如文本分類），並且通常涉及模型的**微調 (fine-tuning)**。

*   **與 Zero-shot/Few-shot 的比較：**
    *   **目標：** 監督式分類的目標是透過修改模型權重來達到最佳性能，而零樣本/少樣本的目標是**在不修改模型權重**的前提下，通過提示詞來利用模型的通用知識。
    *   **數據需求：** 監督式分類需要大量標註數據；而少樣本分類僅需**少量標註數據** (Few-shot classification)，零樣本分類則**完全不需要標註數據** (Zero-shot classification)。
    *   **應用：** 在資源密集度較高（如計算資源或標註數據）的情況下，零樣本和少樣本技術提供了無需昂貴微調即可解決任務的**靈活性**。

### 5. 少樣本分類框架 (SetFit)

SetFit 是一種用於執行少樣本文本分類的**框架**，儘管它利用了「少樣本」的數據限制，但其底層實現涉及模型**微調 (fine-tuning)**。

*   **與 Few-shot Prompting 的比較：**
    *   **Few-shot Prompting** 是一種**推論 (inference)** 時的技術，僅通過提示詞中的範例指導生成。
    *   **SetFit** 則是一個**訓練 (training)** 框架，它利用少量標註數據生成正/負樣本對，然後通過**對比學習 (contrastive learning)** 來微調預訓練的嵌入模型，從而優化分類任務。雖然 SetFit 在少數範例下表現優異 (例如僅用 32 個標註文檔就達到了 F1 分數 0.85)，但它改變了模型的權重。SetFit 也支援透過標籤名稱生成合成範例來進行零樣本分類。