diff --git a/docs/prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md b/docs/prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md
index c43891bde..4e207758e 100644
--- a/docs/prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md
+++ b/docs/prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md
@@ -1,7 +1,101 @@
---
-title: ""
-description: ""
-keywords: ""
+description: "Active prompting is a method used to identify the most effective examples for human annotation. "
---
-[wip]
+When we have a large pool of unlabeled examples that could be used in a prompt, how should we decide which examples to manually label?
+
+Active prompting is a method used to identify the most effective examples for human annotation. The process involves four key steps:
+
+1. **Uncertainty Estimation**: Assess the uncertainty of the LLM's predictions on each possible example
+2. **Selection**: Choose the most uncertain examples for human annotation
+3. **Annotation**: Have humans label the selected examples
+4. **Inference**: Use the newly labeled data to improve the LLM's performance
+
+## Uncertainty Estimation
+
+In this step, we define an unsupervised method to measure the uncertainty of an LLM in answering a given example.
+
+!!! example "Uncertainty Estimation Example"
+
+ Let's say we ask an LLM the following query:
+ >query = "Classify the sentiment of this sentence as positive or negative: I am very excited today."
+
+ and the LLM returns:
+ >response = "positive"
+
+ The goal of uncertainty estimation is to answer: **How sure is the LLM in this response?**
+
+In order to do this, we query the LLM with the same example _k_ times. Then, we use the _k_ responses to determine how dissimmilar these responses are. Three possible metrics1 are:
+
+1. **Disagreement**: Ratio of unique responses to total responses.
+2. **Entropy**: Measurement based on frequency of each response.
+3. **Variance**: Calculation of the spread of numerical responses.
+
+Below is an example of uncertainty estimation for a single input example using the disagreement uncertainty metric.
+
+```python
+import instructor
+from pydantic import BaseModel
+from openai import OpenAI
+
+
+class Response(BaseModel):
+ height: int
+
+
+client = instructor.from_openai(OpenAI())
+
+
+def query_llm():
+ return client.chat.completions.create(
+ model="gpt-4o",
+ response_model=Response,
+ messages=[
+ {
+ "role": "user",
+ "content": "How tall is the Empire State Building in meters?",
+ }
+ ],
+ )
+
+
+def calculate_disagreement(responses):
+ unique_responses = set(responses)
+ h = len(unique_responses)
+ return h / k
+
+
+if __name__ == "__main__":
+ k = 5 # (1)!
+ responses = [query_llm() for _ in range(k)] # Query the LLM k times
+ for response in responses:
+ print(response)
+ #> height=443
+ #> height=443
+ #> height=443
+ #> height=443
+ #> height=381
+
+ print(
+ calculate_disagreement([response.height for response in responses])
+ ) # Calculate the uncertainty metric
+ #> 0.4
+```
+
+1. _k_ is the number of times to query the LLM with a single unlabeled example
+
+This process will then be repeated for all unlabeled examples.
+
+## Selection & Annotation
+
+Once we have a set of examples and their uncertainties, we can select _n_ of them to be annotated by humans. Here, we choose the examples with the highest uncertainties.
+
+## Inference
+
+Now, each time the LLM is prompted, we can include the newly-annotated examples.
+
+## References
+
+1: [Active Prompting with Chain-of-Thought for Large Language Models](https://arxiv.org/abs/2302.12246)
+
+\*: [The Prompt Report: A Systematic Survey of Prompting Techniques](https://arxiv.org/abs/2406.06608)
diff --git a/docs/prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md b/docs/prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md
index c43891bde..738d42d08 100644
--- a/docs/prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md
+++ b/docs/prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md
@@ -1,7 +1,150 @@
---
-title: ""
-description: ""
-keywords: ""
+description: "Automate few-shot chain of thought to choose diverse examples"
---
-[wip]
+How can we improve the performance of few-shot CoT?
+
+While few-shot CoT reasoning is effective, its effectiveness relies on manually crafted examples. Further, choosing diverse examples has shown effective in reducing reasoning errors from CoT.
+
+Here, we automate CoT to choose diverse examples. Given a list of potential examples:
+
+1. **Cluster**: Cluster potential examples
+2. **Sample**: For each cluster,
+ 1. Sort examples by distance from cluster center
+ 2. Select the first example that meets a predefined selection criteria
+3. **Prompt**: Incorporate the chosen questions from each cluster as examples in the LLM prompt
+
+!!! info
+
+ A sample selection criteria could be limiting the number of reasoning steps to a maximum of 5 steps to encourage sampling examples with simpler rationales.
+
+```python hl_lines="72 75 106"
+import instructor
+import numpy as np
+from openai import OpenAI
+from pydantic import BaseModel
+from sklearn.cluster import KMeans
+from sentence_transformers import SentenceTransformer
+
+client = instructor.patch(OpenAI())
+NUM_CLUSTERS = 2
+
+
+class Example(BaseModel):
+ question: str
+ reasoning_steps: list[str]
+
+
+class FinalAnswer(BaseModel):
+ reasoning_steps: list[str]
+ answer: int
+
+
+def cluster_and_sort(questions, n_clusters=NUM_CLUSTERS):
+ # Cluster
+ embeddings = SentenceTransformer('all-MiniLM-L6-v2').encode(questions)
+ kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10).fit(embeddings)
+
+ # Sort
+ sorted_clusters = [[] for _ in range(kmeans.n_clusters)]
+ for question, embedding, label in zip(questions, embeddings, kmeans.labels_):
+ center = kmeans.cluster_centers_[label]
+ distance = np.linalg.norm(embedding - center)
+ sorted_clusters[label].append((distance, question))
+ for cluster in sorted_clusters:
+ cluster.sort() # Sort by distance
+
+ return sorted_clusters
+
+
+def sample(cluster):
+ for question in cluster:
+ response = client.chat.completions.create(
+ model="gpt-4o",
+ response_model=Example,
+ messages=[
+ {
+ "role": "system",
+ "content": "You are an AI assistant that generates step-by-step reasoning for mathematical questions.",
+ },
+ {
+ "role": "user",
+ "content": f"Q: {question}\nA: Let's think step by step.",
+ },
+ ],
+ )
+ if (
+ len(response.reasoning_steps) <= 5
+ ): # If we satisfy the selection criteria, we've found our question for this cluster
+ return response
+
+
+if __name__ == "__main__":
+ questions = [
+ "How many apples are left if you have 10 apples and eat 3?",
+ "What's the sum of 5 and 7?",
+ "If you have 15 candies and give 6 to your friend, how many do you have left?",
+ "What's 8 plus 4?",
+ "You start with 20 stickers and use 8. How many stickers remain?",
+ "Calculate 6 added to 9.",
+ ]
+
+ # Cluster and sort the questions
+ sorted_clusters = cluster_and_sort(questions)
+
+ # Sample questions that match selection criteria for each cluster
+ selected_examples = [sample(cluster) for cluster in sorted_clusters]
+ print(selected_examples)
+ """
+ [
+ Example(
+ question='If you have 15 candies and give 6 to your friend, how many do you have left?',
+ reasoning_steps=[
+ 'Start with the total number of candies you have, which is 15.',
+ 'Subtract the number of candies you give to your friend, which is 6, from the total candies.',
+ '15 - 6 = 9, so you are left with 9 candies.',
+ ],
+ ),
+ Example(
+ question="What's the sum of 5 and 7?",
+ reasoning_steps=[
+ 'Identify the numbers to be added: 5 and 7.',
+ 'Perform the addition: 5 + 7.',
+ 'The sum is 12.',
+ ],
+ ),
+ ]
+ """
+
+ # Use selected questions as examples for the LLM
+ response = client.chat.completions.create(
+ model="gpt-4o",
+ response_model=FinalAnswer,
+ messages=[
+ {
+ "role": "user",
+ "content": f"""
+ {selected_examples}
+ If there are 10 books in my bad and I read 8 of them, how many books do I have left? Let's think step by step.
+ """,
+ }
+ ],
+ )
+
+ print(response.reasoning_steps)
+ """
+ [
+ 'Start with the total number of books in the bag, which is 10.',
+ "Subtract the number of books you've read, which is 8, from the total books.",
+ '10 - 8 = 2, so you have 2 books left.',
+ ]
+ """
+ print(response.answer)
+ #> 2
+```
+
+### References
+
+1: [Automatic Chain of Thought Prompting in Large Language Models](https://arxiv.org/abs/2210.03493)
+
+\*: [The Prompt Report: A Systematic Survey of Prompting Techniques](https://arxiv.org/abs/2406.06608)
diff --git a/docs/prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md b/docs/prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md
index c43891bde..18acfc069 100644
--- a/docs/prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md
+++ b/docs/prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md
@@ -1,7 +1,149 @@
---
-title: ""
-description: ""
-keywords: ""
+description: "Step-back prompting is a two-step prompting technique that asks the LLM a step-back question to gather context for the query"
---
-[wip]
+How can we encourage an LLM to think through any high-level context required to answer a query? Step-back prompting encourages this in two steps:
+
+1. **Abstraction**: Ask the LLM a generic, higher-level concept. This is generally topic-specific. This is known as the _step-back question_.
+2. **Reasoning**: Ask the LLM the original question, given its answer to the abstract question. This is known as _abstracted-grounded reasoning_.
+
+!!! example "Step-Back Prompting Example"
+
+ **Original Question**: What happens to the pressure of an ideal gas when temperature and volume are increased?
+
+ **Step-Back Question**: What are the physics concepts associated with this question?
+
+ **Reasoning Prompt**: {step-back response} {original question}
+
+Note that the step-back question is also generated using an LLM query.
+
+Step-back prompting has been shown to improve scores on reasoning benchmarks for PaLM-2L and GPT-4.\*
+
+```python
+import openai
+import instructor
+from pydantic import BaseModel
+from typing import Iterable, Literal
+
+client = instructor.from_openai(openai.OpenAI())
+
+
+class Stepback(BaseModel):
+ original_question: str
+ abstract_question: str
+
+
+class Education(BaseModel):
+ degree: Literal["Bachelors", "Masters", "PhD"]
+ school: str
+ topic: str
+ year: int
+
+
+class Response(BaseModel):
+ school: str
+
+
+def generate_stepback_question():
+ return client.chat.completions.create(
+ model="gpt-4o",
+ response_model=Stepback,
+ messages=[
+ {
+ "role": "user",
+ "content": f"""
+ You are an expert at world knowledge. Your task is to step back
+ and paraphrase a question to a more generic step-back question,
+ which is easier to answer.
+
+ Here are a few examples:
+ Original Question: Which position did Knox Cunningham hold from
+ May 1955 to Apr 1956?
+ Step-back Question: Which positions has Knox Cunningham held in
+ his career?
+ Original Question: Who was the spouse of Anna Karina from 1968
+ to 1974?
+ Step-back Question: Who were the spouses of Anna Karina?
+ Original Question: Which team did Thierry Audel play for from
+ 2007 to 2008?
+ Step-back Question: Which teams did Thierry Audel play for in
+ his career?
+
+ Now, generate the step-back question for the following question:
+ Estella Leopold went to which school between Aug 1954 and
+ Nov 1954?
+ """,
+ },
+ ],
+ )
+
+
+def ask_stepback_question(stepback):
+ return client.chat.completions.create(
+ model="gpt-4o",
+ response_model=Iterable[Education],
+ messages=[
+ {"role": "user", "content": stepback.abstract_question},
+ ],
+ )
+
+
+def get_final_response(stepback, stepback_response):
+ return client.chat.completions.create(
+ model="gpt-4o",
+ response_model=Response,
+ messages=[
+ {
+ "role": "user",
+ "content": f"""
+ Q: {stepback.abstract_question},
+ A: {stepback_response}
+ Q: {stepback.original_question}
+ A:
+ """,
+ },
+ ],
+ )
+
+
+if __name__ == "__main__":
+ # Generate the step-back question
+ stepback = generate_stepback_question()
+ print(stepback.original_question)
+ #> Estella Leopold went to which school between Aug 1954 and Nov 1954?
+ print(stepback.abstract_question)
+ #> Which schools did Estella Leopold attend in her life?
+
+ # Ask the step-back question
+ stepback_response = ask_stepback_question(stepback)
+ for item in stepback_response:
+ print(item)
+ """
+ degree='Bachelors'
+ school='University of Wisconsin-Madison'
+ topic='Botany'
+ year=1948
+ """
+ """
+ degree='Masters'
+ school='University of California, Berkeley'
+ topic='Botany and Paleobotany'
+ year=1950
+ """
+ """
+ degree='PhD'
+ school='Yale University'
+ topic='Botany and Paleobotany'
+ year=1955
+ """
+
+ # Ask the original question, appended with context from the stepback response
+ print(get_final_response(stepback, stepback_response))
+ #> school='Yale University'
+```
+
+### References
+
+1: [Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models](https://arxiv.org/abs/2310.06117)
+
+\*: [The Prompt Report: A Systematic Survey of Prompting Techniques](https://arxiv.org/abs/2406.06608)
diff --git a/docs/prompting/zero_shot/role_prompting.md b/docs/prompting/zero_shot/role_prompting.md
index a5632058b..141e774f7 100644
--- a/docs/prompting/zero_shot/role_prompting.md
+++ b/docs/prompting/zero_shot/role_prompting.md
@@ -33,7 +33,7 @@ def classify(support_ticket_title: str):
messages=[
{
"role": "system",
- "content": f"""You are a support agent at a tech company.
+ "content": """You are a support agent at a tech company.
You will be assigned a support ticket to classify.
Make sure to only select the label that applies to
the support ticket.""",
diff --git a/mkdocs.yml b/mkdocs.yml
index eb14c5195..538a4ab11 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -226,12 +226,12 @@ nav:
- Thought Generation:
- Chain-Of-Thought (Zero-Shot):
- Analogical Prompting: 'prompting/thought_generation/chain_of_thought_zero_shot/analogical_prompting.md'
- - Step-Back Prompting: 'prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md'
+ - Consider Higher-Level Context: 'prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md'
- Examine The Context: 'prompting/thought_generation/chain_of_thought_zero_shot/thread_of_thought.md'
- Structure The Reasoning: 'prompting/thought_generation/chain_of_thought_zero_shot/tab_cot.md'
- Chain-Of-Thought (Few-Shot):
- - Active-Prompt: 'prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md'
- - Auto-CoT: 'prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md'
+ - Prioritize Uncertain Examples: 'prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md'
+ - Automate Example Selection: 'prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md'
- Prioritize Complex Examples: 'prompting/thought_generation/chain_of_thought_few_shot/complexity_based.md'
- Include Incorrect Examples: 'prompting/thought_generation/chain_of_thought_few_shot/contrastive.md'
- Memory-of-Thought: 'prompting/thought_generation/chain_of_thought_few_shot/memory_of_thought.md'