diff --git a/docs/prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md b/docs/prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md index c43891bde..4e207758e 100644 --- a/docs/prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md +++ b/docs/prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md @@ -1,7 +1,101 @@ --- -title: "" -description: "" -keywords: "" +description: "Active prompting is a method used to identify the most effective examples for human annotation. " --- -[wip] +When we have a large pool of unlabeled examples that could be used in a prompt, how should we decide which examples to manually label? + +Active prompting is a method used to identify the most effective examples for human annotation. The process involves four key steps: + +1. **Uncertainty Estimation**: Assess the uncertainty of the LLM's predictions on each possible example +2. **Selection**: Choose the most uncertain examples for human annotation +3. **Annotation**: Have humans label the selected examples +4. **Inference**: Use the newly labeled data to improve the LLM's performance + +## Uncertainty Estimation + +In this step, we define an unsupervised method to measure the uncertainty of an LLM in answering a given example. + +!!! example "Uncertainty Estimation Example" + + Let's say we ask an LLM the following query: + >query = "Classify the sentiment of this sentence as positive or negative: I am very excited today." + + and the LLM returns: + >response = "positive" + + The goal of uncertainty estimation is to answer: **How sure is the LLM in this response?** + +In order to do this, we query the LLM with the same example _k_ times. Then, we use the _k_ responses to determine how dissimmilar these responses are. Three possible metrics1 are: + +1. **Disagreement**: Ratio of unique responses to total responses. +2. **Entropy**: Measurement based on frequency of each response. +3. **Variance**: Calculation of the spread of numerical responses. + +Below is an example of uncertainty estimation for a single input example using the disagreement uncertainty metric. + +```python +import instructor +from pydantic import BaseModel +from openai import OpenAI + + +class Response(BaseModel): + height: int + + +client = instructor.from_openai(OpenAI()) + + +def query_llm(): + return client.chat.completions.create( + model="gpt-4o", + response_model=Response, + messages=[ + { + "role": "user", + "content": "How tall is the Empire State Building in meters?", + } + ], + ) + + +def calculate_disagreement(responses): + unique_responses = set(responses) + h = len(unique_responses) + return h / k + + +if __name__ == "__main__": + k = 5 # (1)! + responses = [query_llm() for _ in range(k)] # Query the LLM k times + for response in responses: + print(response) + #> height=443 + #> height=443 + #> height=443 + #> height=443 + #> height=381 + + print( + calculate_disagreement([response.height for response in responses]) + ) # Calculate the uncertainty metric + #> 0.4 +``` + +1. _k_ is the number of times to query the LLM with a single unlabeled example + +This process will then be repeated for all unlabeled examples. + +## Selection & Annotation + +Once we have a set of examples and their uncertainties, we can select _n_ of them to be annotated by humans. Here, we choose the examples with the highest uncertainties. + +## Inference + +Now, each time the LLM is prompted, we can include the newly-annotated examples. + +## References + +1: [Active Prompting with Chain-of-Thought for Large Language Models](https://arxiv.org/abs/2302.12246) + +\*: [The Prompt Report: A Systematic Survey of Prompting Techniques](https://arxiv.org/abs/2406.06608) diff --git a/docs/prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md b/docs/prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md index c43891bde..738d42d08 100644 --- a/docs/prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md +++ b/docs/prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md @@ -1,7 +1,150 @@ --- -title: "" -description: "" -keywords: "" +description: "Automate few-shot chain of thought to choose diverse examples" --- -[wip] +How can we improve the performance of few-shot CoT? + +While few-shot CoT reasoning is effective, its effectiveness relies on manually crafted examples. Further, choosing diverse examples has shown effective in reducing reasoning errors from CoT. + +Here, we automate CoT to choose diverse examples. Given a list of potential examples: + +1. **Cluster**: Cluster potential examples +2. **Sample**: For each cluster, + 1. Sort examples by distance from cluster center + 2. Select the first example that meets a predefined selection criteria +3. **Prompt**: Incorporate the chosen questions from each cluster as examples in the LLM prompt + +!!! info + + A sample selection criteria could be limiting the number of reasoning steps to a maximum of 5 steps to encourage sampling examples with simpler rationales. + +```python hl_lines="72 75 106" +import instructor +import numpy as np +from openai import OpenAI +from pydantic import BaseModel +from sklearn.cluster import KMeans +from sentence_transformers import SentenceTransformer + +client = instructor.patch(OpenAI()) +NUM_CLUSTERS = 2 + + +class Example(BaseModel): + question: str + reasoning_steps: list[str] + + +class FinalAnswer(BaseModel): + reasoning_steps: list[str] + answer: int + + +def cluster_and_sort(questions, n_clusters=NUM_CLUSTERS): + # Cluster + embeddings = SentenceTransformer('all-MiniLM-L6-v2').encode(questions) + kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10).fit(embeddings) + + # Sort + sorted_clusters = [[] for _ in range(kmeans.n_clusters)] + for question, embedding, label in zip(questions, embeddings, kmeans.labels_): + center = kmeans.cluster_centers_[label] + distance = np.linalg.norm(embedding - center) + sorted_clusters[label].append((distance, question)) + for cluster in sorted_clusters: + cluster.sort() # Sort by distance + + return sorted_clusters + + +def sample(cluster): + for question in cluster: + response = client.chat.completions.create( + model="gpt-4o", + response_model=Example, + messages=[ + { + "role": "system", + "content": "You are an AI assistant that generates step-by-step reasoning for mathematical questions.", + }, + { + "role": "user", + "content": f"Q: {question}\nA: Let's think step by step.", + }, + ], + ) + if ( + len(response.reasoning_steps) <= 5 + ): # If we satisfy the selection criteria, we've found our question for this cluster + return response + + +if __name__ == "__main__": + questions = [ + "How many apples are left if you have 10 apples and eat 3?", + "What's the sum of 5 and 7?", + "If you have 15 candies and give 6 to your friend, how many do you have left?", + "What's 8 plus 4?", + "You start with 20 stickers and use 8. How many stickers remain?", + "Calculate 6 added to 9.", + ] + + # Cluster and sort the questions + sorted_clusters = cluster_and_sort(questions) + + # Sample questions that match selection criteria for each cluster + selected_examples = [sample(cluster) for cluster in sorted_clusters] + print(selected_examples) + """ + [ + Example( + question='If you have 15 candies and give 6 to your friend, how many do you have left?', + reasoning_steps=[ + 'Start with the total number of candies you have, which is 15.', + 'Subtract the number of candies you give to your friend, which is 6, from the total candies.', + '15 - 6 = 9, so you are left with 9 candies.', + ], + ), + Example( + question="What's the sum of 5 and 7?", + reasoning_steps=[ + 'Identify the numbers to be added: 5 and 7.', + 'Perform the addition: 5 + 7.', + 'The sum is 12.', + ], + ), + ] + """ + + # Use selected questions as examples for the LLM + response = client.chat.completions.create( + model="gpt-4o", + response_model=FinalAnswer, + messages=[ + { + "role": "user", + "content": f""" + {selected_examples} + If there are 10 books in my bad and I read 8 of them, how many books do I have left? Let's think step by step. + """, + } + ], + ) + + print(response.reasoning_steps) + """ + [ + 'Start with the total number of books in the bag, which is 10.', + "Subtract the number of books you've read, which is 8, from the total books.", + '10 - 8 = 2, so you have 2 books left.', + ] + """ + print(response.answer) + #> 2 +``` + +### References + +1: [Automatic Chain of Thought Prompting in Large Language Models](https://arxiv.org/abs/2210.03493) + +\*: [The Prompt Report: A Systematic Survey of Prompting Techniques](https://arxiv.org/abs/2406.06608) diff --git a/docs/prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md b/docs/prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md index c43891bde..18acfc069 100644 --- a/docs/prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md +++ b/docs/prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md @@ -1,7 +1,149 @@ --- -title: "" -description: "" -keywords: "" +description: "Step-back prompting is a two-step prompting technique that asks the LLM a step-back question to gather context for the query" --- -[wip] +How can we encourage an LLM to think through any high-level context required to answer a query? Step-back prompting encourages this in two steps: + +1. **Abstraction**: Ask the LLM a generic, higher-level concept. This is generally topic-specific. This is known as the _step-back question_. +2. **Reasoning**: Ask the LLM the original question, given its answer to the abstract question. This is known as _abstracted-grounded reasoning_. + +!!! example "Step-Back Prompting Example" + + **Original Question**: What happens to the pressure of an ideal gas when temperature and volume are increased? + + **Step-Back Question**: What are the physics concepts associated with this question? + + **Reasoning Prompt**: {step-back response} {original question} + +Note that the step-back question is also generated using an LLM query. + +Step-back prompting has been shown to improve scores on reasoning benchmarks for PaLM-2L and GPT-4.\* + +```python +import openai +import instructor +from pydantic import BaseModel +from typing import Iterable, Literal + +client = instructor.from_openai(openai.OpenAI()) + + +class Stepback(BaseModel): + original_question: str + abstract_question: str + + +class Education(BaseModel): + degree: Literal["Bachelors", "Masters", "PhD"] + school: str + topic: str + year: int + + +class Response(BaseModel): + school: str + + +def generate_stepback_question(): + return client.chat.completions.create( + model="gpt-4o", + response_model=Stepback, + messages=[ + { + "role": "user", + "content": f""" + You are an expert at world knowledge. Your task is to step back + and paraphrase a question to a more generic step-back question, + which is easier to answer. + + Here are a few examples: + Original Question: Which position did Knox Cunningham hold from + May 1955 to Apr 1956? + Step-back Question: Which positions has Knox Cunningham held in + his career? + Original Question: Who was the spouse of Anna Karina from 1968 + to 1974? + Step-back Question: Who were the spouses of Anna Karina? + Original Question: Which team did Thierry Audel play for from + 2007 to 2008? + Step-back Question: Which teams did Thierry Audel play for in + his career? + + Now, generate the step-back question for the following question: + Estella Leopold went to which school between Aug 1954 and + Nov 1954? + """, + }, + ], + ) + + +def ask_stepback_question(stepback): + return client.chat.completions.create( + model="gpt-4o", + response_model=Iterable[Education], + messages=[ + {"role": "user", "content": stepback.abstract_question}, + ], + ) + + +def get_final_response(stepback, stepback_response): + return client.chat.completions.create( + model="gpt-4o", + response_model=Response, + messages=[ + { + "role": "user", + "content": f""" + Q: {stepback.abstract_question}, + A: {stepback_response} + Q: {stepback.original_question} + A: + """, + }, + ], + ) + + +if __name__ == "__main__": + # Generate the step-back question + stepback = generate_stepback_question() + print(stepback.original_question) + #> Estella Leopold went to which school between Aug 1954 and Nov 1954? + print(stepback.abstract_question) + #> Which schools did Estella Leopold attend in her life? + + # Ask the step-back question + stepback_response = ask_stepback_question(stepback) + for item in stepback_response: + print(item) + """ + degree='Bachelors' + school='University of Wisconsin-Madison' + topic='Botany' + year=1948 + """ + """ + degree='Masters' + school='University of California, Berkeley' + topic='Botany and Paleobotany' + year=1950 + """ + """ + degree='PhD' + school='Yale University' + topic='Botany and Paleobotany' + year=1955 + """ + + # Ask the original question, appended with context from the stepback response + print(get_final_response(stepback, stepback_response)) + #> school='Yale University' +``` + +### References + +1: [Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models](https://arxiv.org/abs/2310.06117) + +\*: [The Prompt Report: A Systematic Survey of Prompting Techniques](https://arxiv.org/abs/2406.06608) diff --git a/docs/prompting/zero_shot/role_prompting.md b/docs/prompting/zero_shot/role_prompting.md index a5632058b..141e774f7 100644 --- a/docs/prompting/zero_shot/role_prompting.md +++ b/docs/prompting/zero_shot/role_prompting.md @@ -33,7 +33,7 @@ def classify(support_ticket_title: str): messages=[ { "role": "system", - "content": f"""You are a support agent at a tech company. + "content": """You are a support agent at a tech company. You will be assigned a support ticket to classify. Make sure to only select the label that applies to the support ticket.""", diff --git a/mkdocs.yml b/mkdocs.yml index eb14c5195..538a4ab11 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -226,12 +226,12 @@ nav: - Thought Generation: - Chain-Of-Thought (Zero-Shot): - Analogical Prompting: 'prompting/thought_generation/chain_of_thought_zero_shot/analogical_prompting.md' - - Step-Back Prompting: 'prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md' + - Consider Higher-Level Context: 'prompting/thought_generation/chain_of_thought_zero_shot/step_back_prompting.md' - Examine The Context: 'prompting/thought_generation/chain_of_thought_zero_shot/thread_of_thought.md' - Structure The Reasoning: 'prompting/thought_generation/chain_of_thought_zero_shot/tab_cot.md' - Chain-Of-Thought (Few-Shot): - - Active-Prompt: 'prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md' - - Auto-CoT: 'prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md' + - Prioritize Uncertain Examples: 'prompting/thought_generation/chain_of_thought_few_shot/active_prompt.md' + - Automate Example Selection: 'prompting/thought_generation/chain_of_thought_few_shot/auto_cot.md' - Prioritize Complex Examples: 'prompting/thought_generation/chain_of_thought_few_shot/complexity_based.md' - Include Incorrect Examples: 'prompting/thought_generation/chain_of_thought_few_shot/contrastive.md' - Memory-of-Thought: 'prompting/thought_generation/chain_of_thought_few_shot/memory_of_thought.md'