### Prompt Optimization Techniques
#### Overview
This tutorial explores advanced techniques for optimizing prompts when working with large language models. We focus on two key strategies: A/B testing prompts and iterative refinement. These methods are crucial for improving the effectiveness and efficiency of AI-driven applications.

#### Motivation
As AI language models become more sophisticated, the quality of prompts used to interact with them becomes increasingly important. Optimized prompts can lead to more accurate, relevant, and useful responses, enhancing the overall performance of AI applications. This tutorial aims to equip learners with practical techniques to systematically improve their prompts.

#### Key Components
**A/B Testing Prompts**: A method to compare the effectiveness of different prompt variations.

**Iterative Refinement**: A strategy for gradually improving prompts based on feedback and results.

**Performance Metrics**: Ways to measure and compare the quality of responses from different prompts.

**Practical Implementation**: Hands-on examples using OpenAI's GPT model and LangChain.

In [1]:
import os
import re

from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
import numpy as np

from dotenv import load_dotenv
load_dotenv()

# Set up OpenAI API key
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

# Initialize the language model
llm = ChatOpenAI(model="gpt-4o")

# Define a helper function to generate responses
def generate_response(prompt):
    """Generate a response using the language model.

    Args:
        prompt (str): The input prompt.

    Returns:
        str: The generated response.
    """
    return llm.invoke(prompt).content

### A/B Testing Prompts

In [3]:
# Define prompt variations
prompt_a = PromptTemplate(
    input_variables=["topic"],
    template="Provide a beginner-friendly explanation of {topic}, including key concepts and an example."
)

prompt_b = PromptTemplate(
    input_variables=["topic"],
    template="Explain {topic}, in simple terms."
)

# Updated function to evaluate response quality
def evaluate_response(response, criteria):
    """Evaluate the quality of a response based on given criteria.

    Args:
        response (str): The generated response.
        criteria (list): List of criteria to evaluate.

    Returns:
        float: The average score across all criteria.
    """
    scores = []
    for criterion in criteria:
        print(f"Evaluating response based on {criterion}...")
        prompt = f"On a scale of 1-10, rate the following response on {criterion}. Start your response with the numeric score:\n\n{response}"
        response = generate_response(prompt)
        # show 50 characters of the response
        # Use regex to find the first number in the response
        score_match = re.search(r'\d+', response)
        if score_match:
            score = int(score_match.group())
            scores.append(min(score, 10))  # Ensure score is not greater than 10
        else:
            print(f"Warning: Could not extract numeric score for {criterion}. Using default score of 5.")
            scores.append(5)  # Default score if no number is found
    return np.mean(scores)

# Perform A/B test
topic = "machine learning"
response_a = generate_response(prompt_a.format(topic=topic))
response_b = generate_response(prompt_b.format(topic=topic))

criteria = ["clarity", "informativeness", "engagement"]
score_a = evaluate_response(response_a, criteria)
score_b = evaluate_response(response_b, criteria)

print(f"Prompt A score: {score_a:.2f}")
print(f"Prompt B score: {score_b:.2f}")
print(f"Winning prompt: {'A' if score_a > score_b else 'B'}")

Evaluating response based on clarity...
Evaluating response based on informativeness...
Evaluating response based on engagement...
Evaluating response based on clarity...
Evaluating response based on informativeness...
Evaluating response based on engagement...
Prompt A score: 9.00
Prompt B score: 8.33
Winning prompt: A


### Iterative Refinement

In [4]:
def refine_prompt(initial_prompt, topic, iterations=3):
    """Refine a prompt through multiple iterations.

    Args:
        initial_prompt (PromptTemplate): The starting prompt template.
        topic (str): The topic to explain.
        iterations (int): Number of refinement iterations.

    Returns:
        PromptTemplate: The final refined prompt template.
    """
    current_prompt = initial_prompt
    for i in range(iterations):
        try:
            response = generate_response(current_prompt.format(topic=topic))
        except KeyError as e:
            print(f"Error in iteration {i+1}: Missing key {e}. Adjusting prompt...")
            # Remove the problematic placeholder
            current_prompt.template = current_prompt.template.replace(f"{{{e.args[0]}}}", "relevant example")
            response = generate_response(current_prompt.format(topic=topic))
        
        # Generate feedback and suggestions for improvement
        feedback_prompt = f"Analyze the following explanation of {topic} and suggest improvements to the prompt that generated it:\n\n{response}"
        feedback = generate_response(feedback_prompt)
        
        # Use the feedback to refine the prompt
        refine_prompt = f"Based on this feedback: '{feedback}', improve the following prompt template. Ensure to only use the variable {{topic}} in your template:\n\n{current_prompt.template}"
        refined_template = generate_response(refine_prompt)
        
        current_prompt = PromptTemplate(
            input_variables=["topic"],
            template=refined_template
        )
        
        print(f"Iteration {i+1} prompt: {current_prompt.template}")
    
    return current_prompt

# Perform A/B test
topic = "machine learning"
response_a = generate_response(prompt_a.format(topic=topic))
response_b = generate_response(prompt_b.format(topic=topic))

criteria = ["clarity", "informativeness", "engagement"]
score_a = evaluate_response(response_a, criteria)
score_b = evaluate_response(response_b, criteria)

print(f"Prompt A score: {score_a:.2f}")
print(f"Prompt B score: {score_b:.2f}")
print(f"Winning prompt: {'A' if score_a > score_b else 'B'}")

# Start with the winning prompt from A/B testing
initial_prompt = prompt_b if score_b > score_a else prompt_a
refined_prompt = refine_prompt(initial_prompt, "machine learning")

print("\nFinal refined prompt:")
print(refined_prompt.template)

Evaluating response based on clarity...
Evaluating response based on informativeness...
Evaluating response based on engagement...
Evaluating response based on clarity...
Evaluating response based on informativeness...
Evaluating response based on engagement...
Prompt A score: 8.33
Prompt B score: 9.00
Winning prompt: B
Iteration 1 prompt: Explain {topic} in simple terms by using relatable analogies. To make the explanation more comprehensive and engaging, consider the following elements:

1. **Real-world Applications:** Provide diverse and specific examples of {topic} in action, such as in recommendation systems, fraud detection, autonomous vehicles, or personalized marketing, to demonstrate its practical uses.

2. **Types of Machine Learning:** Briefly introduce the different types, such as supervised learning, unsupervised learning, and reinforcement learning, with examples for each to offer a broader understanding of the field.

3. **Key Concepts:** Explain essential concepts like 

### Comparing Original and Refined Prompts

In [5]:
original_response = generate_response(initial_prompt.format(topic="machine learning"))
refined_response = generate_response(refined_prompt.format(topic="machine learning"))

original_score = evaluate_response(original_response, criteria)
refined_score = evaluate_response(refined_response, criteria)

print(f"Original prompt score: {original_score:.2f}")
print(f"Refined prompt score: {refined_score:.2f}")
print(f"Improvement: {(refined_score - original_score):.2f} points")

Evaluating response based on clarity...
Evaluating response based on informativeness...
Evaluating response based on engagement...
Evaluating response based on clarity...
Evaluating response based on informativeness...
Evaluating response based on engagement...
Original prompt score: 9.00
Refined prompt score: 8.33
Improvement: -0.67 points
