Setup & Configuration

In [1]:
# Import required libraries
import os
from openai import OpenAI
import json
from collections import Counter
import time
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Set default model
MODEL = "gpt-4o-mini"

print("‚úÖ Setup complete! OpenAI client initialized.")

‚úÖ Setup complete! OpenAI client initialized.


Helper function

In [2]:
def call_openai(prompt, system_message="You are a helpful assistant.", temperature=0.7, max_tokens=None, seed=None):
    """Helper function to call OpenAI API with specified parameters"""
    try:
        params = {
            "model": MODEL,
            "messages": [
                {"role": "system", "content": system_message},
                {"role": "user", "content": prompt}
            ],
            "temperature": temperature
        }
        
        if max_tokens:
            params["max_tokens"] = max_tokens
        if seed is not None:
            params["seed"] = seed
            
        response = client.chat.completions.create(**params)
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

def display_response(title, response, params=None):
    """Display a response with formatting"""
    print(f"\n{'='*60}")
    print(f"üìù {title}")
    if params:
        print(f"Parameters: {params}")
    print(f"{'='*60}")
    print(response)
    print(f"{'='*60}\n")

def count_tokens_approx(text):
    """Approximate token count (roughly 4 characters per token)"""
    return len(text) // 4

print("‚úÖ Helper functions loaded!")

‚úÖ Helper functions loaded!


Task 1: Sentiment Analysis

In [3]:
# Initial simple prompt for sentiment analysis
sentiment_prompt_v1 = """
Classify this customer message: "I love this product! It's exactly what I needed."
"""

# Test it once
result = call_openai(sentiment_prompt_v1)
print("Sentiment Analysis Result:")
print(result)


Sentiment Analysis Result:
The customer message can be classified as **Positive Feedback** or **Customer Satisfaction**.


Product description generation

In [4]:
# Initial simple prompt for product description
product_prompt_v1 = """
Create a product description for a wireless mouse that costs $29.99.
"""

# Test it once
result = call_openai(product_prompt_v1)
print("Product Description Result:")
print(result)


Product Description Result:
**Product Description: Wireless Ergonomic Mouse - $29.99**

Elevate your computing experience with our Wireless Ergonomic Mouse, designed for comfort, precision, and style. Priced at just $29.99, this sleek and modern mouse is perfect for both home and office use.

**Key Features:**

- **Wireless Freedom**: Enjoy the convenience of a clutter-free workspace with reliable wireless connectivity. The advanced 2.4GHz technology ensures a strong, stable connection up to 33 feet away, allowing you to move freely without being tethered to your computer.

- **Ergonomic Design**: Engineered with your comfort in mind, the ergonomic shape fits snugly in your hand, reducing strain during long hours of use. Ideal for both left and right-handed users, this mouse promotes a natural hand position for enhanced comfort.

- **Precision Tracking**: Equipped with a high-precision optical sensor, our wireless mouse delivers smooth, accurate tracking on various surfaces. Whether yo

Task 3: Data Extraction

In [5]:
# Initial simple prompt for data extraction
extraction_prompt_v1 = """
Extract information from this customer feedback: "I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged."
"""

# Test it once
result = call_openai(extraction_prompt_v1)
print("Data Extraction Result:")
print(result)


Data Extraction Result:
Here is the extracted information from the customer feedback:

- **Order Number**: #12345
- **Order Date**: March 15th
- **Delivery Speed**: Fast
- **Issue**: Damaged packaging


Part 2: Diagnosing Failures - Systematic Testing

Step 3: Run Prompts 5 Times

Prompt = "I love this product! It's exactly what I needed."

In [6]:
# Run the same prompt 5 times with temperature=0.7
prompt = "I love this product! It's exactly what I needed."
temperature = 0.7
num_runs = 5

print(f"Running the same prompt {num_runs} times with temperature={temperature}")
print(f"Prompt: '{prompt}'\n")
print("="*80)

responses = []
for i in range(num_runs):
    response = call_openai(prompt, temperature=temperature)
    responses.append(response)
    print(f"\nüîÑ Run #{i+1}:")
    print(response)
    print("-"*80)
    time.sleep(0.5)  # Small delay to avoid rate limiting

print("\n" + "="*80)
print("üìä Analysis:")
print(f"Total unique responses: {len(set(responses))} out of {num_runs}")
print(f"Average length: {sum(len(r) for r in responses) / len(responses):.0f} characters")
print("\nüí° Key Insight: Even with the same prompt and parameters, we get different outputs!")
print("This is why prompt engineering is an empirical science - you must test and iterate.")

Running the same prompt 5 times with temperature=0.7
Prompt: 'I love this product! It's exactly what I needed.'


üîÑ Run #1:
I'm glad to hear that you love the product! What specific features do you appreciate the most?
--------------------------------------------------------------------------------

üîÑ Run #2:
That's great to hear! I'm glad you found a product that meets your needs. What specific features do you love about it?
--------------------------------------------------------------------------------

üîÑ Run #3:
I'm glad to hear that you love the product! It's always great to find something that meets your needs perfectly. What specifically do you like about it?
--------------------------------------------------------------------------------

üîÑ Run #4:
That's great to hear! What specific product are you referring to? I‚Äôd love to know more about what you like about it!
--------------------------------------------------------------------------------

üîÑ Run #5:
I'm gl

Prompt = "Create a product description for a wireless mouse that costs $29.99."

In [7]:
# Run the same prompt 5 times with temperature=0.7
prompt = "Create a product description for a wireless mouse that costs $29.99."
temperature = 0.7
num_runs = 5

print(f"Running the same prompt {num_runs} times with temperature={temperature}")
print(f"Prompt: '{prompt}'\n")
print("="*80)

responses = []
for i in range(num_runs):
    response = call_openai(prompt, temperature=temperature)
    responses.append(response)
    print(f"\nüîÑ Run #{i+1}:")
    print(response)
    print("-"*80)
    time.sleep(0.5)  # Small delay to avoid rate limiting

print("\n" + "="*80)
print("üìä Analysis:")
print(f"Total unique responses: {len(set(responses))} out of {num_runs}")
print(f"Average length: {sum(len(r) for r in responses) / len(responses):.0f} characters")
print("\nüí° Key Insight: Even with the same prompt and parameters, we get different outputs!")
print("This is why prompt engineering is an empirical science - you must test and iterate.")

Running the same prompt 5 times with temperature=0.7
Prompt: 'Create a product description for a wireless mouse that costs $29.99.'


üîÑ Run #1:
**Product Description: Wireless Comfort Mouse**

Elevate your productivity and style with our Wireless Comfort Mouse, now available for just $29.99! Designed for both work and play, this sleek and ergonomic mouse offers a perfect blend of functionality and comfort, making it an essential addition to your tech arsenal.

**Key Features:**

- **Ergonomic Design:** The contoured shape fits naturally in your hand, reducing wrist strain and providing all-day comfort, whether you're at the office or working from home.

- **Wireless Convenience:** Say goodbye to tangled cords! Our advanced 2.4GHz wireless technology provides a reliable connection with a range of up to 33 feet, allowing you to roam freely without interruptions.

- **High Precision Tracking:** Equipped with an adjustable DPI (800/1200/1600) sensor, this mouse delivers smooth and accur

Prompt = "I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged."

In [8]:
# Run the same prompt 5 times with temperature=0.7
prompt = "I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged."
temperature = 0.7
num_runs = 5

print(f"Running the same prompt {num_runs} times with temperature={temperature}")
print(f"Prompt: '{prompt}'\n")
print("="*80)

responses = []
for i in range(num_runs):
    response = call_openai(prompt, temperature=temperature)
    responses.append(response)
    print(f"\nüîÑ Run #{i+1}:")
    print(response)
    print("-"*80)
    time.sleep(0.5)  # Small delay to avoid rate limiting

print("\n" + "="*80)
print("üìä Analysis:")
print(f"Total unique responses: {len(set(responses))} out of {num_runs}")
print(f"Average length: {sum(len(r) for r in responses) / len(responses):.0f} characters")
print("\nüí° Key Insight: Even with the same prompt and parameters, we get different outputs!")
print("This is why prompt engineering is an empirical science - you must test and iterate.")

Running the same prompt 5 times with temperature=0.7
Prompt: 'I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged.'


üîÑ Run #1:
I'm sorry to hear that your package arrived damaged. If you need assistance with this issue, you may want to contact the seller or the customer service department of the company you ordered from. They can help you with potential returns, exchanges, or refunds. Be sure to have your order details handy, including the order number and any photos of the damage, if possible. Is there anything else you would like to know or need help with?
--------------------------------------------------------------------------------

üîÑ Run #2:
I'm sorry to hear that the packaging was damaged. While it's great to know that the delivery was fast, damaged packaging can be concerning. Would you like assistance with how to report this issue to the seller or request a replacement?
---------------------------------------------------------------

Step 4: Run Prompts 10 Times

Prompt: "I love this product! It's exactly what I needed."

In [9]:
# Run the same prompt 5 times with temperature=0.7
prompt = "I love this product! It's exactly what I needed."
temperature = 0.7
num_runs = 10

print(f"Running the same prompt {num_runs} times with temperature={temperature}")
print(f"Prompt: '{prompt}'\n")
print("="*80)

responses = []
for i in range(num_runs):
    response = call_openai(prompt, temperature=temperature)
    responses.append(response)
    print(f"\nüîÑ Run #{i+1}:")
    print(response)
    print("-"*80)
    time.sleep(0.5)  # Small delay to avoid rate limiting

print("\n" + "="*80)
print("üìä Analysis:")
print(f"Total unique responses: {len(set(responses))} out of {num_runs}")
print(f"Average length: {sum(len(r) for r in responses) / len(responses):.0f} characters")
print("\nüí° Key Insight: Even with the same prompt and parameters, we get different outputs!")
print("This is why prompt engineering is an empirical science - you must test and iterate.")

Running the same prompt 10 times with temperature=0.7
Prompt: 'I love this product! It's exactly what I needed.'


üîÑ Run #1:
I'm glad to hear that you love the product! What specific features do you appreciate the most?
--------------------------------------------------------------------------------

üîÑ Run #2:
I'm glad to hear that you love the product! What specifically do you like about it?
--------------------------------------------------------------------------------

üîÑ Run #3:
I'm glad to hear that you love the product! What specific features or aspects do you find most helpful?
--------------------------------------------------------------------------------

üîÑ Run #4:
I'm glad to hear that you love the product! It‚Äôs always great to find something that meets your needs perfectly. What specific features do you like about it?
--------------------------------------------------------------------------------

üîÑ Run #5:
I'm glad to hear that you love the product! What 

Prompt = "Create a product description for a wireless mouse that costs $29.99."

In [10]:
# Run the same prompt 5 times with temperature=0.7
# Run the same prompt 5 times with temperature=0.7
prompt = "Create a product description for a wireless mouse that costs $29.99."
temperature = 0.7
num_runs = 10

print(f"Running the same prompt {num_runs} times with temperature={temperature}")
print(f"Prompt: '{prompt}'\n")
print("="*80)

responses = []
for i in range(num_runs):
    response = call_openai(prompt, temperature=temperature)
    responses.append(response)
    print(f"\nüîÑ Run #{i+1}:")
    print(response)
    print("-"*80)
    time.sleep(0.5)  # Small delay to avoid rate limiting

print("\n" + "="*80)
print("üìä Analysis:")
print(f"Total unique responses: {len(set(responses))} out of {num_runs}")
print(f"Average length: {sum(len(r) for r in responses) / len(responses):.0f} characters")
print("\nüí° Key Insight: Even with the same prompt and parameters, we get different outputs!")
print("This is why prompt engineering is an empirical science - you must test and iterate.")

Running the same prompt 10 times with temperature=0.7
Prompt: 'Create a product description for a wireless mouse that costs $29.99.'


üîÑ Run #1:
**Product Description: Wireless Freedom: Ergonomic Wireless Mouse - $29.99**

Elevate your computing experience with our Ergonomic Wireless Mouse, designed for comfort, precision, and convenience. Priced at just $29.99, this sleek accessory combines functionality and style, making it the perfect companion for both work and play.

**Key Features:**

- **Wireless Convenience:** Say goodbye to tangled cords! Our wireless mouse offers a seamless connection with a reliable USB receiver, allowing you to enjoy the freedom of movement without compromising on performance.

- **Ergonomic Design:** Crafted with your comfort in mind, the ergonomic shape fits naturally in your hand, reducing wrist strain during long hours of use. The textured grip ensures stability, making it ideal for both casual browsing and intensive gaming sessions.

- **High Precis

Prompt = "I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged."

In [11]:
# Run the same prompt 5 times with temperature=0.7
prompt = "I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged."
temperature = 0.7
num_runs = 10

print(f"Running the same prompt {num_runs} times with temperature={temperature}")
print(f"Prompt: '{prompt}'\n")
print("="*80)

responses = []
for i in range(num_runs):
    response = call_openai(prompt, temperature=temperature)
    responses.append(response)
    print(f"\nüîÑ Run #{i+1}:")
    print(response)
    print("-"*80)
    time.sleep(0.5)  # Small delay to avoid rate limiting

print("\n" + "="*80)
print("üìä Analysis:")
print(f"Total unique responses: {len(set(responses))} out of {num_runs}")
print(f"Average length: {sum(len(r) for r in responses) / len(responses):.0f} characters")
print("\nüí° Key Insight: Even with the same prompt and parameters, we get different outputs!")
print("This is why prompt engineering is an empirical science - you must test and iterate.")

Running the same prompt 10 times with temperature=0.7
Prompt: 'I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged.'


üîÑ Run #1:
I'm sorry to hear that the packaging was damaged upon delivery of your order #12345. It's great to know that the delivery was fast, but damaged packaging can be concerning. Would you like assistance in addressing this issue, such as contacting customer service or filing a complaint?
--------------------------------------------------------------------------------

üîÑ Run #2:
I'm sorry to hear that the packaging of your order was damaged. While the delivery speed was satisfactory, it's important that items arrive in good condition. I recommend reaching out to the seller or customer service to report the issue with the damaged packaging. They may be able to offer a solution, such as a replacement or a refund, depending on their policy. If you have any specific questions or need assistance with the process, feel free to a

Step 5: Run Prompts 15 Times 

Prompt: "I love this product! It's exactly what I needed."

In [12]:
import pandas as pd

# --- Configuration ---
prompt = "I love this product! It's exactly what I needed."
system_message = "Extract the following information and return as JSON: order_id, order_date, delivery_speed, packaging_condition."
temperature = 0.7
num_runs = 15

responses_data = []

print(f"üöÄ Starting 15-run test for Data Extraction prompt...")
print(f"Prompt: '{prompt}'\n")
print("="*80)

# 1. Collection Loop with Auto-Evaluation
for i in range(num_runs):
    print(f"Executing Run {i+1}/{num_runs}...")
    response = call_openai(prompt, system_message=system_message, temperature=temperature)
    
    # Auto-evaluate: check if response contains expected JSON fields
    is_valid = all(field in response.lower() for field in ['order_id', 'order_date', 'delivery', 'packaging'])
    
    responses_data.append({
        "run_id": i + 1,
        "prompt": prompt,
        "response": response,
        "is_success": is_valid,
        "failure_pattern": "None (Success)" if is_valid else "Missing Fields"
    })
    
    print(f"Response: {response[:100]}...")
    print(f"Valid: {'‚úÖ' if is_valid else '‚ùå'}")
    print("-"*40)
    time.sleep(0.5)

# 2. Create DataFrame and Save
df = pd.DataFrame(responses_data)
df.to_json("failure_analysis.json", orient="records", indent=4)
df.to_csv("failure_analysis_table.csv", index=False)

print("\n" + "="*60)
print("üìä FINAL FAILURE ANALYSIS TABLE")
print("="*60)
print(df[['run_id', 'is_success', 'failure_pattern']])

# Summary
print("\n" + "="*60)
print("üìà SUMMARY STATISTICS")
print("="*60)
success_rate = df['is_success'].sum() / len(df) * 100
print(f"Total runs: {len(df)}")
print(f"Successes: {df['is_success'].sum()}")
print(f"Failures: {len(df) - df['is_success'].sum()}")
print(f"Success Rate: {success_rate:.1f}%")

# Failure pattern breakdown
print("\nüìã FAILURE PATTERNS BREAKDOWN:")
print(df['failure_pattern'].value_counts().to_string())

# Display full responses table
print("\n" + "="*60)
print("üìù FULL RESPONSES TABLE")
print("="*60)
pd.set_option('display.max_colwidth', 100)
print(df[['run_id', 'response', 'is_success', 'failure_pattern']].to_string())

print(f"\nüìÅ Results saved to failure_analysis.json and failure_analysis_table.csv")

# Also display as formatted DataFrame (for Jupyter)
df[['run_id', 'response', 'is_success', 'failure_pattern']]

üöÄ Starting 15-run test for Data Extraction prompt...
Prompt: 'I love this product! It's exactly what I needed.'

Executing Run 1/15...
Response: {
  "order_id": null,
  "order_date": null,
  "delivery_speed": null,
  "packaging_condition": null
...
Valid: ‚úÖ
----------------------------------------
Executing Run 2/15...
Response: {
  "order_id": null,
  "order_date": null,
  "delivery_speed": null,
  "packaging_condition": null
...
Valid: ‚úÖ
----------------------------------------
Executing Run 3/15...
Response: ```json
{
  "order_id": null,
  "order_date": null,
  "delivery_speed": null,
  "packaging_condition...
Valid: ‚úÖ
----------------------------------------
Executing Run 4/15...
Response: {
  "order_id": null,
  "order_date": null,
  "delivery_speed": null,
  "packaging_condition": null
...
Valid: ‚úÖ
----------------------------------------
Executing Run 5/15...
Response: {
  "order_id": null,
  "order_date": null,
  "delivery_speed": null,
  "packaging_condition": null


Unnamed: 0,run_id,response,is_success,failure_pattern
0,1,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
1,2,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
2,3,"```json\n{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_...",True,None (Success)
3,4,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
4,5,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
5,6,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
6,7,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
7,8,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
8,9,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
9,10,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)


Prompt = "Create a product description for a wireless mouse that costs $29.99."

In [13]:
import pandas as pd

# --- Configuration ---
prompt = "Create a product description for a wireless mouse that costs $29.99."
system_message = "Extract the following information and return as JSON: order_id, order_date, delivery_speed, packaging_condition."
temperature = 0.7
num_runs = 15

responses_data = []

print(f"üöÄ Starting 15-run test for Data Extraction prompt...")
print(f"Prompt: '{prompt}'\n")
print("="*80)

# 1. Collection Loop with Auto-Evaluation
for i in range(num_runs):
    print(f"Executing Run {i+1}/{num_runs}...")
    response = call_openai(prompt, system_message=system_message, temperature=temperature)
    
    # Auto-evaluate: check if response contains expected JSON fields
    is_valid = all(field in response.lower() for field in ['order_id', 'order_date', 'delivery', 'packaging'])
    
    responses_data.append({
        "run_id": i + 1,
        "prompt": prompt,
        "response": response,
        "is_success": is_valid,
        "failure_pattern": "None (Success)" if is_valid else "Missing Fields"
    })
    
    print(f"Response: {response[:100]}...")
    print(f"Valid: {'‚úÖ' if is_valid else '‚ùå'}")
    print("-"*40)
    time.sleep(0.5)

# 2. Create DataFrame and Save
df = pd.DataFrame(responses_data)
df.to_json("failure_analysis.json", orient="records", indent=4)
df.to_csv("failure_analysis_table.csv", index=False)

print("\n" + "="*60)
print("üìä FINAL FAILURE ANALYSIS TABLE")
print("="*60)
print(df[['run_id', 'is_success', 'failure_pattern']])

# Summary
print("\n" + "="*60)
print("üìà SUMMARY STATISTICS")
print("="*60)
success_rate = df['is_success'].sum() / len(df) * 100
print(f"Total runs: {len(df)}")
print(f"Successes: {df['is_success'].sum()}")
print(f"Failures: {len(df) - df['is_success'].sum()}")
print(f"Success Rate: {success_rate:.1f}%")

# Failure pattern breakdown
print("\nüìã FAILURE PATTERNS BREAKDOWN:")
print(df['failure_pattern'].value_counts().to_string())

# Display full responses table
print("\n" + "="*60)
print("üìù FULL RESPONSES TABLE")
print("="*60)
pd.set_option('display.max_colwidth', 100)
print(df[['run_id', 'response', 'is_success', 'failure_pattern']].to_string())

print(f"\nüìÅ Results saved to failure_analysis.json and failure_analysis_table.csv")

# Also display as formatted DataFrame (for Jupyter)
df[['run_id', 'response', 'is_success', 'failure_pattern']]

üöÄ Starting 15-run test for Data Extraction prompt...
Prompt: 'Create a product description for a wireless mouse that costs $29.99.'

Executing Run 1/15...
Response: {
  "order_id": null,
  "order_date": null,
  "delivery_speed": null,
  "packaging_condition": null
...
Valid: ‚úÖ
----------------------------------------
Executing Run 2/15...
Response: {
  "order_id": null,
  "order_date": null,
  "delivery_speed": null,
  "packaging_condition": null
...
Valid: ‚úÖ
----------------------------------------
Executing Run 3/15...
Response: {
  "order_id": "123456",
  "order_date": "2023-10-01",
  "delivery_speed": "Standard",
  "packaging...
Valid: ‚úÖ
----------------------------------------
Executing Run 4/15...
Response: {
  "order_id": null,
  "order_date": null,
  "delivery_speed": null,
  "packaging_condition": null
...
Valid: ‚úÖ
----------------------------------------
Executing Run 5/15...
Response: ```json
{
  "order_id": "123456",
  "order_date": "2023-10-15",
  "delivery_spee

Unnamed: 0,run_id,response,is_success,failure_pattern
0,1,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
1,2,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
2,3,"{\n ""order_id"": ""123456"",\n ""order_date"": ""2023-10-01"",\n ""delivery_speed"": ""Standard"",\n ""p...",True,None (Success)
3,4,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
4,5,"```json\n{\n ""order_id"": ""123456"",\n ""order_date"": ""2023-10-15"",\n ""delivery_speed"": ""Standar...",True,None (Success)
5,6,"{\n ""order_id"": ""123456"",\n ""order_date"": ""2023-10-01"",\n ""delivery_speed"": ""Standard"",\n ""p...",True,None (Success)
6,7,"{\n ""order_id"": ""123456"",\n ""order_date"": ""2023-10-01"",\n ""delivery_speed"": ""Standard Shippin...",True,None (Success)
7,8,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
8,9,"{\n ""order_id"": null,\n ""order_date"": null,\n ""delivery_speed"": null,\n ""packaging_condition...",True,None (Success)
9,10,"{\n ""order_id"": ""123456"",\n ""order_date"": ""2023-10-15"",\n ""delivery_speed"": ""Standard Shippin...",True,None (Success)


Prompt = "I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged."

In [14]:
import pandas as pd

# --- Configuration ---
prompt = "I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged."
system_message = "Extract the following information and return as JSON: order_id, order_date, delivery_speed, packaging_condition."
temperature = 0.7
num_runs = 15

responses_data = []

print(f"üöÄ Starting 15-run test for Data Extraction prompt...")
print(f"Prompt: '{prompt}'\n")
print("="*80)

# 1. Collection Loop with Auto-Evaluation
for i in range(num_runs):
    print(f"Executing Run {i+1}/{num_runs}...")
    response = call_openai(prompt, system_message=system_message, temperature=temperature)
    
    # Auto-evaluate: check if response contains expected JSON fields
    is_valid = all(field in response.lower() for field in ['order_id', 'order_date', 'delivery', 'packaging'])
    
    responses_data.append({
        "run_id": i + 1,
        "prompt": prompt,
        "response": response,
        "is_success": is_valid,
        "failure_pattern": "None (Success)" if is_valid else "Missing Fields"
    })
    
    print(f"Response: {response[:100]}...")
    print(f"Valid: {'‚úÖ' if is_valid else '‚ùå'}")
    print("-"*40)
    time.sleep(0.5)

# 2. Create DataFrame and Save
df = pd.DataFrame(responses_data)
df.to_json("failure_analysis.json", orient="records", indent=4)
df.to_csv("failure_analysis_table.csv", index=False)

print("\n" + "="*60)
print("üìä FINAL FAILURE ANALYSIS TABLE")
print("="*60)
print(df[['run_id', 'is_success', 'failure_pattern']])

# Summary
print("\n" + "="*60)
print("üìà SUMMARY STATISTICS")
print("="*60)
success_rate = df['is_success'].sum() / len(df) * 100
print(f"Total runs: {len(df)}")
print(f"Successes: {df['is_success'].sum()}")
print(f"Failures: {len(df) - df['is_success'].sum()}")
print(f"Success Rate: {success_rate:.1f}%")

# Failure pattern breakdown
print("\nüìã FAILURE PATTERNS BREAKDOWN:")
print(df['failure_pattern'].value_counts().to_string())

# Display full responses table
print("\n" + "="*60)
print("üìù FULL RESPONSES TABLE")
print("="*60)
pd.set_option('display.max_colwidth', 100)
print(df[['run_id', 'response', 'is_success', 'failure_pattern']].to_string())

print(f"\nüìÅ Results saved to failure_analysis.json and failure_analysis_table.csv")

# Also display as formatted DataFrame (for Jupyter)
df[['run_id', 'response', 'is_success', 'failure_pattern']]

üöÄ Starting 15-run test for Data Extraction prompt...
Prompt: 'I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged.'

Executing Run 1/15...
Response: ```json
{
  "order_id": "12345",
  "order_date": "March 15th",
  "delivery_speed": "fast",
  "packag...
Valid: ‚úÖ
----------------------------------------
Executing Run 2/15...
Response: ```json
{
  "order_id": "12345",
  "order_date": "March 15th",
  "delivery_speed": "fast",
  "packag...
Valid: ‚úÖ
----------------------------------------
Executing Run 3/15...
Response: ```json
{
  "order_id": "12345",
  "order_date": "March 15th",
  "delivery_speed": "fast",
  "packag...
Valid: ‚úÖ
----------------------------------------
Executing Run 4/15...
Response: ```json
{
  "order_id": "12345",
  "order_date": "2023-03-15",
  "delivery_speed": "fast",
  "packag...
Valid: ‚úÖ
----------------------------------------
Executing Run 5/15...
Response: ```json
{
  "order_id": "12345",
  "order_date": "March 15t

Unnamed: 0,run_id,response,is_success,failure_pattern
0,1,"```json\n{\n ""order_id"": ""12345"",\n ""order_date"": ""March 15th"",\n ""delivery_speed"": ""fast"",\n...",True,None (Success)
1,2,"```json\n{\n ""order_id"": ""12345"",\n ""order_date"": ""March 15th"",\n ""delivery_speed"": ""fast"",\n...",True,None (Success)
2,3,"```json\n{\n ""order_id"": ""12345"",\n ""order_date"": ""March 15th"",\n ""delivery_speed"": ""fast"",\n...",True,None (Success)
3,4,"```json\n{\n ""order_id"": ""12345"",\n ""order_date"": ""2023-03-15"",\n ""delivery_speed"": ""fast"",\n...",True,None (Success)
4,5,"```json\n{\n ""order_id"": ""12345"",\n ""order_date"": ""March 15th"",\n ""delivery_speed"": ""fast"",\n...",True,None (Success)
5,6,"```json\n{\n ""order_id"": ""12345"",\n ""order_date"": ""2023-03-15"",\n ""delivery_speed"": ""fast"",\n...",True,None (Success)
6,7,"```json\n{\n ""order_id"": ""12345"",\n ""order_date"": ""March 15th"",\n ""delivery_speed"": ""fast"",\n...",True,None (Success)
7,8,"```json\n{\n ""order_id"": ""12345"",\n ""order_date"": ""March 15th"",\n ""delivery_speed"": ""fast"",\n...",True,None (Success)
8,9,"```json\n{\n ""order_id"": ""12345"",\n ""order_date"": ""March 15th"",\n ""delivery_speed"": ""fast"",\n...",True,None (Success)
9,10,"```json\n{\n ""order_id"": ""12345"",\n ""order_date"": ""2023-03-15"",\n ""delivery_speed"": ""fast"",\n...",True,None (Success)


Part 3: Iteration 1 - Rewriting Simple Prompts

Step 6: Improve Sentiment Analysis Prompt

In [15]:
# Enhanced sentiment analysis prompt with clear instructions and constraints
sentiment_prompt_v2 = """
Task: Classify the sentiment of the provided customer message.

Instructions:
- Analyze the tone and intent of the text.
- Use exactly one word for the classification.
- Choose from these labels: [Positive, Neutral, Negative].

Output Format: Provide only the label as a single word. No punctuation or explanation.

Customer Message: "I love this product! It's exactly what I needed."
"""

# Test the improved prompt
result = call_openai(sentiment_prompt_v2)
print("Sentiment Analysis Result:")
print(result)

Sentiment Analysis Result:
Positive


Step 7: Improve Product Description Prompt

In [16]:
# Enhanced product description prompt with structure and style guidelines
product_prompt_v2 = """
Task: Write a compelling product description for a $29.99 wireless mouse.

Structure Requirements:
1. Catchy Headline: A short, punchy title.
2. Value Proposition: A 2-sentence paragraph focusing on comfort and value.
3. Feature Bullets: Exactly 3 bullet points highlighting technical specs (e.g., battery life, DPI, connectivity).

Style Guidelines:
- Tone: Professional yet approachable (avoid "corporate speak").
- Target Audience: Students and remote office workers.
- Length: Total word count must be under 100 words.

Constraints: 
- Do not mention specific competitor brand names.
- Focus on the "ergonomic feel" as a primary selling point.
"""

# Test the structured prompt
result = call_openai(product_prompt_v2)
print("Product Description Result:")
print(result)

Product Description Result:
### Glide into Comfort: The Perfect Wireless Mouse

Experience the ultimate in comfort and convenience with our wireless mouse, designed for students and remote workers who value productivity. At just $29.99, this sleek accessory combines ergonomic design with exceptional performance, making long hours at your desk feel effortless.

- **Battery Life:** Enjoy up to 12 months of uninterrupted use on a single AA battery.
- **DPI Settings:** Customize your experience with adjustable DPI settings up to 2400 for precision tracking.
- **Connectivity:** Effortlessly connect via a reliable 2.4GHz wireless connection for a clutter-free workspace.


Step 8: Improve Data Extraction Prompt

In [18]:
# Enhanced data extraction prompt with strict formatting and schema
extraction_prompt_v2 = """
Task: Extract specific data points from the provided customer feedback into a structured format.

Fields to Extract:
1. Order ID: The numerical identifier for the purchase.
2. Date: The date of the order (formatted as YYYY-MM-DD).
3. Sentiment: Overall sentiment (Positive/Neutral/Negative).
4. Reported Issues: List any problems mentioned (e.g., shipping, product quality).

Output Format: 
Return the data as a clean Python Dictionary. Do not include any introductory text, pleasantries, or markdown blocks (like ```python). 

Customer Feedback: "I ordered item #12345 on March 15th. The delivery was fast but the packaging was damaged."
"""

# Test the structured extraction
result = call_openai(extraction_prompt_v2)
print("Data Extraction Result:")
print(result)

Data Extraction Result:
{
    "Order ID": 12345,
    "Date": "2023-03-15",
    "Sentiment": "Neutral",
    "Reported Issues": ["packaging was damaged"]
}


Compare consistency

Compare consistency

In [25]:
# Generate multiple responses to evaluate
prompt = """
Create a product description for a wireless mouse that costs $29.99.
"""

responses_to_judge1 = []
for i in range(3):
    response = call_openai(prompt, temperature=0.8)
    responses_to_judge1.append(response)
    print(f"\nResponse {i+1}:")
    print(response)
    print("-"*80)
    time.sleep(0.5)


Response 1:
**Product Name:** SwiftClick Wireless Mouse

**Price:** $29.99

**Product Description:**

Elevate your computing experience with the SwiftClick Wireless Mouse, designed for seamless performance and ultimate comfort. Whether you're working on important projects, gaming, or browsing the web, this mouse delivers precision and responsiveness at your fingertips.

**Key Features:**

- **Ergonomic Design:** The SwiftClick features a sleek, contoured shape that fits comfortably in your hand, reducing strain during extended use. Its lightweight design allows for easy portability, making it the perfect companion for both home and on-the-go.

- **Reliable Wireless Connection:** Say goodbye to tangled cords and enjoy the freedom of wireless technology. The SwiftClick utilizes advanced 2.4GHz wireless technology for a stable connection with a range of up to 33 feet, ensuring you stay connected without interruptions.

- **High Precision Tracking:** Equipped with an optical sensor, this 

LLM as judge

In [26]:
# Use LLM as judge to evaluate and rank them
judge_prompt = f"""Evaluate these 3 product descriptions. Rate each on:
1. Clarity (1-10)
2. Accuracy (1-10)
3. Accessibility for beginners (1-10)

Response 1: {responses_to_judge1[0]}

Response 2: {responses_to_judge1[1]}

Response 3: {responses_to_judge1[2]}

Provide scores for each response and recommend the best one. Format your response clearly."""

judgment = call_openai(judge_prompt, temperature=0)
display_response("LLM Judge Evaluation", judgment)



üìù LLM Judge Evaluation
Here are the evaluations for each product description:

### Response 1: SwiftClick Wireless Mouse
1. **Clarity:** 9  
   - The description is clear and well-structured, making it easy to understand the product's features and benefits.
   
2. **Accuracy:** 9  
   - The features described are typical for a wireless mouse, and the specifications seem accurate based on common industry standards.
   
3. **Accessibility for beginners:** 8  
   - The language is straightforward, but some technical terms (like DPI settings) may require a bit of prior knowledge for complete beginners.

### Response 2: Wireless Comfort Mouse
1. **Clarity:** 8  
   - The description is mostly clear, but it could be slightly more concise in some areas. The flow is good, but some sentences are a bit lengthy.
   
2. **Accuracy:** 8  
   - The features are accurate and relevant, but the term "advanced optical sensor" could be more specific to enhance understanding.
   
3. **Accessibility fo

In [27]:
# Generate multiple responses to evaluate
prompt = """
Task: Write a compelling product description for a $29.99 wireless mouse.

Structure Requirements:
1. Catchy Headline: A short, punchy title.
2. Value Proposition: A 2-sentence paragraph focusing on comfort and value.
3. Feature Bullets: Exactly 3 bullet points highlighting technical specs (e.g., battery life, DPI, connectivity).

Style Guidelines:
- Tone: Professional yet approachable (avoid "corporate speak").
- Target Audience: Students and remote office workers.
- Length: Total word count must be under 100 words.

Constraints: 
- Do not mention specific competitor brand names.
- Focus on the "ergonomic feel" as a primary selling point.
"""

responses_to_judge = []
for i in range(3):
    response = call_openai(prompt, temperature=0.8)
    responses_to_judge.append(response)
    print(f"\nResponse {i+1}:")
    print(response)
    print("-"*80)
    time.sleep(0.5)


Response 1:
### Glide into Comfort: Your Perfect Wireless Mouse

Experience ultimate comfort and seamless productivity with our wireless mouse, designed for students and remote workers alike. At just $29.99, it combines ergonomic design with exceptional value, making long hours at your desk a breeze.

- **Long-lasting Battery**: Enjoy up to 12 months of usage on a single AA battery, keeping you focused on your tasks.
- **Precision Control**: With adjustable DPI settings up to 2400, customize your tracking speed for any project.
- **Effortless Connectivity**: Connect easily via 2.4GHz wireless technology for a reliable and interference-free experience.
--------------------------------------------------------------------------------

Response 2:
### Elevate Your Productivity with Comfort!

Discover the perfect blend of comfort and functionality with our wireless mouse, designed for students and remote workers alike. Enjoy seamless control and an ergonomic feel that keeps your hand happy

Use LLM as judge

In [28]:
# Use LLM as judge to evaluate and rank them
judge_prompt = f"""Evaluate these 3 product descriptions. Rate each on:
1. Clarity (1-10)
2. Accuracy (1-10)
3. Accessibility for beginners (1-10)

Response 1: {responses_to_judge[0]}

Response 2: {responses_to_judge[1]}

Response 3: {responses_to_judge[2]}

Provide scores for each response and recommend the best one. Format your response clearly."""

judgment = call_openai(judge_prompt, temperature=0)
display_response("LLM Judge Evaluation", judgment)



üìù LLM Judge Evaluation
Here are the evaluations for each product description:

### Response 1: Glide into Comfort: Your Perfect Wireless Mouse
1. **Clarity**: 9  
   - The description is clear and straightforward, effectively communicating the product's features and benefits.
   
2. **Accuracy**: 10  
   - The information provided appears accurate and aligns with common features of wireless mice.
   
3. **Accessibility for beginners**: 8  
   - The language is mostly accessible, but terms like "DPI" may require some explanation for complete beginners.

### Response 2: Elevate Your Productivity with Comfort!
1. **Clarity**: 8  
   - The description is clear, but the phrase "perfect blend of comfort and functionality" could be more specific about what that entails.
   
2. **Accuracy**: 10  
   - The features listed are accurate and relevant to the product.
   
3. **Accessibility for beginners**: 9  
   - The language is simple and easy to understand, making it accessible for beginner

Part 4: Iteration 2 - Adding Structure & Constraints

Step 9: Add Few-Shot Examples to Sentiment Analysis