## A-B Testing using an LLM

### Setup

In [1]:
from openai import OpenAI
import os
import base64
import requests

from collections import Counter

from IPython.display import display, Markdown, Image


from dotenv import load_dotenv
# Load API key
_ = load_dotenv()




In [2]:

def query_LLM(messages=None, prompt=None, model='gpt-4o'):

    if messages is None and prompt:
        messages = [{ "role": "user", "content": prompt }]
    elif messages is None:
        return None
        
    completion = client.chat.completions.create(
      model=model,
      messages=messages
    )
    
    response = completion.choices[0].message.content

    return response

In [9]:
client = OpenAI()

# Product Websites 

* **We used ChatGPT 4o to generate the product webpage markdown file.**

*  **Prompt**
> Generate a product webpage for a new product called [NAME]. [DESCRIPTION]. Include a description and features and testimonials.

* **Product Name & Descriptions**
    1. ProductA
    2. ProductB

It doesn’t have to be a full working website or even a web page… you can use a notebook and Markdown.
But the goal is to have all the components for a product marketing webpage.
Product Name
Logo
Slogan/descriptives
Features
Maybe reviews or endorsements etc.

In [3]:
productA = open('productA.md').read()
productB = open('productB.md').read()

In [4]:
display(Markdown(productA))



In [5]:
display(Markdown(productB))



# A-B Testing

## (1) Compare Descriptions 

In [13]:
system_prompt = "You are consumer who is in need of an easy way to prepare your meals that is affordable, creative, and customizable."

rating_prompt = f'''
   Read the websites for each of these two products
   1. HarmonyBites
   2. Bytes

   {productA}

   ----

   {productB}

   ----
   
   Pay careful attention to the features, descriptions and user reviews.

   Rate each product on scale of 0 (never buy) to 10 (buy immediately).
   Show just your ratings and which product you prefer.
   Then briefly explain your ratings
'''

In [15]:
consumers = [
    "cyclist, male, 50s, with low vision",
    "cyclist, female, 30s",
    "non-cyclist, male, 50s",
    "non-cyclist, female, 30s"
]

In [16]:
for consumer in consumers:
    print(f"\n-----\n{consumer}")

    messages = [
       {"role": "system", "content": system_prompt.format(description=consumer)},
       {"role": "system", "content": rating_prompt}
    ]

    response = query_LLM(messages=messages)

    print(response,'\n')


-----
cyclist, male, 50s, with low vision
1. NeverHitATree: 7
2. TreeAlert AR: 8

I prefer the TreeAlert AR.

Explanation:

- NeverHitATree: This product provides basic alert functionality for avoiding trees and other large obstacles while cycling. It has decent user reviews but lacks advanced features like augmented reality integration, which may limit its efficacy for someone with low vision.

- TreeAlert AR: This product incorporates augmented reality features, providing a more immersive and intuitive navigation experience. It has strong user reviews, indicating its usefulness in real-life cycling scenarios. The augmented reality technology is particularly beneficial for individuals with low vision by offering clearer visual cues. 


-----
cyclist, female, 30s
**NeverHitATree: 7/10**  
**TreeAlert AR: 9/10**

**Preferred Product: TreeAlert AR**

**Explanation:**
1. **NeverHitATree: 7/10**
   - **Features:** Focuses on using sensors to detect trees and potentially hazardous obstacle