<a href="https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/rag/SwigMenu_Playwright_OpenAI_MongoDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![View Article](https://img.shields.io/badge/View%20Article-blue)](https://www.mongodb.com/developer/products/atlas/playwright-structured-outputs-atlas-search/)

## Overview

In this tutorial we are going to scrape the popular Utah "dirty" soda website, Swig, using Playwright, then we are going to feed in our drinks into OpenAI using a prompt and their structured outputs to understand which drinks from their menu are best for various seasons with reasonings, and then save this information into MongoDB Atlas so we can use Atlas Search to find specific drinks based on the fall season and ingredients we are craving.

## Part 1: Scrape all menu items from Swig website

Let's first scrape all our menu items from the Swig website. We need to install Playwright and then build out our function.

In [None]:
!pip install playwright
!playwright install

Collecting playwright
  Downloading playwright-1.47.0-py3-none-manylinux1_x86_64.whl.metadata (3.5 kB)
Collecting greenlet==3.0.3 (from playwright)
  Downloading greenlet-3.0.3-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (3.8 kB)
Collecting pyee==12.0.0 (from playwright)
  Downloading pyee-12.0.0-py3-none-any.whl.metadata (2.8 kB)
Downloading playwright-1.47.0-py3-none-manylinux1_x86_64.whl (38.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.1/38.1 MB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading greenlet-3.0.3-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (616 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m616.0/616.0 kB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyee-12.0.0-py3-none-any.whl (14 kB)
Installing collected packages: pyee, greenlet, playwright
  Attempting uninstall: greenlet
    Found existing installation: greenlet 3.1.1
    Uninstalling greenlet-3.1.1:
  

We have to use async since we are using Google Colab. If you're
not using a notebook you can use sync instead. Please refer to the article written to understand where our selectors came from.

In [None]:
from playwright.async_api import async_playwright

We are using the URL that is inside of the websites iframe, and we are using selectors to make sure we are waiting for the information we want to load. We want to grab the name of each menu item along with its description. Please refer to the written article to understand this function better if necessary!

In [None]:
async def swigScraper():
    async with async_playwright() as playwright:
        # use headless mode since we are using Colab
        browser = await playwright.chromium.launch(headless=True)
        page = await browser.new_page()

        # make sure to have the correct URL
        await page.goto("https://swig-orders.crispnow.com/tabs/locations/menu")

        # let page load
        await page.wait_for_selector(
            "ion-card-content", state="attached", timeout=60000
        )

        # ion-card-content has all of our names and descriptions
        items = await page.query_selector_all("ion-card-content")

        menu = []

        # loop through the html and take what we need
        for item in items:
            result = {}

            name = await item.query_selector("p.text-h3")
            description = await item.query_selector("p.text-b2")

            # just get the inner text
            if name and description:
                result = {}
                result["name"] = await name.inner_text()
                result["description"] = await description.inner_text()
                menu.append(result)

        for item in menu:
            print(f"Name: {item['name']}, Description: {item['description']}")

        await browser.close()
        return menu


scraped_menu = await swigScraper()

print(scraped_menu)

Name: , Description: 
Name: Soda, Description: Create Your Own Dirty Soda:
Soda + Flavors, Fruits, & Creams
Name: DDD, Description: Diet Dr Pepper + Coconut (25 - 70 Calories)
Name: Dirty Dr Pepper, Description: Dr Pepper + Coconut (120 - 440 Calories)
Name: Dirty S.O.P, Description: Dr Pepper + Coconut + Peach (120 - 440 Calories)
Name: Dr Spice, Description: Dr Pepper + Cinnamon + Coconut + Cinnamon Stick + Half & Half (140 - 490 Calories)
Name: Life's a Peach, Description: Dr Pepper + Vanilla + Peach + Half & Half (130 - 480 Calories)
Name: Naughty & Nice, Description: Dr Pepper + English Toffee + Half & Half (130 - 470 Calories)
Name: Princess Peach, Description: Dr Pepper + Peach + Coconut Cream (140 - 510 Calories)
Name: Raspberry Dream, Description: Dr Pepper + Raspberry Puree + Coconut Cream (150 - 550 Calories)
Name: Save Me Jade, Description: Diet Dr Pepper + Sugar Free Vanilla + Sugar Free Coconut (0  Calories)
Name: Spring Fling, Description: Dr Pepper + Vanilla + Strawberr

Now that we have all of our menu options, let's use OpenAI to tell us which drinks are best for fall based on their descriptions.

## Step 2: OpenAI Structured Schema Outputs
Please refer to the documentation to understand OpenAI's structured schema outputs. We want to emulate the section where they are extracting structured data from unstructured data.


In [None]:
!pip install openai

Collecting openai
  Downloading openai-1.48.0-py3-none-any.whl.metadata (24 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.48.0-py3-none-any.whl (376 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m376.1/376.1 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K   [90m━

In [None]:
import getpass
import json

import openai

In [None]:
# put in your OpenAI API key here
openai_api_key = getpass.getpass(prompt="Put in OpenAI API Key here")

Put in OpenAI API Key here··········


Here we are formatting our menu from when we scraped it, putting everything into a single string for OpenAI to understand, and then creating a prompt helping our model understand what we are hoping to achieve.

In [None]:
def swigJoined(scraped_menu):
    drink_list = []

    # just formatting our menu from above
    for drink in scraped_menu:
        drink_format = f"{drink['name']}: {drink['description']}]"
        drink_list.append(drink_format)

    # put all the drinks into a single string for OpenAI to understand it
    drink_string = "\n".join(drink_list)

    # we have to tell OpenAI which drinks/combinations are available
    prompt = (
        "You are the best soda mixologist Utah has ever seen! This is a list of sodas and their descriptions, or ingredients:\n"
        f"{drink_string}\n\n Please sort each and every drink provided into spring, summer, fall, or winter seasons based on their ingredients\n"
        "and give me reasonings as to why by stating which ingredients make it best for each season. For example, cinnamon is more fall, but peach\n"
        "is more summer."
    )

    return prompt

In [None]:
# generate our prompt using the menu we scraped
my_prompt = swigJoined(scraped_menu)

openai.api_key = openai_api_key

In [None]:
# now we are doing our structured call (taken from the documentation)
response = openai.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {
            "role": "system",
            "content": "You are the best soda mixologist Utah has ever seen!",
        },
        {"role": "user", "content": my_prompt},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "drink_response",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "seasonal_drinks": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "drink": {"type": "string"},
                                "reason": {"type": "string"},
                            },
                            "required": ["drink", "reason"],
                            "additionalProperties": False,
                        },
                    }
                },
                "required": ["seasonal_drinks"],
                "additionalProperties": False,
            },
        },
    },
)

Let's check and see our full response and see if it's structured the way we want.

In [None]:
# full response
print(json.dumps(response.model_dump(), indent=2))

{
  "id": "chatcmpl-ABj64ekfxsyu1n7kY1q73GK3reMFT",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "{\"seasonal_drinks\":[{\"drink\":\"Dirty S.O.P: Dr Pepper + Coconut + Peach\",\"reason\":\"The inclusion of peach makes this drink more suited for summer, as peach is typically associated with warm weather and summer harvests.\"},{\"drink\":\"Dr Spice: Dr Pepper + Cinnamon + Coconut + Cinnamon Stick + Half & Half\",\"reason\":\"Cinnamon and cinnamon stick are warm spices typically associated with fall and winter, making this drink best suited for chillier weather.\"},{\"drink\":\"Life's a Peach: Dr Pepper + Vanilla + Peach + Half & Half\",\"reason\":\"The peach flavor suggests a summer drink, as peach is a classic summer fruit. The vanilla adds a creamy note that works well in warmer temperatures.\"},{\"drink\":\"Naughty & Nice: Dr Pepper + English Toffee + Half & Half\",\"reason\":\"The English toffee b

It is structured nicely, but all our fall drinks with their reasonings are under the "content" part. Let's open this up so we can better read it.

In [None]:
# content only response
content = response.model_dump()["choices"][0]["message"]["content"]
print(content)

{"seasonal_drinks":[{"drink":"Dirty S.O.P: Dr Pepper + Coconut + Peach","reason":"The inclusion of peach makes this drink more suited for summer, as peach is typically associated with warm weather and summer harvests."},{"drink":"Dr Spice: Dr Pepper + Cinnamon + Coconut + Cinnamon Stick + Half & Half","reason":"Cinnamon and cinnamon stick are warm spices typically associated with fall and winter, making this drink best suited for chillier weather."},{"drink":"Life's a Peach: Dr Pepper + Vanilla + Peach + Half & Half","reason":"The peach flavor suggests a summer drink, as peach is a classic summer fruit. The vanilla adds a creamy note that works well in warmer temperatures."},{"drink":"Naughty & Nice: Dr Pepper + English Toffee + Half & Half","reason":"The English toffee brings a rich, dessert-like quality suitable for winter, when people tend to crave warmer, indulgent flavors."},{"drink":"Princess Peach: Dr Pepper + Peach + Coconut Cream","reason":"The tropical flavors of peach and co

So it's still in one line. Let's print them out nicely for better readability and so when we input it into MongoDB Atlas everything is in different documents.

In [None]:
# print the drinks out nicely for Atlas
parsed_drinks = json.loads(content)
seasonal_drinks_pretty = parsed_drinks["seasonal_drinks"]
print(json.dumps(seasonal_drinks_pretty, indent=2))

[
  {
    "drink": "Dirty S.O.P: Dr Pepper + Coconut + Peach",
    "reason": "The inclusion of peach makes this drink more suited for summer, as peach is typically associated with warm weather and summer harvests."
  },
  {
    "drink": "Dr Spice: Dr Pepper + Cinnamon + Coconut + Cinnamon Stick + Half & Half",
    "reason": "Cinnamon and cinnamon stick are warm spices typically associated with fall and winter, making this drink best suited for chillier weather."
  },
  {
    "drink": "Life's a Peach: Dr Pepper + Vanilla + Peach + Half & Half",
    "reason": "The peach flavor suggests a summer drink, as peach is a classic summer fruit. The vanilla adds a creamy note that works well in warmer temperatures."
  },
  {
    "drink": "Naughty & Nice: Dr Pepper + English Toffee + Half & Half",
    "reason": "The English toffee brings a rich, dessert-like quality suitable for winter, when people tend to crave warmer, indulgent flavors."
  },
  {
    "drink": "Princess Peach: Dr Pepper + Peach +

Now that our drinks with their reasonings are printed out nicely, let's upload them into MongoDB Atlas so we can use Atlas Search and take a look at drinks based off their ingredients!

## Step 3: Store into MongoDB and use Atlas Search

For this section a MongoDB Atlas cluster is required. Please make sure you have your connection string saved somewhere safe.

First install PyMongo to make things easier for ourselves.

In [None]:
!pip install pymongo

Collecting pymongo
  Downloading pymongo-4.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.6.1-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.6.1-py3-none-any.whl (307 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.7/307.7 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.6.1 pymongo-4.9.1


Set up our MongoDB connection, name your database and collection, and insert your documents into your cluster.

In [None]:
from pymongo import MongoClient

# set up your MongoDB connection
connection_string = getpass.getpass(
    prompt="Enter connection string WITH USER + PASS here"
)
client = MongoClient(connection_string, appname="devrel.showcase.swig_menu")


# name your database and collection anything you want since it will be created when you enter your data
database = client["swig_menu"]
collection = database["seasonal_drinks"]

# insert our fall drinks
collection.insert_many(seasonal_drinks_pretty)

Enter connection string WITH USER + PASS here··········


InsertManyResult([ObjectId('66f567a76d78f892e158abed'), ObjectId('66f567a76d78f892e158abee'), ObjectId('66f567a76d78f892e158abef'), ObjectId('66f567a76d78f892e158abf0'), ObjectId('66f567a76d78f892e158abf1'), ObjectId('66f567a76d78f892e158abf2'), ObjectId('66f567a76d78f892e158abf3'), ObjectId('66f567a76d78f892e158abf4'), ObjectId('66f567a76d78f892e158abf5'), ObjectId('66f567a76d78f892e158abf6'), ObjectId('66f567a76d78f892e158abf7'), ObjectId('66f567a76d78f892e158abf8'), ObjectId('66f567a76d78f892e158abf9'), ObjectId('66f567a76d78f892e158abfa'), ObjectId('66f567a76d78f892e158abfb'), ObjectId('66f567a76d78f892e158abfc'), ObjectId('66f567a76d78f892e158abfd'), ObjectId('66f567a76d78f892e158abfe'), ObjectId('66f567a76d78f892e158abff'), ObjectId('66f567a76d78f892e158ac00'), ObjectId('66f567a76d78f892e158ac01'), ObjectId('66f567a76d78f892e158ac02'), ObjectId('66f567a76d78f892e158ac03'), ObjectId('66f567a76d78f892e158ac04'), ObjectId('66f567a76d78f892e158ac05'), ObjectId('66f567a76d78f892e158ac

Create an Atlas Search index on your collection
and create an aggregation pipeline. We are using the operator $search.

Do NOT run this part in your notebook. This is done in the Atlas UI.



This finds drinks that have "fall" in them

In [None]:
{"text": {"query": "fall", "path": "reason"}}

This finds drinks that are fall AND have apple as an ingredient

In [None]:
{
    "compound": {
        "must": [
            {"text": {"query": "fall", "path": "reason"}},
            {"text": {"query": "apple", "path": "reason"}},
        ],
    }
}

Now you can find drinks that are fall themed that are specific to any ingredients you want!