# Sommelier Agent Demo Setup

To simplify the content of the demo, this has been split into "setup" and "demo" notebooks. This is the Setup notebook.

> **Note:** This demo has been created using Python 3.11.4.

## Setup - Third Party Accounts

### OpenAI API Account
You will need to have an API account with OpenAI created, see [https://platform.openai.com/](https://platform.openai.com/). From here,
you will need to have created an [API key](https://platform.openai.com/account/api-keys).

### Astra Database
You should have created an Astra Database with Vector Search. If you are unfamiliar with Astra, it would be suggested to follow the
"Getting Started with Vector Search" quick-start guide, including getting a working Python application.

To proceed with this exercise, you'll need:

1. Your Secure Connect Bundle (`.zip` file)
2. Your Client ID
3. Your Client Secret
4. A keyspace named `vsearch` (or whatever you'd like to call it, you can set in `.env` file later)

## Setup - File Downloads

The following files are in [WineAgent.zip](https://drive.google.com/file/d/1wWganXifTIxgPF-7b6fW0fQm36FAMcqn/view?usp=sharing), which 
you should download and unzip locally into the root of this project directory.

### `wine_data.db` : Wine Tasting Notes Download

Wine Spectator publishes [Daily Picks](https://www.winespectator.com/dailypicks), in which tasting notes of a recent review are highlighted
in one of three different price categories.As this is nicely structured, it can serve as a simulation for what a restaurant or other 
food-related retailer might be able to access from their own wine catalogue.

Using [download.py](download.py), a number of reviews have been downloaded from Wine Spectator and downloaded into a sqlite3 database. 
Within this database is the raw HTML response text.

**Please do not run `download.py`**: this data has already been downloaded, and there is no need to request it from winespectator.com 
directly. You can view the content with any sqlite3 browser.

### `wines.parquet` : Scraped HTML
Using [scrape.py](scrape.py), the data within `wine_data.db` is parsed into multiple records and written into a Parquet file `wines.parquet`. 
For example, the following Wine Spectator review:

![Wine Review](review-snapshot.png)

contains a winery name (Bodegas Emilio Moro), a wine name (Tempranillo Ribera Del Duero Finca Resalso 2021), a review date (Jun. 19, 2023),
a price ($19), a score (89 points), the reviewer (Alison Napjus), and the tasting notes along with an indication of the drinkability
of the wine and relative rarity.

### `wines-embeddings.parquet` : OpenAI Embeddings
Using [embed.py](embed.py), the parsed entries in `wines.parquet` are embedded with OpenAI's `text-embedding-ada-002` text embedding 
model. These are saved in another Parquet file `wines-embeddings.parquet`. You can certainly run `embed.py` if you'd like to pay
to have these re-generated, or if you would like to use a different embedding model.

## Create a `.env` File

The example code uses `dot_env` to load variables into the environment. This file reside in the root of the project and should contain:
```
OPENAI_API_KEY="<Your OpenAI API key>"
ASTRA_SECUREBUNDLE_PATH="<your/path/to/secure-connect-database.zip>"
ASTRA_CLIENT_ID="<Your Client ID"
ASTRA_CLIENT_SECRET="<Your Client Secret>"
ASTRA_KEYSPACE="vsearch"
ASTRA_TABLE="winespectator"
```
Note that the secure connect bundle path is relative to the root of this project directory. The keyspace and table names should be amended to match your keyspace and table names.

## Python Module Installs
Install the following Python modules and versions for this project:

In [None]:
%pip install -qU \
    "cassandra-driver>=3.28.0" \
    "openai==0.27.7" \
    "tiktoken==0.4.0" \
    "langchain>=0.0.218" \
    "cassio==0.0.7" \
    "python-dotenv" \
    pandas \
    pyarrow \
    tqdm \
    ipywidgets \
    bs4 \
    streamlit

## Environment Validation
This code validates the environment is ready to go: it can connect to OpenAI API as well as Astra.

### Verify `.env` file is set up

In [None]:
import os
from dotenv import load_dotenv
if not load_dotenv('.env',override=True):
    raise Exception("Couldn't load .env file")

envVars = ['OPENAI_API_KEY','ASTRA_SECUREBUNDLE_PATH','ASTRA_CLIENT_ID','ASTRA_CLIENT_SECRET','ASTRA_KEYSPACE', 'ASTRA_TABLE']
missing = []

for var in envVars:
    if var not in os.environ:
        missing.append(var)

if missing:
    raise EnvironmentError(f'These environment variables are missing: {missing}')

### Verify OpenAI API Key

In [None]:
import os
import openai
openai.api_key = os.environ['OPENAI_API_KEY']
models = openai.Model.list()
if (len(models) == 0):
    raise Exception("Your OpenAI API key does not appear to be valid. Please check it and try again.")

### Verify Astra Database Connection

In [None]:
from cassandra.cluster import Cluster
from cassandra.cluster import NoHostAvailable
from cassandra.auth import PlainTextAuthProvider

cloud_config = {'secure_connect_bundle': os.environ['ASTRA_SECUREBUNDLE_PATH']}
auth_provider = PlainTextAuthProvider(os.environ['ASTRA_CLIENT_ID'], os.environ['ASTRA_CLIENT_SECRET'])
cluster = Cluster(cloud=cloud_config
                  ,auth_provider=auth_provider
)

keyspace = os.environ['ASTRA_KEYSPACE']
try:
    session = cluster.connect()
    print("Successfully connected to the cluster.")

    rows = session.execute(f"SELECT keyspace_name FROM system_schema.keyspaces WHERE keyspace_name = '{keyspace}';")
    if rows.one() is not None:
        print(f"Keyspace '{keyspace}' exists.")
    else:
        raise EnvironmentError(f"Keyspace '{keyspace}' does not exist.")
 
    session.shutdown()
except NoHostAvailable as e:
    print("Connection failed, please check your node IPs or the network connection.")
    print(f"Exception: {e}")


## Load Embeddings to Astra
Load `wine-embeddings.parquet` into Astra.

The load itself will take approximately 2 minutes, with the TQDM progress bar being very "lumpy" as it's not well configured for the multi-threading. There may also be timeout error messages, but ultimately a small number of even `Error rows` reported will not affect the demonstration.

In [None]:
%run -i loadAstra.py

Look at some of the data that has been loaded:

In [None]:
session = cluster.connect()

rows = session.execute(f"SELECT document_id, document, metadata_blob FROM {os.environ['ASTRA_KEYSPACE']}.{os.environ['ASTRA_TABLE']} LIMIT 5;")
for row in rows:
    print(f"document_id: {row.document_id}, document: {row.document}, metadata_blob: {row.metadata_blob}")

session.shutdown()

## Setup Complete
You are now ready to run the demo notebook [demo.ipynb](demo.ipynb).

You may also wish to run the UI demo, which you can do via the command line (being sure to have activated the correct Python environment):
```
streamlit run demo-ui.py
```