# Voice of Customer Demo Setup

To simplify the content of the demo, this has been split into "setup" and "demo" notebooks. This is the Setup notebook.

> **Note:** This demo has been created using Python 3.11.4.

# Setup - Third Party Accounts

## OpenAI API Account

You will need to have an API account with OpenAI created, see [https://platform.openai.com/](https://platform.openai.com/). From here,
you will need to have created an [API key](https://platform.openai.com/account/api-keys).

To proceed with this exercise, you'll need:

1. Your OpenAI API key

## Astra Database

You should have created an Astra Database with Vector Search. If you are unfamiliar with Astra, it would be suggested to follow the
"Getting Started with Vector Search" quick-start guide, including getting a working Python application.

To proceed with this exercise, you'll need:

1. Your Secure Connect Bundle (`.zip` file)
2. Your Client ID
3. Your Client Secret
4. A keyspace named `vsearch` (or whatever you'd like to call it, you can set in `.env` file later)

## Send Email - Zapier Toolkit

While there is a GmailToolkit available within Langchain, it was found to be too unreliable for demo purpposes.
As an alternative, we can use the ZapierNLAToolkit, which is a simple wrapper around the Zapier API.

Follow the instructions [here](https://nla.zapier.com/start/), specifically:

1. Choose API Key authentication
2. Create an NLS Development Action, for example "Gmail: Send Email"

To use the "send email" portion of this exercise, you'll need:

1. Your Zapier NLA API Key

# Setup - File Downloads

The following files are in [VoiceOfCustomer.zip](https://drive.google.com/file/d/1oReQlv8kAWXyLgNSmqBdwm3aNgByQK8_/view?usp=sharing), which 
you should download and unzip locally into the root of this project directory.

## `metadata.parquet.gz`

This is product metadata, which has been preprocessed by `cutdownReviews.ipynb`. This is needed if you want to run `createEmbeddings.ipynb`.

## `reviews.parquet.gz`

This is the reference Amazon customer reviews, which has been preprocessed by `cutdownReviews.ipynb`. This is needed if you want to run `createEmbeddings.ipynb`.

## `B0015UC17E.reviews-embeddings-text-embedding-ada-002.parquet.gz`

These are the embeddings for the reviews of a single product `B0016UC17E`, which have been preprocessed by `embedReviews.ipynb`. This will allow you to not 
have to wait (or pay) for embedding generation.

# Python Module Installs

In [1]:
%pip install -qU \
    tiktoken \
    ipywidgets \
    pandas \
    pyarrow \
    langchain \
    openai \
    python_dotenv \
    aiometer \
    cassandra-driver \
    tqdm \
    streamlit    

Note: you may need to restart the kernel to use updated packages.


# Set up a `.env` File

The example code uses `dot_env` to load variables into the environment. This file reside in the root of the project and should contain:
```
OPENAI_API_KEY="<Your OpenAI API key>"
ASTRA_SECUREBUNDLE_PATH="<your/path/to/secure-connect-database.zip>"
ASTRA_CLIENT_ID="<Your Client ID"
ASTRA_CLIENT_SECRET="<Your Client Secret>"
ZAPIER_NLA_API_KEY="<Your Zapier NLA API Key>"
EMAIL_RECIPIENT="<Your email address>"
```
Note that the secure connect bundle path is relative to the root of this project directory. The keyspace and table names should be amended to match your keyspace and table names.

# Review/Modify `CONSTANTS.py`

`CONSTANTS.py` contains a number of constants that are used throughout the demo. You should review these and modify as necessary.

| Constant | Description |
| --- | --- |
| `embed_model` | The name of the OpenAI embedding model to use. |
| `embed_dimensions` | The number of dimensions of the embedding model. |
| `chat_model_name` | The name of the OpenAI chat model to use. |
| `keyspace_name` | The name of the keyspace in the Astra database. This keyspace must exist. |
| `table_name` | The name of the table in the Astra database. It will be created by `loadAstra.py` |


# Environment Validation
This code validates the environment is ready to go: it can connect to OpenAI API as well as Astra.

## Verify `.env` file is set up

In [2]:
import os
from dotenv import load_dotenv
if not load_dotenv('.env',override=True):
    raise Exception("Couldn't load .env file")

envVars = ['OPENAI_API_KEY','ASTRA_SECUREBUNDLE_PATH','ASTRA_CLIENT_ID','ASTRA_CLIENT_SECRET']
missing = []

for var in envVars:
    if var not in os.environ:
        missing.append(var)

if missing:
    raise EnvironmentError(f'These environment variables are missing: {missing}')

if 'ZAPIER_NLA_API_KEY' not in os.environ:
    print("This demo will be unable to send emails.")
else:
    if 'EMAIL_RECIPIENT' not in os.environ:
        print("The demo.ipynb notebook will not be able to send email without modification.")

## Verify OpenAI API Key

In [3]:
import os
import openai
openai.api_key = os.environ['OPENAI_API_KEY']
models = openai.Model.list()
if (len(models) == 0):
    raise Exception("Your OpenAI API key does not appear to be valid. Please check it and try again.")

## Verify Astra Database Connection

In [4]:
from cassandra.cluster import Cluster
from cassandra.cluster import NoHostAvailable
from cassandra.auth import PlainTextAuthProvider
from CONSTANTS import keyspace_name

cloud_config = {'secure_connect_bundle': os.environ['ASTRA_SECUREBUNDLE_PATH']}
auth_provider = PlainTextAuthProvider(os.environ['ASTRA_CLIENT_ID'], os.environ['ASTRA_CLIENT_SECRET'])
cluster = Cluster(cloud=cloud_config
                  ,auth_provider=auth_provider
)

try:
    session = cluster.connect()
    print("Successfully connected to the cluster.")

    rows = session.execute(f"SELECT keyspace_name FROM system_schema.keyspaces WHERE keyspace_name = '{keyspace_name}';")
    if rows.one() is not None:
        print(f"Keyspace '{keyspace_name}' exists.")
    else:
        raise EnvironmentError(f"Keyspace '{keyspace_name}' does not exist.")
 
    session.shutdown()
except NoHostAvailable as e:
    print("Connection failed, please check your node IPs or the network connection.")
    print(f"Exception: {e}")

Successfully connected to the cluster.
Keyspace 'vsearch' exists.


## Verifying Zapier Integration

We will forgo this, as there is not a simple way to verify. If sending an email does not work, you may have a setup problem!

# Load Embeddings to Astra

The embedddings `.parquet` file is loaded into Astra. The filename format is based on values in `CONSTANTS.py`

```
embed_file = f"{example_asin}.reviews-embeddings-{embed_model}.parquet.gz"
```

In [9]:
%run -i loadAstra.py

Thread Initialization:   0%|          | 0/25 [00:00<?, ?it/s]

Record Loading Progress:   0%|          | 0/11110 [00:00<?, ?it/s]

Total rows processed: 11110
Retries: 0
Error rows: 0


And validate there is some data loaded:

In [10]:
from CONSTANTS import keyspace_name, table_name, example_asin
session = cluster.connect()

rows = session.execute(f"SELECT reviewer_name, truncated_review_text FROM {keyspace_name}.{table_name} WHERE asin = '{example_asin}' LIMIT 5;")
for row in rows:
    print(f"reviewer_name: {row.reviewer_name}, truncated_review_text: {row.truncated_review_text}")

session.shutdown()

reviewer_name: left272, truncated_review_text: I read other reviews on this light and brought two of them. The light is bright and the size fits in shirt pocket just like an ink pen. I use mine to trace conduit and pipes in 10 to 16 foot ceiling with amazement. I do wish it had an adjustable beam but still a super light. Coworkers are also impressed with the light and then again when I tell them the price.
reviewer_name: Momraj, truncated_review_text: I bought this for my husband and he LOVES it!!!  It's so handy that we ordered 2 more soon after receiving the first one!  It's very bright and is easy to keep in a pocket or purse.
reviewer_name: tbettenbrock, truncated_review_text: Great pen light, very bright.  Takes 2 AAA batteries, which we always have on hand, no running out for specialty button batteries.
reviewer_name: Tammer Ghaly, truncated_review_text: not much to say about this, other than the fact that it's really bright. it does have a click-on feature, but it's pretty tough