<a href="https://colab.research.google.com/github/Redislabs-Solution-Architects/financial-vss/blob/main/OpenAI_RedisVL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Document Question Answering with OpenAI and RedisVL

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

This notebook would use OpenAI, Redis with Vector Similarity Search and RedisVL to answer questions about the information contained in a document.

### Install Dependencies


In [None]:
!pip install -q redis redisvl openai 

## Initialize OpenAI

You need to supply the OpenAI API key (starts with `sk-...`) when prompted. You can find your API key at https://platform.openai.com/account/api-keys

In [None]:
import openai
import os
import getpass

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY","")
if OPENAI_API_KEY == "":
    key=getpass.getpass(prompt='OpenAI Key: ', stream=None)
    os.environ['OPENAI_API_KEY']=key

openai.api_key = os.getenv("OPENAI_API_KEY")

### Install Redis Stack (OPTIONAL)

Redis Search will be used as Vector Similarity Search engine for LangChain.

Instead of using in-notebook Redis Stack https://redis.io/docs/getting-started/install-stack/ you can provision your own free instance of Redis in the cloud. Get your own Free Redis Cloud instance at https://redis.com/try-free/

In [None]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

### Connect to Redis

By default this notebook would connect to the local instance of Redis Stack. If you have your own Redis Cloud instance - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [None]:
import redis
import os


REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#Replace values above with your own if using Redis Cloud instance
#REDIS_HOST="redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
#REDIS_PORT=18374
#REDIS_PASSWORD="1TNxTEdYRDgIDKM2gDfasupCADXXXX"

#shortcut for redis-cli $REDIS_CONN command
if REDIS_PASSWORD!="":
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT} -a {REDIS_PASSWORD} --no-auth-warning"
else:
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT}"

REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"


Create the index from the schema

In [None]:
from redisvl.index import SearchIndex

INDEX_NAME = f"rvlsec_idx"

schema = {
  "index": {
    "name": INDEX_NAME,
    "prefix": "sec"
  },
  "fields": {
    "text": [{"name": "chunk"}],
    "text": [{"name": "name"}],
    "tag": [{"name": "symbol"}],
    "numeric": [{"name": "year"}],
    "vector": [{
                "name": "embedding",
                "dims": 1536,
                "distance_metric": "cosine",
                "algorithm": "hnsw",
                "datatype": "float32"}
        ]
  },
}

# construct a search index from the schema
index = SearchIndex.from_dict(schema) # or SearchIndex.from_yaml("schema.yaml") for yaml files

# connect to local redis instance
index.connect(REDIS_URL)

# create the index (no data yet)
index.create(overwrite=True)

In [None]:
# use the CLI to see the created index
!rvl index listall

### Load text and split it into manageable chunks

Without this step any large body of text would exceed the limit of tokens you can feed to the LLM

In [None]:
# Add your own URLs here
docs = [
    "/resources/aapl-10k-2023.pdf",
    "/resources/amzn-10k-2023.pdf"
]

# Need to get text chunks from docs and vectorize them

# Optionally examine the result of text load+splitting
for text in texts:
  print(text)

### Initialize embeddings engine

Get the vectorizer and create the embeddings

In [None]:
import os
import openai

from redisvl.vectorize.text import OpenAITextVectorizer

# create a vectorizer
oai = OpenAITextVectorizer(
    model="text-embedding-ada-002",
    api_config={"api_key": openai.api_key},
)


### Create vector store from the documents using Redis as Vector Database

In [None]:
# load expects an iterable of dictionaries
index.load(data)

### Query the database

### Debugging Redis

The code block below is example of how you can interact with the Redis Database

In [None]:
#!redis-cli $REDIS_CONN keys "*"
#!redis-cli $REDIS_CONN HGETALL "doc:qna:idx:063955c855a7436fbf9829821332ed2a"

###-- FLUSHDB will wipe out the entire database!!! Use with caution --###
#!redis-cli $REDIS_CONN flushdb


## Finally - let's ask questions!

Examples:
- What did the president say about Kentaji Brown Jackson
- Did he mention Stephen Breyer?
- What was his stance on Ukraine

In [None]:
query = "What did the president say about Kentaji Brown Jackson?"
res=qa(query)
res['result']