<a href="https://colab.research.google.com/github/kkrueger/Redis-Workshops/blob/main/06-LlamaIndex_Redis%20/06.1_OpenAI_LlamaIndex_Redis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Document Question Answering with LlamaIndex, OpenAI and Redis

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

This notebook would use OpenAI, Redis with Vector Similarity Search and LlamaIndex to answer questions about the information contained in a document.

In [20]:
!pip install -q llama_index redis html2text trafilatura

In [21]:
from llama_index import (
      TrafilaturaWebReader,
      GPTVectorStoreIndex,
      StorageContext,
      ServiceContext
    )
from llama_index.vector_stores import RedisVectorStore



In [22]:
import sys

import logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO) # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

Initialize OpenAI. You need to supply the OpenAI API key (starts with `sk-...`) when prompted. You can find your API key at https://platform.openai.com/account/api-keys

In [23]:
import openai
import os
import getpass

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY","")
if OPENAI_API_KEY == "":
    key=getpass.getpass(prompt='OpenAI Key: ', stream=None)
    os.environ['OPENAI_API_KEY']=key

openai.api_key = os.getenv("OPENAI_API_KEY")

### Install Redis Stack

Redis Search will be used as Vector Similarity Search engine for LangChain. Instead of using in-notebook Redis Stack https://redis.io/docs/getting-started/install-stack/ you can provision your own free instance of Redis in the cloud. Get your own Free Redis Cloud instance at https://redis.com/try-free/

In [24]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


gpg: cannot open '/dev/tty': No such device or address
curl: (23) Failed writing body


### Connect to Redis

By default this notebook would connect to the local instance of Redis Stack. If you have your own Redis Cloud instance - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [25]:
import redis
import os


REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#Replace values above with your own if using Redis Cloud instance
#REDIS_HOST="redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
#REDIS_PORT=18374
#REDIS_PASSWORD="1TNxTEdYRDgIDKM2gDfasupCADXXXX"

#shortcut for redis-cli $REDIS_CONN command
if REDIS_PASSWORD!="":
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT} -a {REDIS_PASSWORD} --no-auth-warning"
else:
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT}"

REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"



### Load web documents

Load web documents that would be used to answer questions. Feel free to replace the links with the ones you would like to use.

In [26]:
documents = TrafilaturaWebReader().load_data(
    [
        "https://www.cnn.com/2023/05/18/media/disney-florida-desantis/index.html",
        "https://www.cnn.com/2022/11/12/business/disney-hiring-freeze-job-cuts/index.html"
        ]
)


In [27]:
# optionally examine the retrieved documents
#documents

### Create vector store using Redis as Vector Database

In [28]:
print(f"Using Redis address: {REDIS_URL}")
vector_store = RedisVectorStore(
    index_name="news",
    index_prefix="cnn",
    redis_url=REDIS_URL,
    overwrite=True
)
vector_store.client.ping()

Using Redis address: redis://:@localhost:6379


True

In [29]:
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(chunk_size=100, chunk_overlap=20)
index = GPTVectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

## Finally - let's ask questions!

Examples:
- What plans is Disney cancelling?
- Who is Bob Chapek?
- Why Disney cancelling the plans?

In [32]:
query_engine = index.as_query_engine()
response = query_engine.query("who is Bob Chapek?")
print(response)

Bob Chapek is the Chief Executive of Disney.
