# Using Redis and OpenAI to chat with PDF documents

This notebook demonstrates how to use RedisAI and (Azure) OpenAI to chat with PDF documents. The PDF included is
a informational brochure about the Chevy Colorado pickup truck.

In this notebook, we will use LLamaIndex to chunk, vectorize, and store the PDF document in Redis as vectors
alongside associated text. The query interface provided by LLamaIndex will be used to search for relevant
information given queries from the user.

In [None]:
# Install the requirements
%pip install redis pypdf PyPDF2 python-dotenv transformers tiktoken ipywidgets llama_index==0.8.26

In [18]:
import os
import textwrap
import openai
from langchain.llms import AzureOpenAI, OpenAI
from langchain.embeddings import OpenAIEmbeddings
from llama_index.vector_stores import RedisVectorStore
from llama_index import LangchainEmbedding
from llama_index import (
    GPTVectorStoreIndex,
    SimpleDirectoryReader,
    LLMPredictor,
    PromptHelper,
    ServiceContext,
    StorageContext
)
import sys

import logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO) # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))


In [2]:
# load the .env file in the parent directory into the current environment
from dotenv import load_dotenv
load_dotenv('./.env')

True

# Azure OpenAI and OpenAI

The notebook allows the user two choose between using the OpenAI and Azure OpenAI endpoints. Make sure to follow the instructions in the README and set the .env correctly according to whichever API you are using. 

NOTE: ONLY ONE API CAN BE USED AT A TIME.

## Azure OpenAI 

Here we setup the AzureOpenAI models and API keys that we set by reading from the environment above. The ``PromptHelper`` sets the parameters for the OpenAI model. The classes defined here are used together to provide a QnA interface between the user and the LLM.

In [3]:
# setup Llama Index to use Azure OpenAI
openai.api_type = "azure"
openai.api_base = os.getenv("AZURE_OPENAI_API_BASE")
openai.api_version = os.getenv("OPENAI_API_VERSION")
openai.api_key = os.getenv("OPENAI_API_KEY")

# Get the OpenAI model names ex. "text-embedding-ada-002"
embedding_model = os.getenv("OPENAI_EMBEDDING_MODEL")
text_model = os.getenv("OPENAI_TEXT_MODEL")


print(f"Using models: {embedding_model} and {text_model}")

# get the Azure Deployment name for the model
embedding_model_deployment = os.getenv("AZURE_EMBED_MODEL_DEPLOYMENT_NAME")
text_model_deployment = os.getenv("AZURE_TEXT_MODEL_DEPLOYMENT_NAME")

print(f"Using deployments: {embedding_model_deployment} and {text_model_deployment}")


Using models: text-embedding-ada-002 and text-davinci-003
Using deployments: embed and textgen


In [4]:

llm = AzureOpenAI(deployment_name=text_model_deployment, model_kwargs={
    "api_key": openai.api_key,
    "api_base": openai.api_base,
    "api_type": openai.api_type,
    "api_version": openai.api_version,
})
llm_predictor = LLMPredictor(llm=llm)

embedding_llm = LangchainEmbedding(
    OpenAIEmbeddings(
        model=embedding_model,
        deployment=embedding_model_deployment,
        openai_api_key= openai.api_key,
        openai_api_base=openai.api_base,
        openai_api_type=openai.api_type,
        openai_api_version=openai.api_version,
    ),
    embed_batch_size=1,
)

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


## OpenAI

The ``OpenAI`` class provides a simple interface to the OpenAI API.


In [3]:
# setup Llama Index to use Azure OpenAI
openai.api_type = "openai"
openai.api_version = os.getenv("OPENAI_API_VERSION")
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_key = os.getenv("OPENAI_API_KEY")

# Get the OpenAI model names ex. "text-embedding-ada-002"
embedding_model = os.getenv("OPENAI_EMBEDDING_MODEL")
text_model = os.getenv("OPENAI_TEXT_MODEL")


print(f"Using models: {embedding_model} and {text_model}")

Using models: text-embedding-ada-002 and text-davinci-003


In [19]:


llm = OpenAI(model_kwargs={
    "api_key": openai.api_key,
    "api_base": openai.api_base,
    "api_type": openai.api_type,
    "api_version" : openai.api_version,

})
llm_predictor = LLMPredictor(llm=llm)

embedding_llm = LangchainEmbedding(
    OpenAIEmbeddings(
        model=embedding_model,
        openai_api_version=openai.api_version,
        openai_api_key= openai.api_key,
        openai_api_base=openai.api_base,
        openai_api_type=openai.api_type,
    ),
    embed_batch_size=1,
)

### LLamaIndex

[LlamaIndex](https://github.com/jerryjliu/llama_index) (GPT Index) is a project that provides a central interface to connect your LLM's with external data sources. It provides a simple interface to vectorize and store embeddings in Redis, create search indices using Redis, and perform vector search to find context for generative models like GPT.

Here we will use it to load in the documents (Chevy Colorado Brochure).

In [20]:
# load documents
documents = SimpleDirectoryReader('./docs').load_data()
print('Document ID:', documents[0].doc_id)

Document ID: cff917df-07b0-4d64-b363-5bc165d455e1


Llamaindex also works with frameworks like langchain to make prompting and other aspects of a chat based application easier. Here we can use the ``PromptHelper`` class to help us generate prompts for the (Azure) OpenAI model. The will be off by default as it can be tricky to setup correctly.

In [6]:
# set number of output tokens
num_output = int(os.getenv("OPENAI_MAX_TOKENS"))
# max LLM token input size
max_input_size = int(os.getenv("CHUNK_SIZE"))
# set maximum chunk overlap
max_chunk_overlap = float(os.getenv("CHUNK_OVERLAP"))

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

In [21]:

# define the service we will use to answer questions
# if you executive the Azure OpenAI code above, your Azure Models and creds will be used and the same for OpenAI
service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor,
    embed_model=embedding_llm,
#    prompt_helper=prompt_helper # uncomment to use prompt_helper.
)

## Initialize Redis as a Vector Database

Now we have our documents read in, we can initialize the ``RedisVectorStore``. This will allow us to store our vectors in Redis and create an index.

The ``GPTVectorStoreIndex`` will then create the embeddings from the text chunks by calling out to OpenAI's API. The embeddings will be stored in Redis and an index will be created.

NOTE: If you didn't set the ``OPENAI_API_KEY`` environment variable, you will get an error here.

In [22]:
def format_redis_conn_from_env(using_ssl=False):
    start = "rediss://" if using_ssl else "redis://"
    # if using RBAC
    password = os.getenv("REDIS_PASSWORD", None)
    username = os.getenv("REDIS_USERNAME", "default")
    if password != None:
        start += f"{username}:{password}@"

    return start + f"{os.getenv('REDIS_ADDRESS')}:{os.getenv('REDIS_PORT')}"


# make using_ssl=True to use SSL with ACRE
redis_address = format_redis_conn_from_env(using_ssl=False)

print(f"Using Redis address: {redis_address}")
vector_store = RedisVectorStore(
    index_name="chevy_docs",
    index_prefix="blog",
    redis_url=redis_address,
    overwrite=True
)

# access the underlying client in the RedisVectorStore implementation to ping the redis instance
vector_store.client.ping()

Using Redis address: redis://localhost:6379


True

In [23]:
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    service_context=service_context
)

INFO:llama_index.vector_stores.redis:Deleting index chevy_docs
Deleting index chevy_docs
Deleting index chevy_docs
INFO:llama_index.vector_stores.redis:Creating index chevy_docs
Creating index chevy_docs
Creating index chevy_docs
INFO:llama_index.vector_stores.redis:Added 27 documents to index chevy_docs
Added 27 documents to index chevy_docs
Added 27 documents to index chevy_docs
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 14521 tokens
> [build_index_from_nodes] Total embedding token usage: 14521 tokens
> [build_index_from_nodes] Total embedding token usage: 14521 tokens


## Start Querying information from the Document

Now that we have our document stored in the index, we can ask questions against the index. The index will use the data stored in itself as the knowledge base for chatgpt.

In [24]:
query_engine = index.as_query_engine()
response = query_engine.query("What types of variants are available for the Chevrolet Colorado?")
print("\n", textwrap.fill(str(response), 100))

INFO:llama_index.vector_stores.redis:Querying index chevy_docs
Querying index chevy_docs
Querying index chevy_docs
INFO:llama_index.vector_stores.redis:Found 2 results for query with id ['blog_52cde0a6-10fb-4ecd-8619-f0cf3ef74e99', 'blog_67cc5f3b-9aca-4582-8f9d-4fc9d5c1ddac']
Found 2 results for query with id ['blog_52cde0a6-10fb-4ecd-8619-f0cf3ef74e99', 'blog_67cc5f3b-9aca-4582-8f9d-4fc9d5c1ddac']
Found 2 results for query with id ['blog_52cde0a6-10fb-4ecd-8619-f0cf3ef74e99', 'blog_67cc5f3b-9aca-4582-8f9d-4fc9d5c1ddac']
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 11 tokens
> [retrieve] Total embedding token usage: 11 tokens
> [retrieve] Total embedding token usage: 11 tokens


Token indices sequence length is longer than the specified maximum sequence length for this model (1449 > 1024). Running this sequence through the model will result in indexing errors


INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1502 tokens
> [get_response] Total LLM token usage: 1502 tokens
> [get_response] Total LLM token usage: 1502 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens

  The Chevrolet Colorado is available in four models: WT, LT, Z71, and ZR2. There are also special
editions available, including the ZR2 Bison Edition, ZR2 Dusk Special Edition, and ZR2 Midnight
Special Edition.


In [25]:
response = query_engine.query("What is the maximum towing capacity of the chevy colorado?")
print("\n", textwrap.fill(str(response), 100))

INFO:llama_index.vector_stores.redis:Querying index chevy_docs
Querying index chevy_docs
Querying index chevy_docs
INFO:llama_index.vector_stores.redis:Found 2 results for query with id ['blog_28b35a8b-ac42-44a2-9830-a1a7c4770bbe', 'blog_52cde0a6-10fb-4ecd-8619-f0cf3ef74e99']
Found 2 results for query with id ['blog_28b35a8b-ac42-44a2-9830-a1a7c4770bbe', 'blog_52cde0a6-10fb-4ecd-8619-f0cf3ef74e99']
Found 2 results for query with id ['blog_28b35a8b-ac42-44a2-9830-a1a7c4770bbe', 'blog_52cde0a6-10fb-4ecd-8619-f0cf3ef74e99']
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 14 tokens
> [retrieve] Total embedding token usage: 14 tokens
> [retrieve] Total embedding token usage: 14 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 983 tok

In [26]:
response = query_engine.query("What are the main differences between the three engine types available for the Chevy Colorado?")
print("\n", textwrap.fill(str(response), 100))

INFO:llama_index.vector_stores.redis:Querying index chevy_docs
Querying index chevy_docs
Querying index chevy_docs
INFO:llama_index.vector_stores.redis:Found 2 results for query with id ['blog_52cde0a6-10fb-4ecd-8619-f0cf3ef74e99', 'blog_67cc5f3b-9aca-4582-8f9d-4fc9d5c1ddac']
Found 2 results for query with id ['blog_52cde0a6-10fb-4ecd-8619-f0cf3ef74e99', 'blog_67cc5f3b-9aca-4582-8f9d-4fc9d5c1ddac']
Found 2 results for query with id ['blog_52cde0a6-10fb-4ecd-8619-f0cf3ef74e99', 'blog_67cc5f3b-9aca-4582-8f9d-4fc9d5c1ddac']
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 16 tokens
> [retrieve] Total embedding token usage: 16 tokens
> [retrieve] Total embedding token usage: 16 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1583 to