# Using Redis and Azure OpenAI to chat with PDF documents

This notebook demonstrates how to use RedisAI and (Azure) OpenAI to chat with PDF documents. The PDF included is
a informational brochure about the Chevy Colorado pickup truck.

In this notebook, we will use LLamaIndex to chunk, vectorize, and store the PDF document in Redis as vectors
alongside associated text. The query interface provided by LLamaIndex will be used to search for relevant
information given queries from the user.

In [None]:
# Install the Python requirements
%pip install -r requirements.txt --no-cache-dir

In [None]:
import os
import sys
import logging

logging.basicConfig(
    stream=sys.stdout, level=logging.WARNING
) # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

import textwrap
import openai

from llama_index.llms import AzureOpenAI
from llama_index.embeddings import AzureOpenAIEmbedding
from llama_index.vector_stores import RedisVectorStore

from llama_index import (
    GPTVectorStoreIndex,
    SimpleDirectoryReader,
    LLMPredictor,
    PromptHelper,
    ServiceContext,
    StorageContext
)
from dotenv import load_dotenv

## Azure OpenAI 

Here we setup the AzureOpenAI models and API keys that we set by reading from the environment above. The ``PromptHelper`` sets the parameters for the OpenAI model. The classes defined here are used together to provide a QnA interface between the user and the LLM.

In [None]:
load_dotenv(override=True)

In [None]:
api_base = os.getenv("AZURE_API_BASE")
api_version = os.getenv("AZURE_API_VERSION") 
api_key = os.getenv("AZURE_API_KEY")


# Get the OpenAI model names ex. "text-embedding-ada-002"
embedding_model = os.getenv("AZURE_EMBEDDING_MODEL")
text_model = os.getenv("AZURE_TEXT_MODEL")
# get the Azure Deployment name for the model
embedding_model_deployment = os.getenv("AZURE_EMBED_MODEL_DEPLOYMENT_NAME")
text_model_deployment = os.getenv("AZURE_TEXT_MODEL_DEPLOYMENT_NAME")

print(f"Using OpenAI models: {embedding_model} and {text_model}")
print(f"Using Azure deployments: {embedding_model_deployment} and {text_model_deployment}")
print(f"Using OpenAI version: {api_version}")


In [None]:
llm = AzureOpenAI(
    model=text_model,
    deployment_name=text_model_deployment, 
    api_key=api_key, 
    azure_endpoint= api_base,
    api_version=api_version,)

llm_predictor = LLMPredictor(llm=llm)

embedding_llm = AzureOpenAIEmbedding(
    model=embedding_model,
    deployment_name=embedding_model_deployment,
    api_key=api_key, 
    azure_endpoint= api_base,
    api_version=api_version,
)

### LLamaIndex

[LlamaIndex](https://github.com/jerryjliu/llama_index) (GPT Index) is a project that provides a central interface to connect your LLM's with external data sources. It provides a simple interface to vectorize and store embeddings in Redis, create search indices using Redis, and perform vector search to find context for generative models like GPT.

Here we will use it to load in the documents (Chevy Colorado Brochure).

In [None]:
# load documents
documents = SimpleDirectoryReader('./docs').load_data()
print('Document ID:', documents[0].doc_id)

Llamaindex also works with frameworks like langchain to make prompting and other aspects of a chat based application easier. Here we can use the ``PromptHelper`` class to help us generate prompts for the (Azure) OpenAI model. The will be off by default as it can be tricky to setup correctly.

In [None]:
# set number of output tokens
num_output = int(os.getenv("OPENAI_MAX_TOKENS"))
# max LLM token input size
max_input_size = int(os.getenv("CHUNK_SIZE"))
# set maximum chunk overlap
max_chunk_overlap = float(os.getenv("CHUNK_OVERLAP"))

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

In [None]:
# define the service we will use to answer questions
service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor,
    embed_model=embedding_llm,
    prompt_helper=prompt_helper # uncomment to use prompt_helper.
)

## Initialize Redis as a Vector Database

Now we have our documents read in, we can initialize the ``RedisVectorStore``. This will allow us to store our vectors in Redis and create an index.

The ``GPTVectorStoreIndex`` will then create the embeddings from the text chunks by calling out to OpenAI's API. The embeddings will be stored in Redis and an index will be created.

In [None]:
def format_redis_conn_from_env(using_ssl=False):
    start = "rediss://" if using_ssl else "redis://"
    # if using RBAC
    password = os.getenv("REDIS_PASSWORD", None)
    username = os.getenv("REDIS_USERNAME", "")
    if password != None:
        start += f"{username}:{password}@"

    return start + f"{os.getenv('REDIS_HOST')}:{os.getenv('REDIS_PORT')}"

# make using_ssl=True to use SSL with ACRE
redis_url = format_redis_conn_from_env(using_ssl=False)
print(f"Using Redis address: {redis_url}")


In [None]:
# Create VectorStore
vector_store = RedisVectorStore(
    index_name="chevy_docs",
    index_prefix="blog",
    redis_url=redis_url,
    overwrite=True
)

# access the underlying client in the RedisVectorStore implementation to ping the redis instance
vector_store.client.ping()

In [None]:
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    service_context=service_context
)

## Test the RAG pipeline!

Now that we have our document stored in the index, we can ask questions against the index. The index will use the data stored in itself as the knowledge base for the LLM.

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("What types of variants are available for the Chevrolet Colorado?")
print("\n", textwrap.fill(str(response), 100))

In [None]:
response = query_engine.query("What is the maximum towing capacity of the chevy colorado?")
print("\n", textwrap.fill(str(response), 100))

In [None]:
response = query_engine.query("What are the main differences between the three engine types available for the Chevy Colorado?")
print("\n", textwrap.fill(str(response), 100))

## Feedback functions
Use [TrueLens RAG Triad](https://www.trulens.org/trulens_eval/core_concepts_rag_triad/) to check for  context relevance, groundedness and answer relevance.

In [None]:
from trulens_eval import Tru

tru = Tru()
tru.reset_database()

In [None]:
import nest_asyncio
from trulens_eval.feedback.provider.openai import AzureOpenAI as fAzureOpenAI
nest_asyncio.apply()
provider = fAzureOpenAI(deployment_name=text_model_deployment, api_key=api_key, api_version=api_version,azure_endpoint=api_base)

### 1. Answer Relevance

In [None]:
from trulens_eval import Feedback

f_qa_relevance = Feedback(
    provider.relevance_with_cot_reasons,
    name="Answer Relevance"
).on_input_output()

### 2. Context Relevance

In [None]:
from trulens_eval import TruLlama

context_selection = TruLlama.select_source_nodes().node.text

In [None]:
import numpy as np

f_qs_relevance = (
    Feedback(provider.qs_relevance,
             name="Context Relevance")
    .on_input()
    .on(context_selection)
    .aggregate(np.mean)
)

In [None]:
import numpy as np

f_qs_relevance = (
    Feedback(provider.qs_relevance_with_cot_reasons,
             name="Context Relevance")
    .on_input()
    .on(context_selection)
    .aggregate(np.mean)
)

### 3. Groundedness

In [None]:
from trulens_eval.feedback import Groundedness

grounded = Groundedness(groundedness_provider=provider)

In [None]:
f_groundedness = (
    Feedback(grounded.groundedness_measure_with_cot_reasons,
             name="Groundedness"
            )
    .on(context_selection)
    .on_output()
    .aggregate(grounded.grounded_statements_aggregator)
)

## Evaluation of the RAG application

In [None]:
from trulens_eval import TruLlama
from trulens_eval import FeedbackMode

tru_recorder = TruLlama(
    query_engine,
    app_id="Redis_Azure_OpenAI",
    feedbacks=[
        f_qa_relevance,
        f_qs_relevance,
        f_groundedness
    ]
)

In [None]:
eval_questions = [
    "What types of variants are available for the Chevrolet Colorado?",
    "What are the main differences between the three engine types available for the Chevy Colorado?",
    "What is the maximum towing capacity of the chevy colorado?",
]

In [None]:
for question in eval_questions:
    with tru_recorder as recording:
        query_engine.query(question)

In [None]:
records, feedback = tru.get_records_and_feedback(app_ids=[])
records.head()

In [None]:
import pandas as pd

pd.set_option("display.max_colwidth", None)
records[["input", "output"] + feedback]

In [None]:
tru.get_leaderboard(app_ids=[])

In [None]:
tru.run_dashboard()