![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

# Advanced RAG example

Now that you have a good foundation in Redis data structures, search capabilities, and basic RAG with the redisvl client from [/getting_started/02_redisvl](../getting_started/02_redisvl.ipynb).

We will extend the basic RAG example with a few special topics/techniques:
- Dense content representation
- Query rewriting / expansion
- Semantic caching
- Conversational memory persistence

## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/RAG/04_advanced_redisvl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Improve accuracy with dense content representations
In the basic example, we took raw chunks of text from our pdf documents and generated embeddings for them to be stored in the vector database. This is okay but one technique we can use to improve the quality of retrieval is to leverage an LLM from OpenAI during ETL. We will prompt the LLM to summarize and decompose the raw pdf text into more discrete propositional phrases. This will enhance the clarity of the text and improve semantic retrieval for RAG.

The goal is to utilize a preprocessing technique similar to what's outlined here:
https://github.com/langchain-ai/langchain/blob/master/templates/propositional-retrieval/propositional_retrieval/proposal_chain.py

If you already have a redis-stack instance running locally from before feel free to jump ahead but if not execute the following commands to get the environment properly setup.

## Environment Setup

### Pull Github Materials
Because you are likely running this notebook in **Google Colab**, we need to first
pull the necessary dataset and materials directly from GitHub.

**If you are running this notebook locally**, FYI you may not need to perform this
step at all.

In [4]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/RAG/resources .
!rm -rf temp_repo

Cloning into 'temp_repo'...
remote: Enumerating objects: 138, done.[K
remote: Counting objects: 100% (138/138), done.[K
remote: Compressing objects: 100% (98/98), done.[K
remote: Total 138 (delta 68), reused 91 (delta 35), pack-reused 0[K
Receiving objects: 100% (138/138), 7.19 MiB | 7.61 MiB/s, done.
Resolving deltas: 100% (68/68), done.


### Install Python Dependencies

In [1]:
# NBVAL_SKIP
!pip install -q redis redisvl pandas "unstructured[pdf]" sentence-transformers langchain langchain-community openai tqdm



### Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings created from PDF document chunks. **We need to make sure we have a Redis
instance available.**

#### For Colab
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly
from the Redis package archive.

In [None]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


#### For Alternative Environments
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [3]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Now that our environment is setup we can again load our financial documents

### Dataset Preparation (PDF Documents)

To best demonstrate Redis as a vector database layer, we will load a single
financial (10k filings) doc and preprocess it using some helpers from LangChain:

- `UnstructuredFileLoader` is not the only document loader type that LangChain provides. Docs: https://python.langchain.com/docs/integrations/document_loaders/unstructured_file
- `RecursiveCharacterTextSplitter` is what we use to create smaller chunks of text from the doc. Docs: https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredFileLoader

# Load list of pdfs from a folder
data_path = "resources/"
docs = [os.path.join(data_path, file) for file in os.listdir(data_path)]

print("Listing available documents ...", docs)

Listing available documents ... ['resources/nke-10k-2023.pdf', 'resources/amzn-10k-2023.pdf', 'resources/jnj-10k-2023.pdf', 'resources/aapl-10k-2023.pdf', 'resources/nvd-10k-2023.pdf', 'resources/msft-10k-2023.pdf']


In [5]:
# pick out the Nike doc for this exercise
doc = [doc for doc in docs if "nke" in doc][0]

# set up the file loader/extractor and text splitter to create chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2500, chunk_overlap=0)
loader = UnstructuredFileLoader(doc, mode="single", strategy="fast")

# extract, load, and make chunks
chunks = loader.load_and_split(text_splitter)

print("Done preprocessing. Created", len(chunks), "chunks of the original pdf", doc)

Done preprocessing. Created 180 chunks of the original pdf resources/nke-10k-2023.pdf


### In the previous example, we would have gone ahead and embed the chunks as extracted here.

Now we will instead leverage an LLM to create dense content representations to improve our retrieval accuracy.

### Setup OpenAI as LLM

In [6]:
import os
import getpass
import openai

CHAT_MODEL = "gpt-3.5-turbo-0125"


if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY")

In [7]:
import tqdm
import json


def create_dense_props(chunk):
    """Create dense representation of raw text content."""

    # The system message here should be HEAVILY customized for your specific use case
    SYSTEM_PROMPT = """
    You are a helpful PDF extractor tool. You will be presented with segments from
    raw PDF documents composed of 10k SEC filings information about public companies.

    Decompose and summarize the raw content into clear and simple propositions,
    ensuring they are interpretable out of context. Consider the following rules:
    1. Split compound sentences into simpler dense phrases that retain existing
    meaning.
    2. Simplify technical jargon or wording if possible while retaining existing
    meaning.
    2. For any named entity that is accompanied by additional descriptive information,
    separate this information into its own distinct proposition.
    3. Decontextualize the proposition by adding necessary modifier to nouns or
    entire sentences and replacing pronouns (e.g., "it", "he", "she", "they", "this", "that")
    with the full name of the entities they refer to.
    4. Present the results as a list of strings, formatted in JSON, under the key "propositions".
    """

    response = openai.OpenAI().chat.completions.create(
        model=CHAT_MODEL,
        response_format={ "type": "json_object" },
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Decompose this raw content using the rules above:\n{chunk.page_content} "}
        ]
    )
    res = response.choices[0].message.content

    try:
        return json.loads(res)["propositions"]
    except Exception as e:
        print(f"Failed to parse propositions", str(e), flush=True)
        # Retry
        return create_dense_props(chunk)

In [8]:
from openai import OpenAI

### Create text propositions using OpenAI

In [25]:
# Load from disk to save time or regenerate as needed.
try:
    with open("resources/propositions.json", "r") as f:
        propositions = json.load(f)
except:
    # create props
    propositions = [create_dense_props(chunk) for chunk in tqdm.tqdm(chunks)]

    # Save to disk for faster reload..
    with open("resources/propositions.json", "w") as f:
        json.dump(propositions, f)

### Let's evaluate the proposition vs the raw chunk

In [26]:
propositions[0]

"As of November 30, 2022, the aggregate market value of NIKE, Inc.'s Common Stock held by non-affiliates was $144,299,267,044. NIKE, Inc. filed a Form 10-K with the UNITED STATES SECURITIES AND EXCHANGE COMMISSION. The Form 10-K is an ANNUAL REPORT FOR THE FISCAL YEAR ENDED MAY 31, 2023. NIKE, Inc.'s exact name as specified in its charter is NIKE, Inc. NIKE, Inc. is incorporated in Oregon with an IRS Employer Identification No. of 93-0584541. The principal executive offices of NIKE, Inc. are located at One Bowerman Drive, Beaverton, Oregon 97005-6453. NIKE, Inc.'s telephone number, including area code, is (503) 671-6453. NIKE, Inc. has Class B Common Stock registered on the New York Stock Exchange under the trading symbol NKE. NIKE, Inc. does not have any securities registered pursuant to SECTION 12(G) OF THE ACT."

In [23]:
chunks[0]

Document(page_content="As of November 30, 2022, the aggregate market values of the Registrant's Common Stock held by non-affiliates were:Class A$7,831,564,572 Class B136,467,702,472 $144,299,267,044\n\nTable of ContentsUNITED STATESSECURITIES AND EXCHANGE COMMISSIONWashington, D.C. 20549FORM 10-K(Mark One)☑ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(D) OF THE SECURITIES EXCHANGE ACT OF 1934FOR THE FISCAL YEAR ENDED MAY 31, 2023OR☐ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(D) OF THE SECURITIES EXCHANGE ACT OF 1934FOR THE TRANSITION PERIOD FROM TO .Commission File No. 1-10635\n\nNIKE, Inc.(Exact name of Registrant as specified in its charter)Oregon93-0584541(State or other jurisdiction of incorporation)(IRS Employer Identification No.)One Bowerman Drive, Beaverton, Oregon 97005-6453(Address of principal executive offices and zip code)(503) 671-6453(Registrant's telephone number, including area code)SECURITIES REGISTERED PURSUANT TO SECTION 12(B) OF THE ACT:Class B Common StockNKENew

### Create embeddings from propositions data

In [27]:
from redisvl.utils.vectorize import HFTextVectorizer

hf = HFTextVectorizer("sentence-transformers/all-MiniLM-L6-v2")
os.environ["TOKENIZERS_PARALLELISM"] = "false"

prop_embeddings = hf.embed_many([
    proposition for proposition in propositions
])

# Check to make sure we've created enough embeddings, 1 per document chunk
len(prop_embeddings) == len(propositions) == len(chunks)

True

### Define a schema and create an index

Below we connect to Redis and create an index that contains a text field, tag field, and vector field.

In [28]:
from redis import Redis
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex


index_name = "redisvl"


schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
    "prefix": "chunk"
  },
  "fields": [
    {
        "name": "chunk_id",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "proposition",
        "type": "text"
    },
    {
        "name": "text_embedding",
        "type": "vector",
        "attrs": {
            "dims": hf.dims,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
  ]
})

ImportError: cannot import name 'Redis' from 'redis' (unknown location)

In [None]:
# connect to redis
client = Redis.from_url(REDIS_URL)

# create an index from schema and the client
index = SearchIndex(schema, client)
index.create(overwrite=True, drop=True)

In [None]:
# get info about the index
# NBVAL_SKIP
!rvl index info -i redisvl



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ redisvl      │ HASH           │ ['chunk']  │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮
│ Name           │ Attribute      │ Type   │ Field Option   │ Option Value   │ Field Option   │ Option Value   │ Field Option   │   Option Value │ Field Option    │ Option Value   │ Field Option   │   Option Value │ Field Option    │   Option Value │
├────────────────┼────────────────┼────────┼────────────────┼────────────

### Process and load dataset
Below we use the RedisVL index to simply load the list of document chunks to Redis db.

In [None]:
# load expects an iterable of dictionaries
from redisvl.redis.utils import array_to_buffer

data = [
    {
        'chunk_id': f'{i}',
        'proposition': proposition,
        # For HASH -- must convert embeddings to bytes
        'text_embedding': array_to_buffer(prop_embeddings[i])
    } for i, proposition in enumerate(propositions)
]

# RedisVL handles batching automatically
keys = index.load(data, id_field="chunk_id")

### Setup RedisVL AsyncSearchIndex

In [None]:
from redis.asyncio import Redis
from redisvl.index import AsyncSearchIndex

client = Redis.from_url(REDIS_URL)
index = AsyncSearchIndex(index.schema, client)

#### Test the updated RAG workflow

In [None]:
from redisvl.query import VectorQuery
from redisvl.index import AsyncSearchIndex


def promptify(query: str, context: str) -> str:
    return f'''Use the provided context below derived from public financial
    documents to answer the user's question. If you can't answer the user's
    question, based on the context; do not guess. If there is no context at all,
    respond with "I don't know".

    User question:

    {query}

    Helpful context:

    {context}

    Answer:
    '''

# Update the retrieval helper to use propositions
async def retrieve_context(index: AsyncSearchIndex, query_vector) -> str:
    """Fetch the relevant context from Redis using vector search"""
    print("Using dense content representation", flush=True)
    results = await index.query(
        VectorQuery(
            vector=query_vector,
            vector_field_name="text_embedding",
            return_fields=["proposition"],
            num_results=3
        )
    )
    content = "\n".join([result["proposition"] for result in results])
    return content

# Update the answer_question method
async def answer_question(index: AsyncSearchIndex, query: str):
    """Answer the user's question"""

    SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
    to public financial 10k documents in order to answer users questions about company
    performance, ethics, characteristics, and core information.
    """

    query_vector = hf.embed(query)
    # Fetch context from Redis using vector search
    context = await retrieve_context(index, query_vector)
    # Generate contextualized prompt and feed to OpenAI
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by LLM
    return response.choices[0].message.content

In [None]:
# Generate a list of questions
questions = [
    "What is the trend in the company's revenue and profit over the past few years?",
    "What are the company's primary revenue sources?",
    "How much debt does the company have, and what are its capital expenditure plans?",
    "What does the company say about its environmental, social, and governance (ESG) practices?",
    "What is the company's strategy for growth?"
]

In [None]:
# NBVAL_SKIP
import asyncio
import pandas as pd

results = await asyncio.gather(*[
    answer_question(index, question) for question in questions
])

pd.DataFrame(columns=["question", "answer"], data=list(zip(questions, results)))

Using dense content representation
Using dense content representation
Using dense content representation
Using dense content representation
Using dense content representation


Unnamed: 0,question,answer
0,What is the trend in the company's revenue and...,"Based on the provided context, the trend in th..."
1,What are the company's primary revenue sources?,The company's primary revenue sources are as f...
2,"How much debt does the company have, and what ...","The company has a total long-term debt of $8,9..."
3,What does the company say about its environmen...,"Based on the provided context, the company ack..."
4,What is the company's strategy for growth?,The company's strategy for growth includes emp...


### Improve accuracy with query rewriting / expansion

We can also use the power on an LLM to rewrite or expand an input question.

Example: https://github.com/langchain-ai/langchain/blob/master/templates/rewrite-retrieve-read/rewrite_retrieve_read/chain.py

In [None]:
# NBVAL_SKIP
# An example question that is a bit simplistic...
await answer_question(index, "How big is the company?")

Using dense content representation


"Based on the information provided, we can see that NIKE, Inc. is a large company with multiple subsidiaries operating in various jurisdictions such as the United States, Netherlands, China, Mexico, Japan, and Korea. The company's revenues are significant, with detailed breakdowns of revenue streams from footwear, apparel, equipment, and other sources. Additionally, the company has a diverse customer base, including wholesale customers and direct-to-consumer sales channels. NIKE, Inc. also has a strong presence in international markets, with operations in North America, Western Europe, and Asia.\n\nOverall, based on the extensive information provided about the company's operations, revenue streams, and global presence, we can conclude that NIKE, Inc. is a large and well-established company in the sports apparel and footwear industry."

In [None]:
# NBVAL_SKIP
async def rewrite_query(query: str, prompt: str = None):
    """Rewrite the user's original query"""

    SYSTEM_PROMPT = prompt if prompt else """Given the user's input question below, find a better or
    more complete way to phrase this question in order to improve semantic search
    engine retrieval quality over a set of SEC 10K PDF docs. Return the rephrased
    question as a string in a JSON response under the key "query"."""

    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        response_format={ "type": "json_object" },
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Original input question from user: {query}"}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by LLM
    rewritten_query = json.loads(response.choices[0].message.content)["query"]
    return rewritten_query

In [None]:
# NBVAL_SKIP
# Example Sinple Query Rewritten
await rewrite_query("How big is the company?")

'What is the size of the company in terms of revenue, number of employees, and market share?'

In [None]:
async def answer_question(index: AsyncSearchIndex, query: str, **kwargs):
    """Answer the user's question"""

    SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
    to public financial 10k documents in order to answer users questions about company
    performance, ethics, characteristics, and core information.
    """

    # Rewrite the query using an LLM
    rewritten_query = await rewrite_query(query, **kwargs)
    print("User query updated to:\n", rewritten_query, flush=True)

    query_vector = hf.embed(rewritten_query)
    # Fetch context from Redis using vector search
    context = await retrieve_context(index, query_vector)
    print("Context retrieved", flush=True)

    # Generate contextualized prompt and feed to OpenAI
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(rewritten_query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by LLM
    return response.choices[0].message.content

In [None]:
# NBVAL_SKIP
# Now try again with query re-writing enabled
await answer_question(index, "How big is the company?")

User query updated to:
 What is the size of the company in terms of revenue, number of employees, and market share?
Using dense content representation
Context retrieved


"Based on the provided context, the company in question is Nike, Inc. Here is the breakdown of the company's size based on the information provided:\n\n1. Revenue:\n   - Total revenues for Nike, Inc. amount to $51.2 billion.\n   - Revenues in North America total $21.6 billion.\n   - Revenues in Europe, Middle East & Africa total $13.4 billion.\n   - Revenues in Greater China total $7.2 billion.\n   - Revenues in Asia Pacific & Latin America total $6.4 billion.\n   - Revenues from footwear total $35.3 billion.\n   - Revenues from apparel total $13.8 billion.\n   - Revenues from equipment total $2.4 billion.\n   - Revenues from other sources total $27 million.\n\n2. Number of Employees:\n   The number of employees is not explicitly provided in the context. This information may be available in the company's annual report or other filings.\n\n3. Market Share:\n   Market share information is not directly provided in the context. Market share data is typically not disclosed in financial docu

### Improve performance and cut costs with LLM caching

In [None]:
from redisvl.extensions.llmcache import SemanticCache

llmcache = SemanticCache(
    name="llmcache",
    vectorizer=hf,
    redis_url=REDIS_URL,
    ttl=120,
    distance_threshold=0.2
)

In [None]:
from functools import wraps

# Create an LLM caching decorator
def cache(func):
    @wraps(func)
    async def wrapper(index, query_text, *args, **kwargs):
        query_vector = llmcache._vectorizer.embed(query_text)

        # Check the cache with the vector
        if result := llmcache.check(vector=query_vector):
            return result[0]['response']

        response = await func(index, query_text, query_vector=query_vector)
        llmcache.store(query_text, response, query_vector)
        return response
    return wrapper


@cache
async def answer_question(index: AsyncSearchIndex, query: str, **kwargs):
    """Answer the user's question"""

    SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
    to public financial 10k documents in order to answer users questions about company
    performance, ethics, characteristics, and core information.
    """

    context = await retrieve_context(index, kwargs["query_vector"])
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by GPT-3.5
    return response.choices[0].message.content

In [None]:
# NBVAL_SKIP
query = "What was Nike's revenue last year compared to this year??"

await answer_question(index, query)

Using dense content representation


"Nike's total revenue for the fiscal year 2023 was $51.2 billion, which represents a 10% increase compared to the previous fiscal year. The revenue growth was mainly driven by higher revenues in North America, Europe, Middle East & Africa, Asia Pacific & Latin America, and Greater China."

In [None]:
# NBVAL_SKIP
query = "What was Nike's total revenue in the last year compared to now??"

await answer_question(index, query)

# notice no HTTP request to OpenAI since this question is "close enough" to the last one

"Nike's total revenue for the fiscal year 2023 was $51.2 billion, which represents a 10% increase compared to the previous fiscal year. The revenue growth was mainly driven by higher revenues in North America, Europe, Middle East & Africa, Asia Pacific & Latin America, and Greater China."

### Improve personalization with including chat session history

In order to preserve state in the conversation, it's imperitive to offload conversation history to a database that can handle high transaction throughput for writes/reads to limit system latency.

We can store message history for a particular user session in a Redis List data type.


In [None]:
import json


class ChatBot:
    def __init__(self, index: AsyncSearchIndex, user: str):
        self.index = index
        self.user = user

    async def get_messages(self) -> list:
        """Get all messages associated with a session"""
        return [
            json.loads(msg) for msg in await self.index.client.lrange(f"messages:{self.user}", 0, -1)
        ]

    async def add_messages(self, messages: list):
        """Add chat messages to a Redis List"""
        return await self.index.client.rpush(
            f"messages:{self.user}", *[json.dumps(msg) for msg in messages]
        )

    async def clear_history(self):
        """Clear session chat"""
        await index.client.delete(f"messages:{self.user}")

    @staticmethod
    def promptify(query: str, context: str) -> str:
        return f'''Use the provided context below derived from public financial
        documents to answer the user's question. If you can't answer the user's
        question, based on the context; do not guess. If there is no context at all,
        respond with "I don't know".

        User question:

        {query}

        Helpful context:

        {context}

        Answer:
        '''

    async def retrieve_context(self, query_vector) -> str:
        """Fetch the relevant context from Redis using vector search"""
        results = await self.index.query(
            VectorQuery(
                vector=query_vector,
                vector_field_name="text_embedding",
                return_fields=["propositions"],
                num_results=3
            )
        )
        content = "\n".join([result["propositions"] for result in results])
        return content

    async def answer_question(self, query: str):
        """Answer the user's question with historical context and caching baked-in"""

        SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
        to public financial 10k documents in order to answer users questions about company
        performance, ethics, characteristics, and core information.
        """

        # Create query vector
        query_vector = llmcache._vectorizer.embed(query)

        # TODO - implement semantic gaurdrails?

        # Check the cache with the vector
        if result := llmcache.check(vector=query_vector):
            answer = result[0]['response']
        else:
            # TODO - implement query rewriting?
            context = await self.retrieve_context(query_vector)
            session = await self.get_messages()
            # TODO - implement session summarization?
            messages = (
                [{"role": "system", "content": SYSTEM_PROMPT}] +
                session +
                [{"role": "user", "content": self.promptify(query, context)}]
            )
            # Response provided by GPT-3.5
            response = await openai.AsyncClient().chat.completions.create(
                model=CHAT_MODEL,
                messages=messages,
                temperature=0.1,
                seed=42
            )
            answer = response.choices[0].message.content
            llmcache.store(query, answer, query_vector)

        # Add message history
        await self.add_messages([
            {"role": "user", "content": query},
            {"role": "assistant", "content": answer}
        ])

        return answer

## Test the entire RAG workflow

In [None]:
# Setup Session
chat = ChatBot(index, "tyler")
await chat.clear_history()

In [None]:
# Run a simple chat
stopterms = ["exit", "quit", "end", "cancel"]

# Simple Chat
# NBVAL_SKIP
while True:
    user_query = input()
    if user_query.lower() in stopterms:
        break
    answer = await chat.answer_question(user_query)
    print(answer, flush=True)

KeyError: 'propositions'

In [None]:
# NBVAL_SKIP
await chat.get_messages()

[{'role': 'user', 'content': 'Hello. How many employees does Nike have?'},
 {'role': 'assistant',
  'content': 'NIKE had about 83,700 employees globally as of May 31, 2023, including retail and part-time workers, as well as independent contractors and temporary personnel.'},
 {'role': 'user',
  'content': 'Ok. What about the different products Nike sells. What kinds of products does it sell?'},
 {'role': 'assistant',
  'content': "NIKE sells a wide range of products including athletic footwear, apparel, accessories, and equipment under the NIKE Brand, Jordan Brand, and Converse. The NIKE Brand offers performance athletic products for Men's, Women's, and Kids' categories, including sport-inspired lifestyle items. The Jordan Brand focuses on athletic and casual products with a basketball focus, while Converse sells casual products under various trademarks. Additionally, NIKE sells licensed apparel with team logos, performance equipment, and accessories such as bags, socks, and eyewear."}

## Your Next Steps

While a good start, there is still more to do. **For example**:
- we could utilize message history to generate an updated and contextualized query to use for retrieval and answer generation (with an LLM). Otherwise, there can be a disconnect between what a user is asking (in context) and what they are asking in isolation.
- we could utilize an LLM to summarize conversation history to use as context instead of passing the whole slew of messages to the Chat endpoint.
- we could utilize semantic properties of the message history (or summaries) in order to fetch only relevant conversation bits (vector search).
- we could utilize a technique like HyDE ( a form of query rewriting ) to improve the retrieval quality from raw user input to source documents OR try to break down user questions into sub questions and fetch / join context based on the different searces.
- we could incorporate semantic routing to take a broken down question and route to different data sources, indices, or query types (etc).
- we could add semantic guardrails on the front end or back end of the conversation I/O to ensure we are within bounds of approved topics.

## Cleanup

Clean up the database.

In [None]:
# NBVAL_SKIP
await index.client.flushall()

True