![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

# RAG from scratch with the Redis Vector Library


Now that you have a good foundation in Redis data structures and basic search capabilities, this notebook builds on that path. It introduces [redisvl](https://redisvl.com), a dedicated Python client library for streamline GenAI application development.

We will go through the same initial setup and data prep stage, then dive into building an **end-to-end RAG system from scratch**, including a few special topics/techniques:
- Dense content representation
- Query rewriting / expansion
- Semantic caching
- Conversational memory persistence


## Let's Begin!
<a href="https://colab.research.google.com/github/Redislabs-Solution-Architects/financial-vss/blob/main/redisvl-02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Environment Setup

### Pull Github Materials
Because you are likely running this notebook in **Google Colab**, we need to first
pull the necessary dataset and materials directly from GitHub.

**If you are running this notebook locally**, FYI you may not need to perform this
step at all.

In [None]:
# This clones the supporting git repository into a directory named 'temp_repo'.
!git clone https://github.com/redis-developer/financial-vss.git temp_repo

# This command moves the 'resources' directory from 'temp_repo' to your current directory.
!mv temp_repo/resources .
!mv temp_repo/requirements.txt .

# This deletes the 'temp_repo' directory, cleaning up the unwanted files.
!rm -rf temp_repo

### Install Python Dependencies

In [1]:
!pip install -r requirements.txt



In [2]:
import warnings

warnings.filterwarnings("ignore")

### Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings created from PDF document chunks. **We need to make sure we have a Redis
instance available.**

#### Method 1: localized Redis Stack
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly
from the Redis package archive.

In [None]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


#### Method 2: Redis Cloud
Instead of using the in-notebook, localized Redis Stack, you can quickly deploy a
[FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [55]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#REDIS_HOST="redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
#REDIS_PORT=18374
#REDIS_PASSWORD="1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

## Simplified Vector Search with RedisVL

### Dataset Preparation (PDF Documents)

To best demonstrate Redis as a vector database layer, we will load a single
financial (10k filings) doc and preprocess it using some helpers from LangChain:

- `UnstructuredFileLoader` is not the only document loader type that LangChain provides. Docs: https://python.langchain.com/docs/integrations/document_loaders/unstructured_file
- `RecursiveCharacterTextSplitter` is what we use to create smaller chunks of text from the doc. Docs: https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredFileLoader

# Load list of pdfs from a folder
data_path = "resources/"
docs = [os.path.join(data_path, file) for file in os.listdir(data_path)]

print("Listing available documents ...", docs)

Listing available documents ... ['resources/nke-10k-2023.pdf', 'resources/amzn-10k-2023.pdf', 'resources/jnj-10k-2023.pdf', 'resources/aapl-10k-2023.pdf', 'resources/nvd-10k-2023.pdf', 'resources/msft-10k-2023.pdf']


In [5]:
# pick out the Nike doc for this exercise
doc = [doc for doc in docs if "nke" in doc][0]

# set up the file loader/extractor and text splitter to create chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2500, chunk_overlap=0
)
loader = UnstructuredFileLoader(
    doc, mode="single", strategy="fast"
)

# extract, load, and make chunks
chunks = loader.load_and_split(text_splitter)

print("Done preprocessing. Created", len(chunks), "chunks of the original pdf", doc)

Done preprocessing. Created 179 chunks of the original pdf resources/nke-10k-2023.pdf


### Text embedding generation with RedisVL
RedisVL has built-in extensions and utilities to aid the GenAI development process.

In [6]:
from redisvl.utils.vectorize import HFTextVectorizer

hf = HFTextVectorizer("sentence-transformers/all-MiniLM-L6-v2")
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# Embed each chunk content
embeddings = hf.embed_many([chunk.page_content for chunk in chunks])

# Check to make sure we've created enough embeddings, 1 per document chunk
len(embeddings) == len(chunks)

True

### Define a schema and create an index

Below we connect to Redis and create an index that contains a text field, tag field, and vector field.

In [56]:
from redis import Redis
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex


index_name = "redisvl"


schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
    "prefix": "chunk"
  },
  "fields": [
    {
        "name": "doc_id",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "content",
        "type": "text"
    },
    {
        "name": "text_embedding",
        "type": "vector",
        "attrs": {
            "dims": hf.dims,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
  ]
})

In [57]:
# connect to redis
client = Redis.from_url(REDIS_URL)

# create an index from schema and the client
index = SearchIndex(schema, client)
index.create(overwrite=True, drop=True)

In [58]:
# use the RedisVL CLI tool to list all indices
!rvl index listall

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[32m10:23:36[0m [34m[RedisVL][0m [1;30mINFO[0m   Indices:
[32m10:23:36[0m [34m[RedisVL][0m [1;30mINFO[0m   1. redisvl


In [10]:
# get info about the index
!rvl index info -i redisvl

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ redisvl      │ HASH           │ ['chunk']  │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬────────┬────────────────┬────────────────╮
│ Name           │ Attribute      │ Type   │ Field Option   │ Option Value   │
├────────────────┼────────────────┼────────┼────────────────┼────────────────┤
│ doc_id         │ doc_id         │ TAG    │ SEPARATOR      │ ,              │
│ content        │ content        │ TEXT   │ WEIGHT         │ 1              │
│ text_embedding │ text_embedding │ VECTOR │                │                │
╰────────────────┴────────────────┴────────┴────────────────┴────────────────╯


### Process and load dataset
Below we use the RedisVL index to simply load the list of document chunks to Redis db.

In [59]:
# load expects an iterable of dictionaries
from redisvl.redis.utils import array_to_buffer

data = [
    {
        'doc_id': f'{i}',
        'content': chunk.page_content,
        # For HASH -- must convert embeddings to bytes
        'text_embedding': array_to_buffer(embeddings[i])
    } for i, chunk in enumerate(chunks)
]

# RedisVL handles batching automatically
keys = index.load(data, id_field="doc_id")

### Query the database
Now we can use the RedisVL index to perform similarity search operations with Redis

In [12]:
from redisvl.query import VectorQuery

query = "Nike profit margins and company performance"

query_embedding = hf.embed(query)

vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=3,
    return_fields=["doc_id", "content"],
    return_score=True
)

# show the raw redis query
str(vector_query)

'*=>[KNN 3 @text_embedding $vector AS vector_distance] RETURN 3 doc_id content vector_distance SORTBY vector_distance ASC DIALECT 2 LIMIT 0 3'

In [13]:
# execute the query with RedisVL
index.query(vector_query)

[{'id': 'chunk:84',
  'vector_distance': '0.321347296238',
  'doc_id': '84',
  'content': 'TOTAL NIKE BRAND Converse\n\n$\n\n1,932 (4,841)\n\n8,359 676\n\n$\n\n1,896 (4,262)\n\n8,406 669\n\n2 % -14 %\n\n1 % $ 1 %\n\n1,530 (3,656)\n\n8,641 543\n\nCorporate TOTAL NIKE, INC. EARNINGS BEFORE INTEREST AND TAXES\n\n(1)\n\n$\n\n(2,840)\n\n6,195\n\n$\n\n(2,219)\n\n6,856\n\n28 %\n\n10 % $\n\n(2,261)\n\n6,923\n\nEBIT margin\n\n(1)\n\n12.1 %\n\n14.7 %\n\n15.5 %\n\nInterest expense (income), net\n\n(6)\n\n205\n\n—\n\n262\n\nTOTAL NIKE, INC. INCOME BEFORE INCOME TAXES\n\n$\n\n6,201\n\n$\n\n6,651\n\n7 % $\n\n6,661\n\n(1) Total NIKE Brand EBIT, Total NIKE, Inc. EBIT and EBIT Margin represent non-GAAP financial measures. See "Use of Non-GAAP Financial Measures" for further information.\n\n2023 FORM 10-K 36\n\n% CHANGE EXCLUDING CURRENCY (1) CHANGES\n\n7 % 12 % -13 %\n\n16 % 302 %\n\n6 % 7 %\n\n— 6 %\n\n% CHANGE\n\n0 % 35 % -27 %\n\n24 % -17 %\n\n3 % 23 % 2 %\n\n1 %\n\n—\n\n0 %\n\nTable of Contents\n\n

In [14]:
# paginate through results
for result in index.paginate(vector_query, page_size=1):
    print(result[0]["doc_id"], result[0]["vector_distance"], flush=True)

84 0.321347296238
83 0.328034043312
117 0.358749747276


### Sort by alternative fields

In [15]:
# Sort by doc_id field after vector search limits to topK
vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=4,
    return_fields=["doc_id"],
    return_score=True
)

# Decompose vector_query into the core query and the params
query = vector_query.query
params = vector_query.params

# Pass query and params direct to index.search()
result = index.search(
    query.sort_by("doc_id", asc=True),
    params
)

[doc.__dict__ for doc in result.docs]

[{'id': 'chunk:117',
  'payload': None,
  'vector_distance': '0.358749747276',
  'doc_id': '117'},
 {'id': 'chunk:157',
  'payload': None,
  'vector_distance': '0.360825419426',
  'doc_id': '157'},
 {'id': 'chunk:83',
  'payload': None,
  'vector_distance': '0.328034043312',
  'doc_id': '83'},
 {'id': 'chunk:84',
  'payload': None,
  'vector_distance': '0.321347296238',
  'doc_id': '84'}]

### Add filters to vector queries

In [16]:
from redisvl.query.filter import Text

vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=4,
    return_fields=["content"],
    return_score=True
)

# Set a text filter
text_filter = Text("content") % "profit"

vector_query.set_filter(text_filter)

index.query(vector_query)

[{'id': 'chunk:80',
  'vector_distance': '0.36378967762',
  'content': 'NIKE Brand revenues, which represented over 90% of NIKE, Inc. Revenues, increased 10% and 16% on a reported and currency-neutral basis, respectively. This increase was primarily due to higher revenues in Men\'s, the Jordan Brand, Women\'s and Kids\' which grew 17%, 35%,11% and 10%, respectively, on a wholesale equivalent basis.\n\nNIKE Brand footwear revenues increased 20% on a currency-neutral basis, due to higher revenues in Men\'s, the Jordan Brand, Women\'s and Kids\'. Unit sales of footwear increased 13%, while higher average selling price ("ASP") per pair contributed approximately 7 percentage points of footwear revenue growth. Higher ASP was primarily due to higher full-price ASP, net of discounts, on a wholesale equivalent basis, and growth in the size of our NIKE Direct business, partially offset by lower NIKE Direct ASP.\n\nNIKE Brand apparel revenues increased 8% on a currency-neutral basis, primarily du

### Range queries in RedisVL

In [17]:
from redisvl.query import RangeQuery

range_query = RangeQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=4,
    return_fields=["content"],
    return_score=True,
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.8
)

In [18]:
index.query(range_query)

[{'id': 'chunk:84',
  'vector_distance': '0.321347296238',
  'content': 'TOTAL NIKE BRAND Converse\n\n$\n\n1,932 (4,841)\n\n8,359 676\n\n$\n\n1,896 (4,262)\n\n8,406 669\n\n2 % -14 %\n\n1 % $ 1 %\n\n1,530 (3,656)\n\n8,641 543\n\nCorporate TOTAL NIKE, INC. EARNINGS BEFORE INTEREST AND TAXES\n\n(1)\n\n$\n\n(2,840)\n\n6,195\n\n$\n\n(2,219)\n\n6,856\n\n28 %\n\n10 % $\n\n(2,261)\n\n6,923\n\nEBIT margin\n\n(1)\n\n12.1 %\n\n14.7 %\n\n15.5 %\n\nInterest expense (income), net\n\n(6)\n\n205\n\n—\n\n262\n\nTOTAL NIKE, INC. INCOME BEFORE INCOME TAXES\n\n$\n\n6,201\n\n$\n\n6,651\n\n7 % $\n\n6,661\n\n(1) Total NIKE Brand EBIT, Total NIKE, Inc. EBIT and EBIT Margin represent non-GAAP financial measures. See "Use of Non-GAAP Financial Measures" for further information.\n\n2023 FORM 10-K 36\n\n% CHANGE EXCLUDING CURRENCY (1) CHANGES\n\n7 % 12 % -13 %\n\n16 % 302 %\n\n6 % 7 %\n\n— 6 %\n\n% CHANGE\n\n0 % 35 % -27 %\n\n24 % -17 %\n\n3 % 23 % 2 %\n\n1 %\n\n—\n\n0 %\n\nTable of Contents\n\nNORTH AMERICA\n\n(

In [19]:
# Add filter to range query
range_query.set_filter(text_filter)

index.query(range_query)

[{'id': 'chunk:80',
  'vector_distance': '0.36378967762',
  'content': 'NIKE Brand revenues, which represented over 90% of NIKE, Inc. Revenues, increased 10% and 16% on a reported and currency-neutral basis, respectively. This increase was primarily due to higher revenues in Men\'s, the Jordan Brand, Women\'s and Kids\' which grew 17%, 35%,11% and 10%, respectively, on a wholesale equivalent basis.\n\nNIKE Brand footwear revenues increased 20% on a currency-neutral basis, due to higher revenues in Men\'s, the Jordan Brand, Women\'s and Kids\'. Unit sales of footwear increased 13%, while higher average selling price ("ASP") per pair contributed approximately 7 percentage points of footwear revenue growth. Higher ASP was primarily due to higher full-price ASP, net of discounts, on a wholesale equivalent basis, and growth in the size of our NIKE Direct business, partially offset by lower NIKE Direct ASP.\n\nNIKE Brand apparel revenues increased 8% on a currency-neutral basis, primarily du

## Building a RAG Pipeline from Scratch
We're going to build a complete RAG pipeline from scratch incorporating the following components:

- Standard retrieval and chat completion
- Dense content representation to improve accuracy
- Query re-writing to improve accuracy
- Semantic caching to improve performance
- Conversational session history to improve personalization

### Setup RedisVL AsyncSearchIndex

In [60]:
from redis.asyncio import Redis
from redisvl.index import AsyncSearchIndex

client = Redis.from_url(REDIS_URL)
index = AsyncSearchIndex(index.schema, client)

### Setup OpenAI API

In [61]:
import openai
import os
import getpass


CHAT_MODEL = "gpt-3.5-turbo-0125"


if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY")


### Baseline Retrieval Augmented Generation

In [62]:

async def answer_question(index: AsyncSearchIndex, query: str):
    """Answer the user's question"""

    SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
    to public financial 10k documents in order to answer users questions about company
    performance, ethics, characteristics, and core information.
    """

    query_vector = hf.embed(query)
    # Fetch context from Redis using vector search
    context = await retrieve_context(index, query_vector)
    # Generate contextualized prompt and feed to OpenAI
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by LLM
    return response.choices[0].message.content


async def retrieve_context(index: AsyncSearchIndex, query_vector) -> str:
    """Fetch the relevant context from Redis using vector search"""
    results = await index.query(
        VectorQuery(
            vector=query_vector,
            vector_field_name="text_embedding",
            return_fields=["content"],
            num_results=3
        )
    )
    content = "\n".join([result["content"] for result in results])
    return content


def promptify(query: str, context: str) -> str:
    return f'''Use the provided context below derived from public financial
    documents to answer the user's question. If you can't answer the user's
    question, based on the context; do not guess. If there is no context at all,
    respond with "I don't know".

    User question:

    {query}

    Helpful context:

    {context}

    Answer:
    '''

### Let's test it out...

In [23]:
# Generate a list of questions
questions = [
    "What is the trend in the company's revenue and profit over the past few years?",
    "What are the company's primary revenue sources?",
    "How much debt does the company have, and what are its capital expenditure plans?",
    "What does the company say about its environmental, social, and governance (ESG) practices?",
    "What is the company's strategy for growth?"
]

In [24]:
import asyncio

results = await asyncio.gather(*[
    answer_question(index, question) for question in questions
])

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [25]:
import pandas as pd

pd.DataFrame(columns=["question", "answer"], data=list(zip(questions, results)))

Unnamed: 0,question,answer
0,What is the trend in the company's revenue and...,The trend in the company's revenue and profit ...
1,What are the company's primary revenue sources?,The company's primary revenue sources are as f...
2,"How much debt does the company have, and what ...","The company has a total long-term debt of $8,9..."
3,What does the company say about its environmen...,The company acknowledges the importance of env...
4,What is the company's strategy for growth?,"Based on the provided context, the company's s..."


### Improve accuracy with dense content representations

One technique we can use to improve the quality of retrieval is to leverage an LLM from OpenAI during ETL. We will prompt the LLM to summarize and decompose the raw pdf text into more discrete propositional phrases. This will enhance the clarity of the text and improve semantic retrieval for RAG.

The goal is to utilize a preprocessing technique similar to what's outlined here:
https://github.com/langchain-ai/langchain/blob/master/templates/propositional-retrieval/propositional_retrieval/proposal_chain.py

In [26]:
import tqdm
import json


def create_dense_props(chunk):
    """Create dense representation of raw text content."""

    # The system message here should be HEAVILY customized for your specific use case
    SYSTEM_PROMPT = """
    You are a helpful PDF extractor tool. You will be presented with segments from
    raw PDF documents composed of 10k SEC filings information about public companies.

    Decompose and summarize the raw content into clear and simple propositions,
    ensuring they are interpretable out of context. Consider the following rules:
    1. Split compound sentences into simpler dense phrases that retain existing
    meaning.
    2. Simplify technical jargon or wording if possible while retaining existing
    meaning.
    2. For any named entity that is accompanied by additional descriptive information,
    separate this information into its own distinct proposition.
    3. Decontextualize the proposition by adding necessary modifier to nouns or
    entire sentences and replacing pronouns (e.g., "it", "he", "she", "they", "this", "that")
    with the full name of the entities they refer to.
    4. Present the results as a list of strings, formatted in JSON, under the key "propositions".
    """

    response = openai.OpenAI().chat.completions.create(
        model=CHAT_MODEL,
        response_format={ "type": "json_object" },
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Decompose this raw content using the rules above:\n{chunk.page_content} "}
        ]
    )
    res = response.choices[0].message.content

    try:
        return json.loads(res)["propositions"]
    except Exception as e:
        print(f"Failed to parse propositions", str(e), flush=True)
        # Retry
        return create_dense_props(chunk)

#### Create text propositions using OpenAI

In [27]:
propositions = [create_dense_props(chunk) for chunk in tqdm.tqdm(chunks)]

100%|██████████| 179/179 [16:55<00:00,  5.67s/it]


In [28]:
propositions = [" ".join(prop) for prop in propositions]

In [29]:
propositions[5:10]

["NIKE Brand operations are reported based on internal geographic organization. Each NIKE Brand geographic segment focuses on designing, developing, marketing, and selling athletic footwear, apparel, and equipment. The reportable operating segments for the NIKE Brand include North America, Europe, Middle East & Africa (EMEA), Greater China, and Asia Pacific & Latin America (APLA), and encompass the NIKE and Jordan brands. Sales through NIKE Direct operations are managed within each geographic operating segment. Converse is a separate reportable operating segment specializing in designing, marketing, licensing, and selling casual sneakers, apparel, and accessories. Converse's direct to consumer operations, including digital commerce, are included in the Converse operating segment results. In the United States, NIKE Brand and Converse sales contributed about 43% of total revenues for fiscal 2023, 40% for fiscal 2022, and 39% for fiscal 2021. Products are sold to various retail accounts i

In [30]:
# Save to disk for faster reload..
with open("propositions.json", "w") as f:
    json.dump(propositions, f)

In [63]:
# Load from disk... if possible.
with open("propositions.json", "r") as f:
    propositions = json.load(f)

#### Create embeddings from propositions data

In [64]:
prop_embeddings = hf.embed_many([
    proposition for proposition in propositions
])

# Check to make sure we've created enough embeddings, 1 per document chunk
len(prop_embeddings) == len(propositions) == len(chunks)

True

#### Upsert Redis

In [65]:
keys = await index.load([
    {
        "doc_id": f"{i}",
        "propositions": proposition,
        # For HASH -- must convert embeddings to bytes
        "text_embedding": array_to_buffer(prop_embeddings[i])
    } for i, proposition in enumerate(propositions)],
    id_field="doc_id"
)

#### Adjust the schema and recreate

In [66]:
index.schema.add_field({"name": "propositions", "type": "text"})

# create new index on same underlying data
await index.create(overwrite=True, drop=False)

10:24:54 redisvl.index.index INFO   Index already exists, overwriting.


#### Test the updated RAG workflow

In [39]:
# Update the retrieval helper to use propositions
async def retrieve_context(index: AsyncSearchIndex, query_vector) -> str:
    """Fetch the relevant context from Redis using vector search"""
    print("Using dense content representation", flush=True)
    results = await index.query(
        VectorQuery(
            vector=query_vector,
            vector_field_name="text_embedding",
            return_fields=["propositions"],
            num_results=3
        )
    )
    content = "\n".join([result["propositions"] for result in results])
    return content

# Update the answer_question method
async def answer_question(index: AsyncSearchIndex, query: str):
    """Answer the user's question"""

    SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
    to public financial 10k documents in order to answer users questions about company
    performance, ethics, characteristics, and core information.
    """

    query_vector = hf.embed(query)
    # Fetch context from Redis using vector search
    context = await retrieve_context(index, query_vector)
    # Generate contextualized prompt and feed to OpenAI
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by LLM
    return response.choices[0].message.content

In [40]:
results = await asyncio.gather(*[
    answer_question(index, question) for question in questions
])

pd.DataFrame(columns=["question", "answer"], data=list(zip(questions, results)))

Using dense content representation
Using dense content representation
Using dense content representation
Using dense content representation
Using dense content representation


Unnamed: 0,question,answer
0,What is the trend in the company's revenue and...,The trend in the company's revenue and profit ...
1,What are the company's primary revenue sources?,The company's primary revenue sources are as f...
2,"How much debt does the company have, and what ...","As of May 31, 2023, the company had approximat..."
3,What does the company say about its environmen...,"Based on the provided context, the company ack..."
4,What is the company's strategy for growth?,"Based on the provided context, the company's s..."


### Improve accuracy with query rewriting / expansion

We can also use the power on an LLM to rewrite or expand an input question.

Example: https://github.com/langchain-ai/langchain/blob/master/templates/rewrite-retrieve-read/rewrite_retrieve_read/chain.py

In [41]:
# An example question that is a bit simplistic...
await answer_question(index, "How big is the company?")

Using dense content representation


'Based on the information provided, we can see that NIKE, Inc. has multiple subsidiaries operating in various jurisdictions such as the United States, Netherlands, China, Mexico, Japan, and Korea. The company also has significant investments in technology, business infrastructure, new businesses, product offerings, and manufacturing. Additionally, the company had bank guarantees and letters of credit totaling $588 million in 2023, primarily for real estate agreements, self-insurance programs, and legal matters.\n\nWhile the specific revenue or market capitalization figures are not provided in the context, the presence of multiple subsidiaries, significant investments, and financial commitments indicate that NIKE, Inc. is a large and globally diversified company.'

In [42]:
async def rewrite_query(query: str, prompt: str = None):
    """Rewrite the user's original query"""

    SYSTEM_PROMPT = prompt if prompt else """Given the user's input question below, find a better or
    more complete way to phrase this question in order to improve semantic search
    engine retrieval quality over a set of SEC 10K PDF docs. Return the rephrased
    question as a string in a JSON response under the key "query"."""

    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        response_format={ "type": "json_object" },
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Original input question from user: {query}"}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by LLM
    rewritten_query = json.loads(response.choices[0].message.content)["query"]
    return rewritten_query

In [43]:
# Example Sinple Query Rewritten
await rewrite_query("How big is the company?")

'What is the size of the company in terms of revenue, assets, and market capitalization?'

In [44]:
async def answer_question(index: AsyncSearchIndex, query: str, **kwargs):
    """Answer the user's question"""

    SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
    to public financial 10k documents in order to answer users questions about company
    performance, ethics, characteristics, and core information.
    """

    # Rewrite the query using an LLM
    rewritten_query = await rewrite_query(query, **kwargs)
    print("User query updated to:\n", rewritten_query, flush=True)

    query_vector = hf.embed(rewritten_query)
    # Fetch context from Redis using vector search
    context = await retrieve_context(index, query_vector)
    print("Context retrieved", flush=True)

    # Generate contextualized prompt and feed to OpenAI
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(rewritten_query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by LLM
    return response.choices[0].message.content

In [45]:
# Now try again with query re-writing enabled
await answer_question(index, "How big is the company?")

User query updated to:
 What is the size of the company in terms of revenue, assets, and market capitalization?
Using dense content representation
Context retrieved


"Based on the provided context from the financial documents, we can determine the size of the company in terms of revenue, assets, and market capitalization as follows:\n\n1. Revenue:\n   - The company's total revenues for fiscal year 2023 were $51,217 million, representing a 10% increase from fiscal year 2022.\n   - The breakdown of revenues includes NIKE Brand revenues from Footwear, Apparel, Global Brand Divisions, and Converse, among others.\n\n2. Assets:\n   - The company's assets at fair value as of May 31, 2023, included various categories such as cash and equivalents, short-term investments, U.S. Treasury securities, commercial paper, bonds, money market funds, and more.\n   - Specific asset values were provided for cash, U.S. Treasury securities, commercial paper and bonds, money market funds, and available-for-sale debt securities.\n\n3. Market Capitalization:\n   - Market capitalization is not explicitly provided in the context. Market capitalization is calculated by multipl

### Improve performance and cut costs with LLM caching

In [46]:
from redisvl.extensions.llmcache import SemanticCache

llmcache = SemanticCache(
    name="llmcache",
    vectorizer=hf,
    redis_url=REDIS_URL,
    ttl=120,
    distance_threshold=0.2
)

In [47]:
from functools import wraps

# Create an LLM caching decorator
def cache(func):
    @wraps(func)
    async def wrapper(index, query_text, *args, **kwargs):
        query_vector = llmcache._vectorizer.embed(query_text)

        # Check the cache with the vector
        if result := llmcache.check(vector=query_vector):
            return result[0]['response']

        response = await func(index, query_text, query_vector=query_vector)
        llmcache.store(query_text, response, query_vector)
        return response
    return wrapper


@cache
async def answer_question(index: AsyncSearchIndex, query: str, **kwargs):
    """Answer the user's question"""

    SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
    to public financial 10k documents in order to answer users questions about company
    performance, ethics, characteristics, and core information.
    """

    context = await retrieve_context(index, kwargs["query_vector"])
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by GPT-3.5
    return response.choices[0].message.content

In [48]:
query = "What was Nike's revenue last year compared to this year??"

await answer_question(index, query)

Using dense content representation


"Nike's revenue last year was $1,932 million, and this year it was $1,896 million. This indicates a decrease in revenue of 4.841% compared to the previous period."

In [49]:
query = "What was Nike's total revenue in the last year compared to now??"

await answer_question(index, query)

# notice no HTTP request to OpenAI since this question is "close enough" to the last one

"Nike's revenue last year was $1,932 million, and this year it was $1,896 million. This indicates a decrease in revenue of 4.841% compared to the previous period."

### Improve personalization with including chat session history

In order to preserve state in the conversation, it's imperitive to offload conversation history to a database that can handle high transaction throughput for writes/reads to limit system latency.

We can store message history for a particular user session in a Redis List data type.


In [50]:
import json


class ChatBot:
    def __init__(self, index: AsyncSearchIndex, user: str):
        self.index = index
        self.user = user

    async def get_messages(self) -> list:
        """Get all messages associated with a session"""
        return [
            json.loads(msg) for msg in await self.index.client.lrange(f"messages:{self.user}", 0, -1)
        ]

    async def add_messages(self, messages: list):
        """Add chat messages to a Redis List"""
        return await self.index.client.rpush(
            f"messages:{self.user}", *[json.dumps(msg) for msg in messages]
        )

    async def clear_history(self):
        """Clear session chat"""
        await index.client.delete(f"messages:{self.user}")

    @staticmethod
    def promptify(query: str, context: str) -> str:
        return f'''Use the provided context below derived from public financial
        documents to answer the user's question. If you can't answer the user's
        question, based on the context; do not guess. If there is no context at all,
        respond with "I don't know".

        User question:

        {query}

        Helpful context:

        {context}

        Answer:
        '''

    async def retrieve_context(self, query_vector) -> str:
        """Fetch the relevant context from Redis using vector search"""
        results = await self.index.query(
            VectorQuery(
                vector=query_vector,
                vector_field_name="text_embedding",
                return_fields=["propositions"],
                num_results=3
            )
        )
        content = "\n".join([result["propositions"] for result in results])
        return content

    async def answer_question(self, query: str):
        """Answer the user's question with historical context and caching baked-in"""

        SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
        to public financial 10k documents in order to answer users questions about company
        performance, ethics, characteristics, and core information.
        """

        # Create query vector
        query_vector = llmcache._vectorizer.embed(query)

        # TODO - implement semantic gaurdrails?

        # Check the cache with the vector
        if result := llmcache.check(vector=query_vector):
            answer = result[0]['response']
        else:
            # TODO - implement query rewriting?
            context = await self.retrieve_context(query_vector)
            session = await self.get_messages()
            # TODO - implement session summarization?
            messages = (
                [{"role": "system", "content": SYSTEM_PROMPT}] +
                session +
                [{"role": "user", "content": self.promptify(query, context)}]
            )
            # Response provided by GPT-3.5
            response = await openai.AsyncClient().chat.completions.create(
                model=CHAT_MODEL,
                messages=messages,
                temperature=0.1,
                seed=42
            )
            answer = response.choices[0].message.content
            llmcache.store(query, answer, query_vector)

        # Add message history
        await self.add_messages([
            {"role": "user", "content": query},
            {"role": "assistant", "content": answer}
        ])

        return answer

## Test the entire RAG workflow

In [51]:
# Setup Session
chat = ChatBot(index, "tyler")
await chat.clear_history()

In [52]:
# Run a simple chat
stopterms = ["exit", "quit", "end", "cancel"]

# Simple Chat
while True:
    user_query = input()
    if user_query.lower() in stopterms:
        break
    answer = await chat.answer_question(user_query)
    print(answer, flush=True)

NIKE had about 83,700 employees globally as of May 31, 2023, including retail and part-time workers, as well as independent contractors and temporary personnel.
NIKE sells a wide range of products including athletic footwear, apparel, accessories, and equipment under the NIKE Brand, Jordan Brand, and Converse. The NIKE Brand offers performance athletic products for Men's, Women's, and Kids' categories, including sport-inspired lifestyle items. The Jordan Brand focuses on athletic and casual products with a basketball focus, while Converse sells casual products under various trademarks. Additionally, NIKE sells licensed apparel with team logos, performance equipment, and accessories such as bags, socks, and eyewear.
Based on the provided context, it appears that footwear is the most profitable product category for NIKE. Footwear revenues for NIKE were $14,897, which is significantly higher than revenues from apparel ($5,947), equipment ($764), and other sources ($0). Additionally, the c

In [53]:
await chat.get_messages()

[{'role': 'user', 'content': 'Hello. How many employees does Nike have?'},
 {'role': 'assistant',
  'content': 'NIKE had about 83,700 employees globally as of May 31, 2023, including retail and part-time workers, as well as independent contractors and temporary personnel.'},
 {'role': 'user',
  'content': 'Ok. What about the different products Nike sells. What kinds of products does it sell?'},
 {'role': 'assistant',
  'content': "NIKE sells a wide range of products including athletic footwear, apparel, accessories, and equipment under the NIKE Brand, Jordan Brand, and Converse. The NIKE Brand offers performance athletic products for Men's, Women's, and Kids' categories, including sport-inspired lifestyle items. The Jordan Brand focuses on athletic and casual products with a basketball focus, while Converse sells casual products under various trademarks. Additionally, NIKE sells licensed apparel with team logos, performance equipment, and accessories such as bags, socks, and eyewear."}

## Your Next Steps

While a good start, there is still more to do. **For example**:
- we could utilize message history to generate an updated and contextualized query to use for retrieval and answer generation (with an LLM). Otherwise, there can be a disconnect between what a user is asking (in context) and what they are asking in isolation.
- we could utilize an LLM to summarize conversation history to use as context instead of passing the whole slew of messages to the Chat endpoint.
- we could utilize semantic properties of the message history (or summaries) in order to fetch only relevant conversation bits (vector search).
- we could utilize a technique like HyDE ( a form of query rewriting ) to improve the retrieval quality from raw user input to source documents OR try to break down user questions into sub questions and fetch / join context based on the different searces.
- we could incorporate semantic routing to take a broken down question and route to different data sources, indices, or query types (etc).
- we could add semantic guardrails on the front end or back end of the conversation I/O to ensure we are within bounds of approved topics.

## Cleanup

Clean up the database.

In [54]:
await index.client.flushall()

True

Now that you have tried the easy-to-use RedisVL client, try your hand with LangChain -- the highest level of abstraction for using and integrating Redis as a vector database.


<a href="https://colab.research.google.com/github/redis-developer/financial-vss/blob/main/langchain-03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>