![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

# RAG from scratch with the Redis Vector Library


In this recipe we will cover the basic of the Redis Vector Library and build a basic RAG app from scratch.

## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/RAG/01_redisvl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Environment Setup

### Pull Github Materials
Because you are likely running this notebook in **Google Colab**, we need to first
pull the necessary dataset and materials directly from GitHub.

**If you are running this notebook locally**, FYI you may not need to perform this
step at all.

In [8]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/RAG/resources .
!rm -rf temp_repo

Cloning into 'temp_repo'...
remote: Enumerating objects: 679, done.[K
remote: Counting objects: 100% (330/330), done.[Kjects:  82% (271/330)[K
remote: Compressing objects: 100% (214/214), done.[K
remote: Total 679 (delta 227), reused 148 (delta 115), pack-reused 349 (from 2)[K
Receiving objects: 100% (679/679), 57.80 MiB | 11.09 MiB/s, done.
Resolving deltas: 100% (295/295), done.
mv: rename temp_repo/python-recipes/RAG/resources to ./resources: Directory not empty


### Install Python Dependencies

In [9]:
%pip install -q "redisvl>=0.6.0" langchain-community pypdf sentence-transformers langchain openai pandas


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings created from PDF document chunks. **We need to make sure we have a Redis
instance available.**

#### For Colab
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [None]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

#### For Alternative Environments
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [3]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

## Simplified Vector Search with RedisVL

### Dataset Preparation (PDF Documents)

To best demonstrate Redis as a vector database layer, we will load a single
financial (10k filings) doc and preprocess it using some helpers from LangChain:

- `PyPDFLoader` is not the only document loader type that LangChain provides. Docs: https://python.langchain.com/docs/integrations/document_loaders/pypdfloader/
- `RecursiveCharacterTextSplitter` is what we use to create smaller chunks of text from the doc. Docs: https://python.langchain.com/docs/how_to/recursive_text_splitter/

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader

# Load list of pdfs from a folder
data_path = "resources/"
docs = [os.path.join(data_path, file) for file in os.listdir(data_path)]

print("Listing available documents ...", docs)

Listing available documents ... ['resources/nke-10k-2023.pdf', 'resources/amzn-10k-2023.pdf', 'resources/jnj-10k-2023.pdf', 'resources/aapl-10k-2023.pdf', 'resources/testset_15.csv', 'resources/retrieval_basic_rag_test.csv', 'resources/2022-chevy-colorado-ebrochure.pdf', 'resources/nvd-10k-2023.pdf', 'resources/testset.csv', 'resources/msft-10k-2023.pdf', 'resources/propositions.json', 'resources/generation_basic_rag_test.csv']


In [5]:
# pick out the Nike doc for this exercise
doc = [doc for doc in docs if "nke" in doc][0]

# set up the file loader/extractor and text splitter to create chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2500, chunk_overlap=0
)
loader = PyPDFLoader(doc, headers = None)

# extract, load, and make chunks
chunks = loader.load_and_split(text_splitter)

print("Done preprocessing. Created", len(chunks), "chunks of the original pdf", doc)

Done preprocessing. Created 211 chunks of the original pdf resources/nke-10k-2023.pdf


### Text embedding generation with RedisVL
RedisVL has built-in extensions and utilities to aid the GenAI development process. In the following snipit we utilize the HFTextVectorizer redisvl in tandem with the **all-MiniLM-L6-v2** class to generate vector embeddings for the chunks created above. These embeddings capture the "meaning" of the text so that we can retrieve the relevant chunks later when a user's query is semantically related.

In [6]:
import warnings
import pandas as pd
from redisvl.utils.vectorize import HFTextVectorizer, BaseVectorizer
from redisvl.extensions.cache.embeddings import EmbeddingsCache

warnings.filterwarnings("ignore")
os.environ["TOKENIZERS_PARALLELISM"] = "false"

hf = HFTextVectorizer(
    model="sentence-transformers/all-MiniLM-L6-v2",
    cache=EmbeddingsCache(
        name="embedcache",
        ttl=600,
        redis_url=REDIS_URL,
    )
)

# Embed each chunk content
embeddings = hf.embed_many([chunk.page_content for chunk in chunks])

# Check to make sure we've created enough embeddings, 1 per document chunk
len(embeddings) == len(chunks)

True

### Define a schema and create an index

Below we connect to Redis and create an index that contains a text field, tag field, and vector field.

In [7]:
from redisvl.index import SearchIndex


index_name = "redisvl"

schema = {
  "index": {
    "name": index_name,
    "prefix": "chunk"
  },
  "fields": [
    {
        "name": "chunk_id",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "content",
        "type": "text"
    },
    {
        "name": "text_embedding",
        "type": "vector",
        "attrs": {
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
  ]
}

In [8]:
# create an index from schema and the client
index = SearchIndex.from_dict(schema, redis_url=REDIS_URL)
index.create(overwrite=True, drop=True)

09:46:55 redisvl.index.index INFO   Index already exists, overwriting.


In [None]:
# use the RedisVL CLI tool to list all indices
!rvl index listall

In [10]:
# get info about the index
!rvl index info -i redisvl



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ redisvl      │ HASH           │ ['chunk']  │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮
│ Name           │ Attribute      │ Type   │ Field Option   │ Option Value   │ Field Option   │ Option Value   │ Field Option   │   Option Value │ Field Option    │ Option Value   │ Field Option   │   Option Value │ Field Option    │   Option Value │
├────────────────┼────────────────┼────────┼────────────────┼─

### Process and load dataset
Below we use the RedisVL index to simply load the list of document chunks to Redis db.

In [11]:
# load expects an iterable of dictionaries
from redisvl.redis.utils import array_to_buffer

data = [
    {
        'chunk_id': i,
        'content': chunk.page_content,
        # For HASH -- must convert embeddings to bytes
        'text_embedding': array_to_buffer(embeddings[i], dtype='float32')
    } for i, chunk in enumerate(chunks)
]

# RedisVL handles batching automatically
keys = index.load(data, id_field="chunk_id")

### Query the database
Now we can use the RedisVL index to perform similarity search operations with Redis

In [12]:
from redisvl.query import VectorQuery

query = "Nike profit margins and company performance"

query_embedding = hf.embed(query)

vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=3,
    return_fields=["chunk_id", "content"],
    return_score=True
)

# show the raw redis query
str(vector_query)

'*=>[KNN 3 @text_embedding $vector AS vector_distance] RETURN 3 chunk_id content vector_distance SORTBY vector_distance ASC DIALECT 2 LIMIT 0 3'

In [13]:
# execute the query with RedisVL
result=index.query(vector_query)

# view the results
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,chunk_id,content
0,chunk:88,0.337694525719,88,"Asia Pacific & Latin America 1,932 1,896 2 % 1..."
1,chunk:80,0.34205275774,80,Table of Contents\nCONSOLIDATED OPERATING RESU...
2,chunk:87,0.357761025429,87,Table of Contents\nOPERATING SEGMENTS\nAs disc...


In [14]:
# paginate through results
for result in index.paginate(vector_query, page_size=1):
    print(result[0]["chunk_id"], result[0]["vector_distance"], flush=True)

88 0.337694525719
80 0.34205275774
87 0.357761025429


### Sort by alternative fields

In [15]:
# Sort by chunk_id field after vector search limits to topK
vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=4,
    return_fields=["chunk_id"],
    return_score=True
)

# Decompose vector_query into the core query and the params
query = vector_query.query
params = vector_query.params

# Pass query and params direct to index.search()
result = index.search(
    query.sort_by("chunk_id", asc=True),
    params
)

pd.DataFrame([doc.__dict__ for doc in result.docs])


Unnamed: 0,id,payload,vector_distance,chunk_id
0,chunk:80,,0.34205275774,80
1,chunk:83,,0.378765881062,83
2,chunk:87,,0.357761025429,87
3,chunk:88,,0.337694525719,88


### Add filters to vector queries

In [16]:
from redisvl.query.filter import Text

vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=4,
    return_fields=["content"],
    return_score=True
)

# Set a text filter
text_filter = Text("content") % "profit"

vector_query.set_filter(text_filter)

result=index.query(vector_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,content
0,chunk:83,0.378765881062,Table of Contents\nGROSS MARGIN\nFISCAL 2023 C...
1,chunk:129,0.418757200241,"Table of Contents\nNIKE, INC.\nCONSOLIDATED ST..."
2,chunk:73,0.465415120125,Table of Contents\nITEM 7. MANAGEMENT'S DISCUS...
3,chunk:63,0.49339401722,"existing businesses, such as our NIKE Direct o..."


### Range queries in RedisVL

In [17]:
from redisvl.query import RangeQuery

range_query = RangeQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=4,
    return_fields=["content"],
    return_score=True,
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.8
)

In [18]:
result=index.query(range_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,content
0,chunk:88,0.337694525719,"Asia Pacific & Latin America 1,932 1,896 2 % 1..."
1,chunk:80,0.34205275774,Table of Contents\nCONSOLIDATED OPERATING RESU...
2,chunk:87,0.357761025429,Table of Contents\nOPERATING SEGMENTS\nAs disc...
3,chunk:83,0.378765881062,Table of Contents\nGROSS MARGIN\nFISCAL 2023 C...


In [19]:
# Add filter to range query
range_query.set_filter(text_filter)

index.query(range_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,content
0,chunk:88,0.337694525719,"Asia Pacific & Latin America 1,932 1,896 2 % 1..."
1,chunk:80,0.34205275774,Table of Contents\nCONSOLIDATED OPERATING RESU...
2,chunk:87,0.357761025429,Table of Contents\nOPERATING SEGMENTS\nAs disc...
3,chunk:83,0.378765881062,Table of Contents\nGROSS MARGIN\nFISCAL 2023 C...


## Building a basic RAG Pipeline from Scratch
We're going to build a basic RAG pipeline from scratch incorporating the following components:

- Standard semantic search
- Integration with OpenAI for LLM
- Chat completion

### Setup RedisVL AsyncSearchIndex

In [20]:
from redisvl.index import AsyncSearchIndex

async_index = AsyncSearchIndex.from_dict(schema, redis_url=REDIS_URL)

### Setup OpenAI API

In [21]:
import openai
import os
import getpass


CHAT_MODEL = "gpt-3.5-turbo-0125"

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY :")


### Baseline Retrieval Augmented Generation
The code below answers a user's questions following this basic flow:

1. Generate a query_vector from the user's chat question to have an apples to apples comparison against the vector database.
2. Retrieve the most semantically relevant chunks to the user's query from the database.
3. Pass the user query and retrieved context to the `promptify` function to generate the final prompt to be sent to the LLM along with the system prompt and necessary hyperparameters.
4. Return the LLMs response to the user.

In [22]:

async def answer_question(index: AsyncSearchIndex, query: str):
    """Answer the user's question"""

    SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
    to public financial 10k documents in order to answer users questions about company
    performance, ethics, characteristics, and core information.
    """

    query_vector = hf.embed(query)
    # Fetch context from Redis using vector search
    context = await retrieve_context(index, query_vector)
    # Generate contextualized prompt and feed to OpenAI
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by LLM
    return response.choices[0].message.content


async def retrieve_context(async_index: AsyncSearchIndex, query_vector) -> str:
    """Fetch the relevant context from Redis using vector search"""
    results = await async_index.query(
        VectorQuery(
            vector=query_vector,
            vector_field_name="text_embedding",
            return_fields=["content"],
            num_results=3
        )
    )
    content = "\n".join([result["content"] for result in results])
    return content


def promptify(query: str, context: str) -> str:
    return f'''Use the provided context below derived from public financial
    documents to answer the user's question. If you can't answer the user's
    question, based on the context; do not guess. If there is no context at all,
    respond with "I don't know".

    User question:

    {query}

    Helpful context:

    {context}

    Answer:
    '''

### Let's test it out...

In [23]:
# Generate a list of questions
questions = [
    "What is the trend in the company's revenue and profit over the past few years?",
    "What are the company's primary revenue sources?",
    "How much debt does the company have, and what are its capital expenditure plans?",
    "What does the company say about its environmental, social, and governance (ESG) practices?",
    "What is the company's strategy for growth?"
]

In [24]:
import asyncio

results = await asyncio.gather(*[
    answer_question(async_index, question) for question in questions
])

### Let's view the results

In [25]:
for i, r in enumerate(results):
    print(f"Question: {questions[i]}")
    print(f"Answer: \n {r}", "\n-----------\n")

Question: What is the trend in the company's revenue and profit over the past few years?
Answer: 
 The trend in the company's revenue and profit over the past few years is as follows:

- Revenue:
  - Fiscal Year 2023: Total revenue for Nike, Inc. was $51,217 million, showing a 10% increase from the previous year.
  - Fiscal Year 2022: Total revenue for Nike, Inc. was $46,710 million, showing a 10% increase from the year before.
  - Fiscal Year 2021: Total revenue for Nike, Inc. was $44,538 million.

- Profit (EBIT):
  - Fiscal Year 2023: EBIT for Nike, Inc. was not provided in the context.
  - Fiscal Year 2022: EBIT for Nike, Inc. was not provided in the context.
  - Fiscal Year 2021: EBIT for Nike, Inc. was not provided in the context.

Based on the revenue figures provided, there has been a consistent increase in revenue for Nike, Inc. over the past few years. However, without the EBIT figures, we cannot determine the trend in profit over the same period. 
-----------

Question: What

### Improve performance and cut costs with LLM caching

In [26]:
from redisvl.extensions.llmcache import SemanticCache

llmcache = SemanticCache(
    name="llmcache",
    vectorizer=hf,
    redis_url=REDIS_URL,
    ttl=120,
    distance_threshold=0.2,
    overwrite=True,
)

09:47:20 redisvl.index.index INFO   Index already exists, overwriting.


In [27]:
from functools import wraps

# Create an LLM caching decorator
def cache(func):
    @wraps(func)
    async def wrapper(index, query_text, *args, **kwargs):
        query_vector = llmcache._vectorizer.embed(query_text)

        # Check the cache with the vector
        if result := llmcache.check(vector=query_vector):
            print("Cache hit!")
            return result[0]['response']

        response = await func(index, query_text, query_vector=query_vector)
        llmcache.store(query_text, response, query_vector)
        return response
    return wrapper


@cache
async def answer_question(index: AsyncSearchIndex, query: str, **kwargs):
    """Answer the user's question"""

    SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
    to public financial 10k documents in order to answer users questions about company
    performance, ethics, characteristics, and core information.
    """

    context = await retrieve_context(index, kwargs["query_vector"])
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by GPT-3.5
    return response.choices[0].message.content

In [28]:
# NBVAL_SKIP
query = "What was Nike's revenue last year compared to this year??"

await answer_question(async_index, query)

"Nike's total revenues were $51.2 billion in fiscal year 2023, compared to $46.7 billion in fiscal year 2022. This represents a 10% increase in revenue from the previous year."

In [29]:
# NBVAL_SKIP
query = "What was Nike's total revenue in the last year compared to now??"

await answer_question(async_index, query)

# notice no HTTP request to OpenAI since this question is "close enough" to the last one

Cache hit!


"Nike's total revenues were $51.2 billion in fiscal year 2023, compared to $46.7 billion in fiscal year 2022. This represents a 10% increase in revenue from the previous year."

### Improve personalization by including chat session history

To preserve state in the conversation, offload conversation history to a database that can handle high transaction throughput for writes/reads to limit system latency.


In [30]:
from functools import wraps
from redisvl.extensions.session_manager import StandardSessionManager


class ChatBot:
    def __init__(self, index: AsyncSearchIndex, vectorizer: BaseVectorizer, user: str):
        self.index = index
        self.vectorizer = vectorizer
        self.session_manager = StandardSessionManager(
            name=f"chat_session_{user}",
            session_tag=user,
            redis_url=REDIS_URL,
        )

    @staticmethod
    def promptify(query: str, context: str) -> str:
        return f'''Use the provided context below derived from public financial
        documents to answer the user's question. If you can't answer the user's
        question, based on the context; do not guess. If there is no context at all,
        respond with "I don't know".

        User question:

        {query}

        Helpful context:

        {context}

        Answer:
        '''

    async def retrieve_context(self, query_vector) -> str:
        """Fetch the relevant context from Redis using vector search"""
        results = await self.index.query(
            VectorQuery(
                vector=query_vector,
                vector_field_name="text_embedding",
                return_fields=["content"],
                num_results=3
            )
        )
        content = "\n".join([result["content"] for result in results])
        return content

    async def clear_history(self):
        """Clear session chat"""
        self.session_manager.clear()

    async def answer_question(self, query: str):
        """Answer the user's question with historical context and caching baked-in"""

        SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
        to public financial 10k documents in order to answer users questions about company
        performance, ethics, characteristics, and core information.
        """

        # Create query vector
        query_vector = self.vectorizer.embed(query)

        # Check the cache with the vector
        if result := llmcache.check(vector=query_vector):
            answer = result[0]['response']
        else:
            context = await self.retrieve_context(query_vector)
            session = self.session_manager.messages
            messages = (
                [{"role": "system", "content": SYSTEM_PROMPT}] +
                session +
                [{"role": "user", "content": self.promptify(query, context)}]
            )
            # Response provided by GPT-3.5
            response = await openai.AsyncClient().chat.completions.create(
                model=CHAT_MODEL,
                messages=messages,
                temperature=0.1,
                seed=42
            )
            answer = response.choices[0].message.content
            llmcache.store(query, answer, query_vector)

        # Add message history
        self.session_manager.add_messages([
            {"role": "user", "content": query},
            {"role": "assistant", "content": answer}
        ])

        return answer

## Test the entire RAG workflow

In [31]:
# Setup Session
chat = ChatBot(async_index, vectorizer=hf, user="Andrew")
await chat.clear_history()

In [32]:
# Run a simple chat
stopterms = ["exit", "quit", "end", "cancel"]

# Simple Chat
# NBVAL_SKIP
while True:
    user_query = input()
    if user_query.lower() in stopterms or not user_query:
        break
    answer = await chat.answer_question(user_query)
    print(answer, flush=True)

Hi! How can I assist you today?


In [33]:
# NBVAL_SKIP
chat.session_manager.messages

[{'role': 'user', 'content': 'hi'},
 {'role': 'assistant', 'content': 'Hi! How can I assist you today?'}]

# You now have a working RAG pipeline!

As you can see, it is easy to get started with RAG and we were able to get decent chat results from this simple setup. To go beyond the basic example though see the [advanced_rag](./04_advanced_redisvl.ipynb) notebook.

This notebook covers:

- **Improving accuracy** with dense content representations and query rewriting/expansion
- **Improving performance and optimizing cost** with semantic caching
- **Improving personalization** with chat session memory.


## Cleanup

Clean up the database.

In [37]:
await async_index.client.flushall()

True