![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

# RAG from scratch with the Redis Vector Library


In this recipe we will cover the basic of the Redis Vector Library and build a basic RAG app from scratch.

## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/RAG/01_redisvl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Environment Setup

### Pull Github Materials
Because you are likely running this notebook in **Google Colab**, we need to first
pull the necessary dataset and materials directly from GitHub.

**If you are running this notebook locally**, FYI you may not need to perform this
step at all.

In [1]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/RAG/resources .
!rm -rf temp_repo

Cloning into 'temp_repo'...
remote: Enumerating objects: 384, done.[K
remote: Counting objects: 100% (247/247), done.[K
remote: Compressing objects: 100% (159/159), done.[K
remote: Total 384 (delta 135), reused 151 (delta 74), pack-reused 137 (from 1)[K
Receiving objects: 100% (384/384), 64.50 MiB | 7.44 MiB/s, done.
Resolving deltas: 100% (159/159), done.


### Install Python Dependencies

In [2]:
# NBVAL_SKIP
!pip install -q redis redisvl langchain_community pypdf sentence-transformers langchain openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/261.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/95.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.6/95.6 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━[0m [32m2.2/2.4 MB[0m [31m81.1 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.4/2.4 MB[0m [31m35.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m27.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━

### Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings created from PDF document chunks. **We need to make sure we have a Redis
instance available.**

#### For Colab
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [3]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


#### For Alternative Environments
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [4]:
import os
import warnings
#warnings.filterwarnings('ignore')

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

## Simplified Vector Search with RedisVL

### Dataset Preparation (PDF Documents)

To best demonstrate Redis as a vector database layer, we will load a single
financial (10k filings) doc and preprocess it using some helpers from LangChain:

- `UnstructuredFileLoader` is not the only document loader type that LangChain provides. Docs: https://python.langchain.com/docs/integrations/document_loaders/unstructured_file
- `RecursiveCharacterTextSplitter` is what we use to create smaller chunks of text from the doc. Docs: https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
#from langchain_unstructured import UnstructuredLoader
from langchain_community.document_loaders import PyPDFLoader

# Load list of pdfs from a folder
data_path = "resources/"
docs = [os.path.join(data_path, file) for file in os.listdir(data_path)]

print("Listing available documents ...", docs)

Listing available documents ... ['resources/nke-10k-2023.pdf', 'resources/propositions.json', 'resources/nvd-10k-2023.pdf', 'resources/retrieval_basic_rag_test.csv', 'resources/msft-10k-2023.pdf', 'resources/generation_basic_rag_test.csv', 'resources/testset_15.csv', 'resources/jnj-10k-2023.pdf', 'resources/testset.csv', 'resources/amzn-10k-2023.pdf', 'resources/aapl-10k-2023.pdf']


In [6]:
# pick out the Nike doc for this exercise
doc = [doc for doc in docs if "nke" in doc][0]

# set up the file loader/extractor and text splitter to create chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2500, chunk_overlap=0
)
loader = PyPDFLoader(doc, headers = None)

# extract, load, and make chunks
chunks = loader.load_and_split(text_splitter)

print("Done preprocessing. Created", len(chunks), "chunks of the original pdf", doc)

Done preprocessing. Created 211 chunks of the original pdf resources/nke-10k-2023.pdf


### Text embedding generation with RedisVL
RedisVL has built-in extensions and utilities to aid the GenAI development process. In the following snipit we utilize the HFTextVectorizer redisvl in tandem with the **all-MiniLM-L6-v2** class to generate vector embeddings for the chunks created above. These embeddings capture the "meaning" of the text so that we can retrieve the relevant chunks later when a user's query is semantically related.

In [7]:
from redisvl.utils.vectorize import HFTextVectorizer
import pandas as pd
from tqdm.auto import tqdm

hf = HFTextVectorizer("sentence-transformers/all-MiniLM-L6-v2")
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# Embed each chunk content
embeddings = hf.embed_many([chunk.page_content for chunk in chunks])

# Check to make sure we've created enough embeddings, 1 per document chunk
len(embeddings) == len(chunks)

21:26:21 numexpr.utils INFO   NumExpr defaulting to 2 threads.
21:26:38 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: cuda
21:26:38 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

True

### Define a schema and create an index

Below we connect to Redis and create an index that contains a text field, tag field, and vector field.

In [8]:
from redis import Redis
from redisvl.index import SearchIndex


index_name = "redisvl"


schema = {
  "index": {
    "name": index_name,
    "prefix": "chunk"
  },
  "fields": [
    {
        "name": "chunk_id",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "content",
        "type": "text"
    },
    {
        "name": "text_embedding",
        "type": "vector",
        "attrs": {
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
  ]
}

In [9]:
# connect to redis
client = Redis.from_url(REDIS_URL)

# create an index from schema and the client
index = SearchIndex.from_dict(schema)
index.set_client(client)
index.create(overwrite=True, drop=True)

In [10]:
# use the RedisVL CLI tool to list all indices
!rvl index listall

[32m21:26:52[0m [34m[RedisVL][0m [1;30mINFO[0m   Indices:
[32m21:26:52[0m [34m[RedisVL][0m [1;30mINFO[0m   1. redisvl


In [11]:
# get info about the index
!rvl index info -i redisvl



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ redisvl      │ HASH           │ ['chunk']  │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮
│ Name           │ Attribute      │ Type   │ Field Option   │ Option Value   │ Field Option   │ Option Value   │ Field Option   │   Option Value │ Field Option    │ Option Value   │ Field Option   │   Option Value │ Field Option    │   Option Value │
├────────────────┼────────────────┼────────┼────────────────┼────────────

### Process and load dataset
Below we use the RedisVL index to simply load the list of document chunks to Redis db.

In [12]:
# load expects an iterable of dictionaries
from redisvl.redis.utils import array_to_buffer

data = [
    {
        'chunk_id': i,
        'content': chunk.page_content,
        # For HASH -- must convert embeddings to bytes
        'text_embedding': array_to_buffer(embeddings[i], dtype='float32')
    } for i, chunk in enumerate(chunks)
]

# RedisVL handles batching automatically
keys = index.load(data, id_field="chunk_id")

### Query the database
Now we can use the RedisVL index to perform similarity search operations with Redis

In [13]:
from redisvl.query import VectorQuery

query = "Nike profit margins and company performance"

query_embedding = hf.embed(query)

vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=3,
    return_fields=["chunk_id", "content"],
    return_score=True
)

# show the raw redis query
str(vector_query)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

'*=>[KNN 3 @text_embedding $vector AS vector_distance] RETURN 3 chunk_id content vector_distance SORTBY vector_distance ASC DIALECT 2 LIMIT 0 3'

In [14]:
# execute the query with RedisVL
result=index.query(vector_query)

# view the results
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,chunk_id,content
0,chunk:88,0.337694585323,88,"Asia Pacific & Latin America 1,932 1,896 2 % 1..."
1,chunk:80,0.342052936554,80,Table of Contents\nCONSOLIDATED OPERATING RESU...
2,chunk:87,0.357760846615,87,Table of Contents\nOPERATING SEGMENTS\nAs disc...


In [15]:
# paginate through results
for result in index.paginate(vector_query, page_size=1):
    print(result[0]["chunk_id"], result[0]["vector_distance"], flush=True)

88 0.337694585323
80 0.342052936554
87 0.357760846615


### Sort by alternative fields

In [16]:
# Sort by chunk_id field after vector search limits to topK
vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=4,
    return_fields=["chunk_id"],
    return_score=True
)

# Decompose vector_query into the core query and the params
query = vector_query.query
params = vector_query.params

# Pass query and params direct to index.search()
result = index.search(
    query.sort_by("chunk_id", asc=True),
    params
)

pd.DataFrame([doc.__dict__ for doc in result.docs])


Unnamed: 0,id,payload,vector_distance,chunk_id
0,chunk:80,,0.342052936554,80
1,chunk:83,,0.37876611948,83
2,chunk:87,,0.357760846615,87
3,chunk:88,,0.337694585323,88


### Add filters to vector queries

In [17]:
from redisvl.query.filter import Text

vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=4,
    return_fields=["content"],
    return_score=True
)

# Set a text filter
text_filter = Text("content") % "profit"

vector_query.set_filter(text_filter)

result=index.query(vector_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,content
0,chunk:83,0.37876611948,Table of Contents\nGROSS MARGIN\nFISCAL 2023 C...
1,chunk:129,0.41875743866,"Table of Contents\nNIKE, INC.\nCONSOLIDATED ST..."
2,chunk:168,0.657553255558,Table of Contents\nNOTE 10 — EARNINGS PER SHAR...
3,chunk:39,0.683842360973,"manner. However, lead times for many of our pr..."


### Range queries in RedisVL

In [18]:
from redisvl.query import RangeQuery

range_query = RangeQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=4,
    return_fields=["content"],
    return_score=True,
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.8
)

In [19]:
result=index.query(range_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,content
0,chunk:88,0.337694585323,"Asia Pacific & Latin America 1,932 1,896 2 % 1..."
1,chunk:80,0.342052936554,Table of Contents\nCONSOLIDATED OPERATING RESU...
2,chunk:87,0.357760846615,Table of Contents\nOPERATING SEGMENTS\nAs disc...
3,chunk:83,0.37876611948,Table of Contents\nGROSS MARGIN\nFISCAL 2023 C...


In [20]:
# Add filter to range query
range_query.set_filter(text_filter)

index.query(range_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,content
0,chunk:88,0.337694585323,"Asia Pacific & Latin America 1,932 1,896 2 % 1..."
1,chunk:80,0.342052936554,Table of Contents\nCONSOLIDATED OPERATING RESU...
2,chunk:87,0.357760846615,Table of Contents\nOPERATING SEGMENTS\nAs disc...
3,chunk:83,0.37876611948,Table of Contents\nGROSS MARGIN\nFISCAL 2023 C...


## Building a basic RAG Pipeline from Scratch
We're going to build a basic RAG pipeline from scratch incorporating the following components:

- Standard semantic search
- Integration with OpenAI for LLM
- Chat completion

### Setup RedisVL AsyncSearchIndex

In [21]:
from redis.asyncio import Redis as AsyncRedis
from redisvl.index import AsyncSearchIndex

client = AsyncRedis.from_url(REDIS_URL)
async_index = AsyncSearchIndex.from_dict(schema)
await async_index.set_client(client)

<redisvl.index.index.AsyncSearchIndex at 0x7db84cf86740>

### Setup OpenAI API

In [22]:
import openai
import os
import getpass


CHAT_MODEL = "gpt-3.5-turbo-0125"

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY :")


OPENAI_API_KEY :··········


### Baseline Retrieval Augmented Generation
The code below answers a user's questions following this basic flow:

1. Generate a query_vector from the user's chat question to have an apples to apples comparison against the vector database.
2. Retrieve the most semantically relevant chunks to the user's query from the database.
3. Pass the user query and retrieved context to the `promptify` function to generate the final prompt to be sent to the LLM along with the system prompt and necessary hyperparameters.
4. Return the LLMs response to the user.

In [23]:

async def answer_question(index: AsyncSearchIndex, query: str):
    """Answer the user's question"""

    SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
    to public financial 10k documents in order to answer users questions about company
    performance, ethics, characteristics, and core information.
    """

    query_vector = hf.embed(query)
    # Fetch context from Redis using vector search
    context = await retrieve_context(index, query_vector)
    # Generate contextualized prompt and feed to OpenAI
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by LLM
    return response.choices[0].message.content


async def retrieve_context(async_index: AsyncSearchIndex, query_vector) -> str:
    """Fetch the relevant context from Redis using vector search"""
    results = await async_index.query(
        VectorQuery(
            vector=query_vector,
            vector_field_name="text_embedding",
            return_fields=["content"],
            num_results=3
        )
    )
    content = "\n".join([result["content"] for result in results])
    return content


def promptify(query: str, context: str) -> str:
    return f'''Use the provided context below derived from public financial
    documents to answer the user's question. If you can't answer the user's
    question, based on the context; do not guess. If there is no context at all,
    respond with "I don't know".

    User question:

    {query}

    Helpful context:

    {context}

    Answer:
    '''

### Let's test it out...

In [24]:
# Generate a list of questions
questions = [
    "What is the trend in the company's revenue and profit over the past few years?",
    "What are the company's primary revenue sources?",
    "How much debt does the company have, and what are its capital expenditure plans?",
    "What does the company say about its environmental, social, and governance (ESG) practices?",
    "What is the company's strategy for growth?"
]

In [25]:
import asyncio

results = await asyncio.gather(*[
    answer_question(async_index, question) for question in questions
])

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

21:27:00 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
21:27:01 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
21:27:01 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
21:27:01 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
21:27:01 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


### Let's view the results

In [26]:
for i, r in enumerate(results):
    print(f"Question: {questions[i]}")
    print(f"Answer: \n {r}", "\n-----------\n")

Question: What is the trend in the company's revenue and profit over the past few years?
Answer: 
 Based on the provided financial data, we can observe the trend in the company's revenue and profit over the past few years as follows:

1. Revenue Trend:
- In fiscal year 2021, the total revenue for Nike, Inc. was $44,538 million.
- In fiscal year 2022, the total revenue increased to $46,710 million, showing a growth of 5%.
- In fiscal year 2023, the total revenue further increased to $51,217 million, indicating a growth of 10%.

2. Profit Trend (EBIT):
- In fiscal year 2021, the Earnings Before Interest and Taxes (EBIT) for Nike, Inc. was not provided in the context.
- In fiscal year 2022, the EBIT increased to $2,346 million.
- In fiscal year 2023, the EBIT further increased to $2,427 million.

Therefore, based on the data provided, we can see a positive trend in both revenue and profit for Nike, Inc. over the past few years. 
-----------

Question: What are the company's primary revenu

# You now have a working RAG pipeline!

As you can see, it is easy to get started with RAG and we were able to get decent chat results from this simple setup. To go beyond the basic example though see the [advanced_rag](../advanced_capabilities/advanced_RAG.ipynb) notebook.

This notebook covers:

- **Improving accuracy** with dense content representations and query rewriting/expansion
- **Improving performance and optimizing cost** with semantic caching
- **Improving personalization** with chat session memory.


## Cleanup

Clean up the database.

In [27]:
# await async_index.client.flushall()