<a href="https://colab.research.google.com/github/Redislabs-Solution-Architects/financial-vss/blob/main/redisvl-02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vector Search with RedisVL

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

This notebook uses [redisvl](https://redisvl.com), a dedicated Python client library for using Redis as a vector database, to perform document + embdding indexing and semantic search tasks.

## Setup and Data Prep

### Pull Github Materials
We need to clone the supporting materials from github.

In [None]:
# This clones your git repository into a directory named 'temp_repo'.
!git clone https://github.com/Redislabs-Solution-Architects/financial-vss.git temp_repo

# This command moves the 'resources' directory from 'temp_repo' to your current directory.
!mv temp_repo/resources .
!mv temp_repo/requirements.txt

# This deletes the 'temp_repo' directory, cleaning up the unwanted files.
!rm -rf temp_repo


### Install Python Dependencies

In [None]:
!pip install -q -r requirements.txt

In [None]:
import warnings

warnings.filterwarnings("ignore")

### Preprocess PDF Doc(s)

Now we will load a single financial (10k filings) doc and preprocess it using some LangChain helpers.

In [None]:
import os

# Load list of pdfs
data_path = "resources/"
docs = [os.path.join(data_path, file) for file in os.listdir(data_path)]

print("Listing available documents ...", docs)

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredFileLoader

# For simplicity, we will just work with one of the 10k files. This will take some time still.
# To Note: the UnstructuredFileLoader is not the only document loader type that LangChain provides
# To Note: the RecursiveCharacterTextSplitter is what we use to create smaller chunks of text from the doc.
# Docs: https://python.langchain.com/docs/integrations/document_loaders/unstructured_file
# Docs: https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter
doc = [doc for doc in docs if "nke" in doc][0]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True)
loader = UnstructuredFileLoader(doc, mode="single", strategy="fast")
chunks = loader.load_and_split(text_splitter)

print("Done preprocessing. Created", len(chunks), "chunks of the original pdf", doc)

### Create document chunk embeddings

In [None]:
from redisvl.utils.vectorize import HFTextVectorizer

hf = HFTextVectorizer("sentence-transformers/all-MiniLM-L6-v2")

# Embed each page_content from the document chunks
chunk_embeddings = hf.embed_many([chunk.page_content for chunk in chunks])

# Check to make sure we've created enough embeddings, 1 per document chunk
len(chunk_embeddings) == len(chunks)

### Run Localized Redis Stack

If you don't have a remote Redis instance, use an in-notebook version of [Redis Stack](https://redis.io/docs/getting-started/install-stack/). Or you can provision your own free instance of [Redis Cloud](https://redis.com/try-free/).


Use the below code to download and run a localized version of Redis Stack here in the notebook.

In [None]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

### Connect to Redis

By default this notebook would connect to the local instance of Redis Stack. If you have your own Redis Cloud instance - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [None]:
# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# Construct URL
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

## Getting Started with RedisVL

### Create an index from schema
Below we connect to Redis and create an index for vector search that contains a single text field and vector field.

In [None]:
from redis import Redis
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex

index_name = "redisvl"

schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
    "prefix": "chunk"
  },
  "fields": [
    {"name": "content", "type": "text"},
    {
        "name": "label",
        "type": "tag",
        "attrs": {
            "sortable": True
          }
      },
    {
      "name": "chunk_vector",
      "type": "vector",
      "attrs": {
        "dims": hf.dims,
        "distance_metric": "cosine",
        "algorithm": "hnsw",
        "datatype": "float32"
      }
    }
  ]
})

# connect to redis
client = Redis.from_url(REDIS_URL)

# create an index
index = SearchIndex(schema, client)
index.create(overwrite=True)

In [None]:
# use the CLI to see the created index
!rvl index listall

In [None]:
!rvl index info -i redisvl

### Process and load data using RedisVL
Below we use the RedisVL index to simply load the list of document chunks to Redis db.

In [None]:
# load expects an iterable of dictionaries
from redisvl.redis.utils import array_to_buffer

data = [
    {
        'label': f'ID-{i}',
        'content': chunk.page_content,
        # For HASH -- must convert embeddings to bytes
        'chunk_vector': array_to_buffer(chunk_embeddings[i])
    } for i, chunk in enumerate(chunks)
]

# RedisVL handles batching automatically
keys = index.load(data)

### Query the database
Now we can use the RedisVL index to perform similarity search operations with Redis

In [None]:
from redisvl.query import VectorQuery

query = "Nike profit margins and company performance"

vector_query = VectorQuery(
    vector=hf.embed(query),
    vector_field_name="chunk_vector",
    num_results=4,
    return_fields=["label", "content"],
    return_score=True
)

# show the raw redis query
str(vector_query)

In [None]:
# execute the query with RedisVL
index.query(vector_query)

In [None]:
# paginate through results
for result in index.paginate(vector_query, page_size=1):
    print(result[0]["label"], result[0]["vector_distance"], flush=True)

### Sort by alternative fields

In [None]:
# Sort by label field after vector search limits to topK
vector_query = VectorQuery(
    vector=hf.embed("Nike profit margins and company performance"),
    vector_field_name="chunk_vector",
    num_results=4,
    return_fields=["label"],
    return_score=True
)

# Decompose vector_query into the core query and the params
query = vector_query.query
params = vector_query.params

# Pass query and params direct to index.search()
result = index.search(
    query.sort_by("label", asc=True),
    params
  )

[doc.__dict__ for doc in result.docs]

### Add filters to vector queries

In [None]:
from redisvl.query.filter import Text

vector_query = VectorQuery(
    vector=hf.embed("Nike profit margins and company performance"),
    vector_field_name="chunk_vector",
    num_results=4,
    return_fields=["content"],
    return_score=True
)

# Set a text filter
text_filter = Text("content") % "profit"

vector_query.set_filter(text_filter)

index.query(vector_query)

### Range queries in RedisVL

In [None]:
from redisvl.query import RangeQuery

range_query = RangeQuery(
    vector=hf.embed("Nike profit margins and company performance"),
    vector_field_name="chunk_vector",
    num_results=4,
    return_fields=["content"],
    return_score=True,
    distance_threshold=0.5  # find all items with a semantic distance of less than 0.5
)

In [None]:
index.query(range_query)

In [None]:
# Add filter to range query
range_query.set_filter(text_filter)

index.query(range_query)

## Building a RAG Pipeline with RedisVL

### Use AsyncSearchIndex

In [None]:
from redis.asyncio import Redis
from redisvl.index import AsyncSearchIndex

client = Redis.from_url(REDIS_URL)
index = AsyncSearchIndex(index.schema, client)

### Prep OpenAI Helpers & Prompts

In [None]:
import openai
import os
import getpass


if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY")


CHAT_MODEL = "gpt-3.5-turbo"


SYSTEM_PROMPT = """You are a helpful financial analyst assistant that has access
to public financial 10k documents in order to answer users questions about company
performance, ethics, characteristics, and core information.
"""

In [None]:

async def answer_question(index: AsyncSearchIndex, query: str):
    """Answer the user's question"""
    query_vector = hf.embed(query)
    context = await retrieve_context(index, query_vector)
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by GPT-3.5
    return response.choices[0].message.content


async def retrieve_context(index: AsyncSearchIndex, query_vector) -> str:
    """Fetch the relevant context from Redis using vector search"""
    results = await index.query(
        VectorQuery(
            vector=query_vector,
            vector_field_name="chunk_vector",
            return_fields=["content"],
            num_results=3
        )
    )
    content = "\n".join([result["content"] for result in results])
    return content


def promptify(query: str, context: str) -> str:
    return f'''Use the provided context below derived from public financial
    documents to answer the user's question. If you can't answer the user's
    question, based on the context; do not guess. If there is no context at all,
    respond with "I don't know".

    User question:

    {query}

    Helpful context:

    {context}

    Answer:
    '''

### Vanilla Async RAG

In [None]:
# Generate a list of questions
questions = [
    "What is the trend in the company's revenue and profit over the past few years?",
    "What are the company's primary revenue sources?",
    "How much debt does the company have, and what are its capital expenditure plans?",
    "What does the company say about its environmental, social, and governance (ESG) practices?",
    "What is the company's strategy for growth?"
]

In [None]:
import asyncio

results = await asyncio.gather(*[
    answer_question(index, question) for question in questions
])

In [None]:
import pandas as pd

pd.DataFrame(columns=["question", "answer"], data=list(zip(questions, results)))

### Improve performance and cut costs with LLM caching

In [None]:
from redisvl.extensions.llmcache import SemanticCache

llmcache = SemanticCache(
    name="llmcache",
    vectorizer=hf,
    redis_url=REDIS_URL,
    ttl=120,
    distance_threshold=0.2
)

In [None]:
from functools import wraps

# Create an LLM caching decorator
def cache(func):
    @wraps(func)
    async def wrapper(index, query_text, *args, **kwargs):
        query_vector = llmcache._vectorizer.embed(query_text)

        # Check the cache with the vector
        if result := llmcache.check(vector=query_vector):
            return result[0]['response']

        response = await func(index, query_text, query_vector=query_vector)
        llmcache.store(query_text, response, query_vector)
        return response
    return wrapper


@cache
async def answer_question(index: AsyncSearchIndex, query: str, **kwargs):
    """Answer the user's question"""
    context = await retrieve_context(index, kwargs["query_vector"])
    response = await openai.AsyncClient().chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": promptify(query, context)}
        ],
        temperature=0.1,
        seed=42
    )
    # Response provided by GPT-3.5
    return response.choices[0].message.content

In [None]:
query = "What was Nike's revenue last year compared to this year??"

await answer_question(index, query)

In [None]:
query = "What was Nike's total revenue in the last year compared to now??"

await answer_question(index, query)

### Offload session history to Redis

In order to preserve state in the conversation, it's imperitive to offload conversation history to a database that can handle high transaction throughput for writes/reads to limit system latency.

We can store message history for a particular user session in a Redis List data type.


In [None]:
import json


async def get_messages(index: AsyncSearchIndex, user_id: str) -> list:
    """Get all messages associated with a session"""
    return [
        json.loads(msg) for msg in await index.client.lrange(f"messages:{user_id}", 0, -1)
    ]

async def add_messages(index: AsyncSearchIndex, user_id: str, messages: list):
    """Add chat messages to a Redis List"""
    return await index.client.rpush(f"messages:{user_id}", *[json.dumps(msg) for msg in messages])

async def clear_history(index: AsyncSearchIndex, user_id: str):
    """Clear session chat"""
    await index.client.delete(f"messages:{user_id}")

async def answer_question(index: AsyncSearchIndex, query: str):
    """Answer the user's question with historical context and caching baked-in"""
    query_vector = llmcache._vectorizer.embed(query)
    # Check the cache with the vector
    if result := llmcache.check(vector=query_vector):
        answer = result[0]['response']
    else:
        context = await retrieve_context(index, query_vector)
        messages = await get_messages(index, "tyler")
        messages += [{"role": "user", "content": promptify(query, context)}]
        # Response provided by GPT-3.5
        response = await openai.AsyncClient().chat.completions.create(
            model=CHAT_MODEL,
            messages=messages,
            temperature=0.1,
            seed=42
        )
        answer = response.choices[0].message.content
    # Add message history
    await add_messages(index, "tyler", [
        {"role": "user", "content": query},
        {"role": "assistant", "content": answer}
    ])
    return answer

In [None]:
# Setup Session
await clear_history(index, "tyler")
await add_messages(index, "tyler", [{"role": "system", "content": SYSTEM_PROMPT}])

# Simple Chat
while True:
    query = input()
    if query is None:
        break
    answer = await answer_question(index, query)
    print(answer, flush=True)


## Your Next Steps

While a good start, there is still more to do. For example:
- we could utilize message history to generate an updated query to use for retrieval. Otherwise, there can be a disconnect between what a user is asking (in context) and what they are asking in isolation.
- we could utilize semantic properties of the message history in order to fetch only relevant conversation bits (vector search).
- we could utilize a technique like HyDE to improve the retrieval quality from raw user input to source documents.

## Cleanup

Clean up the database.

In [None]:
await index.client.flushall()

Now that you have tried the easy-to-use RedisVL client, try your hand with LangChain -- the highest level of abstraction for using and integrating Redis as a vector database.


<a href="https://colab.research.google.com/github/Redislabs-Solution-Architects/financial-vss/blob/main/langchain-03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>