# Vector Similarity Search with RedisVL

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

This notebook uses [RedisVL](https://redisvl.com), a dedicated Python client library for using Redis as a vector database, to perform document + embdding indexing and semantic search tasks.

## Install Python Dependencies

In [None]:
!pip install -q redis redisvl>=0.0.4 numpy sentence-transformers

## Load Document Chunks and Embeddings
**You are expected to have first run the Setup / Data Prep Notebook**

In [2]:
import os
import json

data_path = "notebooks/resources/"

with open(os.path.join(data_path, "embeddings.json"), "r") as f:
    chunk_embeddings = json.load(f)

with open(os.path.join(data_path, "docs.json"), "r") as f:
    chunks = json.load(f)

## Install Redis Stack (OPTIONAL)

Redis Search will be used as Vector Similarity Search engine for LangChain.

Instead of using in-notebook Redis Stack https://redis.io/docs/getting-started/install-stack/ you can provision your own free instance of Redis in the cloud. Get your own Free Redis Cloud instance at https://redis.com/try-free/

In [None]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

### Connect to Redis

By default this notebook would connect to the local instance of Redis Stack. If you have your own Redis Cloud instance - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [1]:
# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#REDIS_HOST="redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
#REDIS_PORT=18374
#REDIS_PASSWORD="1TNxTEdYRDgIDKM2gDfasupCADXXXX"

#shortcut for redis-cli $REDIS_CONN command
if REDIS_PASSWORD!="":
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT} -a {REDIS_PASSWORD} --no-auth-warning"
else:
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT}"

REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"


## Create the index from schema

In [3]:
from redisvl.index import SearchIndex # can also use AsyncSearchIndex for prod use cases

index_name = "redisvl"

schema = {
  "index": {
    "name": index_name,
    "prefix": f"doc:{index_name}"
  },
  "fields": {
    "text": [{"name": "content"}],
    "vector": [{
                "name": "chunk_vector",
                "dims": 1536,
                "distance_metric": "cosine",
                "algorithm": "hnsw",
                "datatype": "float32"}
        ]
  },
}

# construct a search index from the schema
index = SearchIndex.from_dict(schema) # or SearchIndex.from_yaml("schema.yaml") for yaml files

# connect to local redis instance
index.connect(REDIS_URL)

# create the index (no data yet)
index.create(overwrite=True)

In [4]:
# use the CLI to see the created index
!rvl index listall

[32m14:03:58[0m [34m[RedisVL][0m [1;30mINFO[0m   Indices:
[32m14:03:58[0m [34m[RedisVL][0m [1;30mINFO[0m   1. langchain
[32m14:03:58[0m [34m[RedisVL][0m [1;30mINFO[0m   2. redisvl


In [5]:
!rvl index info -i redisvl



Index Information:
╭──────────────┬────────────────┬─────────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes        │ Index Options   │   Indexing │
├──────────────┼────────────────┼─────────────────┼─────────────────┼────────────┤
│ redisvl      │ HASH           │ ['doc:redisvl'] │ []              │          0 │
╰──────────────┴────────────────┴─────────────────┴─────────────────┴────────────╯
Index Fields:
╭──────────────┬──────────────┬────────┬────────────────┬────────────────╮
│ Name         │ Attribute    │ Type   │ Field Option   │   Option Value │
├──────────────┼──────────────┼────────┼────────────────┼────────────────┤
│ content      │ content      │ TEXT   │ WEIGHT         │              1 │
│ chunk_vector │ chunk_vector │ VECTOR │                │                │
╰──────────────┴──────────────┴────────┴────────────────┴────────────────╯


### Process and load data using RedisVL

In [7]:
# load expects an iterable of dictionaries
import numpy as np

data = [
    {
        'content': chunk['page_content'],
        'chunk_vector': np.array(chunk_embeddings[i]).astype(np.float32).tobytes()
    } for i, chunk in enumerate(chunks)
]

index.load(data)

In [8]:
# access the underlying client to check the data in Redis
index.client.dbsize()

646

In [12]:
# do NOT run this command in production
keys = index.client.keys()

index.client.hgetall(keys[0])

{b'chunk_vector': b'+v\x9e\xba\x8097\xbd\x7fKL<\xaa\xf5 \xbd\xf0i\x88\xbc\x188\x07\xbc\x93%~\xbcu\xd1e\xbcY\xe2-\xbaT\xbb\x92\xbb\xa5+`<QV2\xbcL\xa6\x8c<\x8b*\x13<W\xa2\xf2;\xe0\xc8\x86;<\x05\x8b<y\xff\x0b\xbd\xa2\xe0\x9a<\xb0\x93\xb1\xbb6\x8c\x1f\xbdq\x98\xb5:9\x8d\x1a\xbdQ{W<\xb8\xa87;f]\x0f\xbb\xa6X\x0b=A\x19\x16\xbd\xa0V\x15\xbc\xa2*\xe5:Vj=\xbcx\xad\xbb\xbb\xaa\xf5 \xba\xbb\x84\r<\xbc\x97\x9d\xbc\x15\x1d\xf19s\xe3\xfa<Z\xbd\x08<\x031\xaa<Q\xa0\xfc\xbb)\x88\xb3<.\x9c>\xbb\x1e\xd6\x97\xbb\x0c4\x9b\xbc\x8d\xb4\x18;\xc6\x11\x84<\x89a\xcd<\x96\xdc.\xbc\x00\xf1n<Ky\xe1<\xdb=\x06\xbc\x9c\x03J=\xe7\xb0\xe1\xbct\x10\xa6\xbb\xb3o\x07\xbc\xaal\x96\xbc\xbeF\xc89l\xa9O<.w\x99\xbc\x15\x1dq;\x08\xbc\xaa\xbb\xc5\x9a\x0e\xbb\xb4\x0b"\xbdl_\x05=<*0\xbd(6\xe3\xbb\xef<]\xbc\xe0Q\x11;\xa7\xf4%\xba\xac\x7f\xa6<\x9f\x04E=\xd4\xe9\xbf<\xafAa<\xa3|\xb5\xb7\xe0\xed\xab<\x7f&\'\xbc\xeb\xde\x07\xbcU\xf3G;!\xfc\xb7<\xa8\x076<\xac@\xe6:\x9bg/\xbb\x7f&\xa7\xbc\x1f\x97W<k\x96\xbf<z7\xc1<\x01\x1e\x9a\xbc\\\xe3(;\

### Initialize embeddings engine

Get the vectorizer and create the embeddings from RedisVL

In [11]:
from redisvl.vectorize.text import HFTextVectorizer

# create a vectorizer
hf = HFTextVectorizer(model="sentence-transformers/all-MiniLM-L6-v2")

### Query the database
Now we can use the RedisVL index to perform similarity search operations with Redis

In [21]:
from redisvl.query import (
    VectorQuery, RangeQuery, FilterQuery
)
from redisvl.query.filter import Text


query = "Profit margins"
v = VectorQuery(
    vector=hf.embed(query),
    vector_field_name="chunk_vector",
    num_results=4,
    return_fields=["content"],
    return_score=True
)

# show the raw redis query
str(v)

'*=>[KNN 4 @chunk_vector $vector AS vector_distance] RETURN 2 content vector_distance SORTBY vector_distance ASC DIALECT 2 LIMIT 0 4'

In [16]:
# execute the query with RedisVL
index.query(v)

[{'id': 'doc:redisvl:864c193c06b04f089f7de0280b1a061e',
  'vector_distance': '0.178667545319',
  'content': 'Inventories as of May 31, 2023 were $8.5 billion, flat compared to the prior year, driven by the actions we took throughout fiscal 2023 to manage inventory levels\n\nWe returned $7.5 billion to our shareholders in fiscal 2023 through share repurchases and dividends\n\nReturn on Invested Capital ("ROIC") as of May 31, 2023 was 31.5% compared to 46.5% as of May 31, 2022. ROIC is considered a non-GAAP financial measure, see "Use of Non-GAAP Financial Measures" for further information.\n\nFor discussion related to the results of operations and changes in financial condition for fiscal 2022 compared to fiscal 2021 refer to Part II, Item 7. Management\'s Discussion and Analysis of Financial Condition and Results of Operations in our fiscal 2022 Form 10-K, which was filed with the United States Securities and Exchange Commission on July 21, 2022.\n\nCURRENT ECONOMIC CONDITIONS AND MARK

In [19]:
# or use the search API directly from Redis
# NOTE: the .query() syntax handles results parsing on your behalf

result = index.search(v.query, v.params)

[doc.__dict__ for doc in result.docs]

Result{4 total, docs: [Document {'id': 'doc:redisvl:864c193c06b04f089f7de0280b1a061e', 'payload': None, 'vector_distance': '0.178667545319', 'content': 'Inventories as of May 31, 2023 were $8.5 billion, flat compared to the prior year, driven by the actions we took throughout fiscal 2023 to manage inventory levels\n\nWe returned $7.5 billion to our shareholders in fiscal 2023 through share repurchases and dividends\n\nReturn on Invested Capital ("ROIC") as of May 31, 2023 was 31.5% compared to 46.5% as of May 31, 2022. ROIC is considered a non-GAAP financial measure, see "Use of Non-GAAP Financial Measures" for further information.\n\nFor discussion related to the results of operations and changes in financial condition for fiscal 2022 compared to fiscal 2021 refer to Part II, Item 7. Management\'s Discussion and Analysis of Financial Condition and Results of Operations in our fiscal 2022 Form 10-K, which was filed with the United States Securities and Exchange Commission on July 21, 2

In [20]:
# vector search with metadata filtering

f = Text("content") % "profit"
v.set_filter(f)

index.query(v)

[{'id': 'doc:redisvl:5bd0ffbc3ecb4ce990c367650938e7dd',
  'vector_distance': '0.184492409229',
  'content': 'NIKE Brand apparel revenues increased 8% on a currency-neutral basis, primarily due to higher revenues in Men\'s. Unit sales of apparel increased 4%, while higher ASP per unit contributed approximately 4 percentage points of apparel revenue growth. Higher ASP was primarily due to higher full-price ASP and growth in the size of our NIKE Direct business, partially offset by lower NIKE Direct ASP, reflecting higher promotional activity.\n\nNIKE Direct revenues increased 14% from $18.7 billion in fiscal 2022 to $21.3 billion in fiscal 2023. On a currency-neutral basis, NIKE Direct revenues increased 20% primarily driven by NIKE Brand Digital sales growth of 24%, comparable store sales growth of 14% and the addition of new stores. For further information regarding comparable store sales, including the definition, see "Comparable Store Sales". NIKE Brand Digital sales were $12.6 billi

In [22]:
# Perform a standard text (lexical) search

fq = FilterQuery(return_fields=["content"], filter_expression=f, num_results=4)

# inspect raw redis query
str(fq)

'@content:profit RETURN 1 content DIALECT 2 LIMIT 0 4'

In [23]:
index.query(fq)

[{'id': 'doc:redisvl:cc1c27a4542a491da977012192411a0e',
  'content': 'Proposals to reform U.S. and foreign tax laws could significantly impact how U.S. multinational corporations are taxed on global earnings and could increase the U.S. corporate tax rate. For example, the Organization for Economic Co-operation and Development (OECD) and the G20 Inclusive Framework on Base Erosion and Profit Shifting (the "Inclusive Framework") has put forth two proposals—Pillar One and Pillar Two—that revise the existing profit allocation and nexus rules and ensure a minimal level of taxation, respectively. On December 12, 2022, the European Union member states agreed to implement the Inclusive Framework\'s global corporate minimum tax rate of 15%. Other countries are also actively considering changes to their tax laws to adopt certain parts of the Inclusive Framework\'s proposals. Although we cannot predict whether or in what form these proposals will be enacted into law, these changes, if enacted int

In [30]:
# Perform a Range Query!

rq = RangeQuery(
    vector=hf.embed(query),
    vector_field_name="chunk_vector",
    num_results=4,
    return_fields=["content"],
    return_score=True,
    distance_threshold=0.18  # find all items with a semantic distance of less than 0.18
)


# inspect query
str(rq)

'@chunk_vector:[VECTOR_RANGE $distance_threshold $vector]=>{$yield_distance_as: vector_distance} RETURN 2 content vector_distance SORTBY vector_distance ASC DIALECT 2 LIMIT 0 4'

In [31]:
index.query(rq)

[{'id': 'doc:redisvl:864c193c06b04f089f7de0280b1a061e',
  'vector_distance': '0.178760826588',
  'content': 'Inventories as of May 31, 2023 were $8.5 billion, flat compared to the prior year, driven by the actions we took throughout fiscal 2023 to manage inventory levels\n\nWe returned $7.5 billion to our shareholders in fiscal 2023 through share repurchases and dividends\n\nReturn on Invested Capital ("ROIC") as of May 31, 2023 was 31.5% compared to 46.5% as of May 31, 2022. ROIC is considered a non-GAAP financial measure, see "Use of Non-GAAP Financial Measures" for further information.\n\nFor discussion related to the results of operations and changes in financial condition for fiscal 2022 compared to fiscal 2021 refer to Part II, Item 7. Management\'s Discussion and Analysis of Financial Condition and Results of Operations in our fiscal 2022 Form 10-K, which was filed with the United States Securities and Exchange Commission on July 21, 2022.\n\nCURRENT ECONOMIC CONDITIONS AND MARK

In [37]:
# Add filter to range query

rq.set_filter(f)
rq.set_distance_threshold(0.19)

index.query(rq)

[{'id': 'doc:redisvl:5bd0ffbc3ecb4ce990c367650938e7dd',
  'vector_distance': '0.184509694576',
  'content': 'NIKE Brand apparel revenues increased 8% on a currency-neutral basis, primarily due to higher revenues in Men\'s. Unit sales of apparel increased 4%, while higher ASP per unit contributed approximately 4 percentage points of apparel revenue growth. Higher ASP was primarily due to higher full-price ASP and growth in the size of our NIKE Direct business, partially offset by lower NIKE Direct ASP, reflecting higher promotional activity.\n\nNIKE Direct revenues increased 14% from $18.7 billion in fiscal 2022 to $21.3 billion in fiscal 2023. On a currency-neutral basis, NIKE Direct revenues increased 20% primarily driven by NIKE Brand Digital sales growth of 24%, comparable store sales growth of 14% and the addition of new stores. For further information regarding comparable store sales, including the definition, see "Comparable Store Sales". NIKE Brand Digital sales were $12.6 billi

## Cleanup

Clean up the index.

In [38]:
index.delete(drop=True)