![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)
# Vector Search with Redisvl
## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/vector-search/01_redisvl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Prepare data

In this examples we will load a list of movie objects with the following attributes: `title`, `rating`, `description`, and `genre`.

For the vector part of our vector search we will embed the description so that user's can search for movies that best match what they're looking for.

**If you are running this notebook locally**, FYI you may not need to perform this step at all.

In [1]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/vector-search/resources .
!rm -rf temp_repo

Cloning into 'temp_repo'...
remote: Enumerating objects: 384, done.[K
remote: Counting objects: 100% (247/247), done.[K
remote: Compressing objects: 100% (159/159), done.[K
remote: Total 384 (delta 135), reused 153 (delta 74), pack-reused 137 (from 1)[K
Receiving objects: 100% (384/384), 64.50 MiB | 15.56 MiB/s, done.
Resolving deltas: 100% (159/159), done.


## Packages

In [2]:
%pip install -q redis "redisvl>=0.4.1" numpy sentence-transformers pandas

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/261.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/96.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.1/96.1 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.8/86.8 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h

## Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings created from PDF document chunks. **We need to make sure we have a Redis
instance available.

#### For Colab
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [3]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


#### For Alternative Environments
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [2]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Create redis client

In [5]:
from redis import Redis

client = Redis.from_url(REDIS_URL)

### Load Data

In [5]:
import pandas as pd
import numpy as np
import json

df = pd.read_json("resources/movies.json")
df.head()


Unnamed: 0,title,genre,rating,description
0,Explosive Pursuit,action,7,A daring cop chases a notorious criminal acros...
1,Skyfall,action,8,James Bond returns to track down a dangerous n...
2,Fast & Furious 9,action,6,Dom and his crew face off against a high-tech ...
3,Black Widow,action,7,Natasha Romanoff confronts her dark past and f...
4,John Wick,action,8,A retired hitman seeks vengeance against those...


In [3]:
from redisvl.utils.vectorize import HFTextVectorizer

hf = HFTextVectorizer("sentence-transformers/all-MiniLM-L6-v2")
os.environ["TOKENIZERS_PARALLELISM"] = "false"


In [6]:

def texts_to_embeddings(texts):
  return [np.array(embedding, dtype=np.float32).tobytes() for embedding in hf.embed_many(texts)]

# Generate vector embeddings
df["vector"] = texts_to_embeddings(df["description"].tolist())
df.head()

Unnamed: 0,title,genre,rating,description,vector
0,Explosive Pursuit,action,7,A daring cop chases a notorious criminal acros...,b'\x9bf|=\xe8`\n;\x17\x92\xb7;7\xcb~\xbd\xf5d\...
1,Skyfall,action,8,James Bond returns to track down a dangerous n...,b'\x9eD\x9e\xbd]\x9b\x89\xbc\xb4\x16\x95\xbc\x...
2,Fast & Furious 9,action,6,Dom and his crew face off against a high-tech ...,"b'\x18\xa5\xc7\xbc\xfb,\xa2=&\x19H\xbcK\xc6t\x..."
3,Black Widow,action,7,Natasha Romanoff confronts her dark past and f...,b's\xeb\x85\xbd\x0e\xcdo\xbd\x1f\xe9\xc2\xbbF\...
4,John Wick,action,8,A retired hitman seeks vengeance against those...,b'2<x\xbb\xf7.\xc5=\x19\x87:;\xc1\xd0\x94<\x8a...


## Define Redis index schema

In [9]:
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex

index_name = "movies"

schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
  },
  "fields": [
    {
        "name": "title",
        "type": "text",
    },
    {
        "name": "description",
        "type": "text",
    },
    {
        "name": "genre",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "rating",
        "type": "numeric",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "vector",
        "type": "vector",
        "attrs": {
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
  ]
})


index = SearchIndex(schema, client)
index.create(overwrite=True, drop=True)

## Populate index

In [10]:
index.load(df.to_dict(orient="records"))

['rvl:c2f305b7836a40bdafb7223184e26b5b',
 'rvl:59d2b0bd431148e9b2c5516b8417eb75',
 'rvl:299b8027b07446fda39ca8bf37789776',
 'rvl:6406f177887f401b9ab4d45956e324ae',
 'rvl:d051b98b40f64279b9dad13f7d4666ce',
 'rvl:55a20aec9cb74e33ae9e2d8952bcde25',
 'rvl:a5ef870f9a5b41b38e950ea0fa7a9f83',
 'rvl:7dd3e91a505a40b8bd1cf50613e4179c',
 'rvl:4545e847159f441d843611f05df01983',
 'rvl:646f249d9ff646e7ae5b757cd6caad0a',
 'rvl:5cbf956aa543497c8e0a4e2a08e7fa53',
 'rvl:e3d4efbae7bd49e49300d0c0e1e54426',
 'rvl:2dbf334c02854aaab87c196f47cb2729',
 'rvl:1dfb97323ddc47fdbc07aaa299658c99',
 'rvl:ca2c31d6ce8740ca9e2ce711c0d9ef56',
 'rvl:d9ba373d3a174ed1a08d096de012f56d',
 'rvl:1e52ad75779046578e9b05499f78ec92',
 'rvl:19415f0c3b2646b3abbf994762486e4f',
 'rvl:ac862266170c4ee8942bf76791056295',
 'rvl:9161346ce4b645dca6833386cc05ade4']

In [11]:
index.info()

{'index_name': 'movies',
 'index_options': [],
 'index_definition': ['key_type',
  'HASH',
  'prefixes',
  ['rvl'],
  'default_score',
  '1'],
 'attributes': [['identifier',
   'title',
   'attribute',
   'title',
   'type',
   'TEXT',
   'WEIGHT',
   '1'],
  ['identifier',
   'description',
   'attribute',
   'description',
   'type',
   'TEXT',
   'WEIGHT',
   '1'],
  ['identifier',
   'genre',
   'attribute',
   'genre',
   'type',
   'TAG',
   'SEPARATOR',
   ',',
   'SORTABLE'],
  ['identifier',
   'rating',
   'attribute',
   'rating',
   'type',
   'NUMERIC',
   'SORTABLE',
   'UNF'],
  ['identifier',
   'vector',
   'attribute',
   'vector',
   'type',
   'VECTOR',
   'algorithm',
   'HNSW',
   'data_type',
   'FLOAT32',
   'dim',
   384,
   'distance_metric',
   'COSINE',
   'M',
   16,
   'ef_construction',
   200]],
 'num_docs': 20,
 'max_doc_id': 20,
 'num_terms': 432,
 'num_records': 583,
 'inverted_sz_mb': '0.043068885803222656',
 'vector_index_sz_mb': '1.7178497314453125

## Index loaded now we can perform vector search

### basic vector search

In [12]:
from redisvl.query import VectorQuery

user_query = "High tech movies"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True
)

result = index.query(vec_query)
pd.DataFrame(result)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre
0,rvl:299b8027b07446fda39ca8bf37789776,0.685773432255,Fast & Furious 9,6,action
1,rvl:19415f0c3b2646b3abbf994762486e4f,0.801602959633,Despicable Me,7,comedy
2,rvl:d9ba373d3a174ed1a08d096de012f56d,0.812341988087,The Incredibles,8,comedy


### Hybrid filter vector search

Redis allows you to combine filter searches on fields within the index object allowing us to create more specific searches.

In [13]:
# Search for top 3 movies specifically in the action genre

from redisvl.query.filter import Tag

user_query = "High tech movies"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True
)

tag_filter = Tag("genre") == "action"

vec_query.set_filter(tag_filter)

result=index.query(vec_query)
pd.DataFrame(result)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre
0,rvl:299b8027b07446fda39ca8bf37789776,0.685773432255,Fast & Furious 9,6,action
1,rvl:55a20aec9cb74e33ae9e2d8952bcde25,0.820429563522,Mad Max: Fury Road,8,action
2,rvl:c2f305b7836a40bdafb7223184e26b5b,0.851705253124,Explosive Pursuit,7,action


In [14]:
# Search for top 3 movies specifically in the action genre with ratings at or above a 7

from redisvl.query.filter import Num

user_query = "High tech movies"

embedded_user_query = hf.embed(user_query)

tag_filter = Tag("genre") == "action"
num_filter = Num("rating") >= 7
combined_filter = tag_filter & num_filter

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True,
    filter_expression=combined_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre
0,rvl:55a20aec9cb74e33ae9e2d8952bcde25,0.820429563522,Mad Max: Fury Road,8,action
1,rvl:c2f305b7836a40bdafb7223184e26b5b,0.851705253124,Explosive Pursuit,7,action
2,rvl:646f249d9ff646e7ae5b757cd6caad0a,0.856359839439,The Avengers,8,action


In [15]:
# Search with full text search for movies that directly mention "criminal mastermind" in the description

from redisvl.query.filter import Text

user_query = "High tech movies"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True
)

text_filter = Text("description") == "criminal mastermind"

vec_query.set_filter(text_filter)

result = index.query(vec_query)
pd.DataFrame(result)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,rvl:19415f0c3b2646b3abbf994762486e4f,0.801602959633,Despicable Me,7,comedy,When a criminal mastermind uses a trio of orph...
1,rvl:a5ef870f9a5b41b38e950ea0fa7a9f83,0.982345640659,The Dark Knight,9,action,"Batman faces off against the Joker, a criminal..."


In [16]:
# Vector search with wildcard match

from redisvl.query.filter import Text

user_query = "High tech movies"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True
)

text_filter = Text("description") % "crim*"

vec_query.set_filter(text_filter)

result = index.query(vec_query)
pd.DataFrame(result)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,rvl:19415f0c3b2646b3abbf994762486e4f,0.801602959633,Despicable Me,7,comedy,When a criminal mastermind uses a trio of orph...
1,rvl:d9ba373d3a174ed1a08d096de012f56d,0.812341988087,The Incredibles,8,comedy,"A family of undercover superheroes, while tryi..."
2,rvl:c2f305b7836a40bdafb7223184e26b5b,0.851705253124,Explosive Pursuit,7,action,A daring cop chases a notorious criminal acros...


In [17]:
# Vector search with fuzzy match filter

from redisvl.query.filter import Text

user_query = "Movies with central main character"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True
)

# Note: fuzzy match is based on Levenshtein distance. Therefore, "hero" might return result for "her" as an example.
# See docs for more info https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/query_syntax/
text_filter = Text("description") % "%hero%"

vec_query.set_filter(text_filter)

result = index.query(vec_query)
pd.DataFrame(result)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,rvl:646f249d9ff646e7ae5b757cd6caad0a,0.737778425217,The Avengers,8,action,Earth's mightiest heroes come together to stop...
1,rvl:6406f177887f401b9ab4d45956e324ae,0.768839836121,Black Widow,7,action,Natasha Romanoff confronts her dark past and f...
2,rvl:9161346ce4b645dca6833386cc05ade4,0.897787928581,The Princess Diaries,6,comedy,Mia Thermopolis has just found out that she is...


## Range queries

Range queries allow you to set a pre defined "threshold" for which we want to return documents. This is helpful when you only want documents with a certain distance from the search query.

In [18]:
from redisvl.query import RangeQuery

user_query = "Family friendly fantasy movies"

embedded_user_query = hf.embed(user_query)

range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    return_score=True,
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.8
)

result = index.query(range_query)
pd.DataFrame(result)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre
0,rvl:d9ba373d3a174ed1a08d096de012f56d,0.644702494144,The Incredibles,8,comedy
1,rvl:6406f177887f401b9ab4d45956e324ae,0.747987031937,Black Widow,7,action
2,rvl:19415f0c3b2646b3abbf994762486e4f,0.750915527344,Despicable Me,7,comedy
3,rvl:ca2c31d6ce8740ca9e2ce711c0d9ef56,0.751298844814,Shrek,8,comedy
4,rvl:1e52ad75779046578e9b05499f78ec92,0.761669397354,"Monsters, Inc.",8,comedy
5,rvl:2dbf334c02854aaab87c196f47cb2729,0.778580188751,Aladdin,8,comedy


Like the queries above, we can also chain additional filters and conditional operators with range queries. The following adds an `and` condition that returns vector search within the defined range and with a rating at or above 8.

In [19]:
user_query = "Family friendly fantasy movies"

embedded_user_query = hf.embed(user_query)

range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.7
)

numeric_filter = Num("rating") >= 8

range_query.set_filter(numeric_filter)

# in this case we want to do a simple filter search or the vector so we execute as a joint filter directly
result = index.query(range_query)
pd.DataFrame(result)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre
0,rvl:d9ba373d3a174ed1a08d096de012f56d,0.644702494144,The Incredibles,8,comedy
1,rvl:ca2c31d6ce8740ca9e2ce711c0d9ef56,0.751298844814,Shrek,8,comedy
2,rvl:1e52ad75779046578e9b05499f78ec92,0.761669397354,"Monsters, Inc.",8,comedy
3,rvl:2dbf334c02854aaab87c196f47cb2729,0.778580188751,Aladdin,8,comedy


### Next steps

For more query examples with redisvl: [see here](https://github.com/redis/redis-vl-python/blob/main/docs/user_guide/hybrid_queries_02.ipynb)

In [None]:
# clean up!
index.delete()