![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)
# Vector Search with Redisvl
## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/vector-search/01_redisvl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Prepare data

In this examples we will load a list of movie objects with the following attributes: `title`, `rating`, `description`, and `genre`.

For the vector part of our vector search we will embed the description so that user's can search for movies that best match what they're looking for.

**If you are running this notebook locally**, FYI you may not need to perform this step at all.

In [1]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/vector-search/resources .
!rm -rf temp_repo

Cloning into 'temp_repo'...
remote: Enumerating objects: 473, done.[K
remote: Counting objects: 100% (225/225), done.[K
remote: Compressing objects: 100% (141/141), done.[K
remote: Total 473 (delta 146), reused 114 (delta 79), pack-reused 248 (from 2)[K
Receiving objects: 100% (473/473), 25.68 MiB | 7.02 MiB/s, done.
Resolving deltas: 100% (212/212), done.


## Packages

In [2]:
# NBVAL_SKIP
%pip install -q redis redisvl numpy sentence-transformers

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/261.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.5/261.5 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/99.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m99.3/99.3 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/46.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/86.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.8/86.8 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25h

## Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings created from PDF document chunks. **We need to make sure we have a Redis
instance available.

#### For Colab
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [3]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


#### For Alternative Environments
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [4]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Create redis client

In [5]:
from redis import Redis

client = Redis.from_url(REDIS_URL)

### Load Data

In [6]:
import pandas as pd
import numpy as np
import json

df = pd.read_json("resources/movies.json")
df.head()

Unnamed: 0,title,genre,rating,description
0,Explosive Pursuit,action,7,A daring cop chases a notorious criminal acros...
1,Skyfall,action,8,James Bond returns to track down a dangerous n...
2,Fast & Furious 9,action,6,Dom and his crew face off against a high-tech ...
3,Black Widow,action,7,Natasha Romanoff confronts her dark past and f...
4,John Wick,action,8,A retired hitman seeks vengeance against those...


In [7]:
from redisvl.utils.vectorize import HFTextVectorizer
from tqdm.auto import tqdm


hf = HFTextVectorizer("sentence-transformers/all-MiniLM-L6-v2")
os.environ["TOKENIZERS_PARALLELISM"] = "false"

20:39:18 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: cuda
20:39:18 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [9]:
# Generate vector embeddings
df["vector"] = hf.embed_many(df["description"].tolist(), as_buffer=True)
df.head()

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,title,genre,rating,description,vector
0,Explosive Pursuit,action,7,A daring cop chases a notorious criminal acros...,b'\x99f|=2a\n;\x1d\x92\xb7;\x1b\xcb~\xbd\xb4d\...
1,Skyfall,action,8,James Bond returns to track down a dangerous n...,b'\x99D\x9e\xbd-\x9b\x89\xbc\xe0\x16\x95\xbc\x...
2,Fast & Furious 9,action,6,Dom and his crew face off against a high-tech ...,"b'\x1c\xa5\xc7\xbc\xfa,\xa2=*\x19H\xbcB\xc6t\x..."
3,Black Widow,action,7,Natasha Romanoff confronts her dark past and f...,b'r\xeb\x85\xbd\x1c\xcdo\xbd\x80\xe8\xc2\xbb4\...
4,John Wick,action,8,A retired hitman seeks vengeance against those...,b'\n=x\xbb\xf7.\xc5=\x83\x85:;\xd6\xd0\x94<\xc...


## Define Redis index schema

In [10]:
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex

index_name = "movies"

schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
  },
  "fields": [
    {
        "name": "title",
        "type": "text",
    },
    {
        "name": "description",
        "type": "text",
    },
    {
        "name": "genre",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "rating",
        "type": "numeric",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "vector",
        "type": "vector",
        "attrs": {
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
  ]
})


index = SearchIndex(schema, client)
index.create(overwrite=True, drop=True)

## Populate index

In [11]:
index.load(df.to_dict(orient="records"))

['rvl:3c5ec3a5c0914226bdff769b5e96015b',
 'rvl:f6c5664f1c934853be145579cd8d249a',
 'rvl:c9ce763525a54d44ae67b939d412d891',
 'rvl:ff47ff2213df4925b6d82cb89854d6a0',
 'rvl:477c90328622446fb00d867dedb6175b',
 'rvl:91351d436e514dc8928fea80509c0f45',
 'rvl:db6d7a8e046544c89d6fe2376333755e',
 'rvl:5c907ff3f0c54d4994b26938ae94b5c0',
 'rvl:4ef4bf5c2f8740988317a8de092bfe41',
 'rvl:cf87eebcceb14696b95a80b3604d8ec5',
 'rvl:57586058885443dcabe5f7f381cd919a',
 'rvl:c9e18df50207497abd24c69650c2e85f',
 'rvl:cd96c1da464445de8701f8f916e61376',
 'rvl:0adf434abb5f430391c281535d67a7ba',
 'rvl:0dc1586f37814a139a8e0e6e03ee6b7c',
 'rvl:9e25163279b74777928ce01e9854b513',
 'rvl:b4a467e7567545b3ba2aa6866e758e12',
 'rvl:98352515c71646cebbf7fc4e46c55889',
 'rvl:216b578b86644ed5b153d67e4fa6b990',
 'rvl:ca08822cc23a44d0b9fd984c2703c21c']

In [16]:
!rvl index info -i movies



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ movies       │ HASH           │ ['rvl']    │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭─────────────┬─────────────┬─────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮
│ Name        │ Attribute   │ Type    │ Field Option   │ Option Value   │ Field Option   │ Option Value   │ Field Option   │   Option Value │ Field Option    │ Option Value   │ Field Option   │   Option Value │ Field Option    │   Option Value │
├─────────────┼─────────────┼─────────┼────────────────┼────────────────┼──────────

## Index loaded now we can perform vector search

### basic vector search

In [17]:
from redisvl.query import VectorQuery

user_query = "High tech movies"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True
)

result = index.query(vec_query)
pd.DataFrame(result)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre
0,rvl:c9ce763525a54d44ae67b939d412d891,0.685773432255,Fast & Furious 9,6,action
1,rvl:98352515c71646cebbf7fc4e46c55889,0.801602959633,Despicable Me,7,comedy
2,rvl:9e25163279b74777928ce01e9854b513,0.812341988087,The Incredibles,8,comedy


### Hybrid filter vector search

Redis allows you to combine filter searches on fields within the index object allowing us to create more specific searches.

In [18]:
# Search for top 3 movies specifically in the action genre

from redisvl.query.filter import Tag

user_query = "High tech movies"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True
)

tag_filter = Tag("genre") == "action"

vec_query.set_filter(tag_filter)

result=index.query(vec_query)
pd.DataFrame(result)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre
0,rvl:c9ce763525a54d44ae67b939d412d891,0.685773432255,Fast & Furious 9,6,action
1,rvl:91351d436e514dc8928fea80509c0f45,0.820429563522,Mad Max: Fury Road,8,action
2,rvl:3c5ec3a5c0914226bdff769b5e96015b,0.851705253124,Explosive Pursuit,7,action


In [19]:
# Search for top 3 movies specifically in the action genre with ratings at or above a 7

from redisvl.query.filter import Num

user_query = "High tech movies"

embedded_user_query = hf.embed(user_query)

tag_filter = Tag("genre") == "action"
num_filter = Num("rating") >= 7
combined_filter = tag_filter & num_filter

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True,
    filter_expression=combined_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre
0,rvl:91351d436e514dc8928fea80509c0f45,0.820429563522,Mad Max: Fury Road,8,action
1,rvl:3c5ec3a5c0914226bdff769b5e96015b,0.851705253124,Explosive Pursuit,7,action
2,rvl:cf87eebcceb14696b95a80b3604d8ec5,0.856359839439,The Avengers,8,action


In [20]:
# Search with full text search for movies that directly mention "criminal mastermind" in the description

from redisvl.query.filter import Text

user_query = "High tech movies"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True
)

text_filter = Text("description") == "criminal mastermind"

vec_query.set_filter(text_filter)

result = index.query(vec_query)
pd.DataFrame(result)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,rvl:98352515c71646cebbf7fc4e46c55889,0.801602959633,Despicable Me,7,comedy,When a criminal mastermind uses a trio of orph...
1,rvl:db6d7a8e046544c89d6fe2376333755e,0.982345640659,The Dark Knight,9,action,"Batman faces off against the Joker, a criminal..."


In [21]:
# Vector search with wildcard match

from redisvl.query.filter import Text

user_query = "High tech movies"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True
)

text_filter = Text("description") % "crim*"

vec_query.set_filter(text_filter)

result = index.query(vec_query)
pd.DataFrame(result)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,rvl:98352515c71646cebbf7fc4e46c55889,0.801602959633,Despicable Me,7,comedy,When a criminal mastermind uses a trio of orph...
1,rvl:9e25163279b74777928ce01e9854b513,0.812341988087,The Incredibles,8,comedy,"A family of undercover superheroes, while tryi..."
2,rvl:3c5ec3a5c0914226bdff769b5e96015b,0.851705253124,Explosive Pursuit,7,action,A daring cop chases a notorious criminal acros...


In [22]:
# Vector search with fuzzy match filter

from redisvl.query.filter import Text

user_query = "Movies with central main character"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True
)

# Note: fuzzy match is based on Levenshtein distance. Therefore, "hero" might return result for "her" as an example.
# See docs for more info https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/query_syntax/
text_filter = Text("description") % "%hero%"

vec_query.set_filter(text_filter)

result = index.query(vec_query)
pd.DataFrame(result)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,rvl:cf87eebcceb14696b95a80b3604d8ec5,0.737778425217,The Avengers,8,action,Earth's mightiest heroes come together to stop...
1,rvl:ff47ff2213df4925b6d82cb89854d6a0,0.768839836121,Black Widow,7,action,Natasha Romanoff confronts her dark past and f...
2,rvl:ca08822cc23a44d0b9fd984c2703c21c,0.897787928581,The Princess Diaries,6,comedy,Mia Thermopolis has just found out that she is...


## Range queries

Range queries allow you to set a pre defined "threshold" for which we want to return documents. This is helpful when you only want documents with a certain distance from the search query.

In [23]:
from redisvl.query import RangeQuery

user_query = "Family friendly fantasy movies"

embedded_user_query = hf.embed(user_query)

range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    return_score=True,
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.8
)

result = index.query(range_query)
pd.DataFrame(result)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre
0,rvl:9e25163279b74777928ce01e9854b513,0.644702553749,The Incredibles,8,comedy
1,rvl:ff47ff2213df4925b6d82cb89854d6a0,0.747987031937,Black Widow,7,action
2,rvl:98352515c71646cebbf7fc4e46c55889,0.750915527344,Despicable Me,7,comedy
3,rvl:0dc1586f37814a139a8e0e6e03ee6b7c,0.751298844814,Shrek,8,comedy
4,rvl:b4a467e7567545b3ba2aa6866e758e12,0.761669397354,"Monsters, Inc.",8,comedy
5,rvl:cd96c1da464445de8701f8f916e61376,0.778580188751,Aladdin,8,comedy


Like the queries above, we can also chain additional filters and conditional operators with range queries. The following adds an `and` condition that returns vector search within the defined range and with a rating at or above 8.

In [24]:
user_query = "Family friendly fantasy movies"

embedded_user_query = hf.embed(user_query)

range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.7
)

numeric_filter = Num("rating") >= 8

range_query.set_filter(numeric_filter)

# in this case we want to do a simple filter search or the vector so we execute as a joint filter directly
result = index.query(range_query)
pd.DataFrame(result)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,vector_distance,title,rating,genre
0,rvl:9e25163279b74777928ce01e9854b513,0.644702553749,The Incredibles,8,comedy
1,rvl:0dc1586f37814a139a8e0e6e03ee6b7c,0.751298844814,Shrek,8,comedy
2,rvl:b4a467e7567545b3ba2aa6866e758e12,0.761669397354,"Monsters, Inc.",8,comedy
3,rvl:cd96c1da464445de8701f8f916e61376,0.778580188751,Aladdin,8,comedy


### Next steps

For more query examples with redisvl: [see here](https://github.com/redis/redis-vl-python/blob/main/docs/user_guide/hybrid_queries_02.ipynb)

In [None]:
# clean up!
# client.flushall()