![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)
# Vector Search with RedisVL

## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/vector-search/01_redisvl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Prepare data

In this examples we will load a list of movies with the following attributes: `title`, `rating`, `description`, and `genre`.

We will embed the movie description so that user's can search for movies that best match the kind of movie that they're looking for.

**If you are running this notebook locally**, FYI you may not need to perform this step at all.

In [1]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/vector-search/resources .
!rm -rf temp_repo

Cloning into 'temp_repo'...
remote: Enumerating objects: 669, done.[K
remote: Counting objects: 100% (320/320), done.[K
remote: Compressing objects: 100% (207/207), done.[K
remote: Total 669 (delta 219), reused 141 (delta 112), pack-reused 349 (from 2)[K
Receiving objects: 100% (669/669), 57.77 MiB | 20.61 MiB/s, done.
Resolving deltas: 100% (287/287), done.


## Packages

In [None]:
%pip install -q "redisvl==0.5.2" sentence-transformers pandas nltk

## Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings created from PDF document chunks. **We need to make sure we have a Redis
instance available.

#### For Colab
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [None]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

#### For Alternative Environments
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [1]:
import os
import warnings

warnings.filterwarnings('ignore')

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Create redis client

In [48]:
from redis import Redis

client = Redis.from_url(REDIS_URL)
client.ping()

True

In [4]:
#client.flushall()

True

### Load Movies Dataset

In [49]:
import pandas as pd
import numpy as np
import json

df = pd.read_json("resources/movies.json")
print("Loaded", len(df), "movie entries")

df.head()

Loaded 20 movie entries


Unnamed: 0,title,genre,rating,description
0,Explosive Pursuit,action,7,A daring cop chases a notorious criminal acros...
1,Skyfall,action,8,James Bond returns to track down a dangerous n...
2,Fast & Furious 9,action,6,Dom and his crew face off against a high-tech ...
3,Black Widow,action,7,Natasha Romanoff confronts her dark past and f...
4,John Wick,action,8,A retired hitman seeks vengeance against those...


In [50]:
from redisvl.utils.vectorize import HFTextVectorizer

os.environ["TOKENIZERS_PARALLELISM"] = "false"


hf = HFTextVectorizer("sentence-transformers/all-MiniLM-L6-v2")

In [51]:
# Generate vectors
df["vector"] = hf.embed_many(df["description"].tolist(), as_buffer=True)

df.head()

Unnamed: 0,title,genre,rating,description,vector
0,Explosive Pursuit,action,7,A daring cop chases a notorious criminal acros...,b'\x9bf|=\na\n;\xbf\x91\xb7;\x19\xcb~\xbd\xd9d...
1,Skyfall,action,8,James Bond returns to track down a dangerous n...,b'\x9aD\x9e\xbd0\x9b\x89\xbc\xc3\x16\x95\xbc\x...
2,Fast & Furious 9,action,6,Dom and his crew face off against a high-tech ...,"b'*\xa5\xc7\xbc\xf6,\xa2=?\x19H\xbcK\xc6t\xbd\..."
3,Black Widow,action,7,Natasha Romanoff confronts her dark past and f...,b'u\xeb\x85\xbd\x0e\xcdo\xbd&\xe8\xc2\xbb6\xcf...
4,John Wick,action,8,A retired hitman seeks vengeance against those...,b'\xaf<x\xbb\xfb.\xc5=B\x86:;\xce\xd0\x94<\xf9...


## Define Redis index schema

In [52]:
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex


index_name = "movies"

schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
    "prefix": index_name,
    "storage_type": "hash"
  },
  "fields": [
    {
        "name": "title",
        "type": "text",
    },
    {
        "name": "description",
        "type": "text",
    },
    {
        "name": "genre",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "rating",
        "type": "numeric",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "vector",
        "type": "vector",
        "attrs": {
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "flat",
            "datatype": "float32"
        }
    }
  ]
})


index = SearchIndex(schema, client)
index.create(overwrite=True, drop=True)

In [53]:
!rvl index info -i movies -u {REDIS_URL}



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ movies       │ HASH           │ ['movies'] │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭─────────────┬─────────────┬─────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮
│ Name        │ Attribute   │ Type    │ Field Option   │ Option Value   │ Field Option   │ Option Value   │ Field Option   │   Option Value │ Field Option    │ Option Value   │
├─────────────┼─────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼────────────────┤
│ title       │ title       │ TEXT    │ WEIG

## Populate index

In [54]:
index.load(df.to_dict(orient="records"))

['movies:01JSHDN7Q4AHFG0J8D7Q8QS1BG',
 'movies:01JSHDN7Q40Z6PQ3VQDBY80Q7A',
 'movies:01JSHDN7Q4GG029M45HQY8Q5T2',
 'movies:01JSHDN7Q4QFMK2MGNKJHTFT9E',
 'movies:01JSHDN7Q4E7S8094JET0DHC5P',
 'movies:01JSHDN7Q40QYH6Q6TD7ES4TSG',
 'movies:01JSHDN7Q4ZG1SBF02A9SMV6DB',
 'movies:01JSHDN7Q4YW9PSZ346M3N9JQ0',
 'movies:01JSHDN7Q481PGAEDBX0QG75RP',
 'movies:01JSHDN7Q4031K8AS44WJ3J9ZR',
 'movies:01JSHDN7Q4FW8FEX1RTR92QSF7',
 'movies:01JSHDN7Q4AS7C9VT582PWK14J',
 'movies:01JSHDN7Q4H6JSC5Y2FKT7SWJ8',
 'movies:01JSHDN7Q4W57N6NMRBG9FZY4E',
 'movies:01JSHDN7Q4Y009NXBPM25YDZDV',
 'movies:01JSHDN7Q4JARJS5Q4HQZ90RRX',
 'movies:01JSHDN7Q4W21S96KMMBZ2X6KY',
 'movies:01JSHDN7Q4W1NYVVAEBTV91X9X',
 'movies:01JSHDN7Q4T84F0E8XQX82VY4Q',
 'movies:01JSHDN7Q4W3ZWAJGWT7YQPKVR']

## Search techniques

### Standard vector search

In [55]:
from redisvl.query import VectorQuery

user_query = "High tech and action packed movie"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "genre"],
    return_score=True,
)

result = index.query(vec_query)
pd.DataFrame(result)


Unnamed: 0,id,vector_distance,title,genre
0,movies:01JSHDN7Q4GG029M45HQY8Q5T2,0.64973795414,Fast & Furious 9,action
1,movies:01JSHDN7Q40QYH6Q6TD7ES4TSG,0.763235092163,Mad Max: Fury Road,action
2,movies:01JSHDN7Q4AS7C9VT582PWK14J,0.792449712753,The Lego Movie,comedy


### Vector search with filters

Redis allows you to combine filter searches on fields within the index object allowing us to create more specific searches.

Search for top 3 movies specifically in the action genre:


In [56]:
from redisvl.query.filter import Tag

tag_filter = Tag("genre") == "action"

vec_query.set_filter(tag_filter)

result=index.query(vec_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,title,genre
0,movies:01JSHDN7Q4GG029M45HQY8Q5T2,0.64973795414,Fast & Furious 9,action
1,movies:01JSHDN7Q40QYH6Q6TD7ES4TSG,0.763235092163,Mad Max: Fury Road,action
2,movies:01JSHDN7Q4AHFG0J8D7Q8QS1BG,0.796153008938,Explosive Pursuit,action


Search for top 3 movies specifically in the action genre with ratings at or above a 7:


In [57]:
from redisvl.query.filter import Num

# build combined filter expressions
tag_filter = Tag("genre") == "action"
num_filter = Num("rating") >= 7
combined_filter = tag_filter & num_filter

# build vector query
vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True,
    filter_expression=combined_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,title,rating,genre
0,movies:01JSHDN7Q40QYH6Q6TD7ES4TSG,0.763235092163,Mad Max: Fury Road,8,action
1,movies:01JSHDN7Q4AHFG0J8D7Q8QS1BG,0.796153008938,Explosive Pursuit,7,action
2,movies:01JSHDN7Q481PGAEDBX0QG75RP,0.87649422884,Inception,9,action


Search with full text search for movies that directly mention "criminal mastermind" in the description:


In [58]:
from redisvl.query.filter import Text

text_filter = Text("description") % "criminal mastermind"

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True,
    filter_expression=text_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,movies:01JSHDN7Q4W1NYVVAEBTV91X9X,0.827254056931,Despicable Me,7,comedy,When a criminal mastermind uses a trio of orph...
1,movies:01JSHDN7Q4ZG1SBF02A9SMV6DB,0.990856587887,The Dark Knight,9,action,"Batman faces off against the Joker, a criminal..."


Vector search with wildcard text match:


In [59]:
text_filter = Text("description") % "crim*"

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True,
    filter_expression=text_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,movies:01JSHDN7Q4AHFG0J8D7Q8QS1BG,0.796153008938,Explosive Pursuit,7,action,A daring cop chases a notorious criminal acros...
1,movies:01JSHDN7Q4JARJS5Q4HQZ90RRX,0.807471334934,The Incredibles,8,comedy,"A family of undercover superheroes, while tryi..."
2,movies:01JSHDN7Q4W1NYVVAEBTV91X9X,0.827254056931,Despicable Me,7,comedy,When a criminal mastermind uses a trio of orph...


Vector search with fuzzy match filter

> Note: fuzzy match is based on Levenshtein distance. Therefore, "hero" might return result for "her" as an example.

See docs for more info https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/query_syntax/


In [60]:

text_filter = Text("description") % "%hero%"

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True,
    filter_expression=text_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,movies:01JSHDN7Q4QFMK2MGNKJHTFT9E,0.889985799789,Black Widow,7,action,Natasha Romanoff confronts her dark past and f...
1,movies:01JSHDN7Q4031K8AS44WJ3J9ZR,0.893866717815,The Avengers,8,action,Earth's mightiest heroes come together to stop...
2,movies:01JSHDN7Q4W3ZWAJGWT7YQPKVR,0.943198204041,The Princess Diaries,6,comedy,Mia Thermopolis has just found out that she is...


### Range queries

Range queries allow you to set a pre defined distance "threshold" for which we want to return documents. This is helpful when you only want documents with a certain "radius" from the search query.

In [61]:
from redisvl.query import RangeQuery

user_query = "Family friendly fantasy movies"

embedded_user_query = hf.embed(user_query)

range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    return_score=True,
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.8
)

result = index.query(range_query)
pd.DataFrame(result)


Unnamed: 0,id,vector_distance,title,rating,genre
0,movies:01JSHDN7Q4JARJS5Q4HQZ90RRX,0.644702494144,The Incredibles,8,comedy
1,movies:01JSHDN7Q4QFMK2MGNKJHTFT9E,0.747987031937,Black Widow,7,action
2,movies:01JSHDN7Q4W1NYVVAEBTV91X9X,0.750915527344,Despicable Me,7,comedy
3,movies:01JSHDN7Q4Y009NXBPM25YDZDV,0.751298904419,Shrek,8,comedy
4,movies:01JSHDN7Q4W21S96KMMBZ2X6KY,0.761669397354,"Monsters, Inc.",8,comedy
5,movies:01JSHDN7Q4H6JSC5Y2FKT7SWJ8,0.778580129147,Aladdin,8,comedy


Like the queries above, we can also chain additional filters and conditional operators with range queries. The following adds an `and` condition that returns vector search within the defined range and with a rating at or above 8.

In [62]:
range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    distance_threshold=0.8
)

numeric_filter = Num("rating") >= 8

range_query.set_filter(numeric_filter)

# in this case we want to do a simple filter search or the vector so we execute as a joint filter directly
result = index.query(range_query)
pd.DataFrame(result)


Unnamed: 0,id,vector_distance,title,rating,genre
0,movies:01JSHDN7Q4JARJS5Q4HQZ90RRX,0.644702494144,The Incredibles,8,comedy
1,movies:01JSHDN7Q4Y009NXBPM25YDZDV,0.751298904419,Shrek,8,comedy
2,movies:01JSHDN7Q4W21S96KMMBZ2X6KY,0.761669397354,"Monsters, Inc.",8,comedy
3,movies:01JSHDN7Q4H6JSC5Y2FKT7SWJ8,0.778580129147,Aladdin,8,comedy


### Full text search

In [74]:
from redisvl.query import TextQuery

user_query = "High tech, action packed, superheros fight scenes"

text_query = TextQuery(
    text=user_query,
    text_field_name="description",
    text_scorer="BM25STD",
    num_results=20,
    return_fields=["title", "description"],
)

result = index.query(text_query)[:4]
pd.DataFrame(result)[["title", "score"]]

Unnamed: 0,title,score
0,Fast & Furious 9,5.157032
1,The Incredibles,4.022877
2,Explosive Pursuit,2.335427
3,Toy Story,1.630097


### Hybrid search

In [77]:
from redisvl.query import HybridQuery

hybrid_query = HybridQuery(
    text=user_query,
    text_field_name="description",
    text_scorer="BM25",
    vector=embedded_user_query,
    vector_field_name="vector",
    alpha=0.7,
    num_results=20,
    return_fields=["title", "description"],
)

result = index.query(hybrid_query)[:4]
pd.DataFrame(result)[["title", "vector_similarity", "text_score", "hybrid_score"]]

Unnamed: 0,title,vector_similarity,text_score,hybrid_score
0,The Incredibles,0.677648752928,0.398671082609,0.593955451832
1,Fast & Furious 9,0.537397742271,0.498220622181,0.525644606244
2,Toy Story,0.553009659052,0.213523123792,0.451163698474
3,Black Widow,0.626006484032,0.0,0.438204538822


### Next steps

For more query examples with redisvl: [see here](https://github.com/redis/redis-vl-python/blob/main/docs/user_guide/hybrid_queries_02.ipynb)

In [78]:
# clean up!
index.delete()