![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)
# Vector Search with Redisvl
## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/vector-search/01_redisvl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Prepare data

In this examples we will load a list of movie objects with the following attributes: `title`, `rating`, `description`, and `genre`. 

For the vector part of our vector search we will embed the description so that user's can search for movies that best match what they're looking for.

**If you are running this notebook locally**, FYI you may not need to perform this step at all.

In [None]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/vector-search/resources .
!rm -rf temp_repo

## Packages

In [1]:
# NBVAL_SKIP
%pip install -q redis redisvl numpy sentence-transformers


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings created from PDF document chunks. **We need to make sure we have a Redis
instance available.

#### For Colab
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [None]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

#### For Alternative Environments
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [1]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Create redis client

In [2]:
from redis import Redis

client = Redis.from_url(REDIS_URL)

In [3]:
import json

with open("resources/movies.json", 'r') as file:
    movies = json.load(file)

In [4]:
import numpy as np
from sentence_transformers import SentenceTransformer

# load model for embedding our movie descriptions
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

def embed_text(model, text):
    return np.array(model.encode(text)).astype(np.float32).tobytes()



In [5]:
# Note: convert embedding array to bytes for storage in Redis Hash data type
movie_data = [
    {
        **movie,
        "vector": embed_text(model, movie["description"])
    } for movie in movies
]

In [6]:
movie_data[0]

{'title': 'Explosive Pursuit',
 'genre': 'action',
 'rating': 7,
 'description': 'A daring cop chases a notorious criminal across the city in a high-stakes game of cat and mouse.',
 'vector': b'\x9bf|=\x0e`\n;"\x92\xb7;<\xcb~\xbd\xfad\xce\xbb\xc3\x16J=V\xa7?=\xedv\x95<d\xfa\x06\xbe\x14Y\xcf=(\x07p=?\xdb\r\xbd\x95\xf2H\xbdje\xc6<E\xdfa=z8\x16\xbc\x00\xd4\x13<>\xaa\x1c=\xfd\xee\x89<\xbd\xb0-<\x82\xb2\x9f\xbc[\x0b\xc3\xbd\x98NR=xl\xf7\xbcN>\x17\xbe#\x12\x05\xb99u\xbf<\xb0\xe0b\xba\xd3\xa6\xa8\xbdx\xdc\xec\xbcRc%=\xe4\xe7r\xbb\x1eOG=?(\x85=o@\xa2\xbc2Z\xd0\xbdC%K\xbd\xb9\xed\x94\xbcR\xddH=\x92&F<\xc6*\xec<\x90\xd8\x8d\xbd\xcbZ\x98<\t\xa3\xa3=>g3\xbd&\xcd\xbd\xbd\x95$\xf7;\xfd\xf4z=\xfc\xb4\x8c=\x85\x0e\xc6\xbdnI\x90\xbdJ\x16\xbd;s\xe7\x0c\xbd 3\xc9\xbc\x85\xf8\xbb\xbc\xbf&u\xbb5\x8f\xca<\x05\x80J=\x0f\xaf*=\x8bOU\xbd\xc8\xf0\x95\xbc\x1d\x02\x19=)\xf4K<\xcb\xc2\t=F\x83\xac=\x9f\xd7\xb8\xbd\xf2\xb5\x9c\xbdB\x85\x18=\x96d&=-3\xf8<\xfa\xf7\x88<\x16v\xf2\xbb-=[\xbd\xf7\xac\xee\xbb5:A\xbd\xd9d\x

## Define Redis index schema

In [7]:
# from redis.commands.search.field import VectorField, TagField, NumericField, TextField
# from redis.commands.search.indexDefinition import IndexDefinition, IndexType

from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex

index_name = "movies"

schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
  },
  "fields": [
    {
        "name": "title",
        "type": "text",
    },
    {
        "name": "description",
        "type": "text",
    },
    {
        "name": "genre",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "rating",
        "type": "numeric",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "vector",
        "type": "vector",
        "attrs": {
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
  ]
})


index = SearchIndex(schema, client)
index.create(overwrite=True, drop=True)

In [8]:
index.info()

{'index_name': 'movies',
 'index_options': [],
 'index_definition': ['key_type',
  'HASH',
  'prefixes',
  ['rvl'],
  'default_score',
  '1'],
 'attributes': [['identifier',
   'title',
   'attribute',
   'title',
   'type',
   'TEXT',
   'WEIGHT',
   '1'],
  ['identifier',
   'description',
   'attribute',
   'description',
   'type',
   'TEXT',
   'WEIGHT',
   '1'],
  ['identifier',
   'genre',
   'attribute',
   'genre',
   'type',
   'TAG',
   'SEPARATOR',
   ',',
   'SORTABLE'],
  ['identifier',
   'rating',
   'attribute',
   'rating',
   'type',
   'NUMERIC',
   'SORTABLE',
   'UNF'],
  ['identifier',
   'vector',
   'attribute',
   'vector',
   'type',
   'VECTOR',
   'algorithm',
   'HNSW',
   'data_type',
   'FLOAT32',
   'dim',
   384,
   'distance_metric',
   'COSINE',
   'M',
   16,
   'ef_construction',
   200]],
 'num_docs': '0',
 'max_doc_id': '0',
 'num_terms': '0',
 'num_records': '0',
 'inverted_sz_mb': '0',
 'vector_index_sz_mb': '0.02034759521484375',
 'total_inver

## Populate index

In [9]:
index.load(movie_data)

['rvl:e9162a98bf994df98488f4ee30d89199',
 'rvl:9bc1dcaac4494209bcf589b079537ee8',
 'rvl:a7e4659b78434c41876c59ad805f78ec',
 'rvl:51fb1e3f7ba84dce9ad57769603db570',
 'rvl:7bb78e0c1e584dfb9035a21f4d9c2ce0',
 'rvl:791f63febdcb4c3fa0c42d8724f3e0af',
 'rvl:4a01ec29fd5c4a42b221343a0d4b639b',
 'rvl:c1264c843de74ac0b97550855c55e3dc',
 'rvl:0ca2418fda8e49908fb459822b3b5822',
 'rvl:fad59caeb3f747ba94c46498244f50fa',
 'rvl:0a0b1d9ae9dc47cb80a102f95a052082',
 'rvl:117c4c3a9c904ff1af3dd9fbe1693a7f',
 'rvl:be13171682ba4f1295309806ff3b2d61',
 'rvl:b0c676781d3f422e9868b6d38793f784',
 'rvl:592e0c2b352d4c62ba49ff9756fd2f8b',
 'rvl:5480ee42dbc34199b85ea354cfdb67d6',
 'rvl:6dc3408885164f6d87cfe55ccf5bd351',
 'rvl:30b61cc184bf4bbe89361fe93d677cc7',
 'rvl:653ffcadeee74d8cb2c439d1c8a289e0',
 'rvl:2ddc3460066e4ebbb80d1ad37e148642']

## Index loaded now we can perform vector search

### basic vector search

In [10]:
from redisvl.query import VectorQuery

user_query = "High tech movies"

embedded_user_query = embed_text(model, user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True
)

index.query(vec_query)


[{'id': 'rvl:a7e4659b78434c41876c59ad805f78ec',
  'vector_distance': '0.685773253441',
  'title': 'Fast & Furious 9',
  'rating': '6',
  'genre': 'action'},
 {'id': 'rvl:30b61cc184bf4bbe89361fe93d677cc7',
  'vector_distance': '0.801603078842',
  'title': 'Despicable Me',
  'rating': '7',
  'genre': 'comedy'},
 {'id': 'rvl:5480ee42dbc34199b85ea354cfdb67d6',
  'vector_distance': '0.812341928482',
  'title': 'The Incredibles',
  'rating': '8',
  'genre': 'comedy'}]

### Hybrid filter vector search

Redis allows you to combine filter searches on fields within the index object allowing us to create more specific searches.

In [11]:
# Search for top 3 movies specifically in the action genre

from redisvl.query.filter import Tag

user_query = "High tech movies"

embedded_user_query = embed_text(model, user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True
)

tag_filter = Tag("genre") == "action"

vec_query.set_filter(tag_filter)

index.query(vec_query)

[{'id': 'rvl:a7e4659b78434c41876c59ad805f78ec',
  'vector_distance': '0.685773253441',
  'title': 'Fast & Furious 9',
  'rating': '6',
  'genre': 'action'},
 {'id': 'rvl:791f63febdcb4c3fa0c42d8724f3e0af',
  'vector_distance': '0.820429563522',
  'title': 'Mad Max: Fury Road',
  'rating': '8',
  'genre': 'action'},
 {'id': 'rvl:e9162a98bf994df98488f4ee30d89199',
  'vector_distance': '0.851705312729',
  'title': 'Explosive Pursuit',
  'rating': '7',
  'genre': 'action'}]

In [12]:
# Search for top 3 movies specifically in the action genre with ratings at or above a 7

from redisvl.query.filter import Num

user_query = "High tech movies"

embedded_user_query = embed_text(model, user_query)

tag_filter = Tag("genre") == "action"
num_filter = Num("rating") >= 7
combined_filter = tag_filter & num_filter

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True,
    filter_expression=combined_filter
)

index.query(vec_query)

[{'id': 'rvl:791f63febdcb4c3fa0c42d8724f3e0af',
  'vector_distance': '0.820429563522',
  'title': 'Mad Max: Fury Road',
  'rating': '8',
  'genre': 'action'},
 {'id': 'rvl:e9162a98bf994df98488f4ee30d89199',
  'vector_distance': '0.851705312729',
  'title': 'Explosive Pursuit',
  'rating': '7',
  'genre': 'action'},
 {'id': 'rvl:fad59caeb3f747ba94c46498244f50fa',
  'vector_distance': '0.856359839439',
  'title': 'The Avengers',
  'rating': '8',
  'genre': 'action'}]

In [19]:
# Search with full text search for movies that directly mention "criminal mastermind" in the description

from redisvl.query.filter import Text

user_query = "High tech movies"

embedded_user_query = embed_text(model, user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True
)

text_filter = Text("description") == "criminal mastermind"

vec_query.set_filter(text_filter)

index.query(vec_query)

[{'id': 'rvl:30b61cc184bf4bbe89361fe93d677cc7',
  'vector_distance': '0.801603078842',
  'title': 'Despicable Me',
  'rating': '7',
  'genre': 'comedy',
  'description': 'When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.'},
 {'id': 'rvl:4a01ec29fd5c4a42b221343a0d4b639b',
  'vector_distance': '0.982345581055',
  'title': 'The Dark Knight',
  'rating': '9',
  'genre': 'action',
  'description': 'Batman faces off against the Joker, a criminal mastermind who threatens to plunge Gotham into chaos.'}]

In [18]:
# Vector search with wildcard match

from redisvl.query.filter import Text

user_query = "High tech movies"

embedded_user_query = embed_text(model, user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True
)

text_filter = Text("description") % "crim*"

vec_query.set_filter(text_filter)

index.query(vec_query)

[{'id': 'rvl:30b61cc184bf4bbe89361fe93d677cc7',
  'vector_distance': '0.801603078842',
  'title': 'Despicable Me',
  'rating': '7',
  'genre': 'comedy',
  'description': 'When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.'},
 {'id': 'rvl:5480ee42dbc34199b85ea354cfdb67d6',
  'vector_distance': '0.812341928482',
  'title': 'The Incredibles',
  'rating': '8',
  'genre': 'comedy',
  'description': "A family of undercover superheroes, while trying to live the quiet suburban life, are forced into action to save the world. Bob Parr (Mr. Incredible) and his wife Helen (Elastigirl) were among the world's greatest crime fighters, but now they must assume civilian identities and retreat to the suburbs to live a 'normal' life with their three children. However, the family's desire to help the world pulls them back into action when they face a new and dangerous enemy."},
 {'id': 'rvl:e9162a98bf994df98488

In [42]:
# Vector search with fuzzy match filter

from redisvl.query.filter import Text

user_query = "Movies with central main character"

embedded_user_query = embed_text(model, user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True
)

# Note: fuzzy match is based on Levenshtein distance. Therefore, "hero" might return result for "her" as an example.
# See docs for more info https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/query_syntax/
text_filter = Text("description") % "%hero%"

vec_query.set_filter(text_filter)

index.query(vec_query)

[{'id': '9',
  'vector_distance': '0.737778306007',
  'title': 'The Avengers',
  'rating': '8',
  'genre': 'action',
  'description': "Earth's mightiest heroes come together to stop an alien invasion that threatens the entire planet."},
 {'id': '3',
  'vector_distance': '0.768839895725',
  'title': 'Black Widow',
  'rating': '7',
  'genre': 'action',
  'description': 'Natasha Romanoff confronts her dark past and family ties as she battles a new enemy.'},
 {'id': '19',
  'vector_distance': '0.897787809372',
  'title': 'The Princess Diaries',
  'rating': '6',
  'genre': 'comedy',
  'description': 'Mia Thermopolis has just found out that she is the heir apparent to the throne of Genovia. With her friends Lilly and Michael Moscovitz in tow, she tries to navigate through the rest of her sixteenth year.'}]

## Range queries

Range queries allow you to set a pre defined "threshold" for which we want to return documents. This is helpful when you only want documents with a certain distance from the search query.

In [14]:
from redisvl.query import RangeQuery

user_query = "Family friendly fantasy movies"

embedded_user_query = embed_text(model, user_query)

range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    return_score=True,
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.8
)

index.query(range_query)


[{'id': 'rvl:5480ee42dbc34199b85ea354cfdb67d6',
  'vector_distance': '0.644702494144',
  'title': 'The Incredibles',
  'rating': '8',
  'genre': 'comedy'},
 {'id': 'rvl:51fb1e3f7ba84dce9ad57769603db570',
  'vector_distance': '0.747987031937',
  'title': 'Black Widow',
  'rating': '7',
  'genre': 'action'},
 {'id': 'rvl:30b61cc184bf4bbe89361fe93d677cc7',
  'vector_distance': '0.750915467739',
  'title': 'Despicable Me',
  'rating': '7',
  'genre': 'comedy'},
 {'id': 'rvl:592e0c2b352d4c62ba49ff9756fd2f8b',
  'vector_distance': '0.751298904419',
  'title': 'Shrek',
  'rating': '8',
  'genre': 'comedy'},
 {'id': 'rvl:6dc3408885164f6d87cfe55ccf5bd351',
  'vector_distance': '0.761669456959',
  'title': 'Monsters, Inc.',
  'rating': '8',
  'genre': 'comedy'},
 {'id': 'rvl:be13171682ba4f1295309806ff3b2d61',
  'vector_distance': '0.778580069542',
  'title': 'Aladdin',
  'rating': '8',
  'genre': 'comedy'}]

Like the queries above, we can also chain additional filters and conditional operators with range queries. The following adds an `and` condition that returns vector search within the defined range abd with a rating at or above 9.

In [15]:
from redisvl.query import FilterQuery

user_query = "Family friendly fantasy movies"

embedded_user_query = embed_text(model, user_query)

range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.7
)

numeric_filter = Num("rating") >= 8

range_query.set_filter(numeric_filter)

# in this case we want to do a simple filter search or the vector so we execute as a joint filter directly
res = index.query(range_query)

res


[{'id': 'rvl:5480ee42dbc34199b85ea354cfdb67d6',
  'vector_distance': '0.644702494144',
  'title': 'The Incredibles',
  'rating': '8',
  'genre': 'comedy'},
 {'id': 'rvl:592e0c2b352d4c62ba49ff9756fd2f8b',
  'vector_distance': '0.751298904419',
  'title': 'Shrek',
  'rating': '8',
  'genre': 'comedy'},
 {'id': 'rvl:6dc3408885164f6d87cfe55ccf5bd351',
  'vector_distance': '0.761669456959',
  'title': 'Monsters, Inc.',
  'rating': '8',
  'genre': 'comedy'},
 {'id': 'rvl:be13171682ba4f1295309806ff3b2d61',
  'vector_distance': '0.778580069542',
  'title': 'Aladdin',
  'rating': '8',
  'genre': 'comedy'}]

### Next steps

For more query examples with redisvl: [see here](https://github.com/redis/redis-vl-python/blob/main/docs/user_guide/hybrid_queries_02.ipynb)

In [16]:
# clean up!
client.flushall()