![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)
# Vector Search with RedisVL

## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/vector-search/01_redisvl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Prepare data

In this examples we will load a list of movies with the following attributes: `title`, `rating`, `description`, and `genre`.

We will embed the movie description so that user's can search for movies that best match the kind of movie that they're looking for.

**If you are running this notebook locally**, FYI you may not need to perform this step at all.

In [None]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/vector-search/resources .
!rm -rf temp_repo

## Packages

In [1]:
%pip install -q "redisvl>=0.6.0" "redis>=5.3.0" sentence-transformers pandas nltk

You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [None]:
import redis
redis.__version__

In [None]:
import redisvl
redisvl.__version__

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [None]:
import os
import warnings

warnings.filterwarnings('ignore')

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "redis-15951.c81.us-east-1-2.ec2.redns.redis-cloud.com") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "15951")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "training")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Create redis client

In [None]:
from redis import Redis
client = Redis.from_url(REDIS_URL)
client.ping()

In [None]:
#client.flushall()

### Load Movies Dataset

In [None]:
import pandas as pd
import numpy as np
import json

df = pd.read_json("resources/movies.json")
print("Loaded", len(df), "movie entries")

df.head()

In [None]:
from redisvl.utils.vectorize import HFTextVectorizer
from redisvl.extensions.cache.embeddings import EmbeddingsCache

os.environ["TOKENIZERS_PARALLELISM"] = "false"


hf = HFTextVectorizer(
    model="sentence-transformers/all-MiniLM-L6-v2",
    cache=EmbeddingsCache(
        name="embedcache",
        ttl=600,
        redis_client=client,
    )
)

In [None]:
# Generate vectors
df["vector"] = hf.embed_many(df["description"].tolist(), as_buffer=True)

df.head()

## Define Redis index schema

In [None]:
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex


index_name = "movies"

schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
    "prefix": index_name,
    "storage_type": "hash"
  },
  "fields": [
    {
        "name": "title",
        "type": "text",
    },
    {
        "name": "description",
        "type": "text",
    },
    {
        "name": "genre",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "rating",
        "type": "numeric",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "vector",
        "type": "vector",
        "attrs": {
            "dims": 384,
            "distance_metric": "cosine",
            "algorithm": "flat",
            "datatype": "float32"
        }
    }
  ]
})


index = SearchIndex(schema, client)
index.create(overwrite=True, drop=True)

In [None]:
!rvl index info -i movies -u {REDIS_URL}

## Populate index

In [None]:
index.load(df.to_dict(orient="records"))

## Search techniques

### Standard vector search

In [None]:
from redisvl.query import VectorQuery

user_query = "High tech and action packed movie"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "genre"],
    return_score=True,
)

result = index.query(vec_query)
pd.DataFrame(result)


### Vector search with filters

Redis allows you to combine filter searches on fields within the index object allowing us to create more specific searches.

Search for top 3 movies specifically in the action genre:


In [None]:
from redisvl.query.filter import Tag

tag_filter = Tag("genre") == "action"

vec_query.set_filter(tag_filter)

result=index.query(vec_query)
pd.DataFrame(result)

Search for top 3 movies specifically in the action genre with ratings at or above a 7:


In [None]:
from redisvl.query.filter import Num

# build combined filter expressions
tag_filter = Tag("genre") == "action"
num_filter = Num("rating") >= 7
combined_filter = tag_filter & num_filter

# build vector query
vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True,
    filter_expression=combined_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Search with full text search for movies that directly mention "criminal mastermind" in the description:


In [None]:
from redisvl.query.filter import Text

text_filter = Text("description") % "criminal mastermind"

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True,
    filter_expression=text_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Vector search with wildcard text match:


In [None]:
text_filter = Text("description") % "crim*"

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True,
    filter_expression=text_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Vector search with fuzzy match filter

> Note: fuzzy match is based on Levenshtein distance. Therefore, "hero" might return result for "her" as an example.

See docs for more info https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/query_syntax/


In [None]:

text_filter = Text("description") % "%hero%"

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True,
    filter_expression=text_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

### Range queries

Range queries allow you to set a pre defined distance "threshold" for which we want to return documents. This is helpful when you only want documents with a certain "radius" from the search query.

In [None]:
from redisvl.query import RangeQuery

user_query = "Family friendly fantasy movies"

embedded_user_query = hf.embed(user_query)

range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    return_score=True,
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.8
)

result = index.query(range_query)
pd.DataFrame(result)


Like the queries above, we can also chain additional filters and conditional operators with range queries. The following adds an `and` condition that returns vector search within the defined range and with a rating at or above 8.

In [None]:
range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    distance_threshold=0.8
)

numeric_filter = Num("rating") >= 8

range_query.set_filter(numeric_filter)

# in this case we want to do a simple filter search or the vector so we execute as a joint filter directly
result = index.query(range_query)
pd.DataFrame(result)


### Full text search

In [None]:
from redisvl.query import TextQuery

user_query = "High tech, action packed, superheros fight scenes"

text_query = TextQuery(
    text=user_query,
    text_field_name="description",
    text_scorer="BM25STD",
    num_results=20,
    return_fields=["title", "description"],
)

result = index.query(text_query)[:4]
pd.DataFrame(result)[["title", "score"]]

### Hybrid search

In [None]:
from redisvl.query import HybridQuery

hybrid_query = HybridQuery(
    text=user_query,
    text_field_name="description",
    text_scorer="BM25",
    vector=embedded_user_query,
    vector_field_name="vector",
    alpha=0.7,
    num_results=20,
    return_fields=["title", "description"],
)

result = index.query(hybrid_query)[:4]
pd.DataFrame(result)[["title", "vector_similarity", "text_score", "hybrid_score"]]

### Next steps

For more query examples with redisvl: [see here](https://github.com/redis/redis-vl-python/blob/main/docs/user_guide/02_hybrid_queries.ipynb)

In [None]:
# clean up!
index.delete()