![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)
# Vector Search with RedisVL

## Let's Begin!
<a href="https://colab.research.google.com/github/redis-developer/redis-ai-resources/blob/main/python-recipes/vector-search/01_redisvl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Prepare data

In this examples we will load a list of movies with the following attributes: `title`, `rating`, `description`, and `genre`.

We will embed the movie description so that user's can search for movies that best match the kind of movie that they're looking for.

**If you are running this notebook locally**, FYI you may not need to perform this step at all.

In [1]:
# NBVAL_SKIP
!git clone https://github.com/redis-developer/redis-ai-resources.git temp_repo
!mv temp_repo/python-recipes/vector-search/resources .
!rm -rf temp_repo

Cloning into 'temp_repo'...
remote: Enumerating objects: 669, done.[K
remote: Counting objects: 100% (320/320), done.[K
remote: Compressing objects: 100% (207/207), done.[K
remote: Total 669 (delta 219), reused 141 (delta 112), pack-reused 349 (from 2)[K
Receiving objects: 100% (669/669), 57.77 MiB | 20.61 MiB/s, done.
Resolving deltas: 100% (287/287), done.


## Packages

In [None]:
%pip install -q "redisvl>=0.6.0" sentence-transformers pandas nltk

## Install Redis Stack

Later in this tutorial, Redis will be used to store, index, and query vector
embeddings created from PDF document chunks. **We need to make sure we have a Redis
instance available.

#### For Colab
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly from the Redis package archive.

In [None]:
# NBVAL_SKIP
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

#### For Alternative Environments
There are many ways to get the necessary redis-stack instance running
1. On cloud, deploy a [FREE instance of Redis in the cloud](https://redis.com/try-free/). Or, if you have your
own version of Redis Enterprise running, that works too!
2. Per OS, [see the docs](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/)
3. With docker: `docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest`

### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [27]:
import os
import warnings

warnings.filterwarnings('ignore')

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") # ex: "redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      # ex: 18374
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")  # ex: "1TNxTEdYRDgIDKM2gDfasupCADXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Create redis client

In [28]:
from redis import Redis

client = Redis.from_url(REDIS_URL)
client.ping()

True

In [4]:
#client.flushall()

True

### Load Movies Dataset

In [29]:
import pandas as pd
import numpy as np
import json

df = pd.read_json("resources/movies.json")
print("Loaded", len(df), "movie entries")

df.head()

Loaded 20 movie entries


Unnamed: 0,id,title,genre,rating,description
0,1,Explosive Pursuit,action,7,A daring cop chases a notorious criminal acros...
1,2,Skyfall,action,8,James Bond returns to track down a dangerous n...
2,3,Fast & Furious 9,action,6,Dom and his crew face off against a high-tech ...
3,4,Black Widow,action,7,Natasha Romanoff confronts her dark past and f...
4,5,John Wick,action,8,A retired hitman seeks vengeance against those...


In [30]:
from redisvl.utils.vectorize import HFTextVectorizer
from redisvl.extensions.cache.embeddings import EmbeddingsCache

os.environ["TOKENIZERS_PARALLELISM"] = "false"


hf = HFTextVectorizer(
    model="sentence-transformers/all-MiniLM-L6-v2",
    cache=EmbeddingsCache(
        name="embedcache",
        ttl=600,
        redis_client=client,
    )
)
"""
Embedding Cache:
- Stores embeddings in Redis so you don't have to regenerate them for the same text
- When you embed text, it first checks if that exact text has been embedded before
- If found (cache hit), it returns the cached embedding instantly
- If not found (cache miss), it generates the embedding and stores it for future use
- Uses a hash of text + model_name as the key to ensure uniqueness

SO here:
If we embed the same movie description twice, the second call will be nearly instant because it retrieves from Redis instead of running the model again
"""


# Example: OpenAI Vectorizer
# ---------------------------
# from redisvl.utils.vectorize import OpenAITextVectorizer
#
# oai = OpenAITextVectorizer(
#     model="text-embedding-3-small",
#     api_config={"api_key": "your_api_key"},  # OR set OPENAI_API_KEY env variable
#     cache=EmbeddingsCache(
#         name="openai_embedcache",
#         ttl=600,
#         redis_client=client,
#     )
# )
#
# # Generate embeddings
# embedding = oai.embed("Hello, world!")
# embeddings = oai.embed_many(["text1", "text2"], batch_size=10)

# Example: Custom Vectorizer
# ---------------------------
# from redisvl.utils.vectorize import CustomTextVectorizer
#
# # Define your custom embedding function
# def my_embed_function(text: str) -> list[float]:
#     # Your custom logic here
#     # Must return a list of floats
#     return [0.1, 0.2, 0.3, ...]  # Example: 768-dimensional vector
#
# # Optional: Define batch embedding function for better performance
# def my_embed_many_function(texts: list[str]) -> list[list[float]]:
#     # Your custom batch logic here
#     # Must return a list of lists of floats
#     return [[0.1, 0.2, ...] for _ in texts]
#
# custom = CustomTextVectorizer(
#     embed=my_embed_function,
#     embed_many=my_embed_many_function,  # Optional
#     cache=EmbeddingsCache(
#         name="custom_embedcache",
#         ttl=600,
#         redis_client=client,
#     )
# )
#
# # Generate embeddings
# embedding = custom.embed("Hello, world!")
# embeddings = custom.embed_many(["text1", "text2"])


15:20:54 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
15:20:54 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


"\nEmbedding Cache:\n- Stores embeddings in Redis so you don't have to regenerate them for the same text\n- When you embed text, it first checks if that exact text has been embedded before\n- If found (cache hit), it returns the cached embedding instantly\n- If not found (cache miss), it generates the embedding and stores it for future use\n- Uses a hash of text + model_name as the key to ensure uniqueness\n\nSO here:\nIf we embed the same movie description twice, the second call will be nearly instant because it retrieves from Redis instead of running the model again\n"

In [31]:
df["vector"] = hf.embed_many(df["description"].tolist(), as_buffer=True)
# as_buffer -> Redis has hash structure and JSON structure
# hash - single layer (no nesting/objects in objects) whereas JSON is multi-layered
# hash - more memory efficient and faster but embeddings need to be stored as bytes
# as it is stored as a byte array it saves space/memory and is faster to retrieve
df.head()

Unnamed: 0,id,title,genre,rating,description,vector
0,1,Explosive Pursuit,action,7,A daring cop chases a notorious criminal acros...,b'\x9ef|=g`\n;I\x92\xb7;*\xcb~\xbd\xe4d\xce\xb...
1,2,Skyfall,action,8,James Bond returns to track down a dangerous n...,b'\x9eD\x9e\xbdO\x9b\x89\xbc\xc2\x16\x95\xbc\x...
2,3,Fast & Furious 9,action,6,Dom and his crew face off against a high-tech ...,"b'+\xa5\xc7\xbc\xfa,\xa2=\x82\x19H\xbcI\xc6t\x..."
3,4,Black Widow,action,7,Natasha Romanoff confronts her dark past and f...,b's\xeb\x85\xbd\xfd\xcco\xbd\xdc\xe8\xc2\xbb?\...
4,5,John Wick,action,8,A retired hitman seeks vengeance against those...,b'M;x\xbb\x02/\xc5=\x94\x85:;\xc6\xd0\x94<p)w;...


## Define Redis index schema

In [32]:
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex


index_name = "movies"

# Redis supports 5 main field types for indexing:
#
# 1. TEXT - Full-text search with stemming, tokenization, and phonetic matching
#    Use for: Article content, descriptions, reviews, any searchable text
#    Attributes: weight, no_stem, phonetic_matcher, sortable, index_empty
#
# 2. TAG - Exact-match categorical data (like SQL ENUM or categories)
#    Use for: Categories, genres, status, IDs, tags, filters
#    Attributes: separator (default ","), case_sensitive, sortable, index_empty
#
# 3. NUMERIC - Numeric values for range queries and sorting
#    Use for: Prices, ratings, counts, timestamps, ages, scores
#    Attributes: sortable, index_missing, no_index
#
# 4. GEO - Geographic coordinates for location-based search
#    Use for: Latitude/longitude pairs, store locations, delivery zones
#    Format: "longitude,latitude" (e.g., "-122.4194,37.7749")
#    Attributes: sortable, index_missing
#
# 5. VECTOR - Vector embeddings for semantic similarity search
#    Use for: Text embeddings, image embeddings, recommendation systems
#    Algorithms:
#      - FLAT: Exact search (100% recall, slower for large datasets)
#      - HNSW: Approximate nearest neighbor (fast, high recall ~95-99%)
#      - SVS-VAMANA: Compressed vectors (memory efficient, good recall)
#    Distance Metrics: COSINE, L2 (Euclidean), IP (Inner Product)
#    Data Types: float16, float32, float64, bfloat16, int8, uint8
#    Attributes: dims, algorithm, distance_metric, datatype, initial_cap

schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
    "prefix": index_name,
    "storage_type": "hash"  # or "json" for nested data structures
  },
  "fields": [
    {
        "name": "title",
        "type": "text",  # Full-text search field
    },
    {
        "name": "description",
        "type": "text",  # Full-text search field
    },
    {
        "name": "genre",
        "type": "tag",  # Exact-match categorical field
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "rating",
        "type": "numeric",  # Numeric range queries and sorting
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "vector",
        "type": "vector",  # Semantic similarity search
        "attrs": {
            "dims": 384,                    # Vector dimensions (model-specific)
            "distance_metric": "cosine",    # COSINE, L2, or IP
            "algorithm": "flat",            # FLAT, HNSW, or SVS-VAMANA
            "datatype": "float32"           # float16, float32, float64, bfloat16
        }
    }
    # Example: GEO field (commented out)
    # {
    #     "name": "location",
    #     "type": "geo",
    #     "attrs": {
    #         "sortable": False
    #     }
    # }
  ]
})


index = SearchIndex(schema, client)
index.create(overwrite=True, drop=True)

15:23:12 redisvl.index.index INFO   Index already exists, overwriting.


In [33]:
!rvl index info -i movies -u {REDIS_URL}



Index Information:
╭───────────────┬───────────────┬───────────────┬───────────────┬───────────────╮
│ Index Name    │ Storage Type  │ Prefixes      │ Index Options │ Indexing      │
├───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
| movies        | HASH          | ['movies']    | []            | 0             |
╰───────────────┴───────────────┴───────────────┴───────────────┴───────────────╯
Index Fields:
╭─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────╮
│ Name            │ Attribute       │ Type            │ Field Option    │ Option Value    │ Field Option    │ Option Value    │ Field Option    │ Option Value    │ Field Option    │ Option Value    │
├─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────

## Populate index

In [34]:
index.load(df.to_dict(orient="records"))

['movies:01K8V96NBV88RP76DHYNAHK4T2',
 'movies:01K8V96NBV01PXFNSNC8K2JQZP',
 'movies:01K8V96NBVHKA428B4YBCRNXB1',
 'movies:01K8V96NBVFD3S1DCVPDV0BE3W',
 'movies:01K8V96NBVZ64218T1PG7SE7PB',
 'movies:01K8V96NBV13WZJVFDFBET0K5N',
 'movies:01K8V96NBV3N8WDXZ10BQ8QVTM',
 'movies:01K8V96NBVNKF14S0AW75DJDF7',
 'movies:01K8V96NBV23MRYV2QRN7JV5YA',
 'movies:01K8V96NBV8KAR2ZQ13404TH2B',
 'movies:01K8V96NBVS3NH038K2YAZSHAW',
 'movies:01K8V96NBVQA4DA457PS4PX67W',
 'movies:01K8V96NBVK2RATV8KC5NBXJSJ',
 'movies:01K8V96NBVBFT2EA5TNW7SV2X6',
 'movies:01K8V96NBV85BE9MNEFBV60PHP',
 'movies:01K8V96NBV4DQ0P3V61SB2X9DS',
 'movies:01K8V96NBV1MSCHVJ5RY81Q6AM',
 'movies:01K8V96NBVD2BZJDTSV31S7DG6',
 'movies:01K8V96NBVHSERTAZTPBCXY2JV',
 'movies:01K8V96NBV6V1Z83D2Z9K1S3QX']

## Search techniques

### Standard vector search

In [36]:
from redisvl.query import VectorQuery

user_query = "High tech and action packed movie"

embedded_user_query = hf.embed(user_query)

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "genre", "description"],
    return_score=True,
)

result = index.query(vec_query)
pd.DataFrame(result)


Unnamed: 0,id,vector_distance,title,genre,description
0,movies:01K8V96NBVHKA428B4YBCRNXB1,0.64973795414,Fast & Furious 9,action,Dom and his crew face off against a high-tech ...
1,movies:01K8V96NBV13WZJVFDFBET0K5N,0.763235211372,Mad Max: Fury Road,action,"In a post-apocalyptic wasteland, Max teams up ..."
2,movies:01K8V96NBVQA4DA457PS4PX67W,0.792449593544,The Lego Movie,comedy,"An ordinary Lego construction worker, thought ..."


### Vector search with filters

Redis allows you to combine filter searches on fields within the index object allowing us to create more specific searches.

Search for top 3 movies specifically in the action genre:


In [37]:
from redisvl.query.filter import Tag

tag_filter = Tag("genre") == "action"

vec_query.set_filter(tag_filter)

result=index.query(vec_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,title,genre,description
0,movies:01K8V96NBVHKA428B4YBCRNXB1,0.64973795414,Fast & Furious 9,action,Dom and his crew face off against a high-tech ...
1,movies:01K8V96NBV13WZJVFDFBET0K5N,0.763235211372,Mad Max: Fury Road,action,"In a post-apocalyptic wasteland, Max teams up ..."
2,movies:01K8V96NBV88RP76DHYNAHK4T2,0.796153008938,Explosive Pursuit,action,A daring cop chases a notorious criminal acros...


Search for top 3 movies specifically in the action genre with ratings at or above a 7:


In [38]:
from redisvl.query.filter import Num

# build combined filter expressions
tag_filter = Tag("genre") == "action"
num_filter = Num("rating") >= 7
combined_filter = tag_filter & num_filter

# build vector query
vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre"],
    return_score=True,
    filter_expression=combined_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,title,rating,genre
0,movies:01K8V96NBV13WZJVFDFBET0K5N,0.763235211372,Mad Max: Fury Road,8,action
1,movies:01K8V96NBV88RP76DHYNAHK4T2,0.796153008938,Explosive Pursuit,7,action
2,movies:01K8V96NBV23MRYV2QRN7JV5YA,0.876494169235,Inception,9,action


Search with full text search for movies that directly mention "criminal mastermind" in the description:


In [41]:
from redisvl.query.filter import Text

text_filter = Text("description") % "criminal mastermind"

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True,
    filter_expression=text_filter
)

result = index.query(vec_query)
pd.DataFrame(result)['description'][1]

'Batman faces off against the Joker, a criminal mastermind who threatens to plunge Gotham into chaos.'

Vector search with wildcard text match:


In [15]:
text_filter = Text("description") % "crim*"

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True,
    filter_expression=text_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,movies:01K8TWFA576NJD4BY9DKHWRZZY,0.796153008938,Explosive Pursuit,7,action,A daring cop chases a notorious criminal acros...
1,movies:01K8TWFA57RB003JFMYF3N6PNM,0.807471394539,The Incredibles,8,comedy,"A family of undercover superheroes, while tryi..."
2,movies:01K8TWFA57SX8Y09NVMN4EEW6C,0.827253937721,Despicable Me,7,comedy,When a criminal mastermind uses a trio of orph...


Vector search with fuzzy match filter

> Note: fuzzy match is based on Levenshtein distance. Therefore, "hero" might return result for "her" as an example.

See docs for more info https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/query_syntax/


In [16]:

text_filter = Text("description") % "%hero%"

vec_query = VectorQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    num_results=3,
    return_fields=["title", "rating", "genre", "description"],
    return_score=True,
    filter_expression=text_filter
)

result = index.query(vec_query)
pd.DataFrame(result)

Unnamed: 0,id,vector_distance,title,rating,genre,description
0,movies:01K8TWFA571WT01N51DC2098SB,0.889985799789,Black Widow,7,action,Natasha Romanoff confronts her dark past and f...
1,movies:01K8TWFA57CQNKWQGFRTTB6VBM,0.89386677742,The Avengers,8,action,Earth's mightiest heroes come together to stop...
2,movies:01K8TWFA578W3EAAGD9SBF1YNP,0.943198144436,The Princess Diaries,6,comedy,Mia Thermopolis has just found out that she is...


### Range queries

Range queries allow you to set a pre defined distance "threshold" for which we want to return documents. This is helpful when you only want documents with a certain "radius" from the search query.

In [43]:
from redisvl.query import RangeQuery

user_query = "Family friendly fantasy movies"

embedded_user_query = hf.embed(user_query)

range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    return_score=True,
    distance_threshold=0.8  # find all items with a semantic distance of less than 0.8
)

result = index.query(range_query)
pd.DataFrame(result)


Unnamed: 0,id,vector_distance,title,rating,genre
0,movies:01K8V96NBV4DQ0P3V61SB2X9DS,0.644702553749,The Incredibles,8,comedy
1,movies:01K8V96NBVFD3S1DCVPDV0BE3W,0.747986972332,Black Widow,7,action
2,movies:01K8V96NBVD2BZJDTSV31S7DG6,0.750915408134,Despicable Me,7,comedy
3,movies:01K8V96NBV85BE9MNEFBV60PHP,0.751298904419,Shrek,8,comedy
4,movies:01K8V96NBV1MSCHVJ5RY81Q6AM,0.761669397354,"Monsters, Inc.",8,comedy
5,movies:01K8V96NBVK2RATV8KC5NBXJSJ,0.778580188751,Aladdin,8,comedy


Like the queries above, we can also chain additional filters and conditional operators with range queries. The following adds an `and` condition that returns vector search within the defined range and with a rating at or above 8.

In [18]:
range_query = RangeQuery(
    vector=embedded_user_query,
    vector_field_name="vector",
    return_fields=["title", "rating", "genre"],
    distance_threshold=0.8
)

numeric_filter = Num("rating") >= 8

range_query.set_filter(numeric_filter)

# in this case we want to do a simple filter search or the vector so we execute as a joint filter directly
result = index.query(range_query)
pd.DataFrame(result)


Unnamed: 0,id,vector_distance,title,rating,genre
0,movies:01K8TWFA57RB003JFMYF3N6PNM,0.644702553749,The Incredibles,8,comedy
1,movies:01K8TWFA577WVQYQZ5MNDFS083,0.751298904419,Shrek,8,comedy
2,movies:01K8TWFA579R1H9TZ65QPSF3S2,0.761669397354,"Monsters, Inc.",8,comedy
3,movies:01K8TWFA57Z8MY5X741J4K1MTS,0.778580188751,Aladdin,8,comedy


### Full text search

In [19]:
from redisvl.query import TextQuery

user_query = "High tech, action packed, superheros fight scenes"

text_query = TextQuery(
    text=user_query,
    text_field_name="description",
    text_scorer="BM25STD",
    num_results=20,
    return_fields=["title", "description"],
)

result = index.query(text_query)[:4]
pd.DataFrame(result)[["title", "score"]]

Unnamed: 0,title,score
0,Fast & Furious 9,5.157032
1,The Incredibles,4.022877
2,Explosive Pursuit,2.335427
3,Toy Story,1.630097


### Stop Words Example with English and German

Stop words are common words (like "the", "is", "at") that are often filtered out before text processing because they don't carry much semantic meaning. RedisVL uses NLTK stopwords and supports multiple languages.


In [42]:
# Example 1: English Hybrid Search with Stop Words
import nltk
nltk.download('stopwords', quiet=True)

from redisvl.query import HybridQuery

# English query
query_en = "action packed superhero movie with great fight scenes"
embedded_query_en = hf.embed(query_en)

hybrid_query_en = HybridQuery(
    text=query_en,
    text_field_name="description",
    text_scorer="BM25",
    vector=embedded_query_en,
    vector_field_name="vector",
    alpha=0.7,
    num_results=3,
    return_fields=["title", "description"],
    stopwords="english"  # Automatically removes English stop words using NLTK
)

print("English Query:", query_en)
print("After stop word removal:", hybrid_query_en._build_query_string())
print("\nResults:")
result_en = index.query(hybrid_query_en)
pd.DataFrame(result_en)[["title", "hybrid_score"]]


English Query: action packed superhero movie with great fight scenes
After stop word removal: (~@description:(action | packed | superhero | movie | great | fight | scenes))=>[KNN 3 @vector $vector AS vector_distance]

Results:


Unnamed: 0,title,hybrid_score
0,The Incredibles,0.688284047681
1,Fast & Furious 9,0.465631234646
2,The Dark Knight,0.463765496016


In [26]:
# Example 2: German Hybrid Search with Stop Words
# (Note: This example shows the syntax - actual German movie data would be needed for real results)

query_de = "spannender Action Film mit tollen Kampfszenen und Helden"
# Translation: "exciting action movie with great fight scenes and heroes"

# For demonstration, we'll embed the German text
embedded_query_de = hf.embed(query_de)

hybrid_query_de = HybridQuery(
    text=query_de,
    text_field_name="description",
    text_scorer="BM25",
    vector=embedded_query_de,
    vector_field_name="vector",
    alpha=0.7,
    num_results=3,
    return_fields=["title", "description"],
    stopwords="german"  # Automatically removes German stop words using NLTK
)

print("German Query:", query_de)
print("After stop word removal:", hybrid_query_de._build_query_string())
print("\nStop words removed: 'mit', 'und' (with, and)")

# Supported languages: 'english', 'german', 'french', 'spanish', 'italian',
# 'portuguese', 'russian', 'arabic', 'dutch', 'swedish', and more


German Query: spannender Action Film mit tollen Kampfszenen und Helden
After stop word removal: (~@description:(spannender | action | film | tollen | kampfszenen | helden))=>[KNN 3 @vector $vector AS vector_distance]

Stop words removed: 'mit', 'und' (with, and)


### Hybrid search

In [None]:
from redisvl.query import HybridQuery

hybrid_query = HybridQuery(
    text=user_query,
    text_field_name="description",
    text_scorer="BM25",
    vector=embedded_user_query,
    vector_field_name="vector",
    alpha=0.7,
    num_results=20,
    return_fields=["title", "description"],
)

result = index.query(hybrid_query)[:4]
pd.DataFrame(result)[["title", "vector_similarity", "text_score", "hybrid_score"]]


In [None]:
# Redis Query Language Translation
# =================================
# The HybridQuery above translates to this Redis FT.AGGREGATE command:

print("Original query:", user_query)
print("After stop word removal:", hybrid_query._build_query_string())

redis_query = """
FT.AGGREGATE movies
  "(@description:(high | tech | action | packed | superheros | fight | scenes))=>{$yield_distance_as: vector_distance; $vector: <vector_blob>; $vector_field: vector}"
  LOAD 2 @title @description
  SCORER BM25
  APPLY "(2 - @vector_distance)/2" AS vector_similarity
  APPLY "@__score" AS text_score
  APPLY "(0.7 * @vector_similarity) + (0.3 * @text_score)" AS hybrid_score
  SORTBY 2 @hybrid_score DESC
  LIMIT 0 20

Breakdown:
----------
@description:(high | tech | action | ...)  - Full-text search with OR logic (stop words removed)
=>{$yield_distance_as: vector_distance}    - Vector similarity search parameters
LOAD 2 @title @description                 - Load these fields from documents
SCORER BM25                                 - Use BM25 algorithm for text scoring
APPLY "(2 - @vector_distance)/2"           - Convert distance to similarity (0-1)
APPLY "@__score" AS text_score             - Get BM25 text relevance score
APPLY "(0.7 * vector) + (0.3 * text)"      - Weighted hybrid score (alpha=0.7)
SORTBY @hybrid_score DESC                  - Sort by combined score
LIMIT 0 20                                 - Return top 20 results
"""

print(redis_query)

### Next steps

For more query examples with redisvl: [see here](https://github.com/redis/redis-vl-python/blob/main/docs/user_guide/02_hybrid_queries.ipynb)

In [78]:
# clean up!
index.delete()