From 4acc3314a72bc4c116c0bff67b365d545458c1bd Mon Sep 17 00:00:00 2001 From: "redisdocsapp[bot]" <177626021+redisdocsapp[bot]@users.noreply.github.com> Date: Sat, 22 Nov 2025 00:23:17 +0000 Subject: [PATCH] Update for redisvl 0.12.0 --- content/develop/ai/redisvl/api/_index.md | 1 + content/develop/ai/redisvl/api/query.md | 30 ++- content/develop/ai/redisvl/api/schema.md | 39 +++ .../ai/redisvl/user_guide/advanced_queries.md | 251 ++++++++++-------- .../ai/redisvl/user_guide/hash_vs_json.md | 2 +- 5 files changed, 212 insertions(+), 111 deletions(-) diff --git a/content/develop/ai/redisvl/api/_index.md b/content/develop/ai/redisvl/api/_index.md index 7b8e8250ac..0261fb4a86 100644 --- a/content/develop/ai/redisvl/api/_index.md +++ b/content/develop/ai/redisvl/api/_index.md @@ -14,6 +14,7 @@ Reference documentation for the RedisVL API. * [Schema](schema/) * [IndexSchema](schema/#indexschema) + * [Index-Level Stopwords Configuration](schema/#index-level-stopwords-configuration) * [Defining Fields](schema/#defining-fields) * [Basic Field Types](schema/#basic-field-types) * [Vector Field Types](schema/#vector-field-types) diff --git a/content/develop/ai/redisvl/api/query.md b/content/develop/ai/redisvl/api/query.md index 412435209f..55002adf73 100644 --- a/content/develop/ai/redisvl/api/query.md +++ b/content/develop/ai/redisvl/api/query.md @@ -67,7 +67,7 @@ expression. **TypeError** – If filter_expression is not of type redisvl.query.FilterExpression #### `NOTE` -Learn more about vector queries in Redis: [https://redis.io/docs/interact/search-and-query/search/vectors/#knn-search](https://redis.io/docs/interact/search-and-query/search/vectors/#knn-search) +Learn more about vector queries in Redis: [https://redis.io/docs/latest/develop/ai/search-and-query/vectors/#knn-vector-search](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/#knn-vector-search) #### `dialect(dialect)` @@ -758,11 +758,17 @@ Instantiates a AggregateHybridQuery object. * **dtype** (*str* *,* *optional*) – The data type of the vector. Defaults to "float32". * **num_results** (*int* *,* *optional*) – The number of results to return. Defaults to 10. * **return_fields** (*Optional* *[* *List* *[* *str* *]* *]* *,* *optional*) – The fields to return. Defaults to None. - * **stopwords** (*Optional* *[* *Union* *[* *str* *,* *Set* *[* *str* *]* *]* *]* *,* *optional*) – The stopwords to remove from the + * **stopwords** (*Optional* *[* *Union* *[* *str* *,* *Set* *[* *str* *]* *]* *]* *,* *optional*) – + + The stopwords to remove from the provided text prior to searchuse. If a string such as "english" "german" is provided then a default set of stopwords for that language will be used. if a list, set, or tuple of strings is provided then those will be used as stopwords. Defaults to "english". if set to "None" then no stopwords will be removed. + + Note: This parameter controls query-time stopword filtering (client-side). + For index-level stopwords configuration (server-side), see IndexInfo.stopwords. + Using query-time stopwords with index-level STOPWORDS 0 is counterproductive. * **dialect** (*int* *,* *optional*) – The Redis dialect version. Defaults to 2. * **text_weights** (*Optional* *[* *Dict* *[* *str* *,* *float* *]* *]*) – The importance weighting of individual words within the query text. Defaults to None, as no modifications will be made to the @@ -974,6 +980,11 @@ Get the text weights. * **Return type:** Dictionary of word +#### `NOTE` +The `stopwords` parameter in [HybridQuery](#hybridquery) (and `AggregateHybridQuery`) controls query-time stopword filtering (client-side). +For index-level stopwords configuration (server-side), see `redisvl.schema.IndexInfo.stopwords`. +Using query-time stopwords with index-level `STOPWORDS 0` is counterproductive. + ## TextQuery ### `class TextQuery(text, text_field_name, text_scorer='BM25STD', filter_expression=None, return_fields=None, num_results=10, return_score=True, dialect=2, sort_by=None, in_order=False, params=None, stopwords='english', text_weights=None)` @@ -1032,11 +1043,17 @@ A query for running a full text search, along with an optional filter expression the offsets between them. Defaults to False. * **params** (*Optional* *[* *Dict* *[* *str* *,* *Any* *]* *]* *,* *optional*) – The parameters for the query. Defaults to None. - * **stopwords** (*Optional* *[* *Union* *[* *str* *,* *Set* *[* *str* *]* *]*) – The set of stop words to remove - from the query text. If a language like ‘english’ or ‘spanish’ is provided + * **stopwords** (*Optional* *[* *Union* *[* *str* *,* *Set* *[* *str* *]* *]*) – + + The set of stop words to remove + from the query text (client-side filtering). If a language like ‘english’ or ‘spanish’ is provided a default set of stopwords for that language will be used. Users may specify their own stop words by providing a List or Set of words. if set to None, then no words will be removed. Defaults to ‘english’. + + Note: This parameter controls query-time stopword filtering (client-side). + For index-level stopwords configuration (server-side), see IndexInfo.stopwords. + Using query-time stopwords with index-level STOPWORDS 0 is counterproductive. * **text_weights** (*Optional* *[* *Dict* *[* *str* *,* *float* *]* *]*) – The importance weighting of individual words within the query text. Defaults to None, as no modifications will be made to the text_scorer score. @@ -1308,6 +1325,11 @@ Get the text weights. * **Return type:** Dictionary of word +#### `NOTE` +The `stopwords` parameter in [TextQuery](#textquery) controls query-time stopword filtering (client-side). +For index-level stopwords configuration (server-side), see `redisvl.schema.IndexInfo.stopwords`. +Using query-time stopwords with index-level `STOPWORDS 0` is counterproductive. + ## FilterQuery ### `class FilterQuery(filter_expression=None, return_fields=None, num_results=10, dialect=2, sort_by=None, in_order=False, params=None)` diff --git a/content/develop/ai/redisvl/api/schema.md b/content/develop/ai/redisvl/api/schema.md index a7b811dfae..ac0ea1ce91 100644 --- a/content/develop/ai/redisvl/api/schema.md +++ b/content/develop/ai/redisvl/api/schema.md @@ -278,6 +278,45 @@ Configuration for the model, should be a dictionary conforming to [ConfigDict][p Version of the underlying index schema. +## Index-Level Stopwords Configuration + +The `IndexInfo` class supports index-level stopwords configuration through +the `stopwords` field. This controls which words are filtered during indexing +(server-side), as opposed to query-time filtering (client-side). + +**Configuration Options:** + +- `None` (default): Use Redis default stopwords (~300 common words) +- `[]` (empty list): Disable stopwords completely (`STOPWORDS 0`) +- Custom list: Specify your own stopwords (e.g., `["the", "a", "an"]`) + +**Example:** + +```python +from redisvl.schema import IndexSchema + +# Disable stopwords to search for phrases like "Bank of Glasberliner" +schema = IndexSchema.from_dict({ + "index": { + "name": "company-idx", + "prefix": "company", + "stopwords": [] # STOPWORDS 0 + }, + "fields": [ + {"name": "name", "type": "text"} + ] +}) +``` + +**Important Notes:** + +- Index-level stopwords affect what gets indexed (server-side) +- Query-time stopwords (in `TextQuery` and `AggregateHybridQuery`) affect what gets searched (client-side) +- Using query-time stopwords with index-level `STOPWORDS 0` is counterproductive + +For detailed information about stopwords configuration and best practices, see the +Advanced Queries user guide (`docs/user_guide/11_advanced_queries.ipynb`). + ## Defining Fields Fields in the schema can be defined in YAML format or as a Python dictionary, specifying a name, type, an optional path, and attributes for customization. diff --git a/content/develop/ai/redisvl/user_guide/advanced_queries.md b/content/develop/ai/redisvl/user_guide/advanced_queries.md index e7a01bd039..6a60fb6810 100644 --- a/content/develop/ai/redisvl/user_guide/advanced_queries.md +++ b/content/develop/ai/redisvl/user_guide/advanced_queries.md @@ -157,9 +157,6 @@ keys = index.load(data) print(f"Loaded {len(keys)} products into the index") ``` - Loaded 6 products into the index - - ## 1. TextQuery: Full Text Search The `TextQuery` class enables full text search with advanced scoring algorithms. It's ideal for keyword-based search with relevance ranking. @@ -184,10 +181,6 @@ results = index.query(text_query) result_print(results) ``` - -
scoreproduct_idbrief_descriptioncategoryprice
5.953989333038773prod_1comfortable running shoes for athletesfootwear89.99
2.085315593627535prod_5basketball shoes with excellent ankle supportfootwear139.99
2.0410082774474088prod_2lightweight running jacket with water resistanceouterwear129.99
- - ### Text Search with Different Scoring Algorithms RedisVL supports multiple text scoring algorithms. Let's compare `BM25STD` and `TFIDF`: @@ -208,13 +201,6 @@ results = index.query(bm25_query) result_print(results) ``` - Results with BM25 scoring: - - - -
scoreproduct_idbrief_descriptionprice
6.031534703977659prod_1comfortable running shoes for athletes89.99
2.085315593627535prod_5basketball shoes with excellent ankle support139.99
1.5268074873573214prod_4yoga mat with extra cushioning for comfort39.99
- - ```python # TFIDF scoring @@ -231,13 +217,6 @@ results = index.query(tfidf_query) result_print(results) ``` - Results with TFIDF scoring: - - - -
scoreproduct_idbrief_descriptionprice
2.3333333333333335prod_1comfortable running shoes for athletes89.99
2.0prod_5basketball shoes with excellent ankle support139.99
1.0prod_4yoga mat with extra cushioning for comfort39.99
- - ### Text Search with Filters Combine text search with filters to narrow results: @@ -260,10 +239,6 @@ result_print(results) ``` -
scoreproduct_idbrief_descriptioncategoryprice
3.9314935770863046prod_1comfortable running shoes for athletesfootwear89.99
3.1279733904413027prod_5basketball shoes with excellent ankle supportfootwear139.99
- - - ```python # Search for products under $100 price_filtered_query = TextQuery( @@ -278,10 +253,6 @@ results = index.query(price_filtered_query) result_print(results) ``` - -
scoreproduct_idbrief_descriptionprice
3.1541404034996914prod_1comfortable running shoes for athletes89.99
1.5268074873573214prod_4yoga mat with extra cushioning for comfort39.99
- - ### Text Search with Multiple Fields and Weights You can search across multiple text fields with different weights to prioritize certain fields. @@ -300,10 +271,6 @@ results = index.query(weighted_query) result_print(results) ``` - -
scoreproduct_idbrief_description
5.035440025836444prod_1comfortable running shoes for athletes
2.085315593627535prod_5basketball shoes with excellent ankle support
- - ### Text Search with Custom Stopwords Stopwords are common words that are filtered out before processing the query. You can specify which language's default stopwords should be filtered out, like `english`, `french`, or `german`. You can also define your own list of stopwords: @@ -324,10 +291,6 @@ result_print(results) ``` -
scoreproduct_idbrief_description
5.953989333038773prod_1comfortable running shoes for athletes
2.085315593627535prod_5basketball shoes with excellent ankle support
2.0410082774474088prod_2lightweight running jacket with water resistance
- - - ```python # Use custom stopwords custom_stopwords_query = TextQuery( @@ -343,10 +306,6 @@ result_print(results) ``` -
scoreproduct_idbrief_description
3.1541404034996914prod_1comfortable running shoes for athletes
3.0864038416103prod_3professional tennis racket for competitive players
- - - ```python # No stopwords no_stopwords_query = TextQuery( @@ -361,10 +320,6 @@ results = index.query(no_stopwords_query) result_print(results) ``` - -
scoreproduct_idbrief_description
5.953989333038773prod_1comfortable running shoes for athletes
2.085315593627535prod_5basketball shoes with excellent ankle support
2.0410082774474088prod_2lightweight running jacket with water resistance
- - ## 2. AggregateHybridQuery: Combining Text and Vector Search The `AggregateHybridQuery` combines text search and vector similarity to provide the best of both worlds: @@ -379,6 +334,151 @@ hybrid_score = (alpha) * vector_score + (1 - alpha) * text_score Where `alpha` controls the balance between vector and text search (default: 0.7). +### Index-Level Stopwords Configuration + +The previous example showed **query-time stopwords** using `TextQuery.stopwords`, which filters words from the query before searching. RedisVL also supports **index-level stopwords** configuration, which determines which words are indexed in the first place. + +**Key Difference:** +- **Query-time stopwords** (`TextQuery.stopwords`): Filters words from your search query (client-side) +- **Index-level stopwords** (`IndexInfo.stopwords`): Controls which words get indexed in Redis (server-side) + +**Three Configuration Modes:** + +1. **`None` (default)**: Use Redis's default stopwords list +2. **`[]` (empty list)**: Disable stopwords completely (`STOPWORDS 0` in FT.CREATE) +3. **`["the", "a", "an"]`**: Use a custom stopwords list + +**When to use `STOPWORDS 0`:** +- When you need to search for common words like "of", "at", "the" +- For entity names containing stopwords (e.g., "Bank of Glasberliner", "University of Glasberliner") +- When working with structured data where every word matters + + +```python +# Create a schema with index-level stopwords disabled +from redisvl.index import SearchIndex + +stopwords_schema = { + "index": { + "name": "company_index", + "prefix": "company:", + "storage_type": "hash", + "stopwords": [] # STOPWORDS 0 - disable stopwords completely + }, + "fields": [ + {"name": "company_name", "type": "text"}, + {"name": "description", "type": "text"} + ] +} + +# Create index using from_dict (handles schema creation internally) +company_index = SearchIndex.from_dict(stopwords_schema, redis_url="redis://localhost:6379") +company_index.create(overwrite=True, drop=True) + +print(f"Index created with STOPWORDS 0: {company_index}") +``` + + +```python +# Load sample data with company names containing common stopwords +companies = [ + {"company_name": "Bank of Glasberliner", "description": "Major financial institution"}, + {"company_name": "University of Glasberliner", "description": "Public university system"}, + {"company_name": "Department of Glasberliner Affairs", "description": "A government agency"}, + {"company_name": "Glasberliner FC", "description": "Football Club"}, + {"company_name": "The Home Market", "description": "Home improvement retailer"}, +] + +for i, company in enumerate(companies): + company_index.load([company], keys=[f"company:{i}"]) + +print(f"✓ Loaded {len(companies)} companies") +``` + + +```python +# Search for "Bank of Glasberliner" - with STOPWORDS 0, "of" is indexed and searchable +from redisvl.query import FilterQuery + +query = FilterQuery( + filter_expression='@company_name:(Bank of Glasberliner)', + return_fields=["company_name", "description"], +) + +results = company_index.search(query.query, query_params=query.params) + +print(f"Found {len(results.docs)} results for 'Bank of Glasberliner':") +for doc in results.docs: + print(f" - {doc.company_name}: {doc.description}") +``` + +**Comparison: With vs Without Stopwords** + +If we had used the default stopwords (not specifying `stopwords` in the schema), the word "of" would be filtered out during indexing. This means: + +- ❌ Searching for `"Bank of Glasberliner"` might not find exact matches +- ❌ The phrase would be indexed as `"Bank Berlin"` (without "of") +- ✅ With `STOPWORDS 0`, all words including "of" are indexed + +**Custom Stopwords Example:** + +You can also provide a custom list of stopwords: + + +```python +# Example: Create index with custom stopwords +custom_stopwords_schema = { + "index": { + "name": "custom_stopwords_index", + "prefix": "custom:", + "stopwords": ["inc", "llc", "corp"] # Filter out legal entity suffixes + }, + "fields": [ + {"name": "name", "type": "text"} + ] +} + +# This would create an index where "inc", "llc", "corp" are not indexed +print("Custom stopwords:", custom_stopwords_schema["index"]["stopwords"]) +``` + +**YAML Format:** + +You can also define stopwords in YAML schema files: + +```yaml +version: '0.1.0' + +index: + name: company_index + prefix: company: + storage_type: hash + stopwords: [] # Disable stopwords (STOPWORDS 0) + +fields: + - name: company_name + type: text + - name: description + type: text +``` + +Or with custom stopwords: + +```yaml +index: + stopwords: + - the + - a + - an +``` + + +```python +# Cleanup +company_index.delete(drop=True) +print("✓ Cleaned up company_index") +``` + ### Basic Aggregate Hybrid Query Let's search for "running" with both text and semantic search: @@ -401,10 +501,6 @@ results = index.query(hybrid_query) result_print(results) ``` - -
vector_distanceproduct_idbrief_descriptioncategorypricevector_similaritytext_scorehybrid_score
5.96046447754e-08prod_1comfortable running shoes for athletesfootwear89.990.9999999701985.953989333042.48619677905
0.00985252857208prod_5basketball shoes with excellent ankle supportfootwear139.990.9950737357142.085315593631.32214629309
0.00985252857208prod_2lightweight running jacket with water resistanceouterwear129.990.9950737357142.041008277451.30885409823
0.0038834810257prod_4yoga mat with extra cushioning for comfortaccessories39.990.99805825948700.698640781641
0.236237406731prod_6swimming goggles with anti-fog coatingaccessories24.990.88188129663500.617316907644
- - ### Adjusting the Alpha Parameter The `alpha` parameter controls the weight between vector and text search: @@ -430,13 +526,6 @@ results = index.query(vector_heavy_query) result_print(results) ``` - Results with alpha=0.9 (vector-heavy): - - - -
vector_distanceproduct_idbrief_descriptionvector_similaritytext_scorehybrid_score
-1.19209289551e-07prod_4yoga mat with extra cushioning for comfort1.00000005961.526807487361.05268080238
0.00136888027191prod_5basketball shoes with excellent ankle support0.99931555986400.899384003878
0.00136888027191prod_2lightweight running jacket with water resistance0.99931555986400.899384003878
- - ### Aggregate Hybrid Query with Filters You can also combine hybrid search with filters: @@ -458,10 +547,6 @@ results = index.query(filtered_hybrid_query) result_print(results) ``` - -
vector_distanceproduct_idbrief_descriptioncategorypricevector_similaritytext_scorehybrid_score
-1.19209289551e-07prod_3professional tennis racket for competitive playersequipment199.991.00000005963.086403841611.62592119421
0.411657452583prod_5basketball shoes with excellent ankle supportfootwear139.990.79417127370800.555919891596
0.411657452583prod_2lightweight running jacket with water resistanceouterwear129.990.79417127370800.555919891596
- - ### Using Different Text Scorers AggregateHybridQuery supports the same text scoring algorithms as TextQuery: @@ -483,10 +568,6 @@ results = index.query(hybrid_tfidf) result_print(results) ``` - -
vector_distanceproduct_idbrief_descriptionvector_similaritytext_scorehybrid_score
0prod_5basketball shoes with excellent ankle support152.2
0prod_2lightweight running jacket with water resistance100.7
0.00136888027191prod_4yoga mat with extra cushioning for comfort0.99931555986400.699520891905
- - ## 3. MultiVectorQuery: Multi-Vector Search The `MultiVectorQuery` allows you to search over multiple vector fields simultaneously. This is useful when you have different types of embeddings (e.g., text and image embeddings) and want to find results that match across multiple modalities. @@ -531,10 +612,6 @@ results = index.query(multi_vector_query) result_print(results) ``` - -
distance_0distance_1product_idbrief_descriptioncategoryscore_0score_1combined_score
5.96046447754e-085.96046447754e-08prod_1comfortable running shoes for athletesfootwear0.9999999701980.9999999701980.999999970198
0.009852528572080.00266629457474prod_5basketball shoes with excellent ankle supportfootwear0.9950737357140.9986668527130.996151670814
0.009852528572080.0118260979652prod_2lightweight running jacket with water resistanceouterwear0.9950737357140.9940869510170.994777700305
0.00388348102570.210647821426prod_4yoga mat with extra cushioning for comfortaccessories0.9980582594870.8946760892870.967043608427
0.2362374067310.639005899429prod_6swimming goggles with anti-fog coatingaccessories0.8818812966350.6804970502850.82146602273
- - ### Adjusting Vector Weights You can adjust the weights to prioritize different vector fields: @@ -567,13 +644,6 @@ results = index.query(image_heavy_query) result_print(results) ``` - Results with emphasis on image similarity: - - - -
distance_0distance_1product_idbrief_descriptioncategoryscore_0score_1combined_score
-1.19209289551e-070prod_3professional tennis racket for competitive playersequipment1.000000059611.00000001192
0.145393729210.00900757312775prod_6swimming goggles with anti-fog coatingaccessories0.9273031353950.9954962134360.981857597828
0.4366961717610.219131231308prod_4yoga mat with extra cushioning for comfortaccessories0.781651914120.8904343843460.868677890301
- - ### Multi-Vector Query with Filters Combine multi-vector search with filters to narrow results: @@ -606,10 +676,6 @@ results = index.query(filtered_multi_query) result_print(results) ``` - -
distance_0distance_1product_idbrief_descriptioncategorypricescore_0score_1combined_score
5.96046447754e-085.96046447754e-08prod_1comfortable running shoes for athletesfootwear89.990.9999999701980.9999999701980.999999970198
0.009852528572080.00266629457474prod_5basketball shoes with excellent ankle supportfootwear139.990.9950737357140.9986668527130.996510982513
- - ## Comparing Query Types Let's compare the three query types side by side: @@ -629,16 +695,6 @@ result_print(index.query(text_q)) print() ``` - TextQuery Results (keyword-based): - - - -
scoreproduct_idbrief_description
2.8773943004779676prod_1comfortable running shoes for athletes
2.085315593627535prod_5basketball shoes with excellent ankle support
- - - - - ```python # AggregateHybridQuery - combines text and vector search @@ -656,16 +712,6 @@ result_print(index.query(hybrid_q)) print() ``` - AggregateHybridQuery Results (text + vector): - - - -
vector_distanceproduct_idbrief_descriptionvector_similaritytext_scorehybrid_score
5.96046447754e-08prod_1comfortable running shoes for athletes0.9999999701982.877394300481.56321826928
0.0038834810257prod_4yoga mat with extra cushioning for comfort0.99805825948700.698640781641
0.00985252857208prod_2lightweight running jacket with water resistance0.99507373571400.696551615
- - - - - ```python # MultiVectorQuery - searches multiple vector fields @@ -693,13 +739,6 @@ print("MultiVectorQuery Results (multiple vectors):") result_print(index.query(multi_q)) ``` - MultiVectorQuery Results (multiple vectors): - - - -
distance_0distance_1product_idbrief_descriptionscore_0score_1combined_score
5.96046447754e-085.96046447754e-08prod_1comfortable running shoes for athletes0.9999999701980.9999999701980.999999970198
0.009852528572080.00266629457474prod_5basketball shoes with excellent ankle support0.9950737357140.9986668527130.996870294213
0.009852528572080.0118260979652prod_2lightweight running jacket with water resistance0.9950737357140.9940869510170.994580343366
- - ## Best Practices ### When to Use Each Query Type: diff --git a/content/develop/ai/redisvl/user_guide/hash_vs_json.md b/content/develop/ai/redisvl/user_guide/hash_vs_json.md index 6ba665a209..9c0859325f 100644 --- a/content/develop/ai/redisvl/user_guide/hash_vs_json.md +++ b/content/develop/ai/redisvl/user_guide/hash_vs_json.md @@ -9,7 +9,7 @@ weight: 05 Out of the box, Redis provides a [variety of data structures](https://redis.com/redis-enterprise/data-structures/) that can adapt to your domain specific applications and use cases. -In this notebook, we will demonstrate how to use RedisVL with both [Hash](https://redis.io/docs/data-types/hashes/) and [JSON](https://redis.io/docs/data-types/json/) data. +In this notebook, we will demonstrate how to use RedisVL with both [Hash](https://redis.io/docs/latest/develop/data-types/#hashes) and [JSON](https://redis.io/docs/latest/develop/data-types/json/) data. Before running this notebook, be sure to