-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Summary
RedisVL currently has no way to configure index-level STOPWORDS for FT.CREATE, even though redis-py’s create_index() and cluster_create_index() APIs support it. As a result, all indices are created with Redis’s default stopwords list, and users cannot faithfully reproduce commands like:
FT.CREATE testidx ON HASH STOPWORDS 0 SCHEMA work_experience_summary TEXT INDEXMISSING SORTABLE UNF
In many text-search workflows, you may want to explicitly disable stopwords at index time so that phrases like:
Worked at Bank of America -> Worked Bank America -> with stopwords 0 -> Worked at Bank of America
behave as expected when toggling index-level stopwords behavior. This index-level behavior (disabling stopwords at indexing time) cannot currently be expressed via RedisVL’s IndexSchema / SearchIndex APIs.
This is distinct from TextQuery’s client-side stopwords parameter, which only filters query tokens and does not control what gets indexed.
Current Behavior
-
The index metadata model (
IndexInfoinredisvl/schema/schema.py) does not expose any stopwords configuration:- It has fields like
name,prefix,key_separator, andstorage_type, but nothing for index-level stopwords.
- It has fields like
-
SearchIndex.create()(inredisvl/index/index.py) builds anIndexDefinitionand then calls redis-py’screate_index()/cluster_create_index()without passing anystopwordsargument. -
There are explicit TODO comments around the index creation calls:
- Line ~605:
# TODO Nitin: Add stopwords to definition and remove as a parameter - Line ~613:
# TODO Nitin: Add stopwords to definition and remove as a parameter
- Line ~605:
-
As a consequence, every index created via RedisVL uses Redis’s default stopwords list, and there is no official way to:
- Disable stopwords entirely (
STOPWORDS 0), or - Provide a custom stopword list (
STOPWORDS <N> <w1> <w2> ...).
- Disable stopwords entirely (
Reproduction Steps
-
Define a simple schema for a hash index with a text field:
-
Use
IndexSchema.from_dict()orIndexSchema.from_yaml()with something like:index.name = "testidx"index.prefix = "test"index.storage_type = "hash"fields = [{"name": "desc", "type": "text"}]
-
-
Create the index via
SearchIndex:- Initialize
SearchIndex(schema=schema, redis_url="redis://localhost:6379"). - Call
index.create().
- Initialize
-
Observe:
- There is no field or argument anywhere in the RedisVL API to specify index-level stopwords.
- The resulting
FT.CREATEcommand that Redis receives does not contain aSTOPWORDSclause.
-
Compare this to the desired Redis command:
FT.CREATE testidx ON HASH STOPWORDS 0 SCHEMA desc TEXT INDEXMISSING SORTABLE UNF
which disables stopwords at the index level. This cannot be expressed via the current RedisVL API.
Expected Behavior
-
Users should be able to configure index-level stopwords behavior directly via the
IndexSchema/SearchIndexAPIs, including:- Disable stopwords completely:
- Equivalent to
STOPWORDS 0inFT.CREATE.
- Equivalent to
- Provide a custom stopword list:
- Equivalent to
STOPWORDS <N> <w1> <w2> ....
- Equivalent to
- Use default behavior:
- No explicit
STOPWORDSclause, letting Redis fall back to its default list.
- No explicit
- Disable stopwords completely:
-
This configuration should be:
- Representable in YAML/dict schemas.
- Plumbed through to both single-node and clustered clients.
Actual Behavior
IndexSchemahas no field for stopwords.SearchIndex.create()passes onlyfieldsanddefinitionto the underlyingcreate_index()/cluster_create_index()calls, with nostopwordsargument.- All indices use Redis’s default stopwords list.
- There is no way to reproduce
STOPWORDS 0or custom stopwords inFT.CREATEvia RedisVL.
Notes / Clarifications
-
TextQuery.stopwordsis query-time, client-side behavior:- It controls which tokens from the user query are included in the
FT.SEARCHquery string. - It does not affect what tokens were indexed.
- It controls which tokens from the user query are included in the
-
Index-level
STOPWORDSis an indexing-timeFT.CREATEdirective:- It determines which tokens get added to the inverted index.
- Once the index is created without
STOPWORDS 0, you cannot later “undo” this viaFilterQueryor client-side token changes.
Proposed Solution
-
Extend
IndexInfo/IndexSchema-
Add an optional
stopwordsfield toIndexInfo, supporting something like:stopwords: nullor omitted → default Redis behavior (noSTOPWORDSclause).stopwords: 0(or a special value like"none") → emitSTOPWORDS 0.stopwords: ["a", "an", "the"]→ emitSTOPWORDS 3 a an the.
-
Ensure this works in both YAML and dict-based schema definitions.
-
-
Plumb through
SearchIndex.create()-
Derive a
stopwordsargument fromself.schema.index.stopwords. -
Pass it through to:
cluster_create_index(index_name, client, fields, definition, stopwords=...), andself._redis_client.ft(self.name).create_index(fields=redis_fields, definition=definition, stopwords=...)
according to redis-py’s API.
-
-
Documentation
- Document how index-level stopwords differ from
TextQuery.stopwords. - Show examples for:
- Default behavior.
STOPWORDS 0.- Custom stopword list.
- Document how index-level stopwords differ from
-
Testing
-
Unit tests for:
- Parsing various
stopwordsconfigurations inIndexSchema.from_dict()/.from_yaml(). - Ensuring
SearchIndex.create()calls the underlying client with the expectedstopwordsargument.
- Parsing various
-
Integration test (with Redis):
- Create an index with
STOPWORDS 0via RedisVL. - Confirm via behavior (e.g., ability to search on common stopwords) or
FT.INFOthat stopwords are disabled.
- Create an index with
-
References
- Relevant code (current behavior):
redisvl/schema/schema.py–IndexInfodefinition.redisvl/index/index.py–SearchIndex.create()and associated TODO comments for stopwords.
- Example commands:
FT.CREATE testidx ON HASH STOPWORDS 0 SCHEMA desc TEXT INDEXMISSING SORTABLE UNF.
- Example behavior:
Worked at Bank of Redis -> Worked Bank Redis -> with stopwords 0 -> Worked at Bank of Redis.