Skip to content

Add index-level STOPWORDS configuration support in IndexSchema and SearchIndex.create() #432

@nkanu17

Description

@nkanu17

Summary

RedisVL currently has no way to configure index-level STOPWORDS for FT.CREATE, even though redis-py’s create_index() and cluster_create_index() APIs support it. As a result, all indices are created with Redis’s default stopwords list, and users cannot faithfully reproduce commands like:

  • FT.CREATE testidx ON HASH STOPWORDS 0 SCHEMA work_experience_summary TEXT INDEXMISSING SORTABLE UNF

In many text-search workflows, you may want to explicitly disable stopwords at index time so that phrases like:

Worked at Bank of America -> Worked Bank America -> with stopwords 0 -> Worked at Bank of America

behave as expected when toggling index-level stopwords behavior. This index-level behavior (disabling stopwords at indexing time) cannot currently be expressed via RedisVL’s IndexSchema / SearchIndex APIs.

This is distinct from TextQuery’s client-side stopwords parameter, which only filters query tokens and does not control what gets indexed.


Current Behavior

  1. The index metadata model (IndexInfo in redisvl/schema/schema.py) does not expose any stopwords configuration:

    • It has fields like name, prefix, key_separator, and storage_type, but nothing for index-level stopwords.
  2. SearchIndex.create() (in redisvl/index/index.py) builds an IndexDefinition and then calls redis-py’s create_index() / cluster_create_index() without passing any stopwords argument.

  3. There are explicit TODO comments around the index creation calls:

    • Line ~605: # TODO Nitin: Add stopwords to definition and remove as a parameter
    • Line ~613: # TODO Nitin: Add stopwords to definition and remove as a parameter
  4. As a consequence, every index created via RedisVL uses Redis’s default stopwords list, and there is no official way to:

    • Disable stopwords entirely (STOPWORDS 0), or
    • Provide a custom stopword list (STOPWORDS <N> <w1> <w2> ...).

Reproduction Steps

  1. Define a simple schema for a hash index with a text field:

    • Use IndexSchema.from_dict() or IndexSchema.from_yaml() with something like:

      • index.name = "testidx"
      • index.prefix = "test"
      • index.storage_type = "hash"
      • fields = [{"name": "desc", "type": "text"}]
  2. Create the index via SearchIndex:

    • Initialize SearchIndex(schema=schema, redis_url="redis://localhost:6379").
    • Call index.create().
  3. Observe:

    • There is no field or argument anywhere in the RedisVL API to specify index-level stopwords.
    • The resulting FT.CREATE command that Redis receives does not contain a STOPWORDS clause.
  4. Compare this to the desired Redis command:

    • FT.CREATE testidx ON HASH STOPWORDS 0 SCHEMA desc TEXT INDEXMISSING SORTABLE UNF

    which disables stopwords at the index level. This cannot be expressed via the current RedisVL API.


Expected Behavior

  • Users should be able to configure index-level stopwords behavior directly via the IndexSchema / SearchIndex APIs, including:

    1. Disable stopwords completely:
      • Equivalent to STOPWORDS 0 in FT.CREATE.
    2. Provide a custom stopword list:
      • Equivalent to STOPWORDS <N> <w1> <w2> ....
    3. Use default behavior:
      • No explicit STOPWORDS clause, letting Redis fall back to its default list.
  • This configuration should be:

    • Representable in YAML/dict schemas.
    • Plumbed through to both single-node and clustered clients.

Actual Behavior

  • IndexSchema has no field for stopwords.
  • SearchIndex.create() passes only fields and definition to the underlying create_index() / cluster_create_index() calls, with no stopwords argument.
  • All indices use Redis’s default stopwords list.
  • There is no way to reproduce STOPWORDS 0 or custom stopwords in FT.CREATE via RedisVL.

Notes / Clarifications

  • TextQuery.stopwords is query-time, client-side behavior:

    • It controls which tokens from the user query are included in the FT.SEARCH query string.
    • It does not affect what tokens were indexed.
  • Index-level STOPWORDS is an indexing-time FT.CREATE directive:

    • It determines which tokens get added to the inverted index.
    • Once the index is created without STOPWORDS 0, you cannot later “undo” this via FilterQuery or client-side token changes.

Proposed Solution

  1. Extend IndexInfo / IndexSchema

    • Add an optional stopwords field to IndexInfo, supporting something like:

      • stopwords: null or omitted → default Redis behavior (no STOPWORDS clause).
      • stopwords: 0 (or a special value like "none") → emit STOPWORDS 0.
      • stopwords: ["a", "an", "the"] → emit STOPWORDS 3 a an the.
    • Ensure this works in both YAML and dict-based schema definitions.

  2. Plumb through SearchIndex.create()

    • Derive a stopwords argument from self.schema.index.stopwords.

    • Pass it through to:

      • cluster_create_index(index_name, client, fields, definition, stopwords=...), and
      • self._redis_client.ft(self.name).create_index(fields=redis_fields, definition=definition, stopwords=...)

      according to redis-py’s API.

  3. Documentation

    • Document how index-level stopwords differ from TextQuery.stopwords.
    • Show examples for:
      • Default behavior.
      • STOPWORDS 0.
      • Custom stopword list.
  4. Testing

    • Unit tests for:

      • Parsing various stopwords configurations in IndexSchema.from_dict() / .from_yaml().
      • Ensuring SearchIndex.create() calls the underlying client with the expected stopwords argument.
    • Integration test (with Redis):

      • Create an index with STOPWORDS 0 via RedisVL.
      • Confirm via behavior (e.g., ability to search on common stopwords) or FT.INFO that stopwords are disabled.

References

  • Relevant code (current behavior):
    • redisvl/schema/schema.pyIndexInfo definition.
    • redisvl/index/index.pySearchIndex.create() and associated TODO comments for stopwords.
  • Example commands:
    • FT.CREATE testidx ON HASH STOPWORDS 0 SCHEMA desc TEXT INDEXMISSING SORTABLE UNF.
  • Example behavior:
    • Worked at Bank of Redis -> Worked Bank Redis -> with stopwords 0 -> Worked at Bank of Redis.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions