Skip to content

Conversation

@nkanu17
Copy link
Collaborator

@nkanu17 nkanu17 commented Nov 21, 2025

Add Index-Level STOPWORDS Configuration Support

Summary

This PR adds support for configuring stopwords at index creation time through the IndexInfo.stopwords field, enabling users to control which words are filtered during indexing rather than only at query time.
Related to issue: #432

Motivation

Previously, RedisVL had no way to configure index-level STOPWORDS for FT.CREATE, even though redis-py's create_index() API supports it. Users could only control stopwords at query time via TextQuery.stopwords, which is insufficient for use cases like:

  • Searching for phrases containing common words (e.g., "Bank of Glasberliner")
  • Custom stopwords lists for domain-specific applications
  • Disabling stopwords entirely for precise text matching

Changes

Core Implementation

  • redisvl/schema/schema.py: Added stopwords: Optional[List[str]] field to IndexInfo class

    • None (default): Use Redis default stopwords (~300 words)
    • [] (empty list): Disable stopwords completely (STOPWORDS 0)
    • Custom list: Use specified stopwords
  • redisvl/index/index.py: Updated SearchIndex.create() and AsyncSearchIndex.create() to extract and pass stopwords from schema to create_index()

  • redisvl/redis/connection.py: Updated convert_index_info_to_schema() to parse stopwords from FT.INFO output, enabling from_existing() to preserve stopwords configuration

  • redisvl/redis/utils.py: Updated cluster_create_index() and async_cluster_create_index() to accept stopwords parameter

Warning System

Added validation in BaseSearchIndex._validate_query() to warn users when using query-time stopwords with index-level STOPWORDS 0, as this is counterproductive (indexes all words but filters them at query time).

Tests

  • 22 new tests (8 unit tests for schema, 3 for FT.INFO parsing, 11 integration tests)
  • All tests passing (981/981 unit tests, 1003/1003 integration tests)
  • Comprehensive coverage of all three stopwords modes and edge cases

Documentation

  • docs/user_guide/11_advanced_queries.ipynb: Added 8 cells (22-29) with examples and explanations
  • docs/stopwords_interaction_guide.md: Comprehensive guide explaining index-level vs query-time stopwords interaction
  • Updated docstrings in IndexInfo class with YAML and dict examples

Usage Examples

Disable stopwords (STOPWORDS 0)

schema = IndexSchema.from_dict({
    "index": {
        "name": "company-idx",
        "prefix": "company",
        "stopwords": []  # Disable stopwords
    },
    "fields": [{"name": "name", "type": "text"}]
})
index = SearchIndex(schema=schema)
index.create()

# Now "Bank of Glasberliner" searches work correctly
query = TextQuery(text="Bank of Glasberliner", text_field_name="name", stopwords=None)
results = index.query(query)

Copilot AI review requested due to automatic review settings November 21, 2025 00:59
@nkanu17 nkanu17 requested review from abrookins, bsbodden, justin-cechmanek and rbs333 and removed request for Copilot November 21, 2025 00:59
Copilot finished reviewing on behalf of nkanu17 November 21, 2025 01:02
Copilot AI review requested due to automatic review settings November 21, 2025 01:10
Copilot finished reviewing on behalf of nkanu17 November 21, 2025 01:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds index-level stopwords configuration support to RedisVL, enabling users to control which words are filtered during indexing rather than only at query time. This addresses the limitation where phrases like "Bank of America" couldn't be properly searched due to stopwords being filtered at index time.

Key Changes:

  • Added stopwords field to IndexInfo class supporting three modes: None (default), [] (STOPWORDS 0), or custom list
  • Updated index creation logic to pass stopwords configuration to Redis FT.CREATE command
  • Added parsing of stopwords from FT.INFO output to support from_existing() reconstruction
  • Implemented warning system for counterproductive query-time stopwords with STOPWORDS 0

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
redisvl/schema/schema.py Added stopwords field to IndexInfo class with documentation
redisvl/schema/fields.py Refactored field modifier ordering with _normalize_field_modifiers helper
redisvl/redis/connection.py Added stopwords parsing in convert_index_info_to_schema()
redisvl/redis/utils.py Updated cluster_create_index functions to accept stopwords parameter
redisvl/index/index.py Updated create() methods to extract and pass stopwords; added validation warning
redisvl/query/query.py Enhanced TextQuery.stopwords docstring with interaction notes
tests/unit/test_stopwords_schema.py 8 unit tests for schema stopwords configuration
tests/unit/test_convert_index_info.py 3 tests for FT.INFO stopwords parsing
tests/unit/test_field_modifier_ordering.py Comprehensive tests for field modifier ordering fix
tests/integration/test_stopwords_integration.py 11 integration tests for stopwords functionality
tests/integration/test_field_modifier_ordering_integration.py Integration tests against live Redis
docs/user_guide/11_advanced_queries.ipynb Added 8 cells demonstrating stopwords usage
docs/api/schema.rst Added documentation for index-level stopwords configuration
docs/api/query.rst Added note about stopwords interaction

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@justin-cechmanek justin-cechmanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good. Just a few questions, and suggest we include AggregateHybridQuery to the stopwords modifications.

Copilot AI review requested due to automatic review settings November 21, 2025 01:47
Copilot finished reviewing on behalf of nkanu17 November 21, 2025 01:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

docs/user_guide/11_advanced_queries.ipynb:1

  • The example uses SearchIndex.from_dict() to create an index, but based on the codebase patterns, from_dict() is typically a method on IndexSchema, not SearchIndex. The correct pattern should likely be: schema = IndexSchema.from_dict(stopwords_schema); company_index = SearchIndex(schema=schema, redis_url=...). This could confuse users trying to follow the example.
{

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
"pygments_lexer": "ipython3"
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version field was removed from the notebook metadata's language_info section. This is generally acceptable as notebook executors can infer the version, but be aware this might cause version ambiguity in some environments. Consider documenting the required Python version elsewhere if it's critical.

Suggested change
"pygments_lexer": "ipython3"
"pygments_lexer": "ipython3",
"version": "3.10.12"

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings November 21, 2025 03:43
Copilot finished reviewing on behalf of nkanu17 November 21, 2025 03:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 21, 2025 03:56
Copilot finished reviewing on behalf of nkanu17 November 21, 2025 03:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 21, 2025 15:05
Copilot finished reviewing on behalf of nkanu17 November 21, 2025 15:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

tests/unit/test_field_modifier_ordering.py:12

  • Import of 'pytest' is not used.
import pytest

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@rbs333 rbs333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also solid however I wouldn't use "Bank of America" as the primary example because it could be a bit of a faux pas. It can be any other phrase of that kind but if I were investigating a repo like this and saw "Bank of America" as the marquee I would deduct open source cred if that makes sense.

@nkanu17
Copy link
Collaborator Author

nkanu17 commented Nov 21, 2025

Looking really good. Just a few questions, and suggest we include AggregateHybridQuery to the stopwords modifications.

Bank of America
@rbs333
Actually I removed references to it and should have been pushed but noted! Thanks!

@nkanu17 nkanu17 force-pushed the RAAE-1200-add-index-level-stopwords-support-in-configuration-in-index-schema branch from 31e1237 to 3b0c8fb Compare November 21, 2025 15:22
Copilot AI review requested due to automatic review settings November 21, 2025 15:38
Copilot finished reviewing on behalf of nkanu17 November 21, 2025 15:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nkanu17 nkanu17 force-pushed the RAAE-1200-add-index-level-stopwords-support-in-configuration-in-index-schema branch from 5c64a4d to 7cb8a16 Compare November 21, 2025 15:46
Add support for configuring stopwords at index creation time via IndexInfo.stopwords field.

- Add stopwords field to IndexInfo class (None/[]/custom list)
- Update SearchIndex.create() and AsyncSearchIndex.create() to pass stopwords
- Update convert_index_info_to_schema() to parse stopwords from FT.INFO
- Update cluster_create_index() functions to accept stopwords parameter
- Add warning when using query-time stopwords with index-level STOPWORDS 0
- Add comprehensive documentation in 11_advanced_queries.ipynb
- Create stopwords_interaction_guide.md explaining best practices
- Add note to TextQuery docstring about index-level vs query-time stopwords
- Add stopwords section to schema.rst with configuration examples
- Add note to query.rst about stopwords interaction
- Link stopwords_interaction_guide.md from 11_advanced_queries.ipynb

These updates improve discoverability and help users understand the
interaction between index-level and query-time stopwords configuration.
- Clear notebook outputs and Python version metadata to reduce git noise
- Remove unused 'info' variable in test_create_index_with_default_stopwords
- Add explanatory comments to all exception handlers in cleanup blocks

These changes improve code quality and maintainability without affecting functionality.
Remove broad 'except Exception: pass' blocks in test cleanup that were
hiding real failures. Instead:

- test_stopwords_integration.py: Created pytest fixtures for each index
  type (stopwords_disabled_index, custom_stopwords_index,
  default_stopwords_index) that handle cleanup automatically

- test_field_modifier_ordering_integration.py: Removed try/finally/except
  blocks and replaced with direct cleanup calls that will fail if there's
  a real problem (connection issues, permissions, etc.)

This matches the pattern used in most other integration tests and ensures
that cleanup failures are visible rather than silently ignored.
Three improvements:

1. Rename test class for clarity
   - TestMultipleCommandsScenarioIntegration → TestFieldModifierIntegration
   - test_mlp_commands_index_creation → test_index_creation_with_multiple_modifiers
   - Use general naming instead of MLP-specific terminology for open-source

2. Move imports to file level
   - Moved FilterQuery import from inside test function to top of file
   - Follows standard Python convention

3. Extend stopwords validation to AggregateHybridQuery
   - Added AggregateHybridQuery to stopwords warning in _validate_query()
   - Both TextQuery and AggregateHybridQuery now warn when using query-time
     stopwords with index-level STOPWORDS 0
   - Warning message now includes the specific query type for clarity
Add documentation about index-level vs query-time stopwords interaction
for AggregateHybridQuery:

1. Updated AggregateHybridQuery.__init__() docstring
   - Added note about query-time vs index-level stopwords
   - References stopwords_interaction_guide.md for details

2. Updated docs/api/query.rst
   - Added note for HybridQuery/AggregateHybridQuery section
   - Matches the note already present for TextQuery
   - Links to Stopwords Interaction Guide

This completes the documentation updates for stopwords support across
all query types that use stopwords (TextQuery, AggregateHybridQuery).
…mment

Two improvements:

1. Remove all references to docs/stopwords_interaction_guide.md
   - Removed from docs/api/query.rst (HybridQuery and TextQuery notes)
   - Removed from docs/api/schema.rst (replaced with reference to user guide)
   - Removed from redisvl/query/query.py (TextQuery docstring)
   - Removed from redisvl/query/aggregate.py (AggregateHybridQuery docstring)
   - Removed from redisvl/index/index.py (warning message)
   - Removed from docs/user_guide/11_advanced_queries.ipynb
   - The notebook already has comprehensive stopwords documentation

2. Fix inaccurate comment about stopwords=None
   - Updated comment in redisvl/index/index.py to clarify that ANY falsy
     value (None, False, '', 0, [], etc.) results in an empty set, not just None
   - This matches the actual implementation in _set_stopwords() which uses
     'if not stopwords:' check
   - Updated warning message to mention 'stopwords=None (or any falsy value)'
GitHub Copilot correctly identified that markdown cells should not have
execution metadata. This metadata is an artifact from running the notebook
and can cause issues with notebook validation or rendering in some tools.

Cleaned 9 markdown cells by removing their 'execution' metadata blocks.
Only code cells should have execution metadata.
Changed all references from 'Bank of America' to 'Bank of Berlin' for
consistency throughout the stopwords documentation in the notebook.

Updated 6 occurrences across:
- Example data (company_name field)
- Search query and comment
- Print statement
- Markdown explanation text

This addresses the GitHub Copilot feedback about inconsistent entity names.
Updated the explanation text to use 'Bank Berlin' instead of 'Bank America'
for consistency. The phrase now correctly shows that 'Bank of Berlin' would
be indexed as 'Bank Berlin' (without 'of') when using default stopwords.
@nkanu17 nkanu17 force-pushed the RAAE-1200-add-index-level-stopwords-support-in-configuration-in-index-schema branch from 7cb8a16 to 8f8a86c Compare November 21, 2025 16:35
Copy link
Collaborator

@bsbodden bsbodden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done @nkanu17 !

@nkanu17 nkanu17 merged commit d1f4e76 into main Nov 21, 2025
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants