Skip to content

Add show_progress config to DocumentProcessor#97

Merged
tisnik merged 1 commit intolightspeed-core:mainfrom
thibaultmg:add-progress-bar
Mar 10, 2026
Merged

Add show_progress config to DocumentProcessor#97
tisnik merged 1 commit intolightspeed-core:mainfrom
thibaultmg:add-progress-bar

Conversation

@thibaultmg
Copy link
Copy Markdown
Contributor

@thibaultmg thibaultmg commented Mar 10, 2026

Description

Generating embeddings can take time and the process can appear to be hanging, causing confusion. This PR exposes the show_progress parameter in the DocumentProcessor wrapper (defaulting to False) to allow enabling the progress bar on the VectorStoreIndex.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Gemini

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features
    • Added optional show_progress parameter to DocumentProcessor initialization to enable progress reporting during vector store index creation. Defaults to disabled for backward compatibility.

Expose the show_progress parameter in the DocumentProcessor wrapper and
propagate it to VectorStoreIndex. Update existing tests to verify the
new configuration field.

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 10, 2026

Walkthrough

A new optional show_progress parameter (default False) is added to the DocumentProcessor class constructor. The parameter is stored in the internal configuration and passed to VectorStoreIndex creation within _LlamaIndexDB._save_index for optional progress reporting during index creation.

Changes

Cohort / File(s) Summary
Vector Store Progress Configuration
src/lightspeed_rag_content/document_processor.py
Added show_progress: bool = False parameter to DocumentProcessor.__init__, stored in internal _Config, and passed to VectorStoreIndex construction in _LlamaIndexDB._save_index.
Test Updates
tests/test_document_processor.py
Updated three test cases to include show_progress=False in DocumentProcessor initialization expectations, reflecting the new configuration parameter.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a show_progress configuration option to DocumentProcessor.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Collaborator

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/test_document_processor.py (1)

66-201: Consider adding test coverage for show_progress=True.

The current tests only verify the default show_progress=False behavior. Consider adding a test case that:

  1. Initializes DocumentProcessor with show_progress=True.
  2. Verifies the value is correctly stored in config.
  3. Optionally, verifies that VectorStoreIndex is called with show_progress=True during save.
💡 Example test case
def test_init_with_show_progress_enabled(self, mock_processor):
    """Test DocumentProcessor initialization with show_progress=True."""
    params = mock_processor["params"].copy()
    params["show_progress"] = True

    doc_processor = document_processor.DocumentProcessor(**params)
    
    assert doc_processor.config.show_progress is True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_document_processor.py` around lines 66 - 201, Add a test that
covers initializing DocumentProcessor with show_progress=True and asserting it
is persisted on the config and used during save: create a test (e.g.,
test_init_with_show_progress_enabled) that copies mock_processor["params"], sets
params["show_progress"]=True, constructs
document_processor.DocumentProcessor(**params), asserts
doc_processor.config.show_progress is True, and then calls
doc_processor.save(...) while asserting the vector store constructor/save path
(mock_processor["indexdb"] or mock_processor["llamadb"]/doc_processor.db) is
invoked with show_progress=True or that save forwards show_progress to the
VectorStoreIndex call; reference DocumentProcessor, .config, save, and the
mocked indexdb/llamadb/db to locate related code.
src/lightspeed_rag_content/document_processor.py (1)

188-401: Note: show_progress is not utilized by _LlamaStackDB.

The show_progress parameter is only effective when using faiss or postgres vector store types (handled by _LlamaIndexDB). When using llamastack-faiss or llamastack-sqlite-vec, the parameter will be silently ignored.

Consider either:

  1. Documenting this limitation in the show_progress parameter docstring.
  2. Logging a debug/info message when show_progress=True is used with an unsupported vector store type.

This is not blocking since the primary use case targets LlamaIndex backends.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lightspeed_rag_content/document_processor.py` around lines 188 - 401, The
review points out that _LlamaStackDB silently ignores the show_progress flag for
llamastack-faiss/llamastack-sqlite-vec; update _LlamaStackDB to detect
config.show_progress and either (a) emit a LOG.debug/LOG.info message when
show_progress is True and self.config.vector_store_type startswith "llamastack-"
informing users that progress is not supported, or (b) add a brief docstring
note on the class/method (e.g., in _LlamaStackDB.__init__ or save) documenting
that show_progress is unsupported for llama-stack backends; place the check in
save (before write_yaml_config/_start_llama_stack) so the message runs on save
operations and reference the class and method names (_LlamaStackDB, save,
__init__) when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/lightspeed_rag_content/document_processor.py`:
- Around line 188-401: The review points out that _LlamaStackDB silently ignores
the show_progress flag for llamastack-faiss/llamastack-sqlite-vec; update
_LlamaStackDB to detect config.show_progress and either (a) emit a
LOG.debug/LOG.info message when show_progress is True and
self.config.vector_store_type startswith "llamastack-" informing users that
progress is not supported, or (b) add a brief docstring note on the class/method
(e.g., in _LlamaStackDB.__init__ or save) documenting that show_progress is
unsupported for llama-stack backends; place the check in save (before
write_yaml_config/_start_llama_stack) so the message runs on save operations and
reference the class and method names (_LlamaStackDB, save, __init__) when making
the change.

In `@tests/test_document_processor.py`:
- Around line 66-201: Add a test that covers initializing DocumentProcessor with
show_progress=True and asserting it is persisted on the config and used during
save: create a test (e.g., test_init_with_show_progress_enabled) that copies
mock_processor["params"], sets params["show_progress"]=True, constructs
document_processor.DocumentProcessor(**params), asserts
doc_processor.config.show_progress is True, and then calls
doc_processor.save(...) while asserting the vector store constructor/save path
(mock_processor["indexdb"] or mock_processor["llamadb"]/doc_processor.db) is
invoked with show_progress=True or that save forwards show_progress to the
VectorStoreIndex call; reference DocumentProcessor, .config, save, and the
mocked indexdb/llamadb/db to locate related code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: be81c0ff-3534-4b65-b33a-7062be31c87c

📥 Commits

Reviewing files that changed from the base of the PR and between 56f67f7 and 364282f.

📒 Files selected for processing (2)
  • src/lightspeed_rag_content/document_processor.py
  • tests/test_document_processor.py

@tisnik tisnik merged commit a3862ee into lightspeed-core:main Mar 10, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants