Add show_progress config to DocumentProcessor#97
Conversation
Expose the show_progress parameter in the DocumentProcessor wrapper and propagate it to VectorStoreIndex. Update existing tests to verify the new configuration field. Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
WalkthroughA new optional Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/test_document_processor.py (1)
66-201: Consider adding test coverage forshow_progress=True.The current tests only verify the default
show_progress=Falsebehavior. Consider adding a test case that:
- Initializes
DocumentProcessorwithshow_progress=True.- Verifies the value is correctly stored in config.
- Optionally, verifies that
VectorStoreIndexis called withshow_progress=Trueduring save.💡 Example test case
def test_init_with_show_progress_enabled(self, mock_processor): """Test DocumentProcessor initialization with show_progress=True.""" params = mock_processor["params"].copy() params["show_progress"] = True doc_processor = document_processor.DocumentProcessor(**params) assert doc_processor.config.show_progress is True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_document_processor.py` around lines 66 - 201, Add a test that covers initializing DocumentProcessor with show_progress=True and asserting it is persisted on the config and used during save: create a test (e.g., test_init_with_show_progress_enabled) that copies mock_processor["params"], sets params["show_progress"]=True, constructs document_processor.DocumentProcessor(**params), asserts doc_processor.config.show_progress is True, and then calls doc_processor.save(...) while asserting the vector store constructor/save path (mock_processor["indexdb"] or mock_processor["llamadb"]/doc_processor.db) is invoked with show_progress=True or that save forwards show_progress to the VectorStoreIndex call; reference DocumentProcessor, .config, save, and the mocked indexdb/llamadb/db to locate related code.src/lightspeed_rag_content/document_processor.py (1)
188-401: Note:show_progressis not utilized by_LlamaStackDB.The
show_progressparameter is only effective when usingfaissorpostgresvector store types (handled by_LlamaIndexDB). When usingllamastack-faissorllamastack-sqlite-vec, the parameter will be silently ignored.Consider either:
- Documenting this limitation in the
show_progressparameter docstring.- Logging a debug/info message when
show_progress=Trueis used with an unsupported vector store type.This is not blocking since the primary use case targets LlamaIndex backends.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_rag_content/document_processor.py` around lines 188 - 401, The review points out that _LlamaStackDB silently ignores the show_progress flag for llamastack-faiss/llamastack-sqlite-vec; update _LlamaStackDB to detect config.show_progress and either (a) emit a LOG.debug/LOG.info message when show_progress is True and self.config.vector_store_type startswith "llamastack-" informing users that progress is not supported, or (b) add a brief docstring note on the class/method (e.g., in _LlamaStackDB.__init__ or save) documenting that show_progress is unsupported for llama-stack backends; place the check in save (before write_yaml_config/_start_llama_stack) so the message runs on save operations and reference the class and method names (_LlamaStackDB, save, __init__) when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/lightspeed_rag_content/document_processor.py`:
- Around line 188-401: The review points out that _LlamaStackDB silently ignores
the show_progress flag for llamastack-faiss/llamastack-sqlite-vec; update
_LlamaStackDB to detect config.show_progress and either (a) emit a
LOG.debug/LOG.info message when show_progress is True and
self.config.vector_store_type startswith "llamastack-" informing users that
progress is not supported, or (b) add a brief docstring note on the class/method
(e.g., in _LlamaStackDB.__init__ or save) documenting that show_progress is
unsupported for llama-stack backends; place the check in save (before
write_yaml_config/_start_llama_stack) so the message runs on save operations and
reference the class and method names (_LlamaStackDB, save, __init__) when making
the change.
In `@tests/test_document_processor.py`:
- Around line 66-201: Add a test that covers initializing DocumentProcessor with
show_progress=True and asserting it is persisted on the config and used during
save: create a test (e.g., test_init_with_show_progress_enabled) that copies
mock_processor["params"], sets params["show_progress"]=True, constructs
document_processor.DocumentProcessor(**params), asserts
doc_processor.config.show_progress is True, and then calls
doc_processor.save(...) while asserting the vector store constructor/save path
(mock_processor["indexdb"] or mock_processor["llamadb"]/doc_processor.db) is
invoked with show_progress=True or that save forwards show_progress to the
VectorStoreIndex call; reference DocumentProcessor, .config, save, and the
mocked indexdb/llamadb/db to locate related code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: be81c0ff-3534-4b65-b33a-7062be31c87c
📒 Files selected for processing (2)
src/lightspeed_rag_content/document_processor.pytests/test_document_processor.py
Description
Generating embeddings can take time and the process can appear to be hanging, causing confusion. This PR exposes the
show_progressparameter in theDocumentProcessorwrapper (defaulting toFalse) to allow enabling the progress bar on theVectorStoreIndex.Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
show_progressparameter to DocumentProcessor initialization to enable progress reporting during vector store index creation. Defaults to disabled for backward compatibility.