LCORE-1402: OKP Index name missing in chunk metadata (tool RAG)#69#1322
LCORE-1402: OKP Index name missing in chunk metadata (tool RAG)#69#1322tisnik merged 1 commit intolightspeed-core:mainfrom
Conversation
WalkthroughThe PR extends the RAG ID mapping functionality to support both BYOK and OKP configurations. The Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can customize the high-level summary generated by CodeRabbit.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/unit/test_configuration.py (1)
1035-1068: Add a regression test forSOLR_DEFAULT_VECTOR_STORE_IDcollision.Given the merged mapping behavior, it’s worth adding one case where BYOK uses
constants.SOLR_DEFAULT_VECTOR_STORE_IDto lock expected behavior (error or deterministic precedence) and avoid future source mislabeling regressions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_configuration.py` around lines 1035 - 1068, Add a regression test that verifies a collision when a BYOK entry uses constants.SOLR_DEFAULT_VECTOR_STORE_ID; specifically, call AppConfig().init_from_dict with a byok_rag entry whose vector_db_id is constants.SOLR_DEFAULT_VECTOR_STORE_ID and assert that AppConfig.init_from_dict raises a ValueError (or whichever deterministic failure your config validation uses) to lock the expected behavior and prevent silent overwrites of constants.SOLR_DEFAULT_VECTOR_STORE_ID in cfg.rag_id_mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/configuration.py`:
- Around line 402-405: The combined mapping at the end of rag_id_mapping
currently does {**byok_mapping, **okp_mapping} which silently lets okp_mapping
override BYOK when okp key equals constants.SOLR_DEFAULT_VECTOR_STORE_ID; change
the merge logic in the function to preserve BYOK entries (e.g., combine by
iterating okp_mapping keys and only insert if the key is not already present in
byok_mapping, or merge with byok taking precedence) so that byok_mapping wins
and non-OKP chunks are not misclassified; update the return that builds the
final mapping (refer to byok_mapping, okp_mapping,
constants.SOLR_DEFAULT_VECTOR_STORE_ID) accordingly and add a short comment
explaining the precedence choice.
---
Nitpick comments:
In `@tests/unit/test_configuration.py`:
- Around line 1035-1068: Add a regression test that verifies a collision when a
BYOK entry uses constants.SOLR_DEFAULT_VECTOR_STORE_ID; specifically, call
AppConfig().init_from_dict with a byok_rag entry whose vector_db_id is
constants.SOLR_DEFAULT_VECTOR_STORE_ID and assert that AppConfig.init_from_dict
raises a ValueError (or whichever deterministic failure your config validation
uses) to lock the expected behavior and prevent silent overwrites of
constants.SOLR_DEFAULT_VECTOR_STORE_ID in cfg.rag_id_mapping.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4a8083c3-295f-49f4-80ef-8ca7a8cbf469
📒 Files selected for processing (2)
src/configuration.pytests/unit/test_configuration.py
| okp_mapping = ( | ||
| {constants.SOLR_DEFAULT_VECTOR_STORE_ID: okp_id} if okp_enabled else {} | ||
| ) | ||
| return {**byok_mapping, **okp_mapping} |
There was a problem hiding this comment.
Prevent silent BYOK/OKP key collision override in rag_id_mapping.
At Line 405, {**byok_mapping, **okp_mapping} silently overwrites BYOK when a BYOK vector_db_id equals constants.SOLR_DEFAULT_VECTOR_STORE_ID. That can return the wrong source ("okp") for non-OKP chunks.
💡 Proposed fix
rag = self._configuration.rag
okp_id = constants.OKP_RAG_ID
okp_enabled = okp_id in (rag.inline or []) or okp_id in (rag.tool or [])
- okp_mapping = (
- {constants.SOLR_DEFAULT_VECTOR_STORE_ID: okp_id} if okp_enabled else {}
- )
- return {**byok_mapping, **okp_mapping}
+ if okp_enabled:
+ solr_store_id = constants.SOLR_DEFAULT_VECTOR_STORE_ID
+ if (
+ solr_store_id in byok_mapping
+ and byok_mapping[solr_store_id] != okp_id
+ ):
+ raise LogicError(
+ "configuration error: BYOK vector_db_id conflicts with OKP "
+ f"vector store id '{solr_store_id}'"
+ )
+ byok_mapping[solr_store_id] = okp_id
+
+ return byok_mapping🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/configuration.py` around lines 402 - 405, The combined mapping at the end
of rag_id_mapping currently does {**byok_mapping, **okp_mapping} which silently
lets okp_mapping override BYOK when okp key equals
constants.SOLR_DEFAULT_VECTOR_STORE_ID; change the merge logic in the function
to preserve BYOK entries (e.g., combine by iterating okp_mapping keys and only
insert if the key is not already present in byok_mapping, or merge with byok
taking precedence) so that byok_mapping wins and non-OKP chunks are not
misclassified; update the return that builds the final mapping (refer to
byok_mapping, okp_mapping, constants.SOLR_DEFAULT_VECTOR_STORE_ID) accordingly
and add a short comment explaining the precedence choice.
Description
This is a workaround to the llama-stack issue of unknown source in chunks returned from the RAG tool.
Changes:
rag_id_mappingChanges in
lightspeed-providershere(Continuation of #1135, #1208, #1248, #1300)
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
NA
Related Tickets & Documents
Checklist before requesting a review
Testing
Set up OKP in inline and tool, check that the source attribute is "okp".
Summary by CodeRabbit
Bug Fixes
Tests