Skip to content

LCORE-1402: OKP Index name missing in chunk metadata (tool RAG)#69#1322

Merged
tisnik merged 1 commit intolightspeed-core:mainfrom
are-ces:okp-source-attribute
Mar 15, 2026
Merged

LCORE-1402: OKP Index name missing in chunk metadata (tool RAG)#69#1322
tisnik merged 1 commit intolightspeed-core:mainfrom
are-ces:okp-source-attribute

Conversation

@are-ces
Copy link
Contributor

@are-ces are-ces commented Mar 15, 2026

Description

This is a workaround to the llama-stack issue of unknown source in chunks returned from the RAG tool.

Changes:

  • Added okp mapping to rag_id_mapping

Changes in lightspeed-providers here

(Continuation of #1135, #1208, #1248, #1300)

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

NA

Related Tickets & Documents

  • Related Issue # LCORE-1402
  • Closes # LCORE-1402

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

Set up OKP in inline and tool, check that the source attribute is "okp".

Summary by CodeRabbit

  • Bug Fixes

    • RAG configuration now properly merges vector store ID mappings from multiple sources when additional backends are configured, improving system flexibility across different search infrastructure setups.
  • Tests

    • Added comprehensive test coverage for RAG configuration mapping behavior across all supported backend combinations and configuration states.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 15, 2026

Walkthrough

The PR extends the RAG ID mapping functionality to support both BYOK and OKP configurations. The rag_id_mapping property now merges vector store ID mappings from two sources: existing BYOK mappings and OKP-derived mappings when OKP is enabled. OKP enablement is determined by checking for OKP_RAG_ID in RAG inline or tool configurations.

Changes

Cohort / File(s) Summary
RAG ID Mapping Logic
src/configuration.py
Modified rag_id_mapping property to merge BYOK and OKP-derived mappings. When OKP is enabled (OKP_RAG_ID present in rag.inline or rag.tool), the returned mapping includes entries from both sources using vector_db_id and SOLR_DEFAULT_VECTOR_STORE_ID as keys. Updated docstring to reference both BYOK and OKP patterns.
Test Coverage
tests/unit/test_configuration.py
Added comprehensive test cases for rag_id_mapping behavior: cases for OKP exclusion when not configured, OKP inclusion when present in inline/tool, and combined BYOK+OKP scenarios. Imported Generator from collections.abc and reorganized test structure to validate new mapping logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly references the issue (LCORE-1402) and the core change: adding OKP index name to chunk metadata for tool RAG, which aligns with the PR's objective of fixing missing source attributes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/unit/test_configuration.py (1)

1035-1068: Add a regression test for SOLR_DEFAULT_VECTOR_STORE_ID collision.

Given the merged mapping behavior, it’s worth adding one case where BYOK uses constants.SOLR_DEFAULT_VECTOR_STORE_ID to lock expected behavior (error or deterministic precedence) and avoid future source mislabeling regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_configuration.py` around lines 1035 - 1068, Add a regression
test that verifies a collision when a BYOK entry uses
constants.SOLR_DEFAULT_VECTOR_STORE_ID; specifically, call
AppConfig().init_from_dict with a byok_rag entry whose vector_db_id is
constants.SOLR_DEFAULT_VECTOR_STORE_ID and assert that AppConfig.init_from_dict
raises a ValueError (or whichever deterministic failure your config validation
uses) to lock the expected behavior and prevent silent overwrites of
constants.SOLR_DEFAULT_VECTOR_STORE_ID in cfg.rag_id_mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/configuration.py`:
- Around line 402-405: The combined mapping at the end of rag_id_mapping
currently does {**byok_mapping, **okp_mapping} which silently lets okp_mapping
override BYOK when okp key equals constants.SOLR_DEFAULT_VECTOR_STORE_ID; change
the merge logic in the function to preserve BYOK entries (e.g., combine by
iterating okp_mapping keys and only insert if the key is not already present in
byok_mapping, or merge with byok taking precedence) so that byok_mapping wins
and non-OKP chunks are not misclassified; update the return that builds the
final mapping (refer to byok_mapping, okp_mapping,
constants.SOLR_DEFAULT_VECTOR_STORE_ID) accordingly and add a short comment
explaining the precedence choice.

---

Nitpick comments:
In `@tests/unit/test_configuration.py`:
- Around line 1035-1068: Add a regression test that verifies a collision when a
BYOK entry uses constants.SOLR_DEFAULT_VECTOR_STORE_ID; specifically, call
AppConfig().init_from_dict with a byok_rag entry whose vector_db_id is
constants.SOLR_DEFAULT_VECTOR_STORE_ID and assert that AppConfig.init_from_dict
raises a ValueError (or whichever deterministic failure your config validation
uses) to lock the expected behavior and prevent silent overwrites of
constants.SOLR_DEFAULT_VECTOR_STORE_ID in cfg.rag_id_mapping.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4a8083c3-295f-49f4-80ef-8ca7a8cbf469

📥 Commits

Reviewing files that changed from the base of the PR and between 89cac0f and e3d0931.

📒 Files selected for processing (2)
  • src/configuration.py
  • tests/unit/test_configuration.py

Comment on lines +402 to +405
okp_mapping = (
{constants.SOLR_DEFAULT_VECTOR_STORE_ID: okp_id} if okp_enabled else {}
)
return {**byok_mapping, **okp_mapping}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent silent BYOK/OKP key collision override in rag_id_mapping.

At Line 405, {**byok_mapping, **okp_mapping} silently overwrites BYOK when a BYOK vector_db_id equals constants.SOLR_DEFAULT_VECTOR_STORE_ID. That can return the wrong source ("okp") for non-OKP chunks.

💡 Proposed fix
         rag = self._configuration.rag
         okp_id = constants.OKP_RAG_ID
         okp_enabled = okp_id in (rag.inline or []) or okp_id in (rag.tool or [])
-        okp_mapping = (
-            {constants.SOLR_DEFAULT_VECTOR_STORE_ID: okp_id} if okp_enabled else {}
-        )
-        return {**byok_mapping, **okp_mapping}
+        if okp_enabled:
+            solr_store_id = constants.SOLR_DEFAULT_VECTOR_STORE_ID
+            if (
+                solr_store_id in byok_mapping
+                and byok_mapping[solr_store_id] != okp_id
+            ):
+                raise LogicError(
+                    "configuration error: BYOK vector_db_id conflicts with OKP "
+                    f"vector store id '{solr_store_id}'"
+                )
+            byok_mapping[solr_store_id] = okp_id
+
+        return byok_mapping
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/configuration.py` around lines 402 - 405, The combined mapping at the end
of rag_id_mapping currently does {**byok_mapping, **okp_mapping} which silently
lets okp_mapping override BYOK when okp key equals
constants.SOLR_DEFAULT_VECTOR_STORE_ID; change the merge logic in the function
to preserve BYOK entries (e.g., combine by iterating okp_mapping keys and only
insert if the key is not already present in byok_mapping, or merge with byok
taking precedence) so that byok_mapping wins and non-OKP chunks are not
misclassified; update the return that builds the final mapping (refer to
byok_mapping, okp_mapping, constants.SOLR_DEFAULT_VECTOR_STORE_ID) accordingly
and add a short comment explaining the precedence choice.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 11832f8 into lightspeed-core:main Mar 15, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants