Skip to content

Conversation

@bsatapat-jpg
Copy link
Contributor

@bsatapat-jpg bsatapat-jpg commented Sep 17, 2025

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • End-of-stream payload now includes RAG chunks and synthesized referenced documents for richer final responses.
    • Final response summary accumulates tool usage and tool-call context for clearer attribution.
  • Improvements

    • Stronger detection and conditional enabling of knowledge-search/tool groups; responses are steered to use knowledge search when available.
    • Safer streaming completion with fallbacks to avoid missing content.
    • Transcripts now store retrieved RAG excerpts for later review.
  • Documentation

    • Added a BYOK guide detailing setup and configuration for using external knowledge stores.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 17, 2025

Walkthrough

The streaming endpoint now builds a TurnSummary while generating responses, aggregates tool calls and RAG context, and emits an enhanced end-of-stream event containing rag_chunks and synthesized referenced_documents; transcripts store serialized rag_chunks when enabled, and retrieve_response robustly detects and composes RAG/toolgroups, optionally forcing RAG usage in prompts.

Changes

Cohort / File(s) Summary of Changes
Streaming endpoint (core changes)
src/app/endpoints/streaming_query.py
Added TurnSummary creation and accumulation during streaming; stream_end_event signature changed to accept summary: TurnSummary and now returns end payload including rag_chunks and referenced_documents. End-of-stream event is yielded with RAG data after chunk processing.
Response generation & transcript wiring
src/app/endpoints/streaming_query.py
response_generator initializes complete_response and TurnSummary, guards missing final content, updates summary.llm_response, accumulates tool_calls from step events, and serializes rag_chunks into rag_chunks_for_transcript when storing transcripts.
RAG/tool detection and prompting
src/app/endpoints/streaming_query.py
retrieve_response enumerates vector DBs and toolgroups more robustly (handles multiple ID attribute names), enables rag_toolgroups only if builtin::rag/knowledge_search present, logs results, falls back gracefully; augments system prompt and rewrites user query to force knowledge_search when applicable; toolgroups composed with MCP servers or None.
Minor refactors & typing
src/app/endpoints/streaming_query.py
Replaced pattern matching with explicit if/elif in stream_build_event; added/adjusted imports for RAGChunk, ReferencedDocument, and AgentTurnResponseStreamChunk type hints.
Documentation (new BYOK guide)
docs/byok_guide.md
Added comprehensive BYOK documentation covering architecture, vector DB setup (FAISS, pgvector), embedding/model config, Llama Stack BYOK integration, RAG tooling examples, YAML config snippets, and troubleshooting.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as Client
    participant S as response_generator
    participant A as Agent/LLM
    participant T as Tools (knowledge_search)
    participant X as Transcript Store

    Note over S: Init TurnSummary & complete_response

    C->>S: Start streaming request
    S->>A: Build prompt & toolgroups (detect RAG)
    alt RAG available
        A->>T: Invoke knowledge_search
        T-->>A: Return rag_chunks & docs
        S->>S: summary.append_tool_calls_from_llama(...)
    else No RAG
        Note over S: Proceed without RAG
    end

    A-->>S: Stream chunks (content / tool_execution / step_complete)
    S->>S: Accumulate content & tool calls into summary
    S-->>C: Streamed chunks

    S->>S: Prepare end-of-stream payload (rag_chunks, referenced_documents)
    opt Transcripts enabled
        S->>X: store_transcript(..., rag_chunks_for_transcript)
    end
    S-->>C: End event with rag_chunks & referenced_documents
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • manstis
  • tisnik
  • umago

Poem

I twitch my whiskers at the stream’s bright end,
RAG chunks gathered, references to lend.
I hop through steps and tool-call trails,
A transcript burrow keeps the details.
Thump-thump—summary snug, the query avails. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Added RAG chunks in streaming_query response" is concise and accurately summarizes the primary change in the PR by indicating RAG chunks were added to the streaming_query response; it is specific to the modified file and clearly communicates the main intent without extraneous detail.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bsatapat-jpg
Copy link
Contributor Author

Need the below mentioned PR to merge first:
#550

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
src/app/endpoints/streaming_query.py (3)

824-856: Harden toolgroup/vector‑DB detection; add fallbacks and avoid contradictory logging.

  • vector_db.identifier may not always exist; add fallbacks to .id/.name.
  • The log says “skipping RAG” then still builds RAG toolgroups; either skip or downgrade to debug and keep the attempt.
-        vector_dbs = await client.vector_dbs.list()
-        vector_db_ids = [vector_db.identifier for vector_db in vector_dbs]
+        vector_dbs = await client.vector_dbs.list()
+        vector_db_ids: list[str] = []
+        for vdb in vector_dbs:
+            if hasattr(vdb, "identifier"):
+                vector_db_ids.append(getattr(vdb, "identifier"))
+            elif hasattr(vdb, "id"):
+                vector_db_ids.append(getattr(vdb, "id"))
+            elif hasattr(vdb, "name"):
+                vector_db_ids.append(getattr(vdb, "name"))
@@
-                if "builtin::rag" in available_toolgroup_ids:
-                    rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
-                else:
-                    logger.warning("builtin::rag tool group not available, skipping RAG functionality")
-                    # Still try to create RAG toolgroups as they might work anyway
-                    rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
+                if "builtin::rag" in available_toolgroup_ids:
+                    rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
+                else:
+                    logger.debug("builtin::rag tool group not listed; attempting best‑effort RAG toolgroups")
+                    rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
@@
-            except Exception as e:
-                logger.warning("Failed to check available toolgroups, skipping RAG: %s", e)
-                # Still try to create RAG toolgroups as they might work anyway
-                rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
+            except Exception as e:
+                logger.debug("Failed to list toolgroups, proceeding best‑effort with RAG: %s", e)
+                rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []

865-868: Make knowledge_search detection robust (don’t rely on str(tg)).

Current check may miss true positives or yield false positives. Inspect typical attributes.

-    if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
+    def _has_knowledge_search(tg: object) -> bool:
+        name = getattr(tg, "name", "")
+        ident = getattr(tg, "identifier", "")
+        tools = getattr(tg, "tools", None)
+        if "builtin::rag" in ident or "knowledge_search" in name:
+            return True
+        if tools and any(getattr(t, "tool_name", getattr(t, "name", "")) == "knowledge_search" for t in tools):
+            return True
+        return "knowledge_search" in str(tg)
+
+    if toolgroups and any(_has_knowledge_search(tg) for tg in toolgroups):
         system_prompt += "\n\nIMPORTANT: When answering questions, you MUST use the knowledge_search tool to find the most accurate and up-to-date information from the knowledge base. Always search for relevant information before providing your answer."
         logger.info("Enhanced system prompt to encourage RAG tool usage")

631-631: Dead fallback variable (complete_response)—either populate or drop.

You never build complete_response; fallback won’t help if turn.output_message is missing. Either collect tokens as you stream, or remove the fallback.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae0c2e6 and 584f692.

📒 Files selected for processing (1)
  • src/app/endpoints/streaming_query.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/app/endpoints/streaming_query.py (6)
src/models/requests.py (2)
  • QueryRequest (72-222)
  • get_documents (197-204)
src/utils/endpoints.py (5)
  • check_configuration_loaded (62-68)
  • get_agent (138-187)
  • get_system_prompt (71-111)
  • validate_conversation_ownership (39-59)
  • validate_model_provider_override (114-134)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • append_tool_calls_from_llama (65-78)
src/metrics/utils.py (1)
  • update_llm_token_count_from_turn (60-77)
src/app/endpoints/query.py (2)
  • evaluate_model_hints (110-149)
  • get_rag_toolgroups (573-600)
src/utils/transcripts.py (1)
  • store_transcript (33-86)
🔇 Additional comments (1)
src/app/endpoints/streaming_query.py (1)

146-147: API contract change: end event now includes rag_chunks and referenced_documents.

Confirm downstream consumers (web/client) tolerate the new fields and the referenced_documents shape (possibly doc_url=None). If strict schemas exist, bump version or behind a flag.

Comment on lines +110 to +118
# Process RAG chunks
rag_chunks = [
{
"content": chunk.content,
"source": chunk.source,
"score": chunk.score
}
for chunk in summary.rag_chunks
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Fix: Avoid AttributeError on TurnSummary.rag_chunks and make logging safe.

Code assumes summary.rag_chunks exists, but TurnSummary (utils/types.py) in provided context doesn’t define it. This will crash during streaming end and transcript storage. Also, log lines reference “RAG chunks” counters and will fail if the attribute is missing.

Immediate defensive fix (keeps behavior if/when rag chunks are added):

@@
-    rag_chunks = [
-        {
-            "content": chunk.content,
-            "source": chunk.source,
-            "score": chunk.score
-        }
-        for chunk in summary.rag_chunks
-    ]
+    rag_chunks_src = getattr(summary, "rag_chunks", [])
+    rag_chunks = [
+        {
+            "content": chunk.content,
+            "source": chunk.source,
+            "score": chunk.score,
+        }
+        for chunk in rag_chunks_src
+    ]
@@
-    referenced_docs = []
-    doc_sources = set()
-    for chunk in summary.rag_chunks:
+    referenced_docs = []
+    doc_sources = set()
+    for chunk in rag_chunks_src:
         if chunk.source and chunk.source not in doc_sources:
             doc_sources.add(chunk.source)
             referenced_docs.append({
                 "doc_url": chunk.source if chunk.source.startswith("http") else None,
                 "doc_title": chunk.source.split("/")[-1] if chunk.source else None,
             })
@@
-            logger.info("RAG chunks before: %d", len(summary.rag_chunks))
+            logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", [])))
             summary.append_tool_calls_from_llama(step)
-            logger.info("RAG chunks after: %d", len(summary.rag_chunks))
+            logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", [])))
@@
-                        logger.info("RAG chunks before: %d", len(summary.rag_chunks))
+                        logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", [])))
                         summary.append_tool_calls_from_llama(p.step_details)
-                        logger.info("RAG chunks after: %d", len(summary.rag_chunks))
+                        logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", [])))
@@
-                rag_chunks_for_transcript = [
+                rag_chunks_src = getattr(summary, "rag_chunks", [])
+                rag_chunks_for_transcript = [
                     {
                         "content": chunk.content,
                         "source": chunk.source,
                         "score": chunk.score
                     }
-                    for chunk in summary.rag_chunks
+                    for chunk in rag_chunks_src
                 ]

Follow‑up (preferred): add rag_chunks: list[RAGChunk] = [] to TurnSummary and populate it where knowledge_search responses are parsed, then revert the getattr guards.

Quick check script:

Also, confirm client-side consumers can handle referenced_documents entries with doc_url=None. If not, gate emission or normalize to metadata docs only.

Also applies to: 121-141, 146-147, 632-635, 657-665, 666-672, 682-691


🏁 Script executed:

#!/bin/bash
# Verify whether TurnSummary defines rag_chunks
rg -n -C2 -t python $'class\\s+TurnSummary\\b' src/utils/types.py
rg -n -C2 -t python '\\brag_chunks\\b' src/utils/types.py

Length of output: 61


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "SEARCH: class TurnSummary"
rg -n --hidden -S 'class\s+TurnSummary\b' src || true

echo
echo "SEARCH: rag_chunks occurrences"
rg -n --hidden -S '\brag_chunks\b' src || true

echo
# If TurnSummary found, print its file head
ts_file=$(rg -l --hidden -S 'class\s+TurnSummary\b' src || true)
if [ -n "$ts_file" ]; then
  echo "Found TurnSummary in: $ts_file"
  echo "---- $ts_file (lines 1-400) ----"
  nl -ba "$ts_file" | sed -n '1,400p'
fi

echo
# Print relevant parts of streaming_query.py
sq_path="src/app/endpoints/streaming_query.py"
if [ -f "$sq_path" ]; then
  echo "---- $sq_path (lines 1-240) ----"
  nl -ba "$sq_path" | sed -n '1,240p'
  echo "---- $sq_path (lines 600-760) ----"
  nl -ba "$sq_path" | sed -n '600,760p' || true
else
  echo "$sq_path not found; searching for streaming_query.py"
  rg -n --hidden -S 'streaming_query.py' src || true
fi

Length of output: 1546


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- src/utils/types.py (1-240) ----"
sed -n '1,240p' src/utils/types.py || true

echo
echo "---- src/utils/transcripts.py (1-240) ----"
sed -n '1,240p' src/utils/transcripts.py || true

echo
echo "---- src/app/endpoints/streaming_query.py (1-260) ----"
sed -n '1,260p' src/app/endpoints/streaming_query.py || true

echo
echo "---- src/app/endpoints/streaming_query.py (640-720) ----"
sed -n '640,720p' src/app/endpoints/streaming_query.py || true

Length of output: 18546


Fix: Guard summary.rag_chunks access (TurnSummary missing rag_chunks) — critical runtime bug

TurnSummary (src/utils/types.py) does not define rag_chunks; streaming_query.py reads summary.rag_chunks in multiple places and will raise AttributeError at runtime. Apply the defensive guards below (or add rag_chunks to TurnSummary and populate it) and ensure referenced_documents with doc_url=None are acceptable to clients.

@@
-    rag_chunks = [
-        {
-            "content": chunk.content,
-            "source": chunk.source,
-            "score": chunk.score
-        }
-        for chunk in summary.rag_chunks
-    ]
+    rag_chunks_src = getattr(summary, "rag_chunks", [])
+    rag_chunks = [
+        {
+            "content": chunk.content,
+            "source": chunk.source,
+            "score": chunk.score,
+        }
+        for chunk in rag_chunks_src
+    ]
@@
-    referenced_docs = []
-    doc_sources = set()
-    for chunk in summary.rag_chunks:
+    referenced_docs = []
+    doc_sources = set()
+    for chunk in rag_chunks_src:
         if chunk.source and chunk.source not in doc_sources:
             doc_sources.add(chunk.source)
             referenced_docs.append({
                 "doc_url": chunk.source if chunk.source.startswith("http") else None,
                 "doc_title": chunk.source.split("/")[-1] if chunk.source else None,
             })
@@
-            logger.info("RAG chunks before: %d", len(summary.rag_chunks))
+            logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", [])))
             summary.append_tool_calls_from_llama(step)
-            logger.info("RAG chunks after: %d", len(summary.rag_chunks))
+            logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", [])))
@@
-                        logger.info("RAG chunks before: %d", len(summary.rag_chunks))
+                        logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", [])))
                         summary.append_tool_calls_from_llama(p.step_details)
-                        logger.info("RAG chunks after: %d", len(summary.rag_chunks))
+                        logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", [])))
@@
-                rag_chunks_for_transcript = [
+                rag_chunks_src = getattr(summary, "rag_chunks", [])
+                rag_chunks_for_transcript = [
                     {
                         "content": chunk.content,
                         "source": chunk.source,
                         "score": chunk.score
                     }
-                    for chunk in summary.rag_chunks
+                    for chunk in rag_chunks_src
                 ]

Follow-up (preferred): add rag_chunks: list[RAGChunk] = [] to TurnSummary (src/utils/types.py) and populate at knowledge_search parsing, then remove the getattr guards.

Affected: src/app/endpoints/streaming_query.py (multiple sites), src/utils/types.py (class TurnSummary), src/utils/transcripts.py (store_transcript signature), src/app/endpoints/query.py (passes rag_chunks=[]).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Process RAG chunks
rag_chunks = [
{
"content": chunk.content,
"source": chunk.source,
"score": chunk.score
}
for chunk in summary.rag_chunks
]
# Process RAG chunks
rag_chunks_src = getattr(summary, "rag_chunks", [])
rag_chunks = [
{
"content": chunk.content,
"source": chunk.source,
"score": chunk.score,
}
for chunk in rag_chunks_src
]
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 110 to 118, the code
assumes summary.rag_chunks exists which causes AttributeError when TurnSummary
lacks that attribute; update the code to read rag_chunks via a safe getter
(e.g., rag_chunks = getattr(summary, "rag_chunks", [])) and ensure any entries
with referenced_documents that have doc_url=None are handled/filtered or
formatted so clients accept them; as a follow-up (preferred) add rag_chunks:
list[RAGChunk] = [] to TurnSummary in src/utils/types.py, populate it in the
knowledge_search parsing and update signatures in src/utils/transcripts.py and
any call sites (e.g., src/app/endpoints/query.py) so you can remove the getattr
guards.

Comment on lines +865 to +868
if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
system_prompt += "\n\nIMPORTANT: When answering questions, you MUST use the knowledge_search tool to find the most accurate and up-to-date information from the knowledge base. Always search for relevant information before providing your answer."
logger.info("Enhanced system prompt to encourage RAG tool usage")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

System prompt enhancement is currently ignored; compute toolgroups first, then call get_agent with the enhanced prompt.

You build toolgroups and then mutate system_prompt after the agent has already been created, so the enhanced prompt never reaches the session.

Minimal refactor: move the get_agent(...) call to after toolgroup detection and prompt augmentation, right before create_turn(...).

@@
-    agent, conversation_id, session_id = await get_agent(
-        client,
-        model_id,
-        system_prompt,
-        available_input_shields,
-        available_output_shields,
-        query_request.conversation_id,
-        query_request.no_tools or False,
-    )
@@
-    # Enhance system prompt to encourage tool usage when RAG is available
-    if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
+    # Enhance system prompt to encourage tool usage when RAG is available
+    if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
         system_prompt += "\n\nIMPORTANT: When answering questions, you MUST use the knowledge_search tool to find the most accurate and up-to-date information from the knowledge base. Always search for relevant information before providing your answer."
         logger.info("Enhanced system prompt to encourage RAG tool usage")
@@
-    response = await agent.create_turn(
+    # Create the agent after finalizing system_prompt/toolgroups
+    agent, conversation_id, session_id = await get_agent(
+        client,
+        model_id,
+        system_prompt,
+        available_input_shields,
+        available_output_shields,
+        query_request.conversation_id,
+        query_request.no_tools or False,
+    )
+    response = await agent.create_turn(
         messages=[UserMessage(role="user", content=user_query)],
         session_id=session_id,
         documents=query_request.get_documents(),
         stream=True,
         toolgroups=toolgroups,
     )

If API supports overriding the system prompt per turn, that would be even cleaner; otherwise, reordering is required.

Also applies to: 787-795, 875-881

Comment on lines +870 to +874
user_query = query_request.query
if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
user_query = f"Please use the knowledge_search tool to find relevant information about: {query_request.query}"
logger.info("Modified query to force RAG usage: %s", user_query)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Don’t rewrite the user’s query to “force” RAG; rely on system prompt or tool routing.

Mutating the user’s text can degrade UX, leak implementation details into transcripts, and alter semantics. The system prompt (or tool routing) is the right lever.

-    # Force RAG usage by modifying the query when toolgroups are available
-    user_query = query_request.query
-    if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
-        user_query = f"Please use the knowledge_search tool to find relevant information about: {query_request.query}"
-        logger.info("Modified query to force RAG usage: %s", user_query)
+    # Keep the original user query; rely on the enhanced system prompt/tool routing
+    user_query = query_request.query
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
user_query = query_request.query
if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
user_query = f"Please use the knowledge_search tool to find relevant information about: {query_request.query}"
logger.info("Modified query to force RAG usage: %s", user_query)
# Keep the original user query; rely on the enhanced system prompt/tool routing
user_query = query_request.query

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
docs/byok_guide.md (3)

66-67: Fix markdownlint MD034: convert bare URL to a markdown link

Replace the bare URL with a proper link.

-  - Repository: https://github.com/lightspeed-core/rag-content
+  - Repository: [rag-content repository](https://github.com/lightspeed-core/rag-content)

91-93: Fix markdownlint MD034 and phrasing

Avoid the bare URL and improve grammar.

-Use the `rag-content` tool to create a compatible vector database:
-Please refer https://github.com/lightspeed-core/rag-content to create your vector database
+Use the `rag-content` tool to create a compatible vector database.
+Refer to the [rag-content repository](https://github.com/lightspeed-core/rag-content) to create your vector database.

98-102: Terminology consistency: “Llama Stack” naming

Minor phrasing inconsistency (“Llama-Stack Faiss Vector-IO”, “Llama Stack”). Standardize to “Llama Stack” for clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 584f692 and 5197e69.

📒 Files selected for processing (1)
  • docs/byok_guide.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/byok_guide.md

66-66: Bare URL used

(MD034, no-bare-urls)


92-92: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: e2e_tests
🔇 Additional comments (2)
docs/byok_guide.md (2)

1-23: Well-scoped, practical BYOK guide

Good structure, clear ToC, and actionable examples. Nice addition.


169-188: Path naming consistency (ollama directories)

Examples store agent/response SQLite DBs under “.../ollama/...”, even when using OpenAI/vLLM. If this is a generic default path, ignore; otherwise, consider neutral paths to avoid confusion.

Comment on lines +145 to +201
providers:
inference:
# Embedding model provider
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}

# LLM provider (example: OpenAI)
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}

agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: sqlite
db_path: .llama/distributions/ollama/agents_store.db
responses_store:
type: sqlite
db_path: .llama/distributions/ollama/responses_store.db

safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []

# Vector database configuration
vector_io:
- provider_id: your-knowledge-base
provider_type: inline::faiss # or remote::pgvector
config:
kvstore:
type: sqlite
db_path: /path/to/vector_db/faiss_store.db
namespace: null

tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}

# Enable RAG tools
tool_groups:
- provider_id: rag-runtime
toolgroup_id: builtin::rag

# Vector database configuration
vector_dbs:
- embedding_dimension: 768
embedding_model: sentence-transformers/all-mpnet-base-v2
provider_id: your-knowledge-base
vector_db_id: your-index-id # ID used during index generation
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

YAML indentation bug: provider lists not nested under category keys

The lists under providers (inference, agents, safety, vector_io, tool_runtime) are mis-indented, making the YAML invalid. Indent list items under their category keys.

 providers:
-  inference:
-  # Embedding model provider
-  - provider_id: sentence-transformers
-    provider_type: inline::sentence-transformers
-    config: {}
-
-  # LLM provider (example: OpenAI)
-  - provider_id: openai
-    provider_type: remote::openai
-    config:
-      api_key: ${env.OPENAI_API_KEY}
-
-  agents:
-  - provider_id: meta-reference
-    provider_type: inline::meta-reference
-    config:
-      persistence_store:
-        type: sqlite
-        db_path: .llama/distributions/ollama/agents_store.db
-      responses_store:
-        type: sqlite
-        db_path: .llama/distributions/ollama/responses_store.db
-
-  safety:
-  - provider_id: llama-guard
-    provider_type: inline::llama-guard
-    config:
-      excluded_categories: []
-
-  # Vector database configuration
-  vector_io:
-  - provider_id: your-knowledge-base
-    provider_type: inline::faiss  # or remote::pgvector
-    config:
-      kvstore:
-        type: sqlite
-        db_path: /path/to/vector_db/faiss_store.db
-        namespace: null
-
-  tool_runtime:
-  - provider_id: rag-runtime
-    provider_type: inline::rag-runtime
-    config: {}
+  inference:
+    # Embedding model provider
+    - provider_id: sentence-transformers
+      provider_type: inline::sentence-transformers
+      config: {}
+
+    # LLM provider (example: OpenAI)
+    - provider_id: openai
+      provider_type: remote::openai
+      config:
+        api_key: ${env.OPENAI_API_KEY}
+
+  agents:
+    - provider_id: meta-reference
+      provider_type: inline::meta-reference
+      config:
+        persistence_store:
+          type: sqlite
+          db_path: .llama/distributions/ollama/agents_store.db
+        responses_store:
+          type: sqlite
+          db_path: .llama/distributions/ollama/responses_store.db
+
+  safety:
+    - provider_id: llama-guard
+      provider_type: inline::llama-guard
+      config:
+        excluded_categories: []
+
+  # Vector database configuration
+  vector_io:
+    - provider_id: your-knowledge-base
+      provider_type: inline::faiss  # or remote::pgvector
+      config:
+        kvstore:
+          type: sqlite
+          db_path: /path/to/vector_db/faiss_store.db
+          namespace: null
+
+  tool_runtime:
+    - provider_id: rag-runtime
+      provider_type: inline::rag-runtime
+      config: {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
providers:
inference:
# Embedding model provider
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
# LLM provider (example: OpenAI)
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: sqlite
db_path: .llama/distributions/ollama/agents_store.db
responses_store:
type: sqlite
db_path: .llama/distributions/ollama/responses_store.db
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
# Vector database configuration
vector_io:
- provider_id: your-knowledge-base
provider_type: inline::faiss # or remote::pgvector
config:
kvstore:
type: sqlite
db_path: /path/to/vector_db/faiss_store.db
namespace: null
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
# Enable RAG tools
tool_groups:
- provider_id: rag-runtime
toolgroup_id: builtin::rag
# Vector database configuration
vector_dbs:
- embedding_dimension: 768
embedding_model: sentence-transformers/all-mpnet-base-v2
provider_id: your-knowledge-base
vector_db_id: your-index-id # ID used during index generation
```
providers:
inference:
# Embedding model provider
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
# LLM provider (example: OpenAI)
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: sqlite
db_path: .llama/distributions/ollama/agents_store.db
responses_store:
type: sqlite
db_path: .llama/distributions/ollama/responses_store.db
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
# Vector database configuration
vector_io:
- provider_id: your-knowledge-base
provider_type: inline::faiss # or remote::pgvector
config:
kvstore:
type: sqlite
db_path: /path/to/vector_db/faiss_store.db
namespace: null
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
# Enable RAG tools
tool_groups:
- provider_id: rag-runtime
toolgroup_id: builtin::rag
# Vector database configuration
vector_dbs:
- embedding_dimension: 768
embedding_model: sentence-transformers/all-mpnet-base-v2
provider_id: your-knowledge-base
vector_db_id: your-index-id # ID used during index generation
🤖 Prompt for AI Agents
In docs/byok_guide.md around lines 145 to 201 the provider lists are
mis-indented and not nested under their category keys (inference, agents,
safety, vector_io, tool_runtime), which makes the YAML invalid; fix by indenting
each list item (the lines starting with "- provider_id: ...") two spaces further
so they are children of their respective category keys, ensure the comment lines
remain aligned, and validate the file with a YAML linter to confirm proper
nesting.

Comment on lines +289 to +329
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: sqlite
db_path: .llama/distributions/ollama/agents_store.db
responses_store:
type: sqlite
db_path: .llama/distributions/ollama/responses_store.db
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: company-docs
provider_type: inline::faiss
config:
kvstore:
type: sqlite
db_path: /home/user/vector_dbs/company_docs/faiss_store.db
namespace: null
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

YAML indentation bug in Example 1 providers block

Same nesting issue as above; indent lists under each provider category.

 providers:
-  inference:
-  - provider_id: sentence-transformers
-    provider_type: inline::sentence-transformers
-    config: {}
-  - provider_id: openai
-    provider_type: remote::openai
-    config:
-      api_key: ${env.OPENAI_API_KEY}
-
-  agents:
-  - provider_id: meta-reference
-    provider_type: inline::meta-reference
-    config:
-      persistence_store:
-        type: sqlite
-        db_path: .llama/distributions/ollama/agents_store.db
-      responses_store:
-        type: sqlite
-        db_path: .llama/distributions/ollama/responses_store.db
-
-  safety:
-  - provider_id: llama-guard
-    provider_type: inline::llama-guard
-    config:
-      excluded_categories: []
-
-  vector_io:
-  - provider_id: company-docs
-    provider_type: inline::faiss
-    config:
-      kvstore:
-        type: sqlite
-        db_path: /home/user/vector_dbs/company_docs/faiss_store.db
-        namespace: null
-
-  tool_runtime:
-  - provider_id: rag-runtime
-    provider_type: inline::rag-runtime
-    config: {}
+  inference:
+    - provider_id: sentence-transformers
+      provider_type: inline::sentence-transformers
+      config: {}
+    - provider_id: openai
+      provider_type: remote::openai
+      config:
+        api_key: ${env.OPENAI_API_KEY}
+
+  agents:
+    - provider_id: meta-reference
+      provider_type: inline::meta-reference
+      config:
+        persistence_store:
+          type: sqlite
+          db_path: .llama/distributions/ollama/agents_store.db
+        responses_store:
+          type: sqlite
+          db_path: .llama/distributions/ollama/responses_store.db
+
+  safety:
+    - provider_id: llama-guard
+      provider_type: inline::llama-guard
+      config:
+        excluded_categories: []
+
+  vector_io:
+    - provider_id: company-docs
+      provider_type: inline::faiss
+      config:
+        kvstore:
+          type: sqlite
+          db_path: /home/user/vector_dbs/company_docs/faiss_store.db
+          namespace: null
+
+  tool_runtime:
+    - provider_id: rag-runtime
+      provider_type: inline::rag-runtime
+      config: {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: sqlite
db_path: .llama/distributions/ollama/agents_store.db
responses_store:
type: sqlite
db_path: .llama/distributions/ollama/responses_store.db
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: company-docs
provider_type: inline::faiss
config:
kvstore:
type: sqlite
db_path: /home/user/vector_dbs/company_docs/faiss_store.db
namespace: null
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: sqlite
db_path: .llama/distributions/ollama/agents_store.db
responses_store:
type: sqlite
db_path: .llama/distributions/ollama/responses_store.db
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: company-docs
provider_type: inline::faiss
config:
kvstore:
type: sqlite
db_path: /home/user/vector_dbs/company_docs/faiss_store.db
namespace: null
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
🤖 Prompt for AI Agents
In docs/byok_guide.md around lines 289 to 329, the providers block has incorrect
YAML nesting: the provider category keys (inference, agents, safety, vector_io,
tool_runtime) need their list items indented under them rather than at the same
level; fix by moving each provider list (the lines starting with "-
provider_id:") to be nested under the respective category key with consistent
indentation (e.g., two spaces per level) so the lists are children of
inference/agents/etc., and optionally run a YAML linter to verify validity.

Comment on lines +368 to +413
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: vllm
provider_type: remote::vllm
config:
url: http://localhost:8000/v1/
api_token: your-token-here
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: sqlite
db_path: .llama/distributions/ollama/agents_store.db
responses_store:
type: sqlite
db_path: .llama/distributions/ollama/responses_store.db
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: enterprise-knowledge
provider_type: remote::pgvector
config:
host: postgres.company.com
port: 5432
db: enterprise_kb
user: rag_user
password: ${env.POSTGRES_PASSWORD}
kvstore:
type: sqlite
db_path: .llama/distributions/pgvector/registry.db
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

YAML indentation bug in Example 2 providers block

Indent provider lists under their category keys.

 providers:
-  inference:
-  - provider_id: sentence-transformers
-    provider_type: inline::sentence-transformers
-    config: {}
-  - provider_id: vllm
-    provider_type: remote::vllm
-    config:
-      url: http://localhost:8000/v1/
-      api_token: your-token-here
-
-  agents:
-  - provider_id: meta-reference
-    provider_type: inline::meta-reference
-    config:
-      persistence_store:
-        type: sqlite
-        db_path: .llama/distributions/ollama/agents_store.db
-      responses_store:
-        type: sqlite
-        db_path: .llama/distributions/ollama/responses_store.db
-
-  safety:
-  - provider_id: llama-guard
-    provider_type: inline::llama-guard
-    config:
-      excluded_categories: []
-
-  vector_io:
-  - provider_id: enterprise-knowledge
-    provider_type: remote::pgvector
-    config:
-      host: postgres.company.com
-      port: 5432
-      db: enterprise_kb
-      user: rag_user
-      password: ${env.POSTGRES_PASSWORD}
-      kvstore:
-        type: sqlite
-        db_path: .llama/distributions/pgvector/registry.db
-
-  tool_runtime:
-  - provider_id: rag-runtime
-    provider_type: inline::rag-runtime
-    config: {}
+  inference:
+    - provider_id: sentence-transformers
+      provider_type: inline::sentence-transformers
+      config: {}
+    - provider_id: vllm
+      provider_type: remote::vllm
+      config:
+        url: http://localhost:8000/v1/
+        api_token: your-token-here
+
+  agents:
+    - provider_id: meta-reference
+      provider_type: inline::meta-reference
+      config:
+        persistence_store:
+          type: sqlite
+          db_path: .llama/distributions/ollama/agents_store.db
+        responses_store:
+          type: sqlite
+          db_path: .llama/distributions/ollama/responses_store.db
+
+  safety:
+    - provider_id: llama-guard
+      provider_type: inline::llama-guard
+      config:
+        excluded_categories: []
+
+  vector_io:
+    - provider_id: enterprise-knowledge
+      provider_type: remote::pgvector
+      config:
+        host: postgres.company.com
+        port: 5432
+        db: enterprise_kb
+        user: rag_user
+        password: ${env.POSTGRES_PASSWORD}
+        kvstore:
+          type: sqlite
+          db_path: .llama/distributions/pgvector/registry.db
+
+  tool_runtime:
+    - provider_id: rag-runtime
+      provider_type: inline::rag-runtime
+      config: {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: vllm
provider_type: remote::vllm
config:
url: http://localhost:8000/v1/
api_token: your-token-here
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: sqlite
db_path: .llama/distributions/ollama/agents_store.db
responses_store:
type: sqlite
db_path: .llama/distributions/ollama/responses_store.db
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: enterprise-knowledge
provider_type: remote::pgvector
config:
host: postgres.company.com
port: 5432
db: enterprise_kb
user: rag_user
password: ${env.POSTGRES_PASSWORD}
kvstore:
type: sqlite
db_path: .llama/distributions/pgvector/registry.db
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: vllm
provider_type: remote::vllm
config:
url: http://localhost:8000/v1/
api_token: your-token-here
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: sqlite
db_path: .llama/distributions/ollama/agents_store.db
responses_store:
type: sqlite
db_path: .llama/distributions/ollama/responses_store.db
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: enterprise-knowledge
provider_type: remote::pgvector
config:
host: postgres.company.com
port: 5432
db: enterprise_kb
user: rag_user
password: ${env.POSTGRES_PASSWORD}
kvstore:
type: sqlite
db_path: .llama/distributions/pgvector/registry.db
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
🤖 Prompt for AI Agents
In docs/byok_guide.md around lines 368 to 413 the provider list items are not
correctly indented under their category keys (inference, agents, safety,
vector_io, tool_runtime); fix by indenting each provider list (the lines
starting with "- provider_id: ...") two additional spaces so they are nested
under the corresponding category key, and ensure all nested config mappings
(provider_type, config, and nested config fields like persistence_store/kvstore)
are consistently indented to maintain valid YAML structure.

@bsatapat-jpg bsatapat-jpg deleted the dev branch September 25, 2025 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant