Added RAG chunks in streaming_query response #558

bsatapat-jpg · 2025-09-17T18:14:29Z

Description

Type of change

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- End-of-stream payload now includes RAG chunks and synthesized referenced documents for richer final responses.
- Final response summary accumulates tool usage and tool-call context for clearer attribution.
Improvements
- Stronger detection and conditional enabling of knowledge-search/tool groups; responses are steered to use knowledge search when available.
- Safer streaming completion with fallbacks to avoid missing content.
- Transcripts now store retrieved RAG excerpts for later review.
Documentation
- Added a BYOK guide detailing setup and configuration for using external knowledge stores.

coderabbitai · 2025-09-17T18:14:37Z

Walkthrough

The streaming endpoint now builds a TurnSummary while generating responses, aggregates tool calls and RAG context, and emits an enhanced end-of-stream event containing rag_chunks and synthesized referenced_documents; transcripts store serialized rag_chunks when enabled, and retrieve_response robustly detects and composes RAG/toolgroups, optionally forcing RAG usage in prompts.

Changes

Cohort / File(s)	Summary of Changes
Streaming endpoint (core changes) `src/app/endpoints/streaming_query.py`	Added TurnSummary creation and accumulation during streaming; `stream_end_event` signature changed to accept `summary: TurnSummary` and now returns end payload including `rag_chunks` and `referenced_documents`. End-of-stream event is yielded with RAG data after chunk processing.
Response generation & transcript wiring `src/app/endpoints/streaming_query.py`	`response_generator` initializes `complete_response` and `TurnSummary`, guards missing final content, updates `summary.llm_response`, accumulates tool_calls from step events, and serializes `rag_chunks` into `rag_chunks_for_transcript` when storing transcripts.
RAG/tool detection and prompting `src/app/endpoints/streaming_query.py`	`retrieve_response` enumerates vector DBs and toolgroups more robustly (handles multiple ID attribute names), enables rag_toolgroups only if builtin::rag/knowledge_search present, logs results, falls back gracefully; augments system prompt and rewrites user query to force knowledge_search when applicable; toolgroups composed with MCP servers or `None`.
Minor refactors & typing `src/app/endpoints/streaming_query.py`	Replaced pattern matching with explicit if/elif in `stream_build_event`; added/adjusted imports for `RAGChunk`, `ReferencedDocument`, and `AgentTurnResponseStreamChunk` type hints.
Documentation (new BYOK guide) `docs/byok_guide.md`	Added comprehensive BYOK documentation covering architecture, vector DB setup (FAISS, pgvector), embedding/model config, Llama Stack BYOK integration, RAG tooling examples, YAML config snippets, and troubleshooting.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as Client
    participant S as response_generator
    participant A as Agent/LLM
    participant T as Tools (knowledge_search)
    participant X as Transcript Store

    Note over S: Init TurnSummary & complete_response

    C->>S: Start streaming request
    S->>A: Build prompt & toolgroups (detect RAG)
    alt RAG available
        A->>T: Invoke knowledge_search
        T-->>A: Return rag_chunks & docs
        S->>S: summary.append_tool_calls_from_llama(...)
    else No RAG
        Note over S: Proceed without RAG
    end

    A-->>S: Stream chunks (content / tool_execution / step_complete)
    S->>S: Accumulate content & tool calls into summary
    S-->>C: Streamed chunks

    S->>S: Prepare end-of-stream payload (rag_chunks, referenced_documents)
    opt Transcripts enabled
        S->>X: store_transcript(..., rag_chunks_for_transcript)
    end
    S-->>C: End event with rag_chunks & referenced_documents

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

LCORE-385: Restructure tool_call responses #350 — Modifies streaming_query.py streaming response construction and tool/step accumulation (highly related).
Streaming improvements #254 — Refactors streaming behavior and event emission in streaming_query.py (directly related).
Add tool calls to stored transcripts #404 — Wires TurnSummary/transcript storage and RAG/tool chunk handling into transcript flow (directly related).

Suggested reviewers

manstis
tisnik
umago

Poem

I twitch my whiskers at the stream’s bright end,
RAG chunks gathered, references to lend.
I hop through steps and tool-call trails,
A transcript burrow keeps the details.
Thump-thump—summary snug, the query avails. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Added RAG chunks in streaming_query response" is concise and accurately summarizes the primary change in the PR by indicating RAG chunks were added to the streaming_query response; it is specific to the modified file and clearly communicates the main intent without extraneous detail.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bsatapat-jpg · 2025-09-17T18:15:35Z

Need the below mentioned PR to merge first:
#550

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

src/app/endpoints/streaming_query.py (3)

824-856: Harden toolgroup/vector‑DB detection; add fallbacks and avoid contradictory logging.

vector_db.identifier may not always exist; add fallbacks to .id/.name.
The log says “skipping RAG” then still builds RAG toolgroups; either skip or downgrade to debug and keep the attempt.

-        vector_dbs = await client.vector_dbs.list()
-        vector_db_ids = [vector_db.identifier for vector_db in vector_dbs]
+        vector_dbs = await client.vector_dbs.list()
+        vector_db_ids: list[str] = []
+        for vdb in vector_dbs:
+            if hasattr(vdb, "identifier"):
+                vector_db_ids.append(getattr(vdb, "identifier"))
+            elif hasattr(vdb, "id"):
+                vector_db_ids.append(getattr(vdb, "id"))
+            elif hasattr(vdb, "name"):
+                vector_db_ids.append(getattr(vdb, "name"))
@@
-                if "builtin::rag" in available_toolgroup_ids:
-                    rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
-                else:
-                    logger.warning("builtin::rag tool group not available, skipping RAG functionality")
-                    # Still try to create RAG toolgroups as they might work anyway
-                    rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
+                if "builtin::rag" in available_toolgroup_ids:
+                    rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
+                else:
+                    logger.debug("builtin::rag tool group not listed; attempting best‑effort RAG toolgroups")
+                    rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
@@
-            except Exception as e:
-                logger.warning("Failed to check available toolgroups, skipping RAG: %s", e)
-                # Still try to create RAG toolgroups as they might work anyway
-                rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
+            except Exception as e:
+                logger.debug("Failed to list toolgroups, proceeding best‑effort with RAG: %s", e)
+                rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []

865-868: Make knowledge_search detection robust (don’t rely on str(tg)).

Current check may miss true positives or yield false positives. Inspect typical attributes.

-    if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
+    def _has_knowledge_search(tg: object) -> bool:
+        name = getattr(tg, "name", "")
+        ident = getattr(tg, "identifier", "")
+        tools = getattr(tg, "tools", None)
+        if "builtin::rag" in ident or "knowledge_search" in name:
+            return True
+        if tools and any(getattr(t, "tool_name", getattr(t, "name", "")) == "knowledge_search" for t in tools):
+            return True
+        return "knowledge_search" in str(tg)
+
+    if toolgroups and any(_has_knowledge_search(tg) for tg in toolgroups):
         system_prompt += "\n\nIMPORTANT: When answering questions, you MUST use the knowledge_search tool to find the most accurate and up-to-date information from the knowledge base. Always search for relevant information before providing your answer."
         logger.info("Enhanced system prompt to encourage RAG tool usage")

631-631: Dead fallback variable (complete_response)—either populate or drop.

You never build complete_response; fallback won’t help if turn.output_message is missing. Either collect tokens as you stream, or remove the fallback.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae0c2e6 and 584f692.

📒 Files selected for processing (1)

src/app/endpoints/streaming_query.py (9 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/app/endpoints/streaming_query.py (6)

src/models/requests.py (2)

QueryRequest (72-222)

get_documents (197-204)

src/utils/endpoints.py (5)

check_configuration_loaded (62-68)

get_agent (138-187)

get_system_prompt (71-111)

validate_conversation_ownership (39-59)

validate_model_provider_override (114-134)

src/utils/types.py (2)

TurnSummary (59-78)

append_tool_calls_from_llama (65-78)

src/metrics/utils.py (1)

update_llm_token_count_from_turn (60-77)

src/app/endpoints/query.py (2)

evaluate_model_hints (110-149)

get_rag_toolgroups (573-600)

src/utils/transcripts.py (1)

store_transcript (33-86)

🔇 Additional comments (1)

src/app/endpoints/streaming_query.py (1)

146-147: API contract change: end event now includes rag_chunks and referenced_documents.

Confirm downstream consumers (web/client) tolerate the new fields and the referenced_documents shape (possibly doc_url=None). If strict schemas exist, bump version or behind a flag.

coderabbitai · 2025-09-17T18:26:08Z

src/app/endpoints/streaming_query.py

+    # Process RAG chunks
+    rag_chunks = [
+        {
+            "content": chunk.content,
+            "source": chunk.source,
+            "score": chunk.score
+        }
+        for chunk in summary.rag_chunks
+    ]


⚠️ Potential issue

🧩 Analysis chain

Fix: Avoid AttributeError on TurnSummary.rag_chunks and make logging safe.

Code assumes summary.rag_chunks exists, but TurnSummary (utils/types.py) in provided context doesn’t define it. This will crash during streaming end and transcript storage. Also, log lines reference “RAG chunks” counters and will fail if the attribute is missing.

Immediate defensive fix (keeps behavior if/when rag chunks are added):

@@ - rag_chunks = [ - { - "content": chunk.content, - "source": chunk.source, - "score": chunk.score - } - for chunk in summary.rag_chunks - ] + rag_chunks_src = getattr(summary, "rag_chunks", []) + rag_chunks = [ + { + "content": chunk.content, + "source": chunk.source, + "score": chunk.score, + } + for chunk in rag_chunks_src + ] @@ - referenced_docs = [] - doc_sources = set() - for chunk in summary.rag_chunks: + referenced_docs = [] + doc_sources = set() + for chunk in rag_chunks_src: if chunk.source and chunk.source not in doc_sources: doc_sources.add(chunk.source) referenced_docs.append({ "doc_url": chunk.source if chunk.source.startswith("http") else None, "doc_title": chunk.source.split("/")[-1] if chunk.source else None, }) @@ - logger.info("RAG chunks before: %d", len(summary.rag_chunks)) + logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", []))) summary.append_tool_calls_from_llama(step) - logger.info("RAG chunks after: %d", len(summary.rag_chunks)) + logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", []))) @@ - logger.info("RAG chunks before: %d", len(summary.rag_chunks)) + logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", []))) summary.append_tool_calls_from_llama(p.step_details) - logger.info("RAG chunks after: %d", len(summary.rag_chunks)) + logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", []))) @@ - rag_chunks_for_transcript = [ + rag_chunks_src = getattr(summary, "rag_chunks", []) + rag_chunks_for_transcript = [ { "content": chunk.content, "source": chunk.source, "score": chunk.score } - for chunk in summary.rag_chunks + for chunk in rag_chunks_src ]

Follow‑up (preferred): add rag_chunks: list[RAGChunk] = [] to TurnSummary and populate it where knowledge_search responses are parsed, then revert the getattr guards.

Quick check script:

Also, confirm client-side consumers can handle referenced_documents entries with doc_url=None. If not, gate emission or normalize to metadata docs only.

Also applies to: 121-141, 146-147, 632-635, 657-665, 666-672, 682-691

🏁 Script executed:

#!/bin/bash # Verify whether TurnSummary defines rag_chunks rg -n -C2 -t python $'class\\s+TurnSummary\\b' src/utils/types.py rg -n -C2 -t python '\\brag_chunks\\b' src/utils/types.py

Length of output: 61

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "SEARCH: class TurnSummary" rg -n --hidden -S 'class\s+TurnSummary\b' src || true echo echo "SEARCH: rag_chunks occurrences" rg -n --hidden -S '\brag_chunks\b' src || true echo # If TurnSummary found, print its file head ts_file=$(rg -l --hidden -S 'class\s+TurnSummary\b' src || true) if [ -n "$ts_file" ]; then echo "Found TurnSummary in: $ts_file" echo "---- $ts_file (lines 1-400) ----" nl -ba "$ts_file" | sed -n '1,400p' fi echo # Print relevant parts of streaming_query.py sq_path="src/app/endpoints/streaming_query.py" if [ -f "$sq_path" ]; then echo "---- $sq_path (lines 1-240) ----" nl -ba "$sq_path" | sed -n '1,240p' echo "---- $sq_path (lines 600-760) ----" nl -ba "$sq_path" | sed -n '600,760p' || true else echo "$sq_path not found; searching for streaming_query.py" rg -n --hidden -S 'streaming_query.py' src || true fi

Length of output: 1546

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- src/utils/types.py (1-240) ----" sed -n '1,240p' src/utils/types.py || true echo echo "---- src/utils/transcripts.py (1-240) ----" sed -n '1,240p' src/utils/transcripts.py || true echo echo "---- src/app/endpoints/streaming_query.py (1-260) ----" sed -n '1,260p' src/app/endpoints/streaming_query.py || true echo echo "---- src/app/endpoints/streaming_query.py (640-720) ----" sed -n '640,720p' src/app/endpoints/streaming_query.py || true

Length of output: 18546

Fix: Guard summary.rag_chunks access (TurnSummary missing rag_chunks) — critical runtime bug

TurnSummary (src/utils/types.py) does not define rag_chunks; streaming_query.py reads summary.rag_chunks in multiple places and will raise AttributeError at runtime. Apply the defensive guards below (or add rag_chunks to TurnSummary and populate it) and ensure referenced_documents with doc_url=None are acceptable to clients.

@@ - rag_chunks = [ - { - "content": chunk.content, - "source": chunk.source, - "score": chunk.score - } - for chunk in summary.rag_chunks - ] + rag_chunks_src = getattr(summary, "rag_chunks", []) + rag_chunks = [ + { + "content": chunk.content, + "source": chunk.source, + "score": chunk.score, + } + for chunk in rag_chunks_src + ] @@ - referenced_docs = [] - doc_sources = set() - for chunk in summary.rag_chunks: + referenced_docs = [] + doc_sources = set() + for chunk in rag_chunks_src: if chunk.source and chunk.source not in doc_sources: doc_sources.add(chunk.source) referenced_docs.append({ "doc_url": chunk.source if chunk.source.startswith("http") else None, "doc_title": chunk.source.split("/")[-1] if chunk.source else None, }) @@ - logger.info("RAG chunks before: %d", len(summary.rag_chunks)) + logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", []))) summary.append_tool_calls_from_llama(step) - logger.info("RAG chunks after: %d", len(summary.rag_chunks)) + logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", []))) @@ - logger.info("RAG chunks before: %d", len(summary.rag_chunks)) + logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", []))) summary.append_tool_calls_from_llama(p.step_details) - logger.info("RAG chunks after: %d", len(summary.rag_chunks)) + logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", []))) @@ - rag_chunks_for_transcript = [ + rag_chunks_src = getattr(summary, "rag_chunks", []) + rag_chunks_for_transcript = [ { "content": chunk.content, "source": chunk.source, "score": chunk.score } - for chunk in summary.rag_chunks + for chunk in rag_chunks_src ]

Follow-up (preferred): add rag_chunks: list[RAGChunk] = [] to TurnSummary (src/utils/types.py) and populate at knowledge_search parsing, then remove the getattr guards.

Affected: src/app/endpoints/streaming_query.py (multiple sites), src/utils/types.py (class TurnSummary), src/utils/transcripts.py (store_transcript signature), src/app/endpoints/query.py (passes rag_chunks=[]).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Process RAG chunks

rag_chunks = [

{

"content": chunk.content,

"source": chunk.source,

"score": chunk.score

}

for chunk in summary.rag_chunks

]

# Process RAG chunks

rag_chunks_src = getattr(summary, "rag_chunks", [])

rag_chunks = [

{

"content": chunk.content,

"source": chunk.source,

"score": chunk.score,

}

for chunk in rag_chunks_src

]

🤖 Prompt for AI Agents

In src/app/endpoints/streaming_query.py around lines 110 to 118, the code assumes summary.rag_chunks exists which causes AttributeError when TurnSummary lacks that attribute; update the code to read rag_chunks via a safe getter (e.g., rag_chunks = getattr(summary, "rag_chunks", [])) and ensure any entries with referenced_documents that have doc_url=None are handled/filtered or formatted so clients accept them; as a follow-up (preferred) add rag_chunks: list[RAGChunk] = [] to TurnSummary in src/utils/types.py, populate it in the knowledge_search parsing and update signatures in src/utils/transcripts.py and any call sites (e.g., src/app/endpoints/query.py) so you can remove the getattr guards.

coderabbitai · 2025-09-17T18:26:09Z

src/app/endpoints/streaming_query.py

+    if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
+        system_prompt += "\n\nIMPORTANT: When answering questions, you MUST use the knowledge_search tool to find the most accurate and up-to-date information from the knowledge base. Always search for relevant information before providing your answer."
+        logger.info("Enhanced system prompt to encourage RAG tool usage")
+


⚠️ Potential issue

System prompt enhancement is currently ignored; compute toolgroups first, then call get_agent with the enhanced prompt.

You build toolgroups and then mutate system_prompt after the agent has already been created, so the enhanced prompt never reaches the session.

Minimal refactor: move the get_agent(...) call to after toolgroup detection and prompt augmentation, right before create_turn(...).

@@ - agent, conversation_id, session_id = await get_agent( - client, - model_id, - system_prompt, - available_input_shields, - available_output_shields, - query_request.conversation_id, - query_request.no_tools or False, - ) @@ - # Enhance system prompt to encourage tool usage when RAG is available - if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups): + # Enhance system prompt to encourage tool usage when RAG is available + if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups): system_prompt += "\n\nIMPORTANT: When answering questions, you MUST use the knowledge_search tool to find the most accurate and up-to-date information from the knowledge base. Always search for relevant information before providing your answer." logger.info("Enhanced system prompt to encourage RAG tool usage") @@ - response = await agent.create_turn( + # Create the agent after finalizing system_prompt/toolgroups + agent, conversation_id, session_id = await get_agent( + client, + model_id, + system_prompt, + available_input_shields, + available_output_shields, + query_request.conversation_id, + query_request.no_tools or False, + ) + response = await agent.create_turn( messages=[UserMessage(role="user", content=user_query)], session_id=session_id, documents=query_request.get_documents(), stream=True, toolgroups=toolgroups, )

If API supports overriding the system prompt per turn, that would be even cleaner; otherwise, reordering is required.

Also applies to: 787-795, 875-881

coderabbitai · 2025-09-17T18:26:09Z

src/app/endpoints/streaming_query.py

+    user_query = query_request.query
+    if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
+        user_query = f"Please use the knowledge_search tool to find relevant information about: {query_request.query}"
+        logger.info("Modified query to force RAG usage: %s", user_query)
+


⚠️ Potential issue

Don’t rewrite the user’s query to “force” RAG; rely on system prompt or tool routing.

Mutating the user’s text can degrade UX, leak implementation details into transcripts, and alter semantics. The system prompt (or tool routing) is the right lever.

- # Force RAG usage by modifying the query when toolgroups are available - user_query = query_request.query - if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups): - user_query = f"Please use the knowledge_search tool to find relevant information about: {query_request.query}" - logger.info("Modified query to force RAG usage: %s", user_query) + # Keep the original user query; rely on the enhanced system prompt/tool routing + user_query = query_request.query

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

user_query = query_request.query

if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):

user_query = f"Please use the knowledge_search tool to find relevant information about: {query_request.query}"

logger.info("Modified query to force RAG usage: %s", user_query)

# Keep the original user query; rely on the enhanced system prompt/tool routing

user_query = query_request.query

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

docs/byok_guide.md (3)
66-67: Fix markdownlint MD034: convert bare URL to a markdown link

Replace the bare URL with a proper link.
-  - Repository: https://github.com/lightspeed-core/rag-content
+  - Repository: [rag-content repository](https://github.com/lightspeed-core/rag-content)
91-93: Fix markdownlint MD034 and phrasing

Avoid the bare URL and improve grammar.
-Use the `rag-content` tool to create a compatible vector database:
-Please refer https://github.com/lightspeed-core/rag-content to create your vector database
+Use the `rag-content` tool to create a compatible vector database.
+Refer to the [rag-content repository](https://github.com/lightspeed-core/rag-content) to create your vector database.
98-102: Terminology consistency: “Llama Stack” naming

Minor phrasing inconsistency (“Llama-Stack Faiss Vector-IO”, “Llama Stack”). Standardize to “Llama Stack” for clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 584f692 and 5197e69.

📒 Files selected for processing (1)

docs/byok_guide.md (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

docs/byok_guide.md

66-66: Bare URL used

(MD034, no-bare-urls)

92-92: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: e2e_tests

🔇 Additional comments (2)

docs/byok_guide.md (2)

1-23: Well-scoped, practical BYOK guide

Good structure, clear ToC, and actionable examples. Nice addition.

169-188: Path naming consistency (ollama directories)

Examples store agent/response SQLite DBs under “.../ollama/...”, even when using OpenAI/vLLM. If this is a generic default path, ignore; otherwise, consider neutral paths to avoid confusion.

coderabbitai · 2025-09-23T10:11:34Z

docs/byok_guide.md

+providers:
+  inference:
+  # Embedding model provider
+  - provider_id: sentence-transformers
+    provider_type: inline::sentence-transformers
+    config: {}
+
+  # LLM provider (example: OpenAI)
+  - provider_id: openai
+    provider_type: remote::openai
+    config:
+      api_key: ${env.OPENAI_API_KEY}
+
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/agents_store.db
+      responses_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/responses_store.db
+
+  safety:
+  - provider_id: llama-guard
+    provider_type: inline::llama-guard
+    config:
+      excluded_categories: []
+
+  # Vector database configuration
+  vector_io:
+  - provider_id: your-knowledge-base
+    provider_type: inline::faiss  # or remote::pgvector
+    config:
+      kvstore:
+        type: sqlite
+        db_path: /path/to/vector_db/faiss_store.db
+        namespace: null
+
+  tool_runtime:
+  - provider_id: rag-runtime
+    provider_type: inline::rag-runtime
+    config: {}
+
+# Enable RAG tools
+tool_groups:
+- provider_id: rag-runtime
+  toolgroup_id: builtin::rag
+
+# Vector database configuration
+vector_dbs:
+- embedding_dimension: 768
+  embedding_model: sentence-transformers/all-mpnet-base-v2
+  provider_id: your-knowledge-base
+  vector_db_id: your-index-id  # ID used during index generation
+```


⚠️ Potential issue

YAML indentation bug: provider lists not nested under category keys

The lists under providers (inference, agents, safety, vector_io, tool_runtime) are mis-indented, making the YAML invalid. Indent list items under their category keys.

providers: - inference: - # Embedding model provider - - provider_id: sentence-transformers - provider_type: inline::sentence-transformers - config: {} - - # LLM provider (example: OpenAI) - - provider_id: openai - provider_type: remote::openai - config: - api_key: ${env.OPENAI_API_KEY} - - agents: - - provider_id: meta-reference - provider_type: inline::meta-reference - config: - persistence_store: - type: sqlite - db_path: .llama/distributions/ollama/agents_store.db - responses_store: - type: sqlite - db_path: .llama/distributions/ollama/responses_store.db - - safety: - - provider_id: llama-guard - provider_type: inline::llama-guard - config: - excluded_categories: [] - - # Vector database configuration - vector_io: - - provider_id: your-knowledge-base - provider_type: inline::faiss # or remote::pgvector - config: - kvstore: - type: sqlite - db_path: /path/to/vector_db/faiss_store.db - namespace: null - - tool_runtime: - - provider_id: rag-runtime - provider_type: inline::rag-runtime - config: {} + inference: + # Embedding model provider + - provider_id: sentence-transformers + provider_type: inline::sentence-transformers + config: {} + + # LLM provider (example: OpenAI) + - provider_id: openai + provider_type: remote::openai + config: + api_key: ${env.OPENAI_API_KEY} + + agents: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + persistence_store: + type: sqlite + db_path: .llama/distributions/ollama/agents_store.db + responses_store: + type: sqlite + db_path: .llama/distributions/ollama/responses_store.db + + safety: + - provider_id: llama-guard + provider_type: inline::llama-guard + config: + excluded_categories: [] + + # Vector database configuration + vector_io: + - provider_id: your-knowledge-base + provider_type: inline::faiss # or remote::pgvector + config: + kvstore: + type: sqlite + db_path: /path/to/vector_db/faiss_store.db + namespace: null + + tool_runtime: + - provider_id: rag-runtime + provider_type: inline::rag-runtime + config: {}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

providers:

inference:

# Embedding model provider

- provider_id: sentence-transformers

provider_type: inline::sentence-transformers

config: {}

# LLM provider (example: OpenAI)

- provider_id: openai

provider_type: remote::openai

config:

api_key: ${env.OPENAI_API_KEY}

agents:

- provider_id: meta-reference

provider_type: inline::meta-reference

config:

persistence_store:

type: sqlite

db_path: .llama/distributions/ollama/agents_store.db

responses_store:

type: sqlite

db_path: .llama/distributions/ollama/responses_store.db

safety:

- provider_id: llama-guard

provider_type: inline::llama-guard

config:

excluded_categories: []

# Vector database configuration

vector_io:

- provider_id: your-knowledge-base

provider_type: inline::faiss # or remote::pgvector

config:

kvstore:

type: sqlite

db_path: /path/to/vector_db/faiss_store.db

namespace: null

tool_runtime:

- provider_id: rag-runtime

provider_type: inline::rag-runtime

config: {}

# Enable RAG tools

tool_groups:

- provider_id: rag-runtime

toolgroup_id: builtin::rag

# Vector database configuration

vector_dbs:

- embedding_dimension: 768

embedding_model: sentence-transformers/all-mpnet-base-v2

provider_id: your-knowledge-base

vector_db_id: your-index-id # ID used during index generation

```

providers:

inference:

# Embedding model provider

- provider_id: sentence-transformers

provider_type: inline::sentence-transformers

config: {}

# LLM provider (example: OpenAI)

- provider_id: openai

provider_type: remote::openai

config:

api_key: ${env.OPENAI_API_KEY}

agents:

- provider_id: meta-reference

provider_type: inline::meta-reference

config:

persistence_store:

type: sqlite

db_path: .llama/distributions/ollama/agents_store.db

responses_store:

type: sqlite

db_path: .llama/distributions/ollama/responses_store.db

safety:

- provider_id: llama-guard

provider_type: inline::llama-guard

config:

excluded_categories: []

# Vector database configuration

vector_io:

- provider_id: your-knowledge-base

provider_type: inline::faiss # or remote::pgvector

config:

kvstore:

type: sqlite

db_path: /path/to/vector_db/faiss_store.db

namespace: null

tool_runtime:

- provider_id: rag-runtime

provider_type: inline::rag-runtime

config: {}

# Enable RAG tools

tool_groups:

- provider_id: rag-runtime

toolgroup_id: builtin::rag

# Vector database configuration

vector_dbs:

- embedding_dimension: 768

embedding_model: sentence-transformers/all-mpnet-base-v2

provider_id: your-knowledge-base

vector_db_id: your-index-id # ID used during index generation

🤖 Prompt for AI Agents

In docs/byok_guide.md around lines 145 to 201 the provider lists are mis-indented and not nested under their category keys (inference, agents, safety, vector_io, tool_runtime), which makes the YAML invalid; fix by indenting each list item (the lines starting with "- provider_id: ...") two spaces further so they are children of their respective category keys, ensure the comment lines remain aligned, and validate the file with a YAML linter to confirm proper nesting.

coderabbitai · 2025-09-23T10:11:35Z

docs/byok_guide.md

+providers:
+  inference:
+  - provider_id: sentence-transformers
+    provider_type: inline::sentence-transformers
+    config: {}
+  - provider_id: openai
+    provider_type: remote::openai
+    config:
+      api_key: ${env.OPENAI_API_KEY}
+
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/agents_store.db
+      responses_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/responses_store.db
+
+  safety:
+  - provider_id: llama-guard
+    provider_type: inline::llama-guard
+    config:
+      excluded_categories: []
+
+  vector_io:
+  - provider_id: company-docs
+    provider_type: inline::faiss
+    config:
+      kvstore:
+        type: sqlite
+        db_path: /home/user/vector_dbs/company_docs/faiss_store.db
+        namespace: null
+
+  tool_runtime:
+  - provider_id: rag-runtime
+    provider_type: inline::rag-runtime
+    config: {}
+


⚠️ Potential issue

YAML indentation bug in Example 1 providers block

Same nesting issue as above; indent lists under each provider category.

providers: - inference: - - provider_id: sentence-transformers - provider_type: inline::sentence-transformers - config: {} - - provider_id: openai - provider_type: remote::openai - config: - api_key: ${env.OPENAI_API_KEY} - - agents: - - provider_id: meta-reference - provider_type: inline::meta-reference - config: - persistence_store: - type: sqlite - db_path: .llama/distributions/ollama/agents_store.db - responses_store: - type: sqlite - db_path: .llama/distributions/ollama/responses_store.db - - safety: - - provider_id: llama-guard - provider_type: inline::llama-guard - config: - excluded_categories: [] - - vector_io: - - provider_id: company-docs - provider_type: inline::faiss - config: - kvstore: - type: sqlite - db_path: /home/user/vector_dbs/company_docs/faiss_store.db - namespace: null - - tool_runtime: - - provider_id: rag-runtime - provider_type: inline::rag-runtime - config: {} + inference: + - provider_id: sentence-transformers + provider_type: inline::sentence-transformers + config: {} + - provider_id: openai + provider_type: remote::openai + config: + api_key: ${env.OPENAI_API_KEY} + + agents: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + persistence_store: + type: sqlite + db_path: .llama/distributions/ollama/agents_store.db + responses_store: + type: sqlite + db_path: .llama/distributions/ollama/responses_store.db + + safety: + - provider_id: llama-guard + provider_type: inline::llama-guard + config: + excluded_categories: [] + + vector_io: + - provider_id: company-docs + provider_type: inline::faiss + config: + kvstore: + type: sqlite + db_path: /home/user/vector_dbs/company_docs/faiss_store.db + namespace: null + + tool_runtime: + - provider_id: rag-runtime + provider_type: inline::rag-runtime + config: {}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

providers:

inference:

- provider_id: sentence-transformers

provider_type: inline::sentence-transformers

config: {}

- provider_id: openai

provider_type: remote::openai

config:

api_key: ${env.OPENAI_API_KEY}

agents:

- provider_id: meta-reference

provider_type: inline::meta-reference

config:

persistence_store:

type: sqlite

db_path: .llama/distributions/ollama/agents_store.db

responses_store:

type: sqlite

db_path: .llama/distributions/ollama/responses_store.db

safety:

- provider_id: llama-guard

provider_type: inline::llama-guard

config:

excluded_categories: []

vector_io:

- provider_id: company-docs

provider_type: inline::faiss

config:

kvstore:

type: sqlite

db_path: /home/user/vector_dbs/company_docs/faiss_store.db

namespace: null

tool_runtime:

- provider_id: rag-runtime

provider_type: inline::rag-runtime

config: {}

providers:

inference:

- provider_id: sentence-transformers

provider_type: inline::sentence-transformers

config: {}

- provider_id: openai

provider_type: remote::openai

config:

api_key: ${env.OPENAI_API_KEY}

agents:

- provider_id: meta-reference

provider_type: inline::meta-reference

config:

persistence_store:

type: sqlite

db_path: .llama/distributions/ollama/agents_store.db

responses_store:

type: sqlite

db_path: .llama/distributions/ollama/responses_store.db

safety:

- provider_id: llama-guard

provider_type: inline::llama-guard

config:

excluded_categories: []

vector_io:

- provider_id: company-docs

provider_type: inline::faiss

config:

kvstore:

type: sqlite

db_path: /home/user/vector_dbs/company_docs/faiss_store.db

namespace: null

tool_runtime:

- provider_id: rag-runtime

provider_type: inline::rag-runtime

config: {}

🤖 Prompt for AI Agents

In docs/byok_guide.md around lines 289 to 329, the providers block has incorrect YAML nesting: the provider category keys (inference, agents, safety, vector_io, tool_runtime) need their list items indented under them rather than at the same level; fix by moving each provider list (the lines starting with "- provider_id:") to be nested under the respective category key with consistent indentation (e.g., two spaces per level) so the lists are children of inference/agents/etc., and optionally run a YAML linter to verify validity.

coderabbitai · 2025-09-23T10:11:35Z

docs/byok_guide.md

+providers:
+  inference:
+  - provider_id: sentence-transformers
+    provider_type: inline::sentence-transformers
+    config: {}
+  - provider_id: vllm
+    provider_type: remote::vllm
+    config:
+      url: http://localhost:8000/v1/
+      api_token: your-token-here
+
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/agents_store.db
+      responses_store:
+        type: sqlite
+        db_path: .llama/distributions/ollama/responses_store.db
+
+  safety:
+  - provider_id: llama-guard
+    provider_type: inline::llama-guard
+    config:
+      excluded_categories: []
+
+  vector_io:
+  - provider_id: enterprise-knowledge
+    provider_type: remote::pgvector
+    config:
+      host: postgres.company.com
+      port: 5432
+      db: enterprise_kb
+      user: rag_user
+      password: ${env.POSTGRES_PASSWORD}
+      kvstore:
+        type: sqlite
+        db_path: .llama/distributions/pgvector/registry.db
+
+  tool_runtime:
+  - provider_id: rag-runtime
+    provider_type: inline::rag-runtime
+    config: {}
+


⚠️ Potential issue

YAML indentation bug in Example 2 providers block

Indent provider lists under their category keys.

providers: - inference: - - provider_id: sentence-transformers - provider_type: inline::sentence-transformers - config: {} - - provider_id: vllm - provider_type: remote::vllm - config: - url: http://localhost:8000/v1/ - api_token: your-token-here - - agents: - - provider_id: meta-reference - provider_type: inline::meta-reference - config: - persistence_store: - type: sqlite - db_path: .llama/distributions/ollama/agents_store.db - responses_store: - type: sqlite - db_path: .llama/distributions/ollama/responses_store.db - - safety: - - provider_id: llama-guard - provider_type: inline::llama-guard - config: - excluded_categories: [] - - vector_io: - - provider_id: enterprise-knowledge - provider_type: remote::pgvector - config: - host: postgres.company.com - port: 5432 - db: enterprise_kb - user: rag_user - password: ${env.POSTGRES_PASSWORD} - kvstore: - type: sqlite - db_path: .llama/distributions/pgvector/registry.db - - tool_runtime: - - provider_id: rag-runtime - provider_type: inline::rag-runtime - config: {} + inference: + - provider_id: sentence-transformers + provider_type: inline::sentence-transformers + config: {} + - provider_id: vllm + provider_type: remote::vllm + config: + url: http://localhost:8000/v1/ + api_token: your-token-here + + agents: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + persistence_store: + type: sqlite + db_path: .llama/distributions/ollama/agents_store.db + responses_store: + type: sqlite + db_path: .llama/distributions/ollama/responses_store.db + + safety: + - provider_id: llama-guard + provider_type: inline::llama-guard + config: + excluded_categories: [] + + vector_io: + - provider_id: enterprise-knowledge + provider_type: remote::pgvector + config: + host: postgres.company.com + port: 5432 + db: enterprise_kb + user: rag_user + password: ${env.POSTGRES_PASSWORD} + kvstore: + type: sqlite + db_path: .llama/distributions/pgvector/registry.db + + tool_runtime: + - provider_id: rag-runtime + provider_type: inline::rag-runtime + config: {}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

providers:

inference:

- provider_id: sentence-transformers

provider_type: inline::sentence-transformers

config: {}

- provider_id: vllm

provider_type: remote::vllm

config:

url: http://localhost:8000/v1/

api_token: your-token-here

agents:

- provider_id: meta-reference

provider_type: inline::meta-reference

config:

persistence_store:

type: sqlite

db_path: .llama/distributions/ollama/agents_store.db

responses_store:

type: sqlite

db_path: .llama/distributions/ollama/responses_store.db

safety:

- provider_id: llama-guard

provider_type: inline::llama-guard

config:

excluded_categories: []

vector_io:

- provider_id: enterprise-knowledge

provider_type: remote::pgvector

config:

host: postgres.company.com

port: 5432

db: enterprise_kb

user: rag_user

password: ${env.POSTGRES_PASSWORD}

kvstore:

type: sqlite

db_path: .llama/distributions/pgvector/registry.db

tool_runtime:

- provider_id: rag-runtime

provider_type: inline::rag-runtime

config: {}

providers:

inference:

- provider_id: sentence-transformers

provider_type: inline::sentence-transformers

config: {}

- provider_id: vllm

provider_type: remote::vllm

config:

url: http://localhost:8000/v1/

api_token: your-token-here

agents:

- provider_id: meta-reference

provider_type: inline::meta-reference

config:

persistence_store:

type: sqlite

db_path: .llama/distributions/ollama/agents_store.db

responses_store:

type: sqlite

db_path: .llama/distributions/ollama/responses_store.db

safety:

- provider_id: llama-guard

provider_type: inline::llama-guard

config:

excluded_categories: []

vector_io:

- provider_id: enterprise-knowledge

provider_type: remote::pgvector

config:

host: postgres.company.com

port: 5432

db: enterprise_kb

user: rag_user

password: ${env.POSTGRES_PASSWORD}

kvstore:

type: sqlite

db_path: .llama/distributions/pgvector/registry.db

tool_runtime:

- provider_id: rag-runtime

provider_type: inline::rag-runtime

config: {}

🤖 Prompt for AI Agents

In docs/byok_guide.md around lines 368 to 413 the provider list items are not correctly indented under their category keys (inference, agents, safety, vector_io, tool_runtime); fix by indenting each provider list (the lines starting with "- provider_id: ...") two additional spaces so they are nested under the corresponding category key, and ensure all nested config mappings (provider_type, config, and nested config fields like persistence_store/kvstore) are consistently indented to maintain valid YAML structure.

Added RAG chunks in streaming_query response

584f692

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

Added BYOK documentation

5197e69

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

bsatapat-jpg closed this Sep 25, 2025

bsatapat-jpg deleted the dev branch September 25, 2025 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added RAG chunks in streaming_query response #558

Added RAG chunks in streaming_query response #558

Uh oh!

bsatapat-jpg commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

bsatapat-jpg commented Sep 17, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 17, 2025

Uh oh!

coderabbitai bot Sep 17, 2025

Uh oh!

coderabbitai bot Sep 17, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 23, 2025

Uh oh!

coderabbitai bot Sep 23, 2025

Uh oh!

coderabbitai bot Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Added RAG chunks in streaming_query response #558

Added RAG chunks in streaming_query response #558

Uh oh!

Conversation

bsatapat-jpg commented Sep 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

bsatapat-jpg commented Sep 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bsatapat-jpg commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading