-
Notifications
You must be signed in to change notification settings - Fork 58
Added RAG chunks in streaming_query response #558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe streaming endpoint now builds a TurnSummary while generating responses, aggregates tool calls and RAG context, and emits an enhanced end-of-stream event containing rag_chunks and synthesized referenced_documents; transcripts store serialized rag_chunks when enabled, and retrieve_response robustly detects and composes RAG/toolgroups, optionally forcing RAG usage in prompts. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant S as response_generator
participant A as Agent/LLM
participant T as Tools (knowledge_search)
participant X as Transcript Store
Note over S: Init TurnSummary & complete_response
C->>S: Start streaming request
S->>A: Build prompt & toolgroups (detect RAG)
alt RAG available
A->>T: Invoke knowledge_search
T-->>A: Return rag_chunks & docs
S->>S: summary.append_tool_calls_from_llama(...)
else No RAG
Note over S: Proceed without RAG
end
A-->>S: Stream chunks (content / tool_execution / step_complete)
S->>S: Accumulate content & tool calls into summary
S-->>C: Streamed chunks
S->>S: Prepare end-of-stream payload (rag_chunks, referenced_documents)
opt Transcripts enabled
S->>X: store_transcript(..., rag_chunks_for_transcript)
end
S-->>C: End event with rag_chunks & referenced_documents
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Need the below mentioned PR to merge first: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/app/endpoints/streaming_query.py (3)
824-856: Harden toolgroup/vector‑DB detection; add fallbacks and avoid contradictory logging.
vector_db.identifiermay not always exist; add fallbacks to.id/.name.- The log says “skipping RAG” then still builds RAG toolgroups; either skip or downgrade to debug and keep the attempt.
- vector_dbs = await client.vector_dbs.list() - vector_db_ids = [vector_db.identifier for vector_db in vector_dbs] + vector_dbs = await client.vector_dbs.list() + vector_db_ids: list[str] = [] + for vdb in vector_dbs: + if hasattr(vdb, "identifier"): + vector_db_ids.append(getattr(vdb, "identifier")) + elif hasattr(vdb, "id"): + vector_db_ids.append(getattr(vdb, "id")) + elif hasattr(vdb, "name"): + vector_db_ids.append(getattr(vdb, "name")) @@ - if "builtin::rag" in available_toolgroup_ids: - rag_toolgroups = get_rag_toolgroups(vector_db_ids) or [] - else: - logger.warning("builtin::rag tool group not available, skipping RAG functionality") - # Still try to create RAG toolgroups as they might work anyway - rag_toolgroups = get_rag_toolgroups(vector_db_ids) or [] + if "builtin::rag" in available_toolgroup_ids: + rag_toolgroups = get_rag_toolgroups(vector_db_ids) or [] + else: + logger.debug("builtin::rag tool group not listed; attempting best‑effort RAG toolgroups") + rag_toolgroups = get_rag_toolgroups(vector_db_ids) or [] @@ - except Exception as e: - logger.warning("Failed to check available toolgroups, skipping RAG: %s", e) - # Still try to create RAG toolgroups as they might work anyway - rag_toolgroups = get_rag_toolgroups(vector_db_ids) or [] + except Exception as e: + logger.debug("Failed to list toolgroups, proceeding best‑effort with RAG: %s", e) + rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
865-868: Make knowledge_search detection robust (don’t rely onstr(tg)).Current check may miss true positives or yield false positives. Inspect typical attributes.
- if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups): + def _has_knowledge_search(tg: object) -> bool: + name = getattr(tg, "name", "") + ident = getattr(tg, "identifier", "") + tools = getattr(tg, "tools", None) + if "builtin::rag" in ident or "knowledge_search" in name: + return True + if tools and any(getattr(t, "tool_name", getattr(t, "name", "")) == "knowledge_search" for t in tools): + return True + return "knowledge_search" in str(tg) + + if toolgroups and any(_has_knowledge_search(tg) for tg in toolgroups): system_prompt += "\n\nIMPORTANT: When answering questions, you MUST use the knowledge_search tool to find the most accurate and up-to-date information from the knowledge base. Always search for relevant information before providing your answer." logger.info("Enhanced system prompt to encourage RAG tool usage")
631-631: Dead fallback variable (complete_response)—either populate or drop.You never build
complete_response; fallback won’t help ifturn.output_messageis missing. Either collect tokens as you stream, or remove the fallback.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/app/endpoints/streaming_query.py(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/app/endpoints/streaming_query.py (6)
src/models/requests.py (2)
QueryRequest(72-222)get_documents(197-204)src/utils/endpoints.py (5)
check_configuration_loaded(62-68)get_agent(138-187)get_system_prompt(71-111)validate_conversation_ownership(39-59)validate_model_provider_override(114-134)src/utils/types.py (2)
TurnSummary(59-78)append_tool_calls_from_llama(65-78)src/metrics/utils.py (1)
update_llm_token_count_from_turn(60-77)src/app/endpoints/query.py (2)
evaluate_model_hints(110-149)get_rag_toolgroups(573-600)src/utils/transcripts.py (1)
store_transcript(33-86)
🔇 Additional comments (1)
src/app/endpoints/streaming_query.py (1)
146-147: API contract change: end event now includes rag_chunks and referenced_documents.Confirm downstream consumers (web/client) tolerate the new fields and the
referenced_documentsshape (possiblydoc_url=None). If strict schemas exist, bump version or behind a flag.
| # Process RAG chunks | ||
| rag_chunks = [ | ||
| { | ||
| "content": chunk.content, | ||
| "source": chunk.source, | ||
| "score": chunk.score | ||
| } | ||
| for chunk in summary.rag_chunks | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Fix: Avoid AttributeError on TurnSummary.rag_chunks and make logging safe.
Code assumes summary.rag_chunks exists, but TurnSummary (utils/types.py) in provided context doesn’t define it. This will crash during streaming end and transcript storage. Also, log lines reference “RAG chunks” counters and will fail if the attribute is missing.
Immediate defensive fix (keeps behavior if/when rag chunks are added):
@@
- rag_chunks = [
- {
- "content": chunk.content,
- "source": chunk.source,
- "score": chunk.score
- }
- for chunk in summary.rag_chunks
- ]
+ rag_chunks_src = getattr(summary, "rag_chunks", [])
+ rag_chunks = [
+ {
+ "content": chunk.content,
+ "source": chunk.source,
+ "score": chunk.score,
+ }
+ for chunk in rag_chunks_src
+ ]
@@
- referenced_docs = []
- doc_sources = set()
- for chunk in summary.rag_chunks:
+ referenced_docs = []
+ doc_sources = set()
+ for chunk in rag_chunks_src:
if chunk.source and chunk.source not in doc_sources:
doc_sources.add(chunk.source)
referenced_docs.append({
"doc_url": chunk.source if chunk.source.startswith("http") else None,
"doc_title": chunk.source.split("/")[-1] if chunk.source else None,
})
@@
- logger.info("RAG chunks before: %d", len(summary.rag_chunks))
+ logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", [])))
summary.append_tool_calls_from_llama(step)
- logger.info("RAG chunks after: %d", len(summary.rag_chunks))
+ logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", [])))
@@
- logger.info("RAG chunks before: %d", len(summary.rag_chunks))
+ logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", [])))
summary.append_tool_calls_from_llama(p.step_details)
- logger.info("RAG chunks after: %d", len(summary.rag_chunks))
+ logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", [])))
@@
- rag_chunks_for_transcript = [
+ rag_chunks_src = getattr(summary, "rag_chunks", [])
+ rag_chunks_for_transcript = [
{
"content": chunk.content,
"source": chunk.source,
"score": chunk.score
}
- for chunk in summary.rag_chunks
+ for chunk in rag_chunks_src
]Follow‑up (preferred): add rag_chunks: list[RAGChunk] = [] to TurnSummary and populate it where knowledge_search responses are parsed, then revert the getattr guards.
Quick check script:
Also, confirm client-side consumers can handle referenced_documents entries with doc_url=None. If not, gate emission or normalize to metadata docs only.
Also applies to: 121-141, 146-147, 632-635, 657-665, 666-672, 682-691
🏁 Script executed:
#!/bin/bash
# Verify whether TurnSummary defines rag_chunks
rg -n -C2 -t python $'class\\s+TurnSummary\\b' src/utils/types.py
rg -n -C2 -t python '\\brag_chunks\\b' src/utils/types.pyLength of output: 61
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "SEARCH: class TurnSummary"
rg -n --hidden -S 'class\s+TurnSummary\b' src || true
echo
echo "SEARCH: rag_chunks occurrences"
rg -n --hidden -S '\brag_chunks\b' src || true
echo
# If TurnSummary found, print its file head
ts_file=$(rg -l --hidden -S 'class\s+TurnSummary\b' src || true)
if [ -n "$ts_file" ]; then
echo "Found TurnSummary in: $ts_file"
echo "---- $ts_file (lines 1-400) ----"
nl -ba "$ts_file" | sed -n '1,400p'
fi
echo
# Print relevant parts of streaming_query.py
sq_path="src/app/endpoints/streaming_query.py"
if [ -f "$sq_path" ]; then
echo "---- $sq_path (lines 1-240) ----"
nl -ba "$sq_path" | sed -n '1,240p'
echo "---- $sq_path (lines 600-760) ----"
nl -ba "$sq_path" | sed -n '600,760p' || true
else
echo "$sq_path not found; searching for streaming_query.py"
rg -n --hidden -S 'streaming_query.py' src || true
fiLength of output: 1546
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- src/utils/types.py (1-240) ----"
sed -n '1,240p' src/utils/types.py || true
echo
echo "---- src/utils/transcripts.py (1-240) ----"
sed -n '1,240p' src/utils/transcripts.py || true
echo
echo "---- src/app/endpoints/streaming_query.py (1-260) ----"
sed -n '1,260p' src/app/endpoints/streaming_query.py || true
echo
echo "---- src/app/endpoints/streaming_query.py (640-720) ----"
sed -n '640,720p' src/app/endpoints/streaming_query.py || trueLength of output: 18546
Fix: Guard summary.rag_chunks access (TurnSummary missing rag_chunks) — critical runtime bug
TurnSummary (src/utils/types.py) does not define rag_chunks; streaming_query.py reads summary.rag_chunks in multiple places and will raise AttributeError at runtime. Apply the defensive guards below (or add rag_chunks to TurnSummary and populate it) and ensure referenced_documents with doc_url=None are acceptable to clients.
@@
- rag_chunks = [
- {
- "content": chunk.content,
- "source": chunk.source,
- "score": chunk.score
- }
- for chunk in summary.rag_chunks
- ]
+ rag_chunks_src = getattr(summary, "rag_chunks", [])
+ rag_chunks = [
+ {
+ "content": chunk.content,
+ "source": chunk.source,
+ "score": chunk.score,
+ }
+ for chunk in rag_chunks_src
+ ]
@@
- referenced_docs = []
- doc_sources = set()
- for chunk in summary.rag_chunks:
+ referenced_docs = []
+ doc_sources = set()
+ for chunk in rag_chunks_src:
if chunk.source and chunk.source not in doc_sources:
doc_sources.add(chunk.source)
referenced_docs.append({
"doc_url": chunk.source if chunk.source.startswith("http") else None,
"doc_title": chunk.source.split("/")[-1] if chunk.source else None,
})
@@
- logger.info("RAG chunks before: %d", len(summary.rag_chunks))
+ logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", [])))
summary.append_tool_calls_from_llama(step)
- logger.info("RAG chunks after: %d", len(summary.rag_chunks))
+ logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", [])))
@@
- logger.info("RAG chunks before: %d", len(summary.rag_chunks))
+ logger.debug("RAG chunks before: %d", len(getattr(summary, "rag_chunks", [])))
summary.append_tool_calls_from_llama(p.step_details)
- logger.info("RAG chunks after: %d", len(summary.rag_chunks))
+ logger.debug("RAG chunks after: %d", len(getattr(summary, "rag_chunks", [])))
@@
- rag_chunks_for_transcript = [
+ rag_chunks_src = getattr(summary, "rag_chunks", [])
+ rag_chunks_for_transcript = [
{
"content": chunk.content,
"source": chunk.source,
"score": chunk.score
}
- for chunk in summary.rag_chunks
+ for chunk in rag_chunks_src
]Follow-up (preferred): add rag_chunks: list[RAGChunk] = [] to TurnSummary (src/utils/types.py) and populate at knowledge_search parsing, then remove the getattr guards.
Affected: src/app/endpoints/streaming_query.py (multiple sites), src/utils/types.py (class TurnSummary), src/utils/transcripts.py (store_transcript signature), src/app/endpoints/query.py (passes rag_chunks=[]).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Process RAG chunks | |
| rag_chunks = [ | |
| { | |
| "content": chunk.content, | |
| "source": chunk.source, | |
| "score": chunk.score | |
| } | |
| for chunk in summary.rag_chunks | |
| ] | |
| # Process RAG chunks | |
| rag_chunks_src = getattr(summary, "rag_chunks", []) | |
| rag_chunks = [ | |
| { | |
| "content": chunk.content, | |
| "source": chunk.source, | |
| "score": chunk.score, | |
| } | |
| for chunk in rag_chunks_src | |
| ] |
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 110 to 118, the code
assumes summary.rag_chunks exists which causes AttributeError when TurnSummary
lacks that attribute; update the code to read rag_chunks via a safe getter
(e.g., rag_chunks = getattr(summary, "rag_chunks", [])) and ensure any entries
with referenced_documents that have doc_url=None are handled/filtered or
formatted so clients accept them; as a follow-up (preferred) add rag_chunks:
list[RAGChunk] = [] to TurnSummary in src/utils/types.py, populate it in the
knowledge_search parsing and update signatures in src/utils/transcripts.py and
any call sites (e.g., src/app/endpoints/query.py) so you can remove the getattr
guards.
| if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups): | ||
| system_prompt += "\n\nIMPORTANT: When answering questions, you MUST use the knowledge_search tool to find the most accurate and up-to-date information from the knowledge base. Always search for relevant information before providing your answer." | ||
| logger.info("Enhanced system prompt to encourage RAG tool usage") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System prompt enhancement is currently ignored; compute toolgroups first, then call get_agent with the enhanced prompt.
You build toolgroups and then mutate system_prompt after the agent has already been created, so the enhanced prompt never reaches the session.
Minimal refactor: move the get_agent(...) call to after toolgroup detection and prompt augmentation, right before create_turn(...).
@@
- agent, conversation_id, session_id = await get_agent(
- client,
- model_id,
- system_prompt,
- available_input_shields,
- available_output_shields,
- query_request.conversation_id,
- query_request.no_tools or False,
- )
@@
- # Enhance system prompt to encourage tool usage when RAG is available
- if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
+ # Enhance system prompt to encourage tool usage when RAG is available
+ if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
system_prompt += "\n\nIMPORTANT: When answering questions, you MUST use the knowledge_search tool to find the most accurate and up-to-date information from the knowledge base. Always search for relevant information before providing your answer."
logger.info("Enhanced system prompt to encourage RAG tool usage")
@@
- response = await agent.create_turn(
+ # Create the agent after finalizing system_prompt/toolgroups
+ agent, conversation_id, session_id = await get_agent(
+ client,
+ model_id,
+ system_prompt,
+ available_input_shields,
+ available_output_shields,
+ query_request.conversation_id,
+ query_request.no_tools or False,
+ )
+ response = await agent.create_turn(
messages=[UserMessage(role="user", content=user_query)],
session_id=session_id,
documents=query_request.get_documents(),
stream=True,
toolgroups=toolgroups,
)If API supports overriding the system prompt per turn, that would be even cleaner; otherwise, reordering is required.
Also applies to: 787-795, 875-881
| user_query = query_request.query | ||
| if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups): | ||
| user_query = f"Please use the knowledge_search tool to find relevant information about: {query_request.query}" | ||
| logger.info("Modified query to force RAG usage: %s", user_query) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t rewrite the user’s query to “force” RAG; rely on system prompt or tool routing.
Mutating the user’s text can degrade UX, leak implementation details into transcripts, and alter semantics. The system prompt (or tool routing) is the right lever.
- # Force RAG usage by modifying the query when toolgroups are available
- user_query = query_request.query
- if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups):
- user_query = f"Please use the knowledge_search tool to find relevant information about: {query_request.query}"
- logger.info("Modified query to force RAG usage: %s", user_query)
+ # Keep the original user query; rely on the enhanced system prompt/tool routing
+ user_query = query_request.query📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| user_query = query_request.query | |
| if toolgroups and any("knowledge_search" in str(tg) for tg in toolgroups): | |
| user_query = f"Please use the knowledge_search tool to find relevant information about: {query_request.query}" | |
| logger.info("Modified query to force RAG usage: %s", user_query) | |
| # Keep the original user query; rely on the enhanced system prompt/tool routing | |
| user_query = query_request.query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
docs/byok_guide.md (3)
66-67: Fix markdownlint MD034: convert bare URL to a markdown linkReplace the bare URL with a proper link.
- - Repository: https://github.com/lightspeed-core/rag-content + - Repository: [rag-content repository](https://github.com/lightspeed-core/rag-content)
91-93: Fix markdownlint MD034 and phrasingAvoid the bare URL and improve grammar.
-Use the `rag-content` tool to create a compatible vector database: -Please refer https://github.com/lightspeed-core/rag-content to create your vector database +Use the `rag-content` tool to create a compatible vector database. +Refer to the [rag-content repository](https://github.com/lightspeed-core/rag-content) to create your vector database.
98-102: Terminology consistency: “Llama Stack” namingMinor phrasing inconsistency (“Llama-Stack Faiss Vector-IO”, “Llama Stack”). Standardize to “Llama Stack” for clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/byok_guide.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/byok_guide.md
66-66: Bare URL used
(MD034, no-bare-urls)
92-92: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: e2e_tests
🔇 Additional comments (2)
docs/byok_guide.md (2)
1-23: Well-scoped, practical BYOK guideGood structure, clear ToC, and actionable examples. Nice addition.
169-188: Path naming consistency (ollama directories)Examples store agent/response SQLite DBs under “.../ollama/...”, even when using OpenAI/vLLM. If this is a generic default path, ignore; otherwise, consider neutral paths to avoid confusion.
| providers: | ||
| inference: | ||
| # Embedding model provider | ||
| - provider_id: sentence-transformers | ||
| provider_type: inline::sentence-transformers | ||
| config: {} | ||
|
|
||
| # LLM provider (example: OpenAI) | ||
| - provider_id: openai | ||
| provider_type: remote::openai | ||
| config: | ||
| api_key: ${env.OPENAI_API_KEY} | ||
|
|
||
| agents: | ||
| - provider_id: meta-reference | ||
| provider_type: inline::meta-reference | ||
| config: | ||
| persistence_store: | ||
| type: sqlite | ||
| db_path: .llama/distributions/ollama/agents_store.db | ||
| responses_store: | ||
| type: sqlite | ||
| db_path: .llama/distributions/ollama/responses_store.db | ||
|
|
||
| safety: | ||
| - provider_id: llama-guard | ||
| provider_type: inline::llama-guard | ||
| config: | ||
| excluded_categories: [] | ||
|
|
||
| # Vector database configuration | ||
| vector_io: | ||
| - provider_id: your-knowledge-base | ||
| provider_type: inline::faiss # or remote::pgvector | ||
| config: | ||
| kvstore: | ||
| type: sqlite | ||
| db_path: /path/to/vector_db/faiss_store.db | ||
| namespace: null | ||
|
|
||
| tool_runtime: | ||
| - provider_id: rag-runtime | ||
| provider_type: inline::rag-runtime | ||
| config: {} | ||
|
|
||
| # Enable RAG tools | ||
| tool_groups: | ||
| - provider_id: rag-runtime | ||
| toolgroup_id: builtin::rag | ||
|
|
||
| # Vector database configuration | ||
| vector_dbs: | ||
| - embedding_dimension: 768 | ||
| embedding_model: sentence-transformers/all-mpnet-base-v2 | ||
| provider_id: your-knowledge-base | ||
| vector_db_id: your-index-id # ID used during index generation | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAML indentation bug: provider lists not nested under category keys
The lists under providers (inference, agents, safety, vector_io, tool_runtime) are mis-indented, making the YAML invalid. Indent list items under their category keys.
providers:
- inference:
- # Embedding model provider
- - provider_id: sentence-transformers
- provider_type: inline::sentence-transformers
- config: {}
-
- # LLM provider (example: OpenAI)
- - provider_id: openai
- provider_type: remote::openai
- config:
- api_key: ${env.OPENAI_API_KEY}
-
- agents:
- - provider_id: meta-reference
- provider_type: inline::meta-reference
- config:
- persistence_store:
- type: sqlite
- db_path: .llama/distributions/ollama/agents_store.db
- responses_store:
- type: sqlite
- db_path: .llama/distributions/ollama/responses_store.db
-
- safety:
- - provider_id: llama-guard
- provider_type: inline::llama-guard
- config:
- excluded_categories: []
-
- # Vector database configuration
- vector_io:
- - provider_id: your-knowledge-base
- provider_type: inline::faiss # or remote::pgvector
- config:
- kvstore:
- type: sqlite
- db_path: /path/to/vector_db/faiss_store.db
- namespace: null
-
- tool_runtime:
- - provider_id: rag-runtime
- provider_type: inline::rag-runtime
- config: {}
+ inference:
+ # Embedding model provider
+ - provider_id: sentence-transformers
+ provider_type: inline::sentence-transformers
+ config: {}
+
+ # LLM provider (example: OpenAI)
+ - provider_id: openai
+ provider_type: remote::openai
+ config:
+ api_key: ${env.OPENAI_API_KEY}
+
+ agents:
+ - provider_id: meta-reference
+ provider_type: inline::meta-reference
+ config:
+ persistence_store:
+ type: sqlite
+ db_path: .llama/distributions/ollama/agents_store.db
+ responses_store:
+ type: sqlite
+ db_path: .llama/distributions/ollama/responses_store.db
+
+ safety:
+ - provider_id: llama-guard
+ provider_type: inline::llama-guard
+ config:
+ excluded_categories: []
+
+ # Vector database configuration
+ vector_io:
+ - provider_id: your-knowledge-base
+ provider_type: inline::faiss # or remote::pgvector
+ config:
+ kvstore:
+ type: sqlite
+ db_path: /path/to/vector_db/faiss_store.db
+ namespace: null
+
+ tool_runtime:
+ - provider_id: rag-runtime
+ provider_type: inline::rag-runtime
+ config: {}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| providers: | |
| inference: | |
| # Embedding model provider | |
| - provider_id: sentence-transformers | |
| provider_type: inline::sentence-transformers | |
| config: {} | |
| # LLM provider (example: OpenAI) | |
| - provider_id: openai | |
| provider_type: remote::openai | |
| config: | |
| api_key: ${env.OPENAI_API_KEY} | |
| agents: | |
| - provider_id: meta-reference | |
| provider_type: inline::meta-reference | |
| config: | |
| persistence_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/agents_store.db | |
| responses_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/responses_store.db | |
| safety: | |
| - provider_id: llama-guard | |
| provider_type: inline::llama-guard | |
| config: | |
| excluded_categories: [] | |
| # Vector database configuration | |
| vector_io: | |
| - provider_id: your-knowledge-base | |
| provider_type: inline::faiss # or remote::pgvector | |
| config: | |
| kvstore: | |
| type: sqlite | |
| db_path: /path/to/vector_db/faiss_store.db | |
| namespace: null | |
| tool_runtime: | |
| - provider_id: rag-runtime | |
| provider_type: inline::rag-runtime | |
| config: {} | |
| # Enable RAG tools | |
| tool_groups: | |
| - provider_id: rag-runtime | |
| toolgroup_id: builtin::rag | |
| # Vector database configuration | |
| vector_dbs: | |
| - embedding_dimension: 768 | |
| embedding_model: sentence-transformers/all-mpnet-base-v2 | |
| provider_id: your-knowledge-base | |
| vector_db_id: your-index-id # ID used during index generation | |
| ``` | |
| providers: | |
| inference: | |
| # Embedding model provider | |
| - provider_id: sentence-transformers | |
| provider_type: inline::sentence-transformers | |
| config: {} | |
| # LLM provider (example: OpenAI) | |
| - provider_id: openai | |
| provider_type: remote::openai | |
| config: | |
| api_key: ${env.OPENAI_API_KEY} | |
| agents: | |
| - provider_id: meta-reference | |
| provider_type: inline::meta-reference | |
| config: | |
| persistence_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/agents_store.db | |
| responses_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/responses_store.db | |
| safety: | |
| - provider_id: llama-guard | |
| provider_type: inline::llama-guard | |
| config: | |
| excluded_categories: [] | |
| # Vector database configuration | |
| vector_io: | |
| - provider_id: your-knowledge-base | |
| provider_type: inline::faiss # or remote::pgvector | |
| config: | |
| kvstore: | |
| type: sqlite | |
| db_path: /path/to/vector_db/faiss_store.db | |
| namespace: null | |
| tool_runtime: | |
| - provider_id: rag-runtime | |
| provider_type: inline::rag-runtime | |
| config: {} | |
| # Enable RAG tools | |
| tool_groups: | |
| - provider_id: rag-runtime | |
| toolgroup_id: builtin::rag | |
| # Vector database configuration | |
| vector_dbs: | |
| - embedding_dimension: 768 | |
| embedding_model: sentence-transformers/all-mpnet-base-v2 | |
| provider_id: your-knowledge-base | |
| vector_db_id: your-index-id # ID used during index generation |
🤖 Prompt for AI Agents
In docs/byok_guide.md around lines 145 to 201 the provider lists are
mis-indented and not nested under their category keys (inference, agents,
safety, vector_io, tool_runtime), which makes the YAML invalid; fix by indenting
each list item (the lines starting with "- provider_id: ...") two spaces further
so they are children of their respective category keys, ensure the comment lines
remain aligned, and validate the file with a YAML linter to confirm proper
nesting.
| providers: | ||
| inference: | ||
| - provider_id: sentence-transformers | ||
| provider_type: inline::sentence-transformers | ||
| config: {} | ||
| - provider_id: openai | ||
| provider_type: remote::openai | ||
| config: | ||
| api_key: ${env.OPENAI_API_KEY} | ||
| agents: | ||
| - provider_id: meta-reference | ||
| provider_type: inline::meta-reference | ||
| config: | ||
| persistence_store: | ||
| type: sqlite | ||
| db_path: .llama/distributions/ollama/agents_store.db | ||
| responses_store: | ||
| type: sqlite | ||
| db_path: .llama/distributions/ollama/responses_store.db | ||
| safety: | ||
| - provider_id: llama-guard | ||
| provider_type: inline::llama-guard | ||
| config: | ||
| excluded_categories: [] | ||
| vector_io: | ||
| - provider_id: company-docs | ||
| provider_type: inline::faiss | ||
| config: | ||
| kvstore: | ||
| type: sqlite | ||
| db_path: /home/user/vector_dbs/company_docs/faiss_store.db | ||
| namespace: null | ||
| tool_runtime: | ||
| - provider_id: rag-runtime | ||
| provider_type: inline::rag-runtime | ||
| config: {} | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAML indentation bug in Example 1 providers block
Same nesting issue as above; indent lists under each provider category.
providers:
- inference:
- - provider_id: sentence-transformers
- provider_type: inline::sentence-transformers
- config: {}
- - provider_id: openai
- provider_type: remote::openai
- config:
- api_key: ${env.OPENAI_API_KEY}
-
- agents:
- - provider_id: meta-reference
- provider_type: inline::meta-reference
- config:
- persistence_store:
- type: sqlite
- db_path: .llama/distributions/ollama/agents_store.db
- responses_store:
- type: sqlite
- db_path: .llama/distributions/ollama/responses_store.db
-
- safety:
- - provider_id: llama-guard
- provider_type: inline::llama-guard
- config:
- excluded_categories: []
-
- vector_io:
- - provider_id: company-docs
- provider_type: inline::faiss
- config:
- kvstore:
- type: sqlite
- db_path: /home/user/vector_dbs/company_docs/faiss_store.db
- namespace: null
-
- tool_runtime:
- - provider_id: rag-runtime
- provider_type: inline::rag-runtime
- config: {}
+ inference:
+ - provider_id: sentence-transformers
+ provider_type: inline::sentence-transformers
+ config: {}
+ - provider_id: openai
+ provider_type: remote::openai
+ config:
+ api_key: ${env.OPENAI_API_KEY}
+
+ agents:
+ - provider_id: meta-reference
+ provider_type: inline::meta-reference
+ config:
+ persistence_store:
+ type: sqlite
+ db_path: .llama/distributions/ollama/agents_store.db
+ responses_store:
+ type: sqlite
+ db_path: .llama/distributions/ollama/responses_store.db
+
+ safety:
+ - provider_id: llama-guard
+ provider_type: inline::llama-guard
+ config:
+ excluded_categories: []
+
+ vector_io:
+ - provider_id: company-docs
+ provider_type: inline::faiss
+ config:
+ kvstore:
+ type: sqlite
+ db_path: /home/user/vector_dbs/company_docs/faiss_store.db
+ namespace: null
+
+ tool_runtime:
+ - provider_id: rag-runtime
+ provider_type: inline::rag-runtime
+ config: {}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| providers: | |
| inference: | |
| - provider_id: sentence-transformers | |
| provider_type: inline::sentence-transformers | |
| config: {} | |
| - provider_id: openai | |
| provider_type: remote::openai | |
| config: | |
| api_key: ${env.OPENAI_API_KEY} | |
| agents: | |
| - provider_id: meta-reference | |
| provider_type: inline::meta-reference | |
| config: | |
| persistence_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/agents_store.db | |
| responses_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/responses_store.db | |
| safety: | |
| - provider_id: llama-guard | |
| provider_type: inline::llama-guard | |
| config: | |
| excluded_categories: [] | |
| vector_io: | |
| - provider_id: company-docs | |
| provider_type: inline::faiss | |
| config: | |
| kvstore: | |
| type: sqlite | |
| db_path: /home/user/vector_dbs/company_docs/faiss_store.db | |
| namespace: null | |
| tool_runtime: | |
| - provider_id: rag-runtime | |
| provider_type: inline::rag-runtime | |
| config: {} | |
| providers: | |
| inference: | |
| - provider_id: sentence-transformers | |
| provider_type: inline::sentence-transformers | |
| config: {} | |
| - provider_id: openai | |
| provider_type: remote::openai | |
| config: | |
| api_key: ${env.OPENAI_API_KEY} | |
| agents: | |
| - provider_id: meta-reference | |
| provider_type: inline::meta-reference | |
| config: | |
| persistence_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/agents_store.db | |
| responses_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/responses_store.db | |
| safety: | |
| - provider_id: llama-guard | |
| provider_type: inline::llama-guard | |
| config: | |
| excluded_categories: [] | |
| vector_io: | |
| - provider_id: company-docs | |
| provider_type: inline::faiss | |
| config: | |
| kvstore: | |
| type: sqlite | |
| db_path: /home/user/vector_dbs/company_docs/faiss_store.db | |
| namespace: null | |
| tool_runtime: | |
| - provider_id: rag-runtime | |
| provider_type: inline::rag-runtime | |
| config: {} |
🤖 Prompt for AI Agents
In docs/byok_guide.md around lines 289 to 329, the providers block has incorrect
YAML nesting: the provider category keys (inference, agents, safety, vector_io,
tool_runtime) need their list items indented under them rather than at the same
level; fix by moving each provider list (the lines starting with "-
provider_id:") to be nested under the respective category key with consistent
indentation (e.g., two spaces per level) so the lists are children of
inference/agents/etc., and optionally run a YAML linter to verify validity.
| providers: | ||
| inference: | ||
| - provider_id: sentence-transformers | ||
| provider_type: inline::sentence-transformers | ||
| config: {} | ||
| - provider_id: vllm | ||
| provider_type: remote::vllm | ||
| config: | ||
| url: http://localhost:8000/v1/ | ||
| api_token: your-token-here | ||
| agents: | ||
| - provider_id: meta-reference | ||
| provider_type: inline::meta-reference | ||
| config: | ||
| persistence_store: | ||
| type: sqlite | ||
| db_path: .llama/distributions/ollama/agents_store.db | ||
| responses_store: | ||
| type: sqlite | ||
| db_path: .llama/distributions/ollama/responses_store.db | ||
| safety: | ||
| - provider_id: llama-guard | ||
| provider_type: inline::llama-guard | ||
| config: | ||
| excluded_categories: [] | ||
| vector_io: | ||
| - provider_id: enterprise-knowledge | ||
| provider_type: remote::pgvector | ||
| config: | ||
| host: postgres.company.com | ||
| port: 5432 | ||
| db: enterprise_kb | ||
| user: rag_user | ||
| password: ${env.POSTGRES_PASSWORD} | ||
| kvstore: | ||
| type: sqlite | ||
| db_path: .llama/distributions/pgvector/registry.db | ||
| tool_runtime: | ||
| - provider_id: rag-runtime | ||
| provider_type: inline::rag-runtime | ||
| config: {} | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAML indentation bug in Example 2 providers block
Indent provider lists under their category keys.
providers:
- inference:
- - provider_id: sentence-transformers
- provider_type: inline::sentence-transformers
- config: {}
- - provider_id: vllm
- provider_type: remote::vllm
- config:
- url: http://localhost:8000/v1/
- api_token: your-token-here
-
- agents:
- - provider_id: meta-reference
- provider_type: inline::meta-reference
- config:
- persistence_store:
- type: sqlite
- db_path: .llama/distributions/ollama/agents_store.db
- responses_store:
- type: sqlite
- db_path: .llama/distributions/ollama/responses_store.db
-
- safety:
- - provider_id: llama-guard
- provider_type: inline::llama-guard
- config:
- excluded_categories: []
-
- vector_io:
- - provider_id: enterprise-knowledge
- provider_type: remote::pgvector
- config:
- host: postgres.company.com
- port: 5432
- db: enterprise_kb
- user: rag_user
- password: ${env.POSTGRES_PASSWORD}
- kvstore:
- type: sqlite
- db_path: .llama/distributions/pgvector/registry.db
-
- tool_runtime:
- - provider_id: rag-runtime
- provider_type: inline::rag-runtime
- config: {}
+ inference:
+ - provider_id: sentence-transformers
+ provider_type: inline::sentence-transformers
+ config: {}
+ - provider_id: vllm
+ provider_type: remote::vllm
+ config:
+ url: http://localhost:8000/v1/
+ api_token: your-token-here
+
+ agents:
+ - provider_id: meta-reference
+ provider_type: inline::meta-reference
+ config:
+ persistence_store:
+ type: sqlite
+ db_path: .llama/distributions/ollama/agents_store.db
+ responses_store:
+ type: sqlite
+ db_path: .llama/distributions/ollama/responses_store.db
+
+ safety:
+ - provider_id: llama-guard
+ provider_type: inline::llama-guard
+ config:
+ excluded_categories: []
+
+ vector_io:
+ - provider_id: enterprise-knowledge
+ provider_type: remote::pgvector
+ config:
+ host: postgres.company.com
+ port: 5432
+ db: enterprise_kb
+ user: rag_user
+ password: ${env.POSTGRES_PASSWORD}
+ kvstore:
+ type: sqlite
+ db_path: .llama/distributions/pgvector/registry.db
+
+ tool_runtime:
+ - provider_id: rag-runtime
+ provider_type: inline::rag-runtime
+ config: {}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| providers: | |
| inference: | |
| - provider_id: sentence-transformers | |
| provider_type: inline::sentence-transformers | |
| config: {} | |
| - provider_id: vllm | |
| provider_type: remote::vllm | |
| config: | |
| url: http://localhost:8000/v1/ | |
| api_token: your-token-here | |
| agents: | |
| - provider_id: meta-reference | |
| provider_type: inline::meta-reference | |
| config: | |
| persistence_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/agents_store.db | |
| responses_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/responses_store.db | |
| safety: | |
| - provider_id: llama-guard | |
| provider_type: inline::llama-guard | |
| config: | |
| excluded_categories: [] | |
| vector_io: | |
| - provider_id: enterprise-knowledge | |
| provider_type: remote::pgvector | |
| config: | |
| host: postgres.company.com | |
| port: 5432 | |
| db: enterprise_kb | |
| user: rag_user | |
| password: ${env.POSTGRES_PASSWORD} | |
| kvstore: | |
| type: sqlite | |
| db_path: .llama/distributions/pgvector/registry.db | |
| tool_runtime: | |
| - provider_id: rag-runtime | |
| provider_type: inline::rag-runtime | |
| config: {} | |
| providers: | |
| inference: | |
| - provider_id: sentence-transformers | |
| provider_type: inline::sentence-transformers | |
| config: {} | |
| - provider_id: vllm | |
| provider_type: remote::vllm | |
| config: | |
| url: http://localhost:8000/v1/ | |
| api_token: your-token-here | |
| agents: | |
| - provider_id: meta-reference | |
| provider_type: inline::meta-reference | |
| config: | |
| persistence_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/agents_store.db | |
| responses_store: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/responses_store.db | |
| safety: | |
| - provider_id: llama-guard | |
| provider_type: inline::llama-guard | |
| config: | |
| excluded_categories: [] | |
| vector_io: | |
| - provider_id: enterprise-knowledge | |
| provider_type: remote::pgvector | |
| config: | |
| host: postgres.company.com | |
| port: 5432 | |
| db: enterprise_kb | |
| user: rag_user | |
| password: ${env.POSTGRES_PASSWORD} | |
| kvstore: | |
| type: sqlite | |
| db_path: .llama/distributions/pgvector/registry.db | |
| tool_runtime: | |
| - provider_id: rag-runtime | |
| provider_type: inline::rag-runtime | |
| config: {} |
🤖 Prompt for AI Agents
In docs/byok_guide.md around lines 368 to 413 the provider list items are not
correctly indented under their category keys (inference, agents, safety,
vector_io, tool_runtime); fix by indenting each provider list (the lines
starting with "- provider_id: ...") two additional spaces so they are nested
under the corresponding category key, and ensure all nested config mappings
(provider_type, config, and nested config fields like persistence_store/kvstore)
are consistently indented to maintain valid YAML structure.
Description
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Improvements
Documentation