LCORE-1037: Update the BYOK guide to use Lightspeed stack config instead of run.yaml by are-ces · Pull Request #1838 · lightspeed-core/lightspeed-stack

are-ces · 2026-06-02T11:03:52Z

Description

Update the BYOK and RAG guides so users configure BYOK knowledge sources via the byok_rag section in lightspeed-stack.yaml instead of editing run.yaml directly. The required configuration is now auto-generated at startup by make run, make run-stack, docker-compose, and library mode.

Key changes:

Replace all run.yaml BYOK/RAG snippets with byok_rag entries in lightspeed-stack.yaml
Add field reference table for all byok_rag options
Clarify that pgvector is not yet supported via byok_rag (see LCORE-2437) and must be configured directly in the Llama Stack config
Clarify that models and inference providers still go in run.yaml
Remove sqlite-vec references (not supported)
Update example config with embedding_model field

Type of change

Tools used to create PR

Assisted-by: Claude Opus 4.6
Generated by: N/A

Related Tickets & Documents

Closes https://redhat.atlassian.net/browse/LCORE-1037
Related https://redhat.atlassian.net/browse/LCORE-836

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Review the rendered markdown in docs/byok_guide.md and docs/rag_guide.md
Verify all byok_rag field names match src/models/config.py:ByokRag
Confirm no remaining incorrect run.yaml references for BYOK configuration

Summary by CodeRabbit

Documentation
- Enhanced BYOK configuration documentation with improved embedding model per-source setup and vector database guidance
- Updated RAG configuration guide with clearer examples, provider-specific setup instructions, and annotation behavior details
- Refined configuration examples for knowledge source integration with FAISS and pgvector support

Replace all run.yaml references with lightspeed-stack.yaml byok_rag configuration. Users should no longer edit run.yaml directly — the Lightspeed Stack service auto-generates Llama Stack config at startup from the byok_rag and rag sections in lightspeed-stack.yaml. - Rewrite BYOK guide Step 4 to use byok_rag as primary config path - Add field reference table for all byok_rag options - Rewrite RAG guide vector store sections with byok_rag examples - Replace full run.yaml config examples with lightspeed-stack.yaml format - Add embedding_model field to BYOK example config - Update Step 3 to reference byok_rag embedding_model field Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-06-02T11:04:05Z

Walkthrough

This PR updates BYOK and RAG configuration documentation to align with the new byok_rag configuration approach in lightspeed-stack.yaml. Two guides are restructured to document embedding model settings per knowledge source, and an example configuration file is updated with explicit per-RAG embedding and vector store parameters.

Changes

BYOK and RAG Configuration Documentation Update

Layer / File(s)	Summary
BYOK Knowledge Source Configuration Guide `docs/byok_guide.md`	Restructured to document `byok_rag` configuration in `lightspeed-stack.yaml`. TOC updated to reference "Configure BYOK Knowledge Sources". Step 3 and Step 4 sections rewritten with `byok_rag` YAML example, field-reference table, `score_multiplier` guidance, and `vector_db_id` correctness warning. "Supported Vector Database Types" FAISS section replaced with `byok_rag`-based snippet. pgvector section adds NOTE that it is not supported via `byok_rag`. Password environment variable updated from `DB_PASSWORD` to `POSTGRES_PASSWORD`. "Configuration Examples" section replaced with new FAISS and FAISS+pgvector examples. Conclusion references updated.
RAG Configuration Guide Alignment `docs/rag_guide.md`	Updated to align with newer Lightspeed/Llama Stack guidance. Embedding model download flow rewritten to use `byok_rag.embedding_model` with automatic startup download. pgvector examples updated with new `vector_stores` block structure. vLLM and OpenAI provider instructions clarified. Note added about unregistering old resources when experimenting. Ollama limitations text shortened. OKP enrichment note updated to reference startup configuration. "Complete Configuration Reference" replaced with minimal `lightspeed-stack.yaml` example. RAG annotations section updated with `vector_stores.annotation_prompt_params` behavior and annotation default disabling.
Example Configuration Update `examples/lightspeed-stack-byok-okp-rag.yaml`	`byok_rag` list entries updated with explicit per-store FAISS parameters including `embedding_model`, `embedding_dimension`, `vector_db_id`, and `db_path` settings for `ocp-docs` and `knowledge-base` RAG IDs.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the main change: updating BYOK documentation to use lightspeed-stack.yaml config instead of run.yaml, which aligns with the substantial revisions across docs/byok_guide.md, docs/rag_guide.md, and the example configuration file.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/byok_guide.md`:
- Around line 397-400: The blockquote note starting with the marker "[!NOTE]"
has an extra blank line inside the block which breaks markdownlint MD028; remove
the empty line so all lines in the note are contiguous '>'-prefixed lines (i.e.,
keep the "[!NOTE]" marker and the following lines like "For pgvector, ensure
your PostgreSQL credentials..." and "(e.g., `POSTGRES_PASSWORD`)." without any
blank line between them) to satisfy the linter.

In `@docs/rag_guide.md`:
- Line 227: Replace the phrase "OpenAI compatible" with the hyphenated compound
adjective "OpenAI-compatible" in the sentence that reads "While Ollama also
exposes an OpenAI compatible endpoint..." so the docs use "OpenAI-compatible"
for correct grammar and consistency; update that exact occurrence in
docs/rag_guide.md.
- Line 385: The heading "RAG annotations" lacks a preceding blank line (MD022);
update the markdown so there is one empty line immediately before the line
starting with "# RAG annotations" to satisfy the MD022 rule and ensure the
heading is properly separated from the prior content.
- Around line 91-94: The NOTE block beginning with "[!NOTE]" that mentions
pgvector and LCORE-2437 contains an extra blank quoted line; edit that block
(the "[!NOTE]" block containing the sentence "pgvector is not yet supported via
`byok_rag`..." and the following "It must be configured directly...") and remove
the empty/blank line between the quoted lines so the blockquote has no blank
quoted line (MD028).

In `@examples/lightspeed-stack-byok-okp-rag.yaml`:
- Around line 40-42: The embedding_dimension for entries using embedding_model
"sentence-transformers/all-mpnet-base-v2" in the byok_rag configuration is
incorrect (set to 1024 and 384); update both BYOK FAISS stores (ocp-docs and
knowledge-base) to use embedding_dimension 768 to match the model's hidden size
and avoid incompatible vector sizes at runtime—search for occurrences of
embedding_model: sentence-transformers/all-mpnet-base-v2 and set the
corresponding embedding_dimension fields to 768.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d90172e1-09b6-46bc-8307-b3a42a1da0b4

📥 Commits

Reviewing files that changed from the base of the PR and between e5baf4a and 105e002.

📒 Files selected for processing (3)

docs/byok_guide.md
docs/rag_guide.md
examples/lightspeed-stack-byok-okp-rag.yaml

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: unit_tests (3.12)
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: unit_tests (3.13)
GitHub Check: Pylinter
GitHub Check: build-pr

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2026-05-20T08:09:30.641Z

Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

examples/lightspeed-stack-byok-okp-rag.yaml

🪛 LanguageTool

docs/rag_guide.md

[grammar] ~227-~227: Use a hyphen to join words.
Context: ...G. While Ollama also exposes an OpenAI compatible endpoint that supports tool c...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.22.1)

docs/byok_guide.md

[warning] 400-400: Blank line inside blockquote

(MD028, no-blanks-blockquote)

docs/rag_guide.md

[warning] 94-94: Blank line inside blockquote

(MD028, no-blanks-blockquote)

[warning] 385-385: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)

coderabbitai · 2026-06-02T11:07:47Z


 The `remote::ollama` provider can be used for inference. However, it does not support tool calling, including RAG.  
-While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot be used with `llama-stack` due to current limitations in the `remote::openai` provider. 
+While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider. 


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use hyphenated compound adjective (“OpenAI-compatible”).

This improves grammar and consistency in user-facing docs.

Suggested fix

-While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider. +While Ollama also exposes an OpenAI-compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.

While Ollama also exposes an OpenAI-compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.

🧰 Tools

🪛 LanguageTool

[grammar] ~227-~227: Use a hyphen to join words.
Context: ...G. While Ollama also exposes an OpenAI compatible endpoint that supports tool c...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/rag_guide.md` at line 227, Replace the phrase "OpenAI compatible" with the hyphenated compound adjective "OpenAI-compatible" in the sentence that reads "While Ollama also exposes an OpenAI compatible endpoint..." so the docs use "OpenAI-compatible" for correct grammar and consistency; update that exact occurrence in docs/rag_guide.md.

coderabbitai · 2026-06-02T11:07:47Z

+    embedding_model: sentence-transformers/all-mpnet-base-v2
    embedding_dimension: 1024
-    vector_db_id: vs_123       # Llama-stack vector_store_id
+    vector_db_id: vs_123       # Vector store ID (from index generation)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Confirm declared dimensions in repo examples/docs rg -n "embedding_model:\s*sentence-transformers/all-mpnet-base-v2|embedding_dimension:" -C1 # 2) Check model config dimension from Hugging Face curl -fsSL https://huggingface.co/sentence-transformers/all-mpnet-base-v2/raw/main/config.json | jq '.hidden_size'

Repository: lightspeed-core/lightspeed-stack

Length of output: 11601

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show exact YAML block for the affected entries sed -n '30,60p' examples/lightspeed-stack-byok-okp-rag.yaml | nl -ba

Repository: lightspeed-core/lightspeed-stack

Length of output: 119

🏁 Script executed:

#!/bin/bash set -euo pipefail awk 'NR>=30 && NR<=60 {printf "%5d\t%s\n", NR, $0}' examples/lightspeed-stack-byok-okp-rag.yaml

Repository: lightspeed-core/lightspeed-stack

Length of output: 1413

Fix embedding_dimension mismatch for sentence-transformers/all-mpnet-base-v2 in byok_rag.

In examples/lightspeed-stack-byok-okp-rag.yaml, both BYOK FAISS stores use sentence-transformers/all-mpnet-base-v2 but set incompatible dimensions (1024 for ocp-docs and 384 for knowledge-base). The model’s hidden size is 768, and the repo/docs use embedding_dimension: 768 for this model—otherwise stored/query vectors can be incompatible at runtime.

embedding_model: sentence-transformers/all-mpnet-base-v2 embedding_dimension: 1024

Suggested fix

byok_rag: - rag_id: ocp-docs rag_type: inline::faiss embedding_model: sentence-transformers/all-mpnet-base-v2 - embedding_dimension: 1024 + embedding_dimension: 768 vector_db_id: vs_123 db_path: /tmp/ocp.faiss @@ - rag_id: knowledge-base rag_type: inline::faiss embedding_model: sentence-transformers/all-mpnet-base-v2 - embedding_dimension: 384 + embedding_dimension: 768 vector_db_id: vs_456 db_path: /tmp/kb.faiss

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/lightspeed-stack-byok-okp-rag.yaml` around lines 40 - 42, The embedding_dimension for entries using embedding_model "sentence-transformers/all-mpnet-base-v2" in the byok_rag configuration is incorrect (set to 1024 and 384); update both BYOK FAISS stores (ocp-docs and knowledge-base) to use embedding_dimension 768 to match the model's hidden size and avoid incompatible vector sizes at runtime—search for occurrences of embedding_model: sentence-transformers/all-mpnet-base-v2 and set the corresponding embedding_dimension fields to 768.

@are-ces, sure! Planning the fix to correct both embedding_dimension values to 768 (the actual output dimension of sentence-transformers/all-mpnet-base-v2).

Update embedding_dimension from 1024 → 768 for the ocp-docs entry

Update embedding_dimension from 384 → 768 for the knowledge-base entry

🧠 Learnings used

Learnt from: max-svistunov Repo: lightspeed-core/lightspeed-stack PR: 1580 File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110 Timestamp: 2026-05-20T08:09:30.641Z Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Failed to handle agent chat message. Please try again.

Replace remaining Llama Stack mentions with generic terms — users should not need to know about the underlying Llama Stack layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

tisnik

LGTM

syedriko · 2026-06-02T16:27:26Z


 ```bash
 mkdir ./embeddings_model
 pdm run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2 


syedriko · 2026-06-02T16:38:08Z

+
+> [!NOTE]
+> Your LLM inference provider (e.g., OpenAI, vLLM) must also be configured in your `run.yaml`.
+> For OpenAI, set the `OPENAI_API_KEY` environment variable.


This doesn't seem relevant to BYOK...

syedriko · 2026-06-02T16:42:35Z

+> pgvector is not yet supported via `byok_rag` in `lightspeed-stack.yaml` (see [LCORE-2437](https://redhat.atlassian.net/browse/LCORE-2437)).
+> It must be configured directly in the Llama Stack configuration file.
+
 > You will need to install PostgreSQL with a matching version to pgvector, then log in with `psql` and enable the extension with:


matching version of pgvector?

syedriko · 2026-06-02T16:49:52Z

+While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider. 

-There is an [ongoing discussion](https://github.com/meta-llama/llama-stack/discussions/3034) about enabling tool calling with Ollama.  
+Tool calling with Ollama is not yet supported.  


By now it's pretty clear that Ollama doesn't support tool calling :)

syedriko · 2026-06-02T17:20:09Z


 ---

 # References


are-ces requested a review from syedriko June 2, 2026 11:06

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

LCORE-1037: remove all llama-stack references from BYOK and RAG guides

82ddcf0

Replace remaining Llama Stack mentions with generic terms — users should not need to know about the underlying Llama Stack layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

are-ces force-pushed the lcore-1037-update-byok-guide-stack-config branch from 105e002 to 82ddcf0 Compare June 2, 2026 11:10

tisnik approved these changes Jun 2, 2026

View reviewed changes

syedriko reviewed Jun 2, 2026

View reviewed changes

Comment thread docs/rag_guide.md

---

# References

Copy link
Copy Markdown

Contributor

syedriko Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop?

syedriko approved these changes Jun 2, 2026

View reviewed changes

tisnik merged commit 1e70e55 into lightspeed-core:main Jun 2, 2026
31 checks passed

	While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.
	While Ollama also exposes an OpenAI-compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.

Conversation

are-ces commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

syedriko Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

syedriko Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

syedriko Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

syedriko Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

syedriko Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

are-ces commented Jun 2, 2026 •

edited

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

coderabbitai Bot Jun 2, 2026 •

edited

Loading