Skip to content

LCORE-1037: Update the BYOK guide to use Lightspeed stack config instead of run.yaml#1838

Merged
tisnik merged 2 commits into
lightspeed-core:mainfrom
are-ces:lcore-1037-update-byok-guide-stack-config
Jun 2, 2026
Merged

LCORE-1037: Update the BYOK guide to use Lightspeed stack config instead of run.yaml#1838
tisnik merged 2 commits into
lightspeed-core:mainfrom
are-ces:lcore-1037-update-byok-guide-stack-config

Conversation

@are-ces
Copy link
Copy Markdown
Contributor

@are-ces are-ces commented Jun 2, 2026

Description

Update the BYOK and RAG guides so users configure BYOK knowledge sources via the byok_rag section in lightspeed-stack.yaml instead of editing run.yaml directly. The required configuration is now auto-generated at startup by make run, make run-stack, docker-compose, and library mode.

Key changes:

  • Replace all run.yaml BYOK/RAG snippets with byok_rag entries in lightspeed-stack.yaml
  • Add field reference table for all byok_rag options
  • Clarify that pgvector is not yet supported via byok_rag (see LCORE-2437) and must be configured directly in the Llama Stack config
  • Clarify that models and inference providers still go in run.yaml
  • Remove sqlite-vec references (not supported)
  • Update example config with embedding_model field

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by: Claude Opus 4.6
  • Generated by: N/A

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Review the rendered markdown in docs/byok_guide.md and docs/rag_guide.md
  • Verify all byok_rag field names match src/models/config.py:ByokRag
  • Confirm no remaining incorrect run.yaml references for BYOK configuration

Summary by CodeRabbit

  • Documentation
    • Enhanced BYOK configuration documentation with improved embedding model per-source setup and vector database guidance
    • Updated RAG configuration guide with clearer examples, provider-specific setup instructions, and annotation behavior details
    • Refined configuration examples for knowledge source integration with FAISS and pgvector support

Replace all run.yaml references with lightspeed-stack.yaml byok_rag
configuration. Users should no longer edit run.yaml directly — the
Lightspeed Stack service auto-generates Llama Stack config at startup
from the byok_rag and rag sections in lightspeed-stack.yaml.

- Rewrite BYOK guide Step 4 to use byok_rag as primary config path
- Add field reference table for all byok_rag options
- Rewrite RAG guide vector store sections with byok_rag examples
- Replace full run.yaml config examples with lightspeed-stack.yaml format
- Add embedding_model field to BYOK example config
- Update Step 3 to reference byok_rag embedding_model field

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Walkthrough

This PR updates BYOK and RAG configuration documentation to align with the new byok_rag configuration approach in lightspeed-stack.yaml. Two guides are restructured to document embedding model settings per knowledge source, and an example configuration file is updated with explicit per-RAG embedding and vector store parameters.

Changes

BYOK and RAG Configuration Documentation Update

Layer / File(s) Summary
BYOK Knowledge Source Configuration Guide
docs/byok_guide.md
Restructured to document byok_rag configuration in lightspeed-stack.yaml. TOC updated to reference "Configure BYOK Knowledge Sources". Step 3 and Step 4 sections rewritten with byok_rag YAML example, field-reference table, score_multiplier guidance, and vector_db_id correctness warning. "Supported Vector Database Types" FAISS section replaced with byok_rag-based snippet. pgvector section adds NOTE that it is not supported via byok_rag. Password environment variable updated from DB_PASSWORD to POSTGRES_PASSWORD. "Configuration Examples" section replaced with new FAISS and FAISS+pgvector examples. Conclusion references updated.
RAG Configuration Guide Alignment
docs/rag_guide.md
Updated to align with newer Lightspeed/Llama Stack guidance. Embedding model download flow rewritten to use byok_rag.embedding_model with automatic startup download. pgvector examples updated with new vector_stores block structure. vLLM and OpenAI provider instructions clarified. Note added about unregistering old resources when experimenting. Ollama limitations text shortened. OKP enrichment note updated to reference startup configuration. "Complete Configuration Reference" replaced with minimal lightspeed-stack.yaml example. RAG annotations section updated with vector_stores.annotation_prompt_params behavior and annotation default disabling.
Example Configuration Update
examples/lightspeed-stack-byok-okp-rag.yaml
byok_rag list entries updated with explicit per-store FAISS parameters including embedding_model, embedding_dimension, vector_db_id, and db_path settings for ocp-docs and knowledge-base RAG IDs.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main change: updating BYOK documentation to use lightspeed-stack.yaml config instead of run.yaml, which aligns with the substantial revisions across docs/byok_guide.md, docs/rag_guide.md, and the example configuration file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@are-ces are-ces requested a review from syedriko June 2, 2026 11:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/byok_guide.md`:
- Around line 397-400: The blockquote note starting with the marker "[!NOTE]"
has an extra blank line inside the block which breaks markdownlint MD028; remove
the empty line so all lines in the note are contiguous '>'-prefixed lines (i.e.,
keep the "[!NOTE]" marker and the following lines like "For pgvector, ensure
your PostgreSQL credentials..." and "(e.g., `POSTGRES_PASSWORD`)." without any
blank line between them) to satisfy the linter.

In `@docs/rag_guide.md`:
- Line 227: Replace the phrase "OpenAI compatible" with the hyphenated compound
adjective "OpenAI-compatible" in the sentence that reads "While Ollama also
exposes an OpenAI compatible endpoint..." so the docs use "OpenAI-compatible"
for correct grammar and consistency; update that exact occurrence in
docs/rag_guide.md.
- Line 385: The heading "RAG annotations" lacks a preceding blank line (MD022);
update the markdown so there is one empty line immediately before the line
starting with "# RAG annotations" to satisfy the MD022 rule and ensure the
heading is properly separated from the prior content.
- Around line 91-94: The NOTE block beginning with "[!NOTE]" that mentions
pgvector and LCORE-2437 contains an extra blank quoted line; edit that block
(the "[!NOTE]" block containing the sentence "pgvector is not yet supported via
`byok_rag`..." and the following "It must be configured directly...") and remove
the empty/blank line between the quoted lines so the blockquote has no blank
quoted line (MD028).

In `@examples/lightspeed-stack-byok-okp-rag.yaml`:
- Around line 40-42: The embedding_dimension for entries using embedding_model
"sentence-transformers/all-mpnet-base-v2" in the byok_rag configuration is
incorrect (set to 1024 and 384); update both BYOK FAISS stores (ocp-docs and
knowledge-base) to use embedding_dimension 768 to match the model's hidden size
and avoid incompatible vector sizes at runtime—search for occurrences of
embedding_model: sentence-transformers/all-mpnet-base-v2 and set the
corresponding embedding_dimension fields to 768.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d90172e1-09b6-46bc-8307-b3a42a1da0b4

📥 Commits

Reviewing files that changed from the base of the PR and between e5baf4a and 105e002.

📒 Files selected for processing (3)
  • docs/byok_guide.md
  • docs/rag_guide.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • examples/lightspeed-stack-byok-okp-rag.yaml
🪛 LanguageTool
docs/rag_guide.md

[grammar] ~227-~227: Use a hyphen to join words.
Context: ...G. While Ollama also exposes an OpenAI compatible endpoint that supports tool c...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.22.1)
docs/byok_guide.md

[warning] 400-400: Blank line inside blockquote

(MD028, no-blanks-blockquote)

docs/rag_guide.md

[warning] 94-94: Blank line inside blockquote

(MD028, no-blanks-blockquote)


[warning] 385-385: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)

Comment thread docs/byok_guide.md
Comment thread docs/rag_guide.md
Comment thread docs/rag_guide.md

The `remote::ollama` provider can be used for inference. However, it does not support tool calling, including RAG.
While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot be used with `llama-stack` due to current limitations in the `remote::openai` provider.
While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use hyphenated compound adjective (“OpenAI-compatible”).

This improves grammar and consistency in user-facing docs.

Suggested fix
-While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.
+While Ollama also exposes an OpenAI-compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.
While Ollama also exposes an OpenAI-compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.
🧰 Tools
🪛 LanguageTool

[grammar] ~227-~227: Use a hyphen to join words.
Context: ...G. While Ollama also exposes an OpenAI compatible endpoint that supports tool c...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/rag_guide.md` at line 227, Replace the phrase "OpenAI compatible" with
the hyphenated compound adjective "OpenAI-compatible" in the sentence that reads
"While Ollama also exposes an OpenAI compatible endpoint..." so the docs use
"OpenAI-compatible" for correct grammar and consistency; update that exact
occurrence in docs/rag_guide.md.

Comment thread docs/rag_guide.md
Comment on lines +40 to +42
embedding_model: sentence-transformers/all-mpnet-base-v2
embedding_dimension: 1024
vector_db_id: vs_123 # Llama-stack vector_store_id
vector_db_id: vs_123 # Vector store ID (from index generation)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Confirm declared dimensions in repo examples/docs
rg -n "embedding_model:\s*sentence-transformers/all-mpnet-base-v2|embedding_dimension:" -C1

# 2) Check model config dimension from Hugging Face
curl -fsSL https://huggingface.co/sentence-transformers/all-mpnet-base-v2/raw/main/config.json | jq '.hidden_size'

Repository: lightspeed-core/lightspeed-stack

Length of output: 11601


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Show exact YAML block for the affected entries
sed -n '30,60p' examples/lightspeed-stack-byok-okp-rag.yaml | nl -ba

Repository: lightspeed-core/lightspeed-stack

Length of output: 119


🏁 Script executed:

#!/bin/bash
set -euo pipefail
awk 'NR>=30 && NR<=60 {printf "%5d\t%s\n", NR, $0}' examples/lightspeed-stack-byok-okp-rag.yaml

Repository: lightspeed-core/lightspeed-stack

Length of output: 1413


Fix embedding_dimension mismatch for sentence-transformers/all-mpnet-base-v2 in byok_rag.

In examples/lightspeed-stack-byok-okp-rag.yaml, both BYOK FAISS stores use sentence-transformers/all-mpnet-base-v2 but set incompatible dimensions (1024 for ocp-docs and 384 for knowledge-base). The model’s hidden size is 768, and the repo/docs use embedding_dimension: 768 for this model—otherwise stored/query vectors can be incompatible at runtime.

embedding_model: sentence-transformers/all-mpnet-base-v2
embedding_dimension: 1024
Suggested fix
 byok_rag:
   - rag_id: ocp-docs
     rag_type: inline::faiss
     embedding_model: sentence-transformers/all-mpnet-base-v2
-    embedding_dimension: 1024
+    embedding_dimension: 768
     vector_db_id: vs_123
     db_path: /tmp/ocp.faiss
@@
   - rag_id: knowledge-base
     rag_type: inline::faiss
     embedding_model: sentence-transformers/all-mpnet-base-v2
-    embedding_dimension: 384
+    embedding_dimension: 768
     vector_db_id: vs_456
     db_path: /tmp/kb.faiss
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/lightspeed-stack-byok-okp-rag.yaml` around lines 40 - 42, The
embedding_dimension for entries using embedding_model
"sentence-transformers/all-mpnet-base-v2" in the byok_rag configuration is
incorrect (set to 1024 and 384); update both BYOK FAISS stores (ocp-docs and
knowledge-base) to use embedding_dimension 768 to match the model's hidden size
and avoid incompatible vector sizes at runtime—search for occurrences of
embedding_model: sentence-transformers/all-mpnet-base-v2 and set the
corresponding embedding_dimension fields to 768.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@are-ces, sure! Planning the fix to correct both embedding_dimension values to 768 (the actual output dimension of sentence-transformers/all-mpnet-base-v2).

  • Update embedding_dimension from 1024768 for the ocp-docs entry
  • Update embedding_dimension from 384768 for the knowledge-base entry

🧠 Learnings used
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Failed to handle agent chat message. Please try again.

Replace remaining Llama Stack mentions with generic terms — users
should not need to know about the underlying Llama Stack layer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@are-ces are-ces force-pushed the lcore-1037-update-byok-guide-stack-config branch from 105e002 to 82ddcf0 Compare June 2, 2026 11:10
Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread docs/byok_guide.md

```bash
mkdir ./embeddings_model
pdm run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv?

Comment thread docs/byok_guide.md

> [!NOTE]
> Your LLM inference provider (e.g., OpenAI, vLLM) must also be configured in your `run.yaml`.
> For OpenAI, set the `OPENAI_API_KEY` environment variable.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem relevant to BYOK...

Comment thread docs/rag_guide.md
> pgvector is not yet supported via `byok_rag` in `lightspeed-stack.yaml` (see [LCORE-2437](https://redhat.atlassian.net/browse/LCORE-2437)).
> It must be configured directly in the Llama Stack configuration file.

> You will need to install PostgreSQL with a matching version to pgvector, then log in with `psql` and enable the extension with:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matching version of pgvector?

Comment thread docs/rag_guide.md
While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.

There is an [ongoing discussion](https://github.com/meta-llama/llama-stack/discussions/3034) about enabling tool calling with Ollama.
Tool calling with Ollama is not yet supported.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By now it's pretty clear that Ollama doesn't support tool calling :)

Comment thread docs/rag_guide.md

---

# References
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop?

@tisnik tisnik merged commit 1e70e55 into lightspeed-core:main Jun 2, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants