LCORE-1592: Design doc#1522
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 48 minutes and 58 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughReplaced the single "Solution" section with a decision-structured design: added "Available solutions" (S1–S5), expanded Requirements (R3–R10) and Use Cases (U8–U12), added "Chosen approach and configuration" and "Implementation details" (D1–D5), and introduced a startup/teardown sequence diagram. (Documentation-only.) Changes
Sequence Diagram(s)sequenceDiagram
participant Admin as Administrator
participant LCORE as Lightspeed Core Runner
participant LSRunner as Llama Stack Runner
participant LSService as Llama Stack Service
participant CoreService as Lightspeed Core Service
rect rgba(200,220,255,0.5)
Admin->>LCORE: trigger startup
LCORE->>LCORE: read lightspeed-stack.yaml
LCORE->>LSRunner: generate/start Llama Stack runner
LSRunner->>LSService: start service / ping
LSService-->>LSRunner: status OK
LSRunner-->>LCORE: service started/status
LCORE->>CoreService: start core service / ping
CoreService-->>LCORE: status OK
LCORE-->>Admin: notify services started
end
rect rgba(255,220,220,0.5)
Admin->>LCORE: trigger teardown
LCORE->>LSRunner: stop Llama Stack runner
LSRunner->>LSService: stop service
LSService-->>LSRunner: stopped
LSRunner-->>LCORE: runner stopped/status
LCORE->>CoreService: stop core service
CoreService-->>LCORE: stopped
LCORE-->>Admin: notify services stopped
end
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md (2)
10-10:⚠️ Potential issue | 🟡 MinorFix the document link path.
Line 10 points to
docs/low-overhead-deployment-for-server-mode, but this design doc lives underdocs/design/...and includes a markdown filename. Please update this reference to the actual path to avoid dead links.Proposed doc fix
-| **Links** | Spike doc: `docs/low-overhead-deployment-for-server-mode` | +| **Links** | Spike doc: `docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md` |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md` at line 10, Update the broken link entry in low-overhead-deployment-for-server-mode.md: replace the current path `docs/low-overhead-deployment-for-server-mode` with the full design-doc path including the filename under docs/design, e.g. `docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md`, so the Links table points to the actual markdown file; edit the link text in the table row that starts with "**Links**" to use this corrected path.
470-480:⚠️ Potential issue | 🟡 MinorReplace placeholders with tracked work items before merge.
The
LCORE-????and empty Stories table leave execution ownership undefined. Fill these with actual epic/story IDs (or remove the tables until IDs exist) so implementation and rollout are traceable from the design doc.If helpful, I can draft a minimal “tracking” template for these sections that is merge-safe even before all tickets are created.
Also applies to: 494-500
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md` around lines 470 - 480, Replace the placeholder epic ID "LCORE-????" in the Epic table and either populate the "Stories created" table with real story rows (Story ID, Description, Link) or remove those table rows until tickets exist; ensure any Jira links use the actual ticket IDs in the URLs rather than "????". Update both the Epic table and the later "Stories created" table (the empty Story rows) so ownership and traceability are explicit, or substitute a minimal merge-safe tracking template (e.g., "TBD - Epic/Story to be created; tracking will be added") in place of placeholders.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md`:
- Around line 412-457: Update the document to clearly state this is
proposed/target-state behavior (not currently implemented): change phrasing like
“will support”/“schema: include”/“LCORE reads” to “proposed” or “target-state”
and add concrete implementation tracking IDs/milestones (e.g., ISSUE- or
MILESTONE- identifiers) that map to the actual work items; also reference the
current code areas where runtime-mode selection is missing
(src/models/config.py, src/configuration.py, src/lightspeed_stack.py) so readers
understand these are design goals, not live features.
---
Outside diff comments:
In
`@docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md`:
- Line 10: Update the broken link entry in
low-overhead-deployment-for-server-mode.md: replace the current path
`docs/low-overhead-deployment-for-server-mode` with the full design-doc path
including the filename under docs/design, e.g.
`docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md`,
so the Links table points to the actual markdown file; edit the link text in the
table row that starts with "**Links**" to use this corrected path.
- Around line 470-480: Replace the placeholder epic ID "LCORE-????" in the Epic
table and either populate the "Stories created" table with real story rows
(Story ID, Description, Link) or remove those table rows until tickets exist;
ensure any Jira links use the actual ticket IDs in the URLs rather than "????".
Update both the Epic table and the later "Stories created" table (the empty
Story rows) so ownership and traceability are explicit, or substitute a minimal
merge-safe tracking template (e.g., "TBD - Epic/Story to be created; tracking
will be added") in place of placeholders.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7d634bbb-1b4e-4781-9479-197200f58817
📒 Files selected for processing (1)
docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: unit_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: check_dependencies
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
🧰 Additional context used
🪛 LanguageTool
docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md
[style] ~253-~253: This phrase is redundant. Consider writing “Stack”.
Context: ...evelopers run Lightspeed Core and Llama Stack together locally with a single command. ## U2...
(SPLICE_TOGETHER)
13a8916 to
ea9409f
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md (1)
468-472: 🧹 Nitpick | 🔵 TrivialConsider creating the implementation epic or adding a note.
The "Epics created" section contains a placeholder
LCORE-????. Since this design doc is being submitted for review, consider either:
- Creating the implementation epic in Jira and updating the reference, or
- Adding a note explaining the epic will be created pending design approval
This helps track the progression from design to implementation and provides a clear signal about the document's status.
📋 Example note if epic is pending approval
# Epics created +_Implementation epic(s) will be created following design approval._ + | Epic | Description | Link | |------------|-----------------------------------------------------------------------------|------------------------------------------------| -| LCORE-???? | | https://redhat.atlassian.net/browse/LCORE-???? | +| TBD | Low-overhead deployment for server mode | To be created following design approval |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md` around lines 468 - 472, The "Epics created" table still contains the placeholder LCORE-????; either create the implementation epic in Jira and replace LCORE-???? with the real ticket ID and link, or add a one-line note in the "Epics created" section (next to the table entry) stating the epic will be created pending design approval and will be updated with the real LCORE ticket and link once available; update the table row so the Epic column and Link column are no longer placeholders and clearly reference the Jira epic or the pending-note text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md`:
- Around line 485-486: Expand the "Which architectures must be supported?"
unknowns section by adding additional open questions to surface risks and
dependencies: include items for "Performance/overhead impact of LCORE-managed
lifecycle vs. separate deployments", "Migration strategy for teams running
standalone Llama Stack", "Backward compatibility guarantees for existing
configurations", "Resource requirements and scaling characteristics for each
runtime mode", and "Testing strategy for parity between containerized and local
modes" so reviewers can evaluate trade-offs and required investigations.
- Line 47: The Table of Contents entry "* [Chosen approach and
configuration](`#chosen-approach-and-configuration`)" has a broken fragment
because the actual heading includes "(target state)"; update the link fragment
to match the real heading text (e.g., change the link to
"#chosen-approach-and-configuration-target-state") or alternatively rename the
heading to remove "(target state)"; make the change referencing the "Chosen
approach and configuration" TOC line and the corresponding heading so the
fragment exactly matches the rendered heading.
- Around line 419-457: The implementation details subsections D1–D5 read like
current behavior; prepend a short target-state framing sentence to this block
clarifying these are proposed/targeted features (not yet implemented) and
reference the relevant code areas (e.g., lightspeed-stack.yaml schema, LCORE,
and files src/models/config.py, src/configuration.py, src/lightspeed_stack.py)
so readers know these are design proposals; update the language in the preamble
to use future/target phrasing (e.g., "Proposed: …" or "Target state: …") before
D1 so each item (D1–D5) is clearly presented as intended work rather than
existing behavior.
---
Outside diff comments:
In
`@docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md`:
- Around line 468-472: The "Epics created" table still contains the placeholder
LCORE-????; either create the implementation epic in Jira and replace LCORE-????
with the real ticket ID and link, or add a one-line note in the "Epics created"
section (next to the table entry) stating the epic will be created pending
design approval and will be updated with the real LCORE ticket and link once
available; update the table row so the Epic column and Link column are no longer
placeholders and clearly reference the Jira epic or the pending-note text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5ed77ec2-0600-4be0-b8ea-35f6362782b7
⛔ Files ignored due to path filters (1)
docs/design/low-overhead-deployment-for-server-mode/sequence_diagram.svgis excluded by!**/*.svg
📒 Files selected for processing (2)
docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.mddocs/design/low-overhead-deployment-for-server-mode/sequence_diagram.puml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: build-pr
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
🪛 LanguageTool
docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md
[style] ~253-~253: This phrase is redundant. Consider writing “Stack”.
Context: ...evelopers run Lightspeed Core and Llama Stack together locally with a single command. ## U2...
(SPLICE_TOGETHER)
🪛 markdownlint-cli2 (0.22.0)
docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md
[warning] 47-47: Link fragments should be valid
(MD051, link-fragments)
🔇 Additional comments (5)
docs/design/low-overhead-deployment-for-server-mode/sequence_diagram.puml (1)
1-36: Well-structured sequence diagram for proposed architecture.The diagram clearly documents the proposed Startup and Teardown flows for the lightspeed core runner orchestration pattern. The participant separation (configuration loader, configuration generator, runners, services) provides good architectural clarity for the target state.
However, based on the relevant code snippet from
src/lightspeed_stack.py:95-165, this runner-based orchestration pattern does not currently exist in the codebase. The currentmain()function directly calls configuration loading and service startup without the abstracted runner layer shown here.💡 Suggestion: Add a note in the diagram indicating this is proposed architecture
Consider adding a PlantUML note at the top of the diagram to clarify this documents target-state behavior:
`@startuml` +note top + **Proposed Architecture (Target State)** + This diagram documents the planned orchestration pattern + for LCORE-1592, not the current implementation. +end note actor admin as "Administrator"This will help readers immediately understand they're viewing a design proposal rather than existing functionality.
docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md (4)
67-83: Clear articulation of design goals.The "What" section effectively establishes the core objective: abstracting platform complexity behind stable interfaces. The emphasis on reducing cognitive load, onboarding time, and error modes provides strong justification for the proposed architectural changes.
90-104: Strong business justification.The "Why" section clearly explains the operational burden teams currently face and aligns the solution with Llama Stack team's preferred server mode approach. This provides solid justification for the complexity of the proposed changes.
114-243: Comprehensive and well-structured requirements.Requirements R1-R10 provide excellent coverage of the feature's scope, including:
- Automated startup and teardown (R1-R2)
- Abstraction and developer experience (R3, R5-R6)
- Distribution strategy (R4)
- Platform support considerations (R7-R8)
- Integration impact analysis (R9-R10)
The explicit marking of R7 as "Optional" helps with prioritization. The requirements strike a good balance between technical detail and high-level goals.
387-406: Clarify the Llama Stack invocation command in S5.Solution S5 mentions invoking "uv (or equivalent)" to start a local Llama Stack process, but should be more specific about the actual command structure. Based on Llama Stack's documented patterns, the server is invoked as:
uvx --from 'llama-stack[starter]' llama stack run starter(ephemeral, no install)uv run llama stack run <config>(after adding llama-stack to project dependencies)llama stack run <config>(if globally installed)Specify which approach LCORE will use, provide the actual command template including how configuration is passed, and document any environment variables or flags needed for readiness signal monitoring (e.g., port binding, health endpoint).
| # Implementation details | ||
|
|
||
| ## D1 | ||
|
|
||
| `lightspeed-stack.yaml` schema: include a top-level runtime field with values | ||
| like `container` or `local`, plus runtime-specific sections: | ||
| - `container`: image, runtime (docker|podman), ports, volumes, env, imagePullPolicy, resource limits. | ||
| - `local`: command (e.g., uv), args, env, workingDir, socket/path settings, resource hints. | ||
|
|
||
|
|
||
|
|
||
| ## D2 | ||
|
|
||
| Defaults and overrides: sensible defaults for dev and prod profiles; support | ||
| per-environment overrides and CLI flags. | ||
|
|
||
|
|
||
|
|
||
| ## D3 | ||
|
|
||
| Distribution: publish a Lightspeed-tailored Llama Stack OCI image | ||
| (lightspeed-providers included) and make it the default container image in | ||
| configs. | ||
|
|
||
|
|
||
|
|
||
| ## D4 | ||
|
|
||
| Lifecycle management: LCORE reads `lightspeed-stack.yaml`, instantiates either | ||
| container or local process, performs readiness checks, and ensures graceful | ||
| teardown (log collection, cleanup). | ||
|
|
||
|
|
||
|
|
||
| ## D5 | ||
|
|
||
| Testing & CI: include CI jobs that validate both container and local modes | ||
| using Compose/Kind or local-process harnesses to ensure parity. | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Implementation details language could be clearer about target state.
While the section heading now includes "(target state)" from line 410 and line 412 uses "We propose supporting" (addressing the past review comment), the implementation details subsections D1-D5 still use present-tense and imperative language that could be misinterpreted as committed implementation or current behavior.
For example:
- D1 (line 423): "
lightspeed-stack.yamlschema: include..." reads as an instruction or current state - D4 (line 447): "LCORE reads
lightspeed-stack.yaml..." describes behavior as if implemented
Based on the relevant code snippets from src/models/config.py, src/configuration.py, and src/lightspeed_stack.py, these runtime-mode selection and lifecycle management features do not yet exist in the codebase.
📝 Suggestion: Add target-state framing to implementation details
Consider adding a brief preamble before D1 to clarify these are proposed implementation steps:
# Implementation details
+
+The following sections outline the proposed implementation approach for supporting both containerized and local Llama Stack startup modes. These details represent the target architecture and are subject to refinement during development.
## D1
-`lightspeed-stack.yaml` schema: include a top-level runtime field with values
+Proposed `lightspeed-stack.yaml` schema changes: include a top-level runtime field with valuesThis reinforces that D1-D5 are design proposals, not documentation of existing features.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@docs/design/low-overhead-deployment-for-server-mode/low-overhead-deployment-for-server-mode.md`
around lines 419 - 457, The implementation details subsections D1–D5 read like
current behavior; prepend a short target-state framing sentence to this block
clarifying these are proposed/targeted features (not yet implemented) and
reference the relevant code areas (e.g., lightspeed-stack.yaml schema, LCORE,
and files src/models/config.py, src/configuration.py, src/lightspeed_stack.py)
so readers know these are design proposals; update the language in the preamble
to use future/target phrasing (e.g., "Proposed: …" or "Target state: …") before
D1 so each item (D1–D5) is clearly presented as intended work rather than
existing behavior.
jrobertboos
left a comment
There was a problem hiding this comment.
Overall LGTM, I think that having lightspeed stack in charge of the lifecycle of llama stack is cool, im not exactly clear on why we would be implementing S4 and S5. Like could you give an example of when using containers might not be ideal :)
|
|
||
|
|
||
|
|
||
| # Available solutions |
There was a problem hiding this comment.
would we implement multiple of these solutions because if not would S2 work as it would require openshift which not all light speeds would be deployed with (afaik)
There was a problem hiding this comment.
well S4/S5 is for different use cases. For deployment into real environment, docker images are better, of course. For developing and finding bugs, local deployment is better (you might also notice that logging and debugging in docker is not ideal, and sometimes it simply get stuck)
There was a problem hiding this comment.
ahhhh ok that makes sense
Description
LCORE-1592: Design doc
Type of change
Tools used to create PR
Related Tickets & Documents
Summary by CodeRabbit