LCORE-1872: Fix llama-stack container startup issues#1800
Conversation
WalkthroughRebase test container onto Red Hat UBI Python 3.12 with dnf-installed build tools, change entrypoint to write enriched config to /tmp/enriched-run.yaml, and update test/prow manifests to invoke the enrichment script using the virtualenv Python. ChangesInfrastructure and Deployment Updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/llama-stack-entrypoint.sh`:
- Line 8: The ENRICHED_CONFIG path is inconsistent: the script sets
ENRICHED_CONFIG="/tmp/enriched-run.yaml" but the E2E manifests still hardcode
the old "/opt/app-root/run.yaml"; update the manifests to use
ENRICHED_CONFIG="/tmp/enriched-run.yaml" to match the script (or refactor to a
single shared source for the enrichment path) so the enrichment behavior cannot
diverge; look for the ENRICHED_CONFIG variable and any hardcoded
"/opt/app-root/run.yaml" occurrences in the llama-stack entrypoint and the E2E
manifest templates and make them use the same "/tmp/enriched-run.yaml" value (or
reference the centralized variable).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0ab071ab-5175-4264-aa92-1c69444cb139
📒 Files selected for processing (2)
deploy/llama-stack/test.containerfilescripts/llama-stack-entrypoint.sh
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-05-12T15:14:34.788Z
Learnt from: syedriko
Repo: lightspeed-core/lightspeed-stack PR: 1727
File: scripts/konflux_requirements.sh:9-15
Timestamp: 2026-05-12T15:14:34.788Z
Learning: In this repo, the `.konflux/` directory is committed/tracked and is guaranteed to exist in a fresh clone. Therefore, shell scripts that write output under `.konflux/` (e.g., create files like `.konflux/<...>`) should not waste effort by calling `mkdir -p .konflux` first. Only add directory-creation logic if the script may run in an environment/repo state where `.konflux/` might not be present.
Applied to files:
scripts/llama-stack-entrypoint.sh
🔇 Additional comments (2)
deploy/llama-stack/test.containerfile (2)
1-2: Clarify SQLite compatibility claim for theubi9/python-312base imageRed Hat docs I found don’t state the bundled SQLite version or guarantee the
sqlite3_deserializecapability, so the “>= 3.30.0 withsqlite3_deserializesupport” PR objective needs an in-container check for the exact image tag used atdeploy/llama-stack/test.containerfile(lines 1-2):
python -c "import sqlite3; print(sqlite3.sqlite_version)"python -c "import sqlite3; print(hasattr(sqlite3.Connection, 'deserialize'))"
6-9: ⚡ Quick winllslibdev dependencies likely won’t need
cmake/cargoindeploy/llama-stack/test.containerfile(lines 6-9)
Thellslibdevdependency group inpyproject.tomldoesn’t directly includefastuuid/maturin/cargo, anduv.lockprovidesmanylinuxwheels forfastuuid(souv sync --locked --group llslibdevshould not require a Rust toolchain on typical Linux platforms). Only targets without matching wheels would fall back to thefastuuidsdist and needcargo/build tooling.
9c1b725 to
da6a5cf
Compare
Fixes two issues preventing llama-stack container from starting: 1.SQLite compatibility error - Switch base image from ubi9/ubi-minimal to ubi9/python-312. The minimal image ships with SQLite < 3.30.0 which lacks sqlite3_deserialize support required by Python 3.12's _sqlite3 module. The python-312 base image includes a compatible SQLite version and properly configured Python dependencies. 2.Read-only filesystem error - Config enrichment script now writes to /tmp/enriched-run.yaml instead of /opt/app-root/run.yaml. The latter is mounted read-only from the host, causing OSError when the enrichment script attempts to write the enriched configuration. Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
da6a5cf to
3b50327
Compare
The enrichment script was failing with `ModuleNotFoundError: No module named 'yaml'` because it was invoked with system Python instead of virtualenv Python where dependencies are installed. Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deploy/llama-stack/test.containerfile`:
- Line 2: The Dockerfile uses the mutable tag
"registry.access.redhat.com/ubi9/python-312" in the FROM instruction; replace
that tag with the corresponding immutable digest by finding the correct sha256
for the image and updating the FROM line to use the digest form (e.g., FROM
registry.access.redhat.com/ubi9/python-312@sha256:<digest>), ensuring you pick
the exact digest that matches the desired image variant/architecture for
reproducible builds and supply-chain traceability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8b7f2dad-9e5e-4d8d-97a8-6475fba17b5c
📒 Files selected for processing (4)
deploy/llama-stack/test.containerfilescripts/llama-stack-entrypoint.shtests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yamltests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: unit_tests (3.12)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.13)
- GitHub Check: Pylinter
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-02-19T10:06:50.647Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:50.647Z
Learning: In the rhoai tests under tests/e2e-prow/rhoai/manifests, avoid static ConfigMap definitions for mock-jwks-script and mcp-mock-server-script since these ConfigMaps are created dynamically by the pipeline.sh deployment script using 'oc create configmap'. Ensure there are no static ConfigMap resources for these names in the manifests. If such ConfigMaps are added in the future, coordinate with the pipeline to reflect dynamic creation or adjust tests to rely on the dynamic provisioning.
Applied to files:
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yamltests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Applied to files:
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yamltests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
📚 Learning: 2026-05-12T15:14:34.788Z
Learnt from: syedriko
Repo: lightspeed-core/lightspeed-stack PR: 1727
File: scripts/konflux_requirements.sh:9-15
Timestamp: 2026-05-12T15:14:34.788Z
Learning: In this repo, the `.konflux/` directory is committed/tracked and is guaranteed to exist in a fresh clone. Therefore, shell scripts that write output under `.konflux/` (e.g., create files like `.konflux/<...>`) should not waste effort by calling `mkdir -p .konflux` first. Only add directory-creation logic if the script may run in an environment/repo state where `.konflux/` might not be present.
Applied to files:
scripts/llama-stack-entrypoint.sh
🪛 Trivy (0.69.3)
deploy/llama-stack/test.containerfile
[warning] 2-2: ':latest' tag used
Specify a tag in the 'FROM' statement for image 'registry.access.redhat.com/ubi9/python-312'
Rule: DS-0001
(IaC/Dockerfile)
[info] 1-1: No HEALTHCHECK defined
Add HEALTHCHECK instruction in your Dockerfile
Rule: DS-0026
(IaC/Dockerfile)
🔇 Additional comments (4)
deploy/llama-stack/test.containerfile (1)
7-9: LGTM!scripts/llama-stack-entrypoint.sh (1)
8-8: LGTM!Also applies to: 15-15
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml (1)
160-160: LGTM!tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml (1)
150-150: LGTM!
| # Upstream llama-stack built from Red Hat UBI | ||
| FROM registry.access.redhat.com/ubi9/ubi-minimal | ||
| # Upstream llama-stack built from Red Hat UBI Python 3.12 image | ||
| FROM registry.access.redhat.com/ubi9/python-312 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the current sha256 digest for registry.access.redhat.com/ubi9/python-312, and does Red Hat recommend digest pinning for reproducible builds?
💡 Result:
The container image registry.access.redhat.com/ubi9/python-312 is a mutable tag that points to updated image versions over time [1][2]. Consequently, it does not have a single, static SHA256 digest. Because Red Hat frequently updates these images to address security vulnerabilities and bug fixes, the digest associated with the ubi9/python-312 tag changes whenever a new image is pushed to the registry [1][3]. To find the current digest for your specific environment, you can use the command: podman inspect registry.access.redhat.com/ubi9/python-312 --format '{{.Digest}}' Red Hat explicitly recommends digest pinning for scenarios requiring reproducible builds and robust supply chain security [4][2][5]. While tags are convenient for general development, they can change at any time, which introduces ambiguity and prevents reproducibility [2]. By referencing an image by its immutable SHA256 digest, you ensure that your builds consistently use the exact same image content, effectively eliminating discrepancies caused by tag updates [2][5]. Furthermore, for certain Red Hat ecosystem requirements, such as Operator bundle certification, using image digests instead of tags is mandatory [5].
Citations:
- 1: https://access.redhat.com/errata/RHBA-2024:5682
- 2: https://developers.redhat.com/articles/2025/01/28/how-name-version-and-reference-container-images
- 3: https://access.redhat.com/errata/RHBA-2026:16041
- 4: https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html/building_running_and_managing_containers/introduction-to-reproducible-container-builds
- 5: https://github.com/redhat-openshift-ecosystem/certification-releases/blob/main/4.9/ga/troubleshooting.md
Pin the base image to an immutable digest (deploy/llama-stack/test.containerfile:2).
registry.access.redhat.com/ubi9/python-312 is a mutable tag, so the image contents/digest can change over time, hurting reproducibility and supply-chain traceability. Red Hat recommends digest pinning for reproducible builds and stronger supply-chain security; replace the tag with the appropriate sha256 digest.
Suggested change
-FROM registry.access.redhat.com/ubi9/python-312
+FROM registry.access.redhat.com/ubi9/python-312@sha256:<resolved_digest>🧰 Tools
🪛 Trivy (0.69.3)
[warning] 2-2: ':latest' tag used
Specify a tag in the 'FROM' statement for image 'registry.access.redhat.com/ubi9/python-312'
Rule: DS-0001
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/llama-stack/test.containerfile` at line 2, The Dockerfile uses the
mutable tag "registry.access.redhat.com/ubi9/python-312" in the FROM
instruction; replace that tag with the corresponding immutable digest by finding
the correct sha256 for the image and updating the FROM line to use the digest
form (e.g., FROM registry.access.redhat.com/ubi9/python-312@sha256:<digest>),
ensuring you pick the exact digest that matches the desired image
variant/architecture for reproducible builds and supply-chain traceability.
Description
Fixes two issues preventing llama-stack container from starting:
SQLite compatibility error - Switch base image from ubi9/ubi-minimal to ubi9/python-312. The minimal image ships with SQLite < 3.30.0 which lacks sqlite3_deserialize support required by Python 3.12's _sqlite3 module. The python-312 base image includes a compatible SQLite version and properly configured Python dependencies.
Read-only filesystem error - Config enrichment script now writes to /tmp/enriched-run.yaml instead of /opt/app-root/run.yaml. The latter is mounted read-only from the host, causing OSError when the enrichment script attempts to write the enriched configuration.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Release Notes