LCORE-1221: Added tekton file for konflux e2e test run#1403
LCORE-1221: Added tekton file for konflux e2e test run#1403radofuchs merged 5 commits intolightspeed-core:mainfrom
Conversation
ca66810 to
56c614a
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds Konflux/OpenShift integration test infrastructure for Lightspeed Stack, including a new Tekton pipeline that orchestrates EAAS space provisioning, cluster provisioning, image extraction, and e2e test execution. Introduces Konflux-specific pipeline scripts for deployment and testing, new MCP configuration variants, updated Kubernetes manifests with security hardening and parameterization, and enhanced test utilities with improved pod restart/restore logic. Changes
Sequence Diagram(s)sequenceDiagram
actor Tekton as Tekton Pipeline
participant EAAS as EAAS Service
participant Cluster as AWS Cluster<br/>(Hypershift)
participant Repo as Lightspeed Repo
participant K8s as Kubernetes<br/>(Target)
participant Tests as E2E Tests
Tekton->>EAAS: Provision space
EAAS-->>Tekton: Space ready
Tekton->>Cluster: Provision ephemeral<br/>OpenShift 4.19.x
Cluster-->>Tekton: Cluster ready<br/>(kubeconfig)
Tekton->>Repo: Extract snapshot image<br/>& commit
Repo-->>Tekton: Image & revision
Tekton->>K8s: Get kubeconfig
Tekton->>K8s: Create namespace
Tekton->>K8s: Deploy services
K8s-->>Tekton: Services ready
Tekton->>Repo: Clone at commit
Repo-->>Tekton: Code ready
Tekton->>Tests: Run e2e tests<br/>(with kubeconfig,<br/>snapshot env)
Tests->>K8s: Access cluster
Tests-->>Tekton: Test results
sequenceDiagram
participant Pipeline as pipeline-konflux.sh
participant Services as pipeline-services-konflux.sh
participant K8s as Kubernetes
participant MockJWKS as Mock JWKS Pod
participant MockMCP as Mock MCP Pod
participant LlamaStack as Llama Stack Pod
participant Lightspeed as Lightspeed Pod
participant PortFwd as Port Forwards<br/>(8080, 8321, 8000)
participant Tests as run-tests.sh
Pipeline->>K8s: Create namespace
Pipeline->>K8s: Create secrets<br/>(OpenAI, Quay, MCP tokens)
Pipeline->>K8s: Deploy mock JWKS
K8s->>MockJWKS: Start pod
MockJWKS-->>K8s: Ready
Pipeline->>MockJWKS: Wait for ready
Pipeline->>K8s: Deploy mock MCP
K8s->>MockMCP: Start pod
MockMCP-->>K8s: Ready
Pipeline->>MockMCP: Wait for ready
Pipeline->>Services: Call services script
Services->>K8s: Create ConfigMaps<br/>(run.yaml, RAG data)
Services->>K8s: Deploy Llama Stack
K8s->>LlamaStack: Start pod
LlamaStack-->>K8s: Ready
Services->>K8s: Deploy Lightspeed
K8s->>Lightspeed: Start pod
Lightspeed-->>K8s: Ready
Pipeline->>PortFwd: Start port-forwards<br/>(8080→8080, 8321→8321)
PortFwd-->>Pipeline: Forwards established
Pipeline->>Pipeline: Verify health<br/>(curl polling)
Pipeline->>Tests: Run tests
Tests->>PortFwd: Access via localhost
PortFwd->>Lightspeed: Forward requests
Tests-->>Pipeline: Test exit code
Pipeline->>PortFwd: Terminate all forwards
Pipeline->>K8s: Cleanup resources
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (5)
tests/e2e-prow/rhoai/pipeline-services-konflux.sh (1)
9-19: Consider adding logging for secret creation consistency.Unlike
pipeline-konflux.shwhich logs success/warning messages when creating these secrets, this script is silent. Adding similar logging would improve debuggability and maintain consistency across pipeline scripts.📝 Optional: Add logging for consistency with pipeline-konflux.sh
+log() { echo "[pipeline-services-konflux] $*"; } + if [ -f "$REPO_ROOT/tests/e2e/secrets/mcp-token" ]; then oc create secret generic mcp-file-auth-token -n "$NAMESPACE" \ --from-file=token="$REPO_ROOT/tests/e2e/secrets/mcp-token" \ --dry-run=client -o yaml | oc apply -f - + log "✅ mcp-file-auth-token secret applied" +else + log "⚠️ mcp-token missing — MCPFileAuth may fail" fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/pipeline-services-konflux.sh` around lines 9 - 19, Add explicit success/warning log messages after each secret creation in pipeline-services-konflux.sh to match pipeline-konflux.sh behavior: after the oc create ... | oc apply -f - invocation for mcp-file-auth-token and mcp-invalid-file-auth-token, check the command exit status and echo a clear success message (or a warning/error message on failure) referencing the secret names (mcp-file-auth-token, mcp-invalid-file-auth-token) and the NAMESPACE variable so operators can see whether each secret was created/applied successfully.tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1)
6-18: Good security hardening, but consider read-only root filesystem.The added security context settings (seccompProfile, dropped capabilities, non-root execution) are excellent hardening measures.
Regarding the Trivy hint about
readOnlyRootFilesystem: since the application writes feedback and transcripts to/tmp/data/, enabling read-only root would require adding anemptyDirvolume for/tmp. This is a recommended improvement but not critical for E2E test infrastructure.🔒 Optional: Enable read-only root filesystem with emptyDir for /tmp
securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault + readOnlyRootFilesystem: true # ... existing volumeMounts ... + - name: tmp-data + mountPath: /tmp volumes: # ... existing volumes ... + - name: tmp-data + emptyDir: {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml` around lines 6 - 18, Add a readOnlyRootFilesystem flag and an emptyDir volume for /tmp to allow the container to run with a read-only root while still permitting writes to /tmp/data; update the securityContext for the container named lightspeed-stack-container to include readOnlyRootFilesystem: true, add a volume (type emptyDir) at the pod spec level, and add a matching volumeMount in containers (mountPath: /tmp or /tmp/data) so the app's writes go to the writable emptyDir while the rest of the filesystem remains read-only.tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml (2)
86-89: Overly permissivechmod 777on data directories.While this is test infrastructure and the volume is an
emptyDir, usingchmod 777is broader than necessary. Consider using775or770to limit world-write access.command: - /bin/sh - -c - | mkdir -p /data/src/.llama/storage/rag /data/src/.llama/storage/files - chmod -R 777 /data + chmod -R 775 /data gunzip -c /rag-data/kv_store.db.gz > /data/src/.llama/storage/rag/kv_store.db - chmod -R 777 /data + chmod -R 775 /data🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml` around lines 86 - 89, The two chmod commands that set permissions to 777 on the data volume (the lines with "chmod -R 777 /data") are overly permissive; update both occurrences to a more restrictive mode such as 775 or 770 (e.g., "chmod -R 775 /data") so the test pod retains necessary access without world-write permission, and verify ownership/UID if needed to ensure the process can still write to /data/src/.llama/storage/rag and /data/src/.llama/storage/files after the change.
73-73: Use a pinned image tag instead ofbusybox:latest.The
busybox:latesttag is mutable and can lead to non-reproducible builds or unexpected behavior if the upstream image changes.- image: busybox:latest + image: busybox:1.36🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml` at line 73, Replace the mutable image reference "image: busybox:latest" with a pinned immutable tag or digest (e.g., a specific version like busybox:1.36.1 or a sha256 digest) so the manifest is reproducible; update the "image: busybox:latest" entry in the manifest (the line containing image: busybox:latest) to use the chosen versioned tag or digest and, if applicable, set imagePullPolicy to IfNotPresent..tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml (1)
2-2: Use Tekton APIv1instead ofv1beta1.The
tekton.dev/v1beta1API version is deprecated as of v0.50.0 (2023). Migrate to the stabletekton.dev/v1version, which is recommended for all pipelines.Suggested change
-apiVersion: tekton.dev/v1beta1 +apiVersion: tekton.dev/v1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml at line 2, Update the Tekton manifest's apiVersion from the deprecated value "tekton.dev/v1beta1" to the stable "tekton.dev/v1" by replacing the top-level apiVersion entry (currently "apiVersion: tekton.dev/v1beta1") with "apiVersion: tekton.dev/v1"; after changing, run a quick validation/apply of the manifest to confirm no fields used by the pipeline require additional migration for the v1 API.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml:
- Around line 201-203: The TEST_OUTPUT pipeline result is declared but never
written; update the run-e2e-tests step (or the step that produces Enterprise
Contract output) to write the standardized JSON into the TASK_RESULT named
TEST_OUTPUT (e.g., echo the JSON to $(results.TEST_OUTPUT.path) or use tkn-style
result write) so downstream consumers receive the output, or remove the
TEST_OUTPUT result declaration if no step will produce that JSON; locate
references to TEST_OUTPUT and the run-e2e-tests step in the task spec and add a
command that writes the JSON to the results file path.
- Around line 299-300: The CI step currently hardcodes
REPO_URL="https://github.com/radofuchs/lightspeed-stack.git" and
REPO_REV="lcore-1221-konflux-e2e-tests", which will break after merge; change
the step to pull repository and revision dynamically (e.g., read from the
SNAPSHOT payload or accept a pipeline parameter) and replace the hardcoded
REPO_URL/REPO_REV with those variables so the pipeline uses the canonical repo
and the correct revision supplied by the SNAPSHOT or pipeline input (update
references to REPO_URL and REPO_REV in the step to use the new variables).
- Around line 246-252: The echo-kubeconfig pipeline step currently prints the
entire KUBECONFIG (script block that calls cat "$KUBECONFIG"), which can leak
credentials; update the echo-kubeconfig script to stop printing file contents
and instead emit only non-sensitive metadata (KUBECONFIG path, existence, file
size/permissions/timestamp, and a safe summary such as context names or a
hash/fingerprint of the file) so logs remain useful for debugging without
exposing secrets; locate the script block in the echo-kubeconfig step and
replace the cat "$KUBECONFIG" line with metadata-only output and/or a safe
fingerprinting command.
In `@tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml`:
- Around line 60-65: The secret volumes mcp-file-auth-token and
mcp-invalid-file-auth-token in lightspeed-stack.yaml are mandatory and will
cause pod startup failures if those secrets are absent; update each volume
definition (the entries named mcp-file-auth-token and
mcp-invalid-file-auth-token) to mark the secret as optional by adding
secret.optional: true so Kubernetes will allow the pod to start when those
secrets are missing.
In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml`:
- Around line 49-50: The default REPO_URL in the manifest points to a personal
fork
(REPO_URL="${REPO_URL:-https://github.com/radofuchs/lightspeed-stack.git}");
update that fallback to the canonical upstream repository URL (replace the
radofuchs URL with the upstream repo) so collaborators and CI use the official
source, leaving REPO_REVISION handling (REPO_REVISION="${REPO_REVISION:-main}")
unchanged.
In `@tests/e2e-prow/rhoai/pipeline-konflux.sh`:
- Around line 190-191: The default fallback for REPO_URL currently points to a
personal fork; change the fallback assignment for REPO_URL to the upstream
repository URL (replace the current
'https://github.com/radofuchs/lightspeed-stack.git' value) while keeping the
existing fallback logic for REPO_REVISION (default 'main') intact so the lines
setting REPO_URL and REPO_REVISION preserve the same pattern.
- Line 147: The command creating the ConfigMap uses a relative path
"configs/lightspeed-stack.yaml" which may fail if the script's CWD is not the
repo root; update the invocation that runs oc create configmap (the line
containing oc create configmap lightspeed-stack-config ...) to reference the
absolute repo path (e.g., $REPO_ROOT/configs/lightspeed-stack.yaml) or ensure
the script sets/changes to $REPO_ROOT first so the file is resolved
consistently.
- Around line 245-249: The oc expose call for pod lightspeed-stack-service will
fail on re-runs if the service already exists; update the pipeline step that
runs the oc expose pod lightspeed-stack-service
--name=lightspeed-stack-service-svc --port=8080 --type=ClusterIP -n $NAMESPACE
command to either check for the existing service before exposing (e.g., test for
a service named lightspeed-stack-service-svc and skip if present) or replace the
direct expose with the idempotent pattern using --dry-run=client -o yaml piped
into oc apply -f - so the lightspeed-stack-service exposure is created or
updated safely on reruns.
In `@tests/e2e-prow/rhoai/pipeline-services-konflux.sh`:
- Around line 25-26: The oc expose call for the service is not idempotent and
will fail on re-runs; make creation of the service named
"llama-stack-service-svc" for pod "llama-stack-service" idempotent by first
checking for the service’s existence (e.g., use oc get svc for
"llama-stack-service-svc" in the target namespace and only run the expose step
if it does not exist) or replace the one-line expose with a declarative Service
manifest and use oc apply to create/update the service atomically.
In `@tests/e2e-prow/rhoai/scripts/e2e-ops.sh`:
- Around line 283-288: The sed fallback that substitutes ${LLAMA_STACK_IMAGE}
can produce an empty image field if LLAMA_STACK_IMAGE is unset; update the
branch that uses sed (the block checking command -v envsubst and the sed
pipeline) to either validate LLAMA_STACK_IMAGE before templating and exit with
an error if empty, or apply a safe default via shell parameter expansion
(reference LLAMA_STACK_IMAGE and the sed substitution of
"$MANIFEST_DIR/llama-stack.yaml"), so the oc apply receives a valid manifest;
ensure the same behavior is used as the envsubst branch to avoid creating pods
with an empty image.
In `@tests/e2e/features/environment.py`:
- Around line 17-20: The import block is mis-ordered per ruff; move the two
names restore_llama_stack_pod and restart_lightspeed_stack_only from the current
top-level import into the local/project imports group (after standard library
and third-party imports) so they are grouped with other local imports, or simply
run `ruff check --fix` to auto-sort; ensure the symbols restore_llama_stack_pod
and restart_lightspeed_stack_only remain imported from
tests.e2e.utils.prow_utils and that import ordering follows stdlib, third-party,
local convention.
In `@tests/e2e/mock_mcp_server/server.py`:
- Around line 12-14: Imports in tests/e2e/mock_mcp_server/server.py are not
sorted per ruff; reorder the stdlib imports so the from-import comes first and
names are alphabetized (e.g., from http.server import BaseHTTPRequestHandler,
HTTPServer) and then import json and import sys, or simply run ruff --fix to
auto-correct the import ordering.
---
Nitpick comments:
In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml:
- Line 2: Update the Tekton manifest's apiVersion from the deprecated value
"tekton.dev/v1beta1" to the stable "tekton.dev/v1" by replacing the top-level
apiVersion entry (currently "apiVersion: tekton.dev/v1beta1") with "apiVersion:
tekton.dev/v1"; after changing, run a quick validation/apply of the manifest to
confirm no fields used by the pipeline require additional migration for the v1
API.
In `@tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml`:
- Around line 6-18: Add a readOnlyRootFilesystem flag and an emptyDir volume for
/tmp to allow the container to run with a read-only root while still permitting
writes to /tmp/data; update the securityContext for the container named
lightspeed-stack-container to include readOnlyRootFilesystem: true, add a volume
(type emptyDir) at the pod spec level, and add a matching volumeMount in
containers (mountPath: /tmp or /tmp/data) so the app's writes go to the writable
emptyDir while the rest of the filesystem remains read-only.
In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml`:
- Around line 86-89: The two chmod commands that set permissions to 777 on the
data volume (the lines with "chmod -R 777 /data") are overly permissive; update
both occurrences to a more restrictive mode such as 775 or 770 (e.g., "chmod -R
775 /data") so the test pod retains necessary access without world-write
permission, and verify ownership/UID if needed to ensure the process can still
write to /data/src/.llama/storage/rag and /data/src/.llama/storage/files after
the change.
- Line 73: Replace the mutable image reference "image: busybox:latest" with a
pinned immutable tag or digest (e.g., a specific version like busybox:1.36.1 or
a sha256 digest) so the manifest is reproducible; update the "image:
busybox:latest" entry in the manifest (the line containing image:
busybox:latest) to use the chosen versioned tag or digest and, if applicable,
set imagePullPolicy to IfNotPresent.
In `@tests/e2e-prow/rhoai/pipeline-services-konflux.sh`:
- Around line 9-19: Add explicit success/warning log messages after each secret
creation in pipeline-services-konflux.sh to match pipeline-konflux.sh behavior:
after the oc create ... | oc apply -f - invocation for mcp-file-auth-token and
mcp-invalid-file-auth-token, check the command exit status and echo a clear
success message (or a warning/error message on failure) referencing the secret
names (mcp-file-auth-token, mcp-invalid-file-auth-token) and the NAMESPACE
variable so operators can see whether each secret was created/applied
successfully.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 903972b4-ecc6-4fbf-933f-f499ba2131ee
📒 Files selected for processing (22)
.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yamldocs/e2e_testing.mdtests/e2e-prow/rhoai/configs/lightspeed-stack-invalid-mcp-file-auth.yamltests/e2e-prow/rhoai/configs/lightspeed-stack-mcp-client-auth.yamltests/e2e-prow/rhoai/configs/lightspeed-stack-mcp-file-auth.yamltests/e2e-prow/rhoai/configs/lightspeed-stack-mcp-kubernetes-auth.yamltests/e2e-prow/rhoai/configs/lightspeed-stack-mcp-oauth-auth.yamltests/e2e-prow/rhoai/configs/lightspeed-stack-mcp.yamltests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yamltests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yamltests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yamltests/e2e-prow/rhoai/manifests/lightspeed/mcp-mock-server.yamltests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yamltests/e2e-prow/rhoai/pipeline-konflux.shtests/e2e-prow/rhoai/pipeline-services-konflux.shtests/e2e-prow/rhoai/pipeline.shtests/e2e-prow/rhoai/run-tests.shtests/e2e-prow/rhoai/scripts/e2e-ops.shtests/e2e/features/environment.pytests/e2e/mock_mcp_server/server.pytests/e2e/test_list.txttests/e2e/utils/prow_utils.py
💤 Files with no reviewable changes (1)
- tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml
| results: | ||
| - name: TEST_OUTPUT | ||
| description: Standardized JSON output for Enterprise Contract |
There was a problem hiding this comment.
TEST_OUTPUT result is declared but never populated.
The TEST_OUTPUT result is defined for "Standardized JSON output for Enterprise Contract" but no step writes to it. This will cause the result to be empty, potentially breaking downstream consumers.
Either populate the result in the run-e2e-tests step or remove the declaration if it's not needed yet.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml
around lines 201 - 203, The TEST_OUTPUT pipeline result is declared but never
written; update the run-e2e-tests step (or the step that produces Enterprise
Contract output) to write the standardized JSON into the TASK_RESULT named
TEST_OUTPUT (e.g., echo the JSON to $(results.TEST_OUTPUT.path) or use tkn-style
result write) so downstream consumers receive the output, or remove the
TEST_OUTPUT result declaration if no step will produce that JSON; locate
references to TEST_OUTPUT and the run-e2e-tests step in the task spec and add a
command that writes the JSON to the results file path.
.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml
Outdated
Show resolved
Hide resolved
.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml
Outdated
Show resolved
Hide resolved
| - name: mcp-file-auth-token | ||
| secret: | ||
| secretName: mcp-file-auth-token | ||
| - name: mcp-invalid-file-auth-token | ||
| secret: | ||
| secretName: mcp-invalid-file-auth-token |
There was a problem hiding this comment.
Secrets may not exist, causing pod startup failure.
The mcp-file-auth-token and mcp-invalid-file-auth-token secrets are created conditionally in pipeline-konflux.sh (only if fixture files exist). If the secrets don't exist, the pod will fail to start with a "secret not found" error.
Consider making these volumes optional to allow graceful degradation when MCP file-auth tests aren't needed:
🛡️ Proposed fix to make secret volumes optional
- name: mcp-file-auth-token
secret:
secretName: mcp-file-auth-token
+ optional: true
- name: mcp-invalid-file-auth-token
secret:
secretName: mcp-invalid-file-auth-token
+ optional: true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: mcp-file-auth-token | |
| secret: | |
| secretName: mcp-file-auth-token | |
| - name: mcp-invalid-file-auth-token | |
| secret: | |
| secretName: mcp-invalid-file-auth-token | |
| - name: mcp-file-auth-token | |
| secret: | |
| secretName: mcp-file-auth-token | |
| optional: true | |
| - name: mcp-invalid-file-auth-token | |
| secret: | |
| secretName: mcp-invalid-file-auth-token | |
| optional: true |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml` around lines
60 - 65, The secret volumes mcp-file-auth-token and mcp-invalid-file-auth-token
in lightspeed-stack.yaml are mandatory and will cause pod startup failures if
those secrets are absent; update each volume definition (the entries named
mcp-file-auth-token and mcp-invalid-file-auth-token) to mark the secret as
optional by adding secret.optional: true so Kubernetes will allow the pod to
start when those secrets are missing.
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
Outdated
Show resolved
Hide resolved
| oc expose pod lightspeed-stack-service \ | ||
| --name=lightspeed-stack-service-svc \ | ||
| --port=8080 \ | ||
| --type=ClusterIP \ | ||
| -n $NAMESPACE |
There was a problem hiding this comment.
oc expose may fail on pipeline re-runs if service already exists.
Unlike oc apply, oc expose will fail if the service already exists. Consider using --dry-run=client -o yaml | oc apply -f - pattern or checking existence first:
-oc expose pod lightspeed-stack-service \
- --name=lightspeed-stack-service-svc \
- --port=8080 \
- --type=ClusterIP \
- -n $NAMESPACE
+oc expose pod lightspeed-stack-service \
+ --name=lightspeed-stack-service-svc \
+ --port=8080 \
+ --type=ClusterIP \
+ -n $NAMESPACE 2>/dev/null || log "Service lightspeed-stack-service-svc already exists"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| oc expose pod lightspeed-stack-service \ | |
| --name=lightspeed-stack-service-svc \ | |
| --port=8080 \ | |
| --type=ClusterIP \ | |
| -n $NAMESPACE | |
| oc expose pod lightspeed-stack-service \ | |
| --name=lightspeed-stack-service-svc \ | |
| --port=8080 \ | |
| --type=ClusterIP \ | |
| -n $NAMESPACE 2>/dev/null || log "Service lightspeed-stack-service-svc already exists" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e-prow/rhoai/pipeline-konflux.sh` around lines 245 - 249, The oc
expose call for pod lightspeed-stack-service will fail on re-runs if the service
already exists; update the pipeline step that runs the oc expose pod
lightspeed-stack-service --name=lightspeed-stack-service-svc --port=8080
--type=ClusterIP -n $NAMESPACE command to either check for the existing service
before exposing (e.g., test for a service named lightspeed-stack-service-svc and
skip if present) or replace the direct expose with the idempotent pattern using
--dry-run=client -o yaml piped into oc apply -f - so the
lightspeed-stack-service exposure is created or updated safely on reruns.
| oc label pod llama-stack-service pod=llama-stack-service -n "$NAMESPACE" | ||
| oc expose pod llama-stack-service --name=llama-stack-service-svc --port=8321 --type=ClusterIP -n "$NAMESPACE" |
There was a problem hiding this comment.
oc expose is not idempotent and will fail on re-runs.
If the script is executed multiple times (e.g., during debugging or pipeline retries), line 26 will fail because the service already exists. Consider making this idempotent like the secret creation on line 29.
🔧 Proposed fix for idempotent service creation
oc label pod llama-stack-service pod=llama-stack-service -n "$NAMESPACE"
-oc expose pod llama-stack-service --name=llama-stack-service-svc --port=8321 --type=ClusterIP -n "$NAMESPACE"
+oc expose pod llama-stack-service --name=llama-stack-service-svc --port=8321 --type=ClusterIP -n "$NAMESPACE" 2>/dev/null || trueAlternatively, use oc get svc to check existence first, or use a declarative Service manifest.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| oc label pod llama-stack-service pod=llama-stack-service -n "$NAMESPACE" | |
| oc expose pod llama-stack-service --name=llama-stack-service-svc --port=8321 --type=ClusterIP -n "$NAMESPACE" | |
| oc label pod llama-stack-service pod=llama-stack-service -n "$NAMESPACE" | |
| oc expose pod llama-stack-service --name=llama-stack-service-svc --port=8321 --type=ClusterIP -n "$NAMESPACE" 2>/dev/null || true |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e-prow/rhoai/pipeline-services-konflux.sh` around lines 25 - 26, The
oc expose call for the service is not idempotent and will fail on re-runs; make
creation of the service named "llama-stack-service-svc" for pod
"llama-stack-service" idempotent by first checking for the service’s existence
(e.g., use oc get svc for "llama-stack-service-svc" in the target namespace and
only run the expose step if it does not exist) or replace the one-line expose
with a declarative Service manifest and use oc apply to create/update the
service atomically.
| if command -v envsubst >/dev/null 2>&1; then | ||
| envsubst < "$MANIFEST_DIR/llama-stack.yaml" | oc apply -n "$NAMESPACE" -f - | ||
| else | ||
| sed "s|\${LLAMA_STACK_IMAGE}|${LLAMA_STACK_IMAGE:-}|g" "$MANIFEST_DIR/llama-stack.yaml" | | ||
| oc apply -n "$NAMESPACE" -f - | ||
| fi |
There was a problem hiding this comment.
Empty LLAMA_STACK_IMAGE may produce invalid manifest.
When envsubst is unavailable, the sed fallback substitutes ${LLAMA_STACK_IMAGE} with the variable value. If LLAMA_STACK_IMAGE is unset/empty, this produces an empty image: field which will fail pod creation.
Consider adding a default or validation:
else
- sed "s|\${LLAMA_STACK_IMAGE}|${LLAMA_STACK_IMAGE:-}|g" "$MANIFEST_DIR/llama-stack.yaml" |
+ sed "s|\${LLAMA_STACK_IMAGE}|${LLAMA_STACK_IMAGE:?LLAMA_STACK_IMAGE must be set for non-Konflux Prow}|g" "$MANIFEST_DIR/llama-stack.yaml" |
oc apply -n "$NAMESPACE" -f -
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if command -v envsubst >/dev/null 2>&1; then | |
| envsubst < "$MANIFEST_DIR/llama-stack.yaml" | oc apply -n "$NAMESPACE" -f - | |
| else | |
| sed "s|\${LLAMA_STACK_IMAGE}|${LLAMA_STACK_IMAGE:-}|g" "$MANIFEST_DIR/llama-stack.yaml" | | |
| oc apply -n "$NAMESPACE" -f - | |
| fi | |
| if command -v envsubst >/dev/null 2>&1; then | |
| envsubst < "$MANIFEST_DIR/llama-stack.yaml" | oc apply -n "$NAMESPACE" -f - | |
| else | |
| sed "s|\${LLAMA_STACK_IMAGE}|${LLAMA_STACK_IMAGE:?LLAMA_STACK_IMAGE must be set for non-Konflux Prow}|g" "$MANIFEST_DIR/llama-stack.yaml" | | |
| oc apply -n "$NAMESPACE" -f - | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e-prow/rhoai/scripts/e2e-ops.sh` around lines 283 - 288, The sed
fallback that substitutes ${LLAMA_STACK_IMAGE} can produce an empty image field
if LLAMA_STACK_IMAGE is unset; update the branch that uses sed (the block
checking command -v envsubst and the sed pipeline) to either validate
LLAMA_STACK_IMAGE before templating and exit with an error if empty, or apply a
safe default via shell parameter expansion (reference LLAMA_STACK_IMAGE and the
sed substitution of "$MANIFEST_DIR/llama-stack.yaml"), so the oc apply receives
a valid manifest; ensure the same behavior is used as the envsubst branch to
avoid creating pods with an empty image.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/e2e/mock_mcp_server/server.py (1)
119-122: Consider adding basic error handling for port argument.If an invalid value is passed as
sys.argv[1](e.g., non-numeric), the server crashes with an unhelpfulValueError. Since this is test infrastructure with controlled inputs from the deployment manifest, this is low priority.♻️ Optional: Add graceful error handling
if __name__ == "__main__": - port = int(sys.argv[1]) if len(sys.argv) > 1 else 3001 + try: + port = int(sys.argv[1]) if len(sys.argv) > 1 else 3001 + except ValueError: + sys.exit(f"Invalid port: {sys.argv[1]}") server = HTTPServer(("0.0.0.0", port), Handler) print(f"Mock MCP server on :{port}") server.serve_forever()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/mock_mcp_server/server.py` around lines 119 - 122, Wrap the port parsing in the __main__ block (where sys.argv is used to set port and HTTPServer(("0.0.0.0", port), Handler) is constructed) with a try/except that catches ValueError when converting sys.argv[1] to int, print a clear error message indicating the invalid port value and either fall back to the default port (3001) or exit non‑zero; ensure the variable name port remains used for HTTPServer initialization so no other changes are needed.tests/e2e/features/environment.py (1)
465-467: Redundantmode_dirrecalculation.
mode_diris already computed at line 441 at the start ofbefore_feature. The recalculation here is unnecessary.♻️ Proposed fix
if "MCP" in feature.tags: - mode_dir = "library-mode" if context.is_library_mode else "server-mode" context.feature_config = _get_config_path("mcp", mode_dir)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/environment.py` around lines 465 - 467, The block in before_feature redundantly recalculates mode_dir; remove the duplicate computation and use the already-computed mode_dir from earlier in before_feature when setting context.feature_config for MCP (i.e., keep the context.feature_config = _get_config_path("mcp", mode_dir) line but drop the preceding mode_dir = ... recalculation and rely on the existing mode_dir variable that was set at the top of before_feature using context.is_library_mode).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/e2e/features/environment.py`:
- Around line 465-467: The block in before_feature redundantly recalculates
mode_dir; remove the duplicate computation and use the already-computed mode_dir
from earlier in before_feature when setting context.feature_config for MCP
(i.e., keep the context.feature_config = _get_config_path("mcp", mode_dir) line
but drop the preceding mode_dir = ... recalculation and rely on the existing
mode_dir variable that was set at the top of before_feature using
context.is_library_mode).
In `@tests/e2e/mock_mcp_server/server.py`:
- Around line 119-122: Wrap the port parsing in the __main__ block (where
sys.argv is used to set port and HTTPServer(("0.0.0.0", port), Handler) is
constructed) with a try/except that catches ValueError when converting
sys.argv[1] to int, print a clear error message indicating the invalid port
value and either fall back to the default port (3001) or exit non‑zero; ensure
the variable name port remains used for HTTPServer initialization so no other
changes are needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f5ce253d-c480-42fb-aa63-5c19e8438d79
📒 Files selected for processing (2)
tests/e2e/features/environment.pytests/e2e/mock_mcp_server/server.py
| kind: Pipeline | ||
| metadata: | ||
| annotations: | ||
| pipelinesascode.tekton.dev/task: "[ols-installer, ols-e2e-task]" |
| secretName: quay-aipcc-password | ||
| - name: ols-konflux-artifacts-bot-creds | ||
| secret: | ||
| secretName: ols-konflux-artifacts-bot |
| echo "[e2e] 4/8 SNAPSHOT (length ${#SNAPSHOT} chars; clone URL fixed — SNAPSHOT is main/upstream, not fork)..." | ||
| # Fixed fork + branch: Konflux SNAPSHOT points at main repo/rev, not the PR/fork under test. | ||
| REPO_URL="https://github.com/radofuchs/lightspeed-stack.git" | ||
| REPO_REV="lcore-1221-konflux-e2e-tests" |
There was a problem hiding this comment.
coderabbit seems to be right ;)
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml (3)
199-201:⚠️ Potential issue | 🟡 Minor
TEST_OUTPUTis still never populated.No active step writes
$(results.TEST_OUTPUT.path), so the declared task result stays empty/missing. Either emit the Enterprise Contract JSON fromrun-e2e-testsor remove this result until it is actually produced.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml around lines 199 - 201, The TASK declares a result named TEST_OUTPUT but nothing writes to $(results.TEST_OUTPUT.path); update the pipeline so the step run-e2e-tests produces the Enterprise Contract JSON to that path (e.g., write the JSON file to the path referenced by results.TEST_OUTPUT.path inside the run-e2e-tests step) or if you do not plan to produce it yet, remove the TEST_OUTPUT result declaration; search for the result name TEST_OUTPUT and the step run-e2e-tests to implement the write (or remove the unused result).
294-295:⚠️ Potential issue | 🔴 CriticalThe repo/branch is still hardcoded to a personal fork.
This bypasses the snapshot under test and will break as soon as
lcore-1221-konflux-e2e-testsis deleted. Pull both.source.git.urland.source.git.revisionfromSNAPSHOT(or pipeline params) so the checkout stays aligned with the built image.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml around lines 294 - 295, The pipeline currently hardcodes REPO_URL and REPO_REV (REPO_URL="https://github.com/radofuchs/lightspeed-stack.git" and REPO_REV="lcore-1221-konflux-e2e-tests"); replace those hardcoded values by reading the repository URL and revision from the SNAPSHOT (or pipeline params) — e.g. set REPO_URL to the snapshot's .source.git.url and REPO_REV to .source.git.revision (or corresponding pipeline parameter names) so the checkout uses the snapshot under test instead of a personal fork/branch.
242-246:⚠️ Potential issue | 🟠 MajorStop printing the full kubeconfig.
This still dumps cluster credentials straight into the task log. Keep the path/existence checks, but limit the output to non-sensitive metadata or context names.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml around lines 242 - 246, Remove the cat "$KUBECONFIG" that dumps credentials and instead output only non-sensitive metadata; keep the existing KUBECONFIG path and existence checks, then replace the full kubeconfig print with a safe metadata summary (for example, print available context names and the current context using kubectl config view/query commands rather than dumping file contents) so credentials are not exposed in task logs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml:
- Line 120: The task currently writes an empty string when jq fails to find the
"lightspeed-stack" component in SNAPSHOT which allows downstream tests to
silently use a fallback image; update the step to fail-fast: after computing the
image value with the existing jq expression (the part using --arg n
"lightspeed-stack" '.components[] | select(.name == $n) | .containerImage //
""'), check if the result is empty and if so print a clear error and exit
non-zero, otherwise write the image into
step.results.lightspeed-stack-image.path so the pipeline aborts when the
component is missing instead of running with the wrong image.
- Around line 248-249: The pipeline step named "run-e2e-tests" currently has
onError: continue which masks failures; remove the onError: continue from the
run-e2e-tests step (or alternatively re-enable the follow-up
"fail-if-any-step-failed" step) so that a non-zero exit from pipeline-konflux.sh
causes the TaskRun to fail; update the task YAML around the run-e2e-tests step
and ensure pipeline-konflux.sh remains the sole active test step or that
fail-if-any-step-failed is uncommented to enforce failure propagation.
---
Duplicate comments:
In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml:
- Around line 199-201: The TASK declares a result named TEST_OUTPUT but nothing
writes to $(results.TEST_OUTPUT.path); update the pipeline so the step
run-e2e-tests produces the Enterprise Contract JSON to that path (e.g., write
the JSON file to the path referenced by results.TEST_OUTPUT.path inside the
run-e2e-tests step) or if you do not plan to produce it yet, remove the
TEST_OUTPUT result declaration; search for the result name TEST_OUTPUT and the
step run-e2e-tests to implement the write (or remove the unused result).
- Around line 294-295: The pipeline currently hardcodes REPO_URL and REPO_REV
(REPO_URL="https://github.com/radofuchs/lightspeed-stack.git" and
REPO_REV="lcore-1221-konflux-e2e-tests"); replace those hardcoded values by
reading the repository URL and revision from the SNAPSHOT (or pipeline params) —
e.g. set REPO_URL to the snapshot's .source.git.url and REPO_REV to
.source.git.revision (or corresponding pipeline parameter names) so the checkout
uses the snapshot under test instead of a personal fork/branch.
- Around line 242-246: Remove the cat "$KUBECONFIG" that dumps credentials and
instead output only non-sensitive metadata; keep the existing KUBECONFIG path
and existence checks, then replace the full kubeconfig print with a safe
metadata summary (for example, print available context names and the current
context using kubectl config view/query commands rather than dumping file
contents) so credentials are not exposed in task logs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cff1d330-a1a4-4c25-b890-82f2add05e00
📒 Files selected for processing (1)
.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml
| description: "commit sha to be used to store artifacts" | ||
| script: | | ||
| dnf -y install jq | ||
| echo -n "$(jq -r --arg n "lightspeed-stack" '.components[] | select(.name == $n) | .containerImage // ""' <<< "$SNAPSHOT")" > $(step.results.lightspeed-stack-image.path) |
There was a problem hiding this comment.
Fail fast if lightspeed-stack is missing from SNAPSHOT.
Line 120 currently writes "" when the component lookup misses. tests/e2e-prow/rhoai/pipeline-konflux.sh:14-26 then falls back to quay.io/lightspeed-core/lightspeed-stack:dev-latest, so this task can silently test the wrong image instead of the snapshot under test.
Suggested change
- echo -n "$(jq -r --arg n "lightspeed-stack" '.components[] | select(.name == $n) | .containerImage // ""' <<< "$SNAPSHOT")" > $(step.results.lightspeed-stack-image.path)
+ image="$(jq -er --arg n "lightspeed-stack" '.components[] | select(.name == $n) | .containerImage' <<< "$SNAPSHOT")" || {
+ echo "lightspeed-stack image missing from SNAPSHOT" >&2
+ exit 1
+ }
+ printf '%s' "$image" > "$(step.results.lightspeed-stack-image.path)"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo -n "$(jq -r --arg n "lightspeed-stack" '.components[] | select(.name == $n) | .containerImage // ""' <<< "$SNAPSHOT")" > $(step.results.lightspeed-stack-image.path) | |
| image="$(jq -er --arg n "lightspeed-stack" '.components[] | select(.name == $n) | .containerImage' <<< "$SNAPSHOT")" || { | |
| echo "lightspeed-stack image missing from SNAPSHOT" >&2 | |
| exit 1 | |
| } | |
| printf '%s' "$image" > "$(step.results.lightspeed-stack-image.path)" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml at
line 120, The task currently writes an empty string when jq fails to find the
"lightspeed-stack" component in SNAPSHOT which allows downstream tests to
silently use a fallback image; update the step to fail-fast: after computing the
image value with the existing jq expression (the part using --arg n
"lightspeed-stack" '.components[] | select(.name == $n) | .containerImage //
""'), check if the result is empty and if so print a clear error and exit
non-zero, otherwise write the image into
step.results.lightspeed-stack-image.path so the pipeline aborts when the
component is missing instead of running with the wrong image.
| - name: run-e2e-tests | ||
| onError: continue |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Tekton Pipelines, if a step uses onError: continue and exits non-zero, does the enclosing TaskRun/PipelineRun still succeed when there are no later failing steps?
💡 Result:
Yes, if a step in a Tekton Task uses onError: continue and exits with a non-zero code, but all subsequent steps in the Task succeed, the enclosing TaskRun succeeds (SUCCEEDED condition True with reason Succeeded). The PipelineRun will also succeed if all its TaskRuns succeed and there are no later failing steps or PipelineTasks with onError: stopAndFail.
Citations:
- 1: TEP-0040 implementation - specifying
onErrorin a step tektoncd/pipeline#4106 - 2: https://tekton.dev/docs/pipelines/tasks
- 3: https://tekton.dev/docs/pipelines/taskruns/
🏁 Script executed:
cat -n .tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml | sed -n '240,260p'Repository: lightspeed-core/lightspeed-stack
Length of output: 1159
🏁 Script executed:
wc -l .tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yamlRepository: lightspeed-core/lightspeed-stack
Length of output: 153
🏁 Script executed:
rg 'PIPELINE_EXIT' .tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml -B 5 -A 5Repository: lightspeed-core/lightspeed-stack
Length of output: 767
🏁 Script executed:
# Check if there are any steps defined after the run-e2e-tests step
cat -n .tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml | sed -n '245,280p'Repository: lightspeed-core/lightspeed-stack
Length of output: 1790
🏁 Script executed:
cat -n .tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml | sed -n '280,359p'Repository: lightspeed-core/lightspeed-stack
Length of output: 4753
onError: continue makes failing E2E runs non-blocking and allows the task to succeed despite test failures.
This is the only active step in the task—all subsequent steps are commented out. With onError: continue, even if pipeline-konflux.sh exits with a non-zero code, the TaskRun will still report success, so broken E2E tests will not fail CI. Remove onError: continue or uncomment the fail-if-any-step-failed step before merge.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml
around lines 248 - 249, The pipeline step named "run-e2e-tests" currently has
onError: continue which masks failures; remove the onError: continue from the
run-e2e-tests step (or alternatively re-enable the follow-up
"fail-if-any-step-failed" step) so that a non-zero exit from pipeline-konflux.sh
causes the TaskRun to fail; update the task YAML around the run-e2e-tests step
and ensure pipeline-konflux.sh remains the sole active test step or that
fail-if-any-step-failed is uncommented to enforce failure propagation.
There was a problem hiding this comment.
♻️ Duplicate comments (5)
.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml (3)
233-234:⚠️ Potential issue | 🔴 Critical
onError: continuemasks test failures - pipeline will succeed even when E2E tests fail.With
onError: continueand thefail-if-any-step-failedstep commented out (line 336), failing E2E tests won't fail the pipeline. This defeats the purpose of CI integration testing.Remove
onError: continueor uncomment thefail-if-any-step-failedstep before merge.🛠️ Suggested fix
- name: run-e2e-tests - onError: continue resources:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml around lines 233 - 234, The pipeline step "run-e2e-tests" currently sets onError: continue which masks failures; remove the onError: continue entry from the "run-e2e-tests" step (or alternatively uncomment and enable the "fail-if-any-step-failed" step) so that failing E2E tests cause the pipeline to fail; ensure "run-e2e-tests" no longer swallows errors and that the "fail-if-any-step-failed" step (if used) is active and runs after the E2E steps to aggregate and fail the pipeline when any step failed.
199-201:⚠️ Potential issue | 🟡 Minor
TEST_OUTPUTresult is declared but never populated.The
TEST_OUTPUTresult for Enterprise Contract is defined but no step writes to it. This will leave the result empty, potentially breaking downstream consumers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml around lines 199 - 201, The TEST_OUTPUT result is declared but never populated; update the Enterprise Contract task/step that runs the contract verification to write its JSON output into the declared result by redirecting or echoing the verifier's JSON to the results path (use the Tekton variable $(results.TEST_OUTPUT.path)). Locate the Enterprise Contract step (the step that runs the contract check/verification) and ensure its command or post-step writes the standardized JSON to $(results.TEST_OUTPUT.path) so downstream tasks can consume it.
118-121:⚠️ Potential issue | 🟠 MajorFail fast if
lightspeed-stackcomponent is missing from SNAPSHOT.The jq extraction writes an empty string when the component isn't found, which causes silent fallback to
quay.io/lightspeed-core/lightspeed-stack:dev-latestin downstream scripts. This means the pipeline could test the wrong image without warning.🛠️ Suggested fix to fail fast
script: | dnf -y install jq - echo -n "$(jq -r --arg n "lightspeed-stack" '.components[] | select(.name == $n) | .containerImage // ""' <<< "$SNAPSHOT")" > $(step.results.lightspeed-stack-image.path) + image="$(jq -er --arg n "lightspeed-stack" '.components[] | select(.name == $n) | .containerImage' <<< "$SNAPSHOT")" || { + echo "ERROR: lightspeed-stack component missing from SNAPSHOT" >&2 + exit 1 + } + printf '%s' "$image" > "$(step.results.lightspeed-stack-image.path)" echo -n "$(jq -r --arg n "lightspeed-stack" '.components[] | select(.name == $n) | .source.git.revision // "latest"' <<< "$SNAPSHOT")" > $(step.results.commit.path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml around lines 118 - 121, The jq commands that write to $(step.results.lightspeed-stack-image.path) and $(step.results.commit.path) currently produce empty strings when the "lightspeed-stack" component is missing from SNAPSHOT, allowing downstream steps to silently use a default image; update the script to extract each field into temporary variables (using the same jq queries against $SNAPSHOT), check if the resulting values are empty (e.g., test -z), and if either is empty log a clear error and exit non‑zero so the pipeline fails fast instead of writing empty results; if both are present, then echo the values into the respective step.results files as before.tests/e2e-prow/rhoai/pipeline-konflux.sh (2)
147-147:⚠️ Potential issue | 🟡 MinorRelative path
configs/lightspeed-stack.yamlmay not resolve correctly.The working directory isn't explicitly set at this point. Use
$REPO_ROOTfor consistency with other ConfigMap creations in this script.🛠️ Suggested fix
-oc create configmap lightspeed-stack-config -n "$NAMESPACE" --from-file=configs/lightspeed-stack.yaml --dry-run=client -o yaml | oc apply -f - +oc create configmap lightspeed-stack-config -n "$NAMESPACE" --from-file="$REPO_ROOT/configs/lightspeed-stack.yaml" --dry-run=client -o yaml | oc apply -f -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/pipeline-konflux.sh` at line 147, The ConfigMap creation uses a relative path that may fail; update the command that creates lightspeed-stack-config to reference the repo root variable like the others by replacing --from-file=configs/lightspeed-stack.yaml with --from-file="$REPO_ROOT/configs/lightspeed-stack.yaml" in the oc create/apply pipeline (the command that currently pipes oc create ... --from-file=configs/lightspeed-stack.yaml -o yaml | oc apply -f -), ensuring $REPO_ROOT is quoted.
245-249:⚠️ Potential issue | 🟡 Minor
oc exposemay fail on pipeline re-runs if service already exists.Unlike
oc apply,oc exposeis not idempotent. Consider handling the case where the service already exists.🛠️ Suggested fix
-oc expose pod lightspeed-stack-service \ - --name=lightspeed-stack-service-svc \ - --port=8080 \ - --type=ClusterIP \ - -n $NAMESPACE +oc expose pod lightspeed-stack-service \ + --name=lightspeed-stack-service-svc \ + --port=8080 \ + --type=ClusterIP \ + -n "$NAMESPACE" 2>/dev/null || log "Service lightspeed-stack-service-svc already exists"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/pipeline-konflux.sh` around lines 245 - 249, The oc expose command is not idempotent and will fail if the service already exists; update the pipeline-konflux.sh snippet around the oc expose invocation so it first checks for the service (e.g., run oc get svc lightspeed-stack-service-svc -n $NAMESPACE >/dev/null 2>&1) and only runs oc expose pod lightspeed-stack-service --name=lightspeed-stack-service-svc --port=8080 --type=ClusterIP -n $NAMESPACE when the get returns non-zero, or alternatively delete and re-create the service if you prefer recreation; ensure you reference the exact service name lightspeed-stack-service-svc and the oc expose command when making the change.
🧹 Nitpick comments (3)
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml (1)
72-95: Good security hardening onsetup-rag-datacontainer.The security context is well-configured with
runAsNonRoot, dropped capabilities, and seccomp profile.Minor: The
chmod -R 777 /dataruns twice (lines 87 and 89) - the second call after extraction is sufficient.♻️ Remove redundant chmod
mkdir -p /data/src/.llama/storage/rag /data/src/.llama/storage/files - chmod -R 777 /data gunzip -c /rag-data/kv_store.db.gz > /data/src/.llama/storage/rag/kv_store.db chmod -R 777 /data🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml` around lines 72 - 95, In the setup-rag-data container's startup command (container name: setup-rag-data), remove the first redundant chmod invocation so only the chmod -R 777 /data that runs after the gunzip extraction remains; update the command block inside the container spec (the multi-line shell under command: - |) to delete the earlier chmod -R 777 /data line and keep the post-extraction chmod to ensure correct permissions without duplication.tests/e2e-prow/rhoai/pipeline-konflux.sh (2)
198-206: Minor redundancy: double-wait forllama-stack-service.
pipeline-services-konflux.shalready waits forllama-stack-servicereadiness (per context snippet 4), then this script waits again. The second wait should return immediately if the pod is ready, so this is not harmful, but it's redundant.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/pipeline-konflux.sh` around lines 198 - 206, The script redundantly waits for the same pod twice; remove the duplicate wait for llama-stack-service in this file by changing the oc wait call in the current script to only wait for the service(s) not already waited for in pipeline-services-konflux.sh (e.g., keep pod/lightspeed-stack-service only), updating the oc wait invocation that currently references pod/lightspeed-stack-service and pod/llama-stack-service so it no longer includes pod/llama-stack-service.
156-169: Consider passing RAG_DB_PATH as argument rather than shell interpolation.The Python script embeds
$RAG_DB_PATHvia shell string interpolation. This works but could break if the path contains special characters. Consider passing it as a command-line argument.♻️ Safer approach using sys.argv
- export FAISS_VECTOR_STORE_ID=$(python3 -c " + export FAISS_VECTOR_STORE_ID=$(python3 - "$RAG_DB_PATH" <<'PYEOF' import sqlite3 import re -conn = sqlite3.connect('$RAG_DB_PATH') +import sys +conn = sqlite3.connect(sys.argv[1]) cursor = conn.cursor() cursor.execute(\"SELECT key FROM kvstore WHERE key LIKE 'vector_stores:v%::%' LIMIT 1\") row = cursor.fetchone() if row: - # Extract the vs_xxx ID from the key match = re.search(r'(vs_[a-f0-9-]+)', row[0]) if match: print(match.group(1)) conn.close() -" 2>/dev/null || echo "") +PYEOF +2>/dev/null || echo "")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/pipeline-konflux.sh` around lines 156 - 169, The inline Python snippet that computes FAISS_VECTOR_STORE_ID currently injects $RAG_DB_PATH via shell interpolation which can break on special characters; modify the invocation so the shell passes RAG_DB_PATH as a command-line argument (e.g., python3 script.py "$RAG_DB_PATH") and update the Python snippet to read the path from sys.argv (use sys.argv[1]) instead of relying on shell expansion; ensure the export FAISS_VECTOR_STORE_ID assignment still captures stdout or falls back to empty string on error, and reference the FAISS_VECTOR_STORE_ID variable and the RAG_DB_PATH argument usage when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In @.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yaml:
- Around line 233-234: The pipeline step "run-e2e-tests" currently sets onError:
continue which masks failures; remove the onError: continue entry from the
"run-e2e-tests" step (or alternatively uncomment and enable the
"fail-if-any-step-failed" step) so that failing E2E tests cause the pipeline to
fail; ensure "run-e2e-tests" no longer swallows errors and that the
"fail-if-any-step-failed" step (if used) is active and runs after the E2E steps
to aggregate and fail the pipeline when any step failed.
- Around line 199-201: The TEST_OUTPUT result is declared but never populated;
update the Enterprise Contract task/step that runs the contract verification to
write its JSON output into the declared result by redirecting or echoing the
verifier's JSON to the results path (use the Tekton variable
$(results.TEST_OUTPUT.path)). Locate the Enterprise Contract step (the step that
runs the contract check/verification) and ensure its command or post-step writes
the standardized JSON to $(results.TEST_OUTPUT.path) so downstream tasks can
consume it.
- Around line 118-121: The jq commands that write to
$(step.results.lightspeed-stack-image.path) and $(step.results.commit.path)
currently produce empty strings when the "lightspeed-stack" component is missing
from SNAPSHOT, allowing downstream steps to silently use a default image; update
the script to extract each field into temporary variables (using the same jq
queries against $SNAPSHOT), check if the resulting values are empty (e.g., test
-z), and if either is empty log a clear error and exit non‑zero so the pipeline
fails fast instead of writing empty results; if both are present, then echo the
values into the respective step.results files as before.
In `@tests/e2e-prow/rhoai/pipeline-konflux.sh`:
- Line 147: The ConfigMap creation uses a relative path that may fail; update
the command that creates lightspeed-stack-config to reference the repo root
variable like the others by replacing --from-file=configs/lightspeed-stack.yaml
with --from-file="$REPO_ROOT/configs/lightspeed-stack.yaml" in the oc
create/apply pipeline (the command that currently pipes oc create ...
--from-file=configs/lightspeed-stack.yaml -o yaml | oc apply -f -), ensuring
$REPO_ROOT is quoted.
- Around line 245-249: The oc expose command is not idempotent and will fail if
the service already exists; update the pipeline-konflux.sh snippet around the oc
expose invocation so it first checks for the service (e.g., run oc get svc
lightspeed-stack-service-svc -n $NAMESPACE >/dev/null 2>&1) and only runs oc
expose pod lightspeed-stack-service --name=lightspeed-stack-service-svc
--port=8080 --type=ClusterIP -n $NAMESPACE when the get returns non-zero, or
alternatively delete and re-create the service if you prefer recreation; ensure
you reference the exact service name lightspeed-stack-service-svc and the oc
expose command when making the change.
---
Nitpick comments:
In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml`:
- Around line 72-95: In the setup-rag-data container's startup command
(container name: setup-rag-data), remove the first redundant chmod invocation so
only the chmod -R 777 /data that runs after the gunzip extraction remains;
update the command block inside the container spec (the multi-line shell under
command: - |) to delete the earlier chmod -R 777 /data line and keep the
post-extraction chmod to ensure correct permissions without duplication.
In `@tests/e2e-prow/rhoai/pipeline-konflux.sh`:
- Around line 198-206: The script redundantly waits for the same pod twice;
remove the duplicate wait for llama-stack-service in this file by changing the
oc wait call in the current script to only wait for the service(s) not already
waited for in pipeline-services-konflux.sh (e.g., keep
pod/lightspeed-stack-service only), updating the oc wait invocation that
currently references pod/lightspeed-stack-service and pod/llama-stack-service so
it no longer includes pod/llama-stack-service.
- Around line 156-169: The inline Python snippet that computes
FAISS_VECTOR_STORE_ID currently injects $RAG_DB_PATH via shell interpolation
which can break on special characters; modify the invocation so the shell passes
RAG_DB_PATH as a command-line argument (e.g., python3 script.py "$RAG_DB_PATH")
and update the Python snippet to read the path from sys.argv (use sys.argv[1])
instead of relying on shell expansion; ensure the export FAISS_VECTOR_STORE_ID
assignment still captures stdout or falls back to empty string on error, and
reference the FAISS_VECTOR_STORE_ID variable and the RAG_DB_PATH argument usage
when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fc32ef73-c6f8-4ab5-a602-9b4f2cf34d56
📒 Files selected for processing (3)
.tekton/integration-tests/pipeline/lightspeed-stack-integration-test.yamltests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yamltests/e2e-prow/rhoai/pipeline-konflux.sh
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/e2e/features/environment.py (1)
473-478: LGTM!Using
_get_config_path("mcp", mode_dir)aligns with the centralized config path pattern.Minor nit:
mode_diris recomputed on line 474, but it was already computed on line 449 and is available in the function scope. Consider removing the redundant computation.,
♻️ Optional: remove redundant mode_dir
if "MCP" in feature.tags: - mode_dir = "library-mode" if context.is_library_mode else "server-mode" context.feature_config = _get_config_path("mcp", mode_dir)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/environment.py` around lines 473 - 478, The variable mode_dir is being recomputed unnecessarily where MCP is handled; remove the redundant assignment and reuse the existing mode_dir computed earlier in the function (the variable named mode_dir and the boolean context.is_library_mode are already in scope), so simply call context.feature_config = _get_config_path("mcp", mode_dir) without reassigning mode_dir, then leave the rest (create_config_backup, switch_config, restart_container) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/e2e/features/environment.py`:
- Around line 473-478: The variable mode_dir is being recomputed unnecessarily
where MCP is handled; remove the redundant assignment and reuse the existing
mode_dir computed earlier in the function (the variable named mode_dir and the
boolean context.is_library_mode are already in scope), so simply call
context.feature_config = _get_config_path("mcp", mode_dir) without reassigning
mode_dir, then leave the rest (create_config_backup, switch_config,
restart_container) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4e85a774-6725-423c-8d50-023b25b4fc78
📒 Files selected for processing (1)
tests/e2e/features/environment.py
Description
Added tekton file for konflux e2e test run
Note: I will update the repo paths after the review to be able to apply fixes
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Tests
Documentation
Chores