Skip to content

Conversation

@are-ces
Copy link
Contributor

@are-ces are-ces commented Oct 24, 2025

Description

Added e2e tests for RHEL AI (1.5.2), updated docs.

e2e tests are set to run every day at midnight UTC.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue # LCORE-333
  • Closes # LCORE-333

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
    E2E test for RHEL AI shows that the inference works, thus is supported.

Summary by CodeRabbit

  • New Features

    • Added RHEL AI (vLLM) support with example config, model entries, and environment variables for endpoint, port, API key, and model.
    • Added end-to-end RHEL AI test configuration to validate deployments.
  • Documentation

    • Updated provider docs and README to include RHEL AI (vLLM) compatibility and example config.
  • Chores

    • Added a daily/on-demand CI workflow for RHEL AI E2E tests and tightened connectivity checks to fail on HTTP errors.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 24, 2025

Walkthrough

Adds RHEL AI (vLLM) support: new daily/on-demand GitHub Actions workflow for RHEL AI e2e tests, changes curl to fail-on-HTTP-error in an existing RHAIIS workflow, docker-compose env vars for RHEL AI, docs updates, and a new e2e config tests/e2e/configs/run-rhelai.yaml. (≤50 words)

Changes

Cohort / File(s) Summary
GitHub Actions Workflows
​.github/workflows/e2e_tests_rhaiis.yaml, ​.github/workflows/e2e_tests_rhelai.yaml
e2e_tests_rhaiis.yaml: curl calls updated to use -f so steps fail on HTTP error statuses. e2e_tests_rhelai.yaml (new): daily / workflow_dispatch RHEL AI E2E workflow with checkout, config generation/selection, connectivity checks, docker-compose service start, waits/health checks, e2e test execution, and failure log dumps.
Docker Compose
docker-compose.yaml
Added environment variables to the llama-stack service: RHEL_AI_URL, RHEL_AI_PORT, RHEL_AI_API_KEY, RHEL_AI_MODEL.
Documentation
README.md, docs/providers.md
Added "RHEL AI (vLLM)" entry to README compatibility table and added a Red Hat provider row to docs/providers.md (remote/vllm details).
E2E Test Configuration
tests/e2e/configs/run-rhelai.yaml
New end-to-end RHEL AI test configuration YAML defining APIs, storage backends, provider registry (including vllm), server settings, models, telemetry, and runtime mappings.

Sequence Diagram(s)

sequenceDiagram
    actor GitHub as GitHub Actions
    participant Workflow as RHEL AI Workflow
    participant Repo as Repository
    participant Config as Config Generation
    participant Endpoint as RHEL AI Endpoint
    participant Docker as Docker Compose
    participant Tests as E2E Tests

    GitHub->>Workflow: Trigger (daily / workflow_dispatch)
    Workflow->>Repo: Checkout code
    Workflow->>Config: Generate/select `lightspeed-stack.yaml` / `run-rhelai.yaml`
    Workflow->>Endpoint: Connectivity check (curl -f) %% explicit HTTP failure on non-2xx
    alt endpoint reachable
        Workflow->>Docker: docker-compose up (start services)
        Docker->>Docker: wait / health checks
        Workflow->>Endpoint: Quick local connectivity test
        Workflow->>Tests: Run e2e tests (uv sync, make test-e2e)
        Tests-->>Workflow: Test results
    else endpoint/error
        Workflow->>Workflow: Fail early and dump logs
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas to pay attention to:

  • .github/workflows/e2e_tests_rhelai.yaml: secrets wiring, generated config contents, path substitutions, and docker-compose orchestration.
  • .github/workflows/e2e_tests_rhaiis.yaml: curl -f change that alters failure semantics.
  • docker-compose.yaml: new llama-stack env vars and any downstream assumptions.
  • tests/e2e/configs/run-rhelai.yaml: provider/model entries, file paths, and credential placeholders.

Possibly related PRs

Suggested reviewers

  • tisnik
  • radofuchs

Poem

🐇 I hopped through YAML and CI,
I nudged a curl to fail and try,
RHEL AI wakes and answers near,
Compose spins services with a cheer,
Logs and carrots — tests are clear! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "LCORE-333: Lightspeed core needs to fully support RHEL AI LLM provider" accurately captures the main objective of the changeset, which is to add comprehensive RHEL AI support. The changes include a new E2E test workflow for RHEL AI, configuration files (run-rhelai.yaml), environment variables in docker-compose.yaml, and documentation updates in README.md and docs/providers.md. The title is concise, clear, and uses a standard issue tracker reference format that teammates would understand when scanning history. It directly reflects the primary change without unnecessary noise or vague terminology.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a66916 and 37142bd.

📒 Files selected for processing (2)
  • .github/workflows/e2e_tests_rhaiis.yaml (1 hunks)
  • docker-compose.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • docker-compose.yaml
  • .github/workflows/e2e_tests_rhaiis.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@are-ces are-ces marked this pull request as ready for review October 24, 2025 08:23
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
.github/workflows/e2e_tests_rhaiis.yaml (1)

119-125: Port handling inconsistency with RHEL AI workflow.

The RHAIIS connectivity test hardcodes port 8000, while the new RHEL AI workflow (line 126 in .github/workflows/e2e_tests_rhelai.yaml) uses a parameterized ${RHEL_AI_PORT}. For consistency and flexibility, consider parameterizing RHAIIS_PORT similarly.

Additionally, both workflows should validate that secrets are set before attempting curl requests to provide clearer error messages if credentials are missing.

docker-compose.yaml (1)

20-22: Environment variables added correctly, but port handling differs from RHAIIS pattern.

The RHEL_AI environment variables are properly added. However, note that RHAIIS omits RHAIIS_PORT (hardcoded as 8000 in workflows), while RHEL_AI includes RHEL_AI_PORT. Consider whether RHAIIS should also have a parameterized port for consistency and future flexibility.

.github/workflows/e2e_tests_rhelai.yaml (2)

80-119: Run.yaml configuration step is solid, but sed patterns may be brittle.

The script correctly selects and copies the environment-specific config file. However, the sed patterns (lines 114-115) assume specific path formats. If paths in configuration files vary, these patterns might fail silently or miss updates.

Consider making path normalization more robust (e.g., validate the result or use a dedicated tool).


120-127: Connectivity test properly uses parameterized port but lacks fallback messaging.

The curl connectivity test correctly uses -f flag and parameterized ${RHEL_AI_PORT}, improving on the RHAIIS workflow. However, if the external endpoint is unreachable, the entire job fails without clear messaging about which secret/endpoint was unavailable.

Consider adding explicit error checking with descriptive output.

Apply a diff to improve error messaging:

- name: Test RHEL_AI connectivity
  env: 
      RHEL_AI_URL: ${{ secrets.RHEL_AI_URL }}
      RHEL_AI_PORT: ${{ secrets.RHEL_AI_PORT }}
      RHEL_AI_API_KEY: ${{ secrets.RHEL_AI_API_KEY }}
  run: |  
+   if [ -z "$RHEL_AI_URL" ] || [ -z "$RHEL_AI_PORT" ] || [ -z "$RHEL_AI_API_KEY" ]; then
+     echo "❌ Missing required RHEL_AI secrets (RHEL_AI_URL, RHEL_AI_PORT, RHEL_AI_API_KEY)"
+     exit 1
+   fi
    curl -f ${RHEL_AI_URL}:${RHEL_AI_PORT}/v1/models   -H "Authorization: Bearer ${RHEL_AI_API_KEY}"
tests/e2e/configs/run-rhelai.yaml (1)

75-81: RHEL AI vLLM provider configuration looks structurally sound, but review security and token settings.

The provider configuration correctly uses environment variable substitution (${env.RHEL_AI_URL}, etc.). A few observations:

  • Line 80: tls_verify: false disables TLS certificate verification—acceptable for E2E tests against internal endpoints, but ensure this isn't propagated to production configurations.
  • Line 81: max_tokens: 2048 is hardcoded. Verify this aligns with the model's capabilities or consider making it configurable for future flexibility.

Apply a diff to document the security consideration:

  - provider_id: vllm
    provider_type: remote::vllm
    config:
      url: http://${env.RHEL_AI_URL}:${env.RHEL_AI_PORT}/v1/
      api_token: ${env.RHEL_AI_API_KEY}
-     tls_verify: false
+     tls_verify: false  # E2E testing only; use true in production
      max_tokens: 2048
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45c7482 and f1674d9.

📒 Files selected for processing (6)
  • .github/workflows/e2e_tests_rhaiis.yaml (1 hunks)
  • .github/workflows/e2e_tests_rhelai.yaml (1 hunks)
  • README.md (1 hunks)
  • docker-compose.yaml (1 hunks)
  • docs/providers.md (1 hunks)
  • tests/e2e/configs/run-rhelai.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
🔇 Additional comments (5)
docs/providers.md (1)

62-65: Provider documentation is consistent and complete.

The new RHEL AI (vLLM) entry is properly formatted and aligns with the version tested (1.5.2) mentioned in the PR description. The dependency on openai is correct for the OpenAI-compatible vLLM API.

README.md (1)

123-129: LLM Compatibility table entry is consistent and correctly linked.

The new RHEL AI (vLLM) entry properly references the test configuration at tests/e2e/configs/run-rhelai.yaml and maintains consistency with the existing RHAIIS entry and documentation in docs/providers.md.

tests/e2e/configs/run-rhelai.yaml (2)

130-140: Model definitions align with documentation and are properly wired to providers.

The embedding and LLM models correctly reference their respective providers (sentence-transformers and vllm). The model IDs and provider model IDs match documentation and support the test scenario.


1-27: Configuration metadata and API list are comprehensive.

The file properly declares all required APIs for RHEL AI E2E testing. Path normalization in the workflow handles the relative paths correctly.

.github/workflows/e2e_tests_rhelai.yaml (1)

160-167: Review comment is incorrect.

The concern assumes run.yaml must be loaded by the test-e2e target, but this misunderstands the architecture. GitHub Actions does NOT automatically "source" a run.yaml from your repository root, and in this case, run.yaml doesn't need to be. The workflow correctly:

  1. Copies run.yaml from tests/e2e/configs/run-$ENVIRONMENT.yaml to be used by docker-compose (for llama-stack service configuration)
  2. E2E tests connect to services via environment variables (E2E_LSC_HOSTNAME, E2E_LSC_PORT), not by loading run.yaml

The make test-e2e target uses behave and environment variables—no explicit run.yaml reference is needed or expected.

Likely an incorrect or invalid review comment.

@are-ces are-ces marked this pull request as draft October 24, 2025 08:59
@are-ces are-ces marked this pull request as ready for review October 24, 2025 09:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
.github/workflows/e2e_tests_rhelai.yaml (2)

146-159: Consider replacing fixed sleep with health checks or retry logic.

The 20-second fixed sleep (line 149) is a blunt instrument. Services may take longer under load or in CI runners, causing flaky tests. A better approach would use health checks or retry logic with exponential backoff.

Optional improvement (if flakiness is observed):

       - name: Wait for services
         run: |
           echo "Waiting for services to be healthy..."
-          sleep 20  # adjust depending on boot time
+          max_retries=30
+          retry_count=0
+          while [ $retry_count -lt $max_retries ]; do
+            if curl -f http://localhost:8080/v1/models >/dev/null 2>&1 && \
+               curl -f http://llama-stack:8321/v1/models >/dev/null 2>&1; then
+              echo "✓ Services are healthy"
+              exit 0
+            fi
+            retry_count=$((retry_count + 1))
+            echo "Attempt $retry_count/$max_retries: services not ready, waiting..."
+            sleep 2
+          done
+          echo "✗ Services failed to become healthy after $((max_retries * 2))s"
+          exit 1

This avoids unnecessary delays when services are ready quickly and provides faster feedback when they're not.


16-19: Consolidate redundant environment variable declarations.

The RHEL AI secrets are declared in three separate places (job-level env, test connectivity step env, and run service step env). While functional, this violates DRY and creates maintenance risk. Consider declaring once at the job level since they're used consistently.

     env:
       RHEL_AI_URL: ${{ secrets.RHEL_AI_URL }}
       RHEL_AI_PORT: ${{ secrets.RHEL_AI_PORT }}
       RHEL_AI_API_KEY: ${{ secrets.RHEL_AI_API_KEY }}
     
     steps:
       # ... checkout steps ...
       - name: Test RHEL_AI connectivity
-        env: 
-            RHEL_AI_URL: ${{ secrets.RHEL_AI_URL }}
-            RHEL_AI_PORT: ${{ secrets.RHEL_AI_PORT }}
-            RHEL_AI_API_KEY: ${{ secrets.RHEL_AI_API_KEY }}
         run: | 
           # ... uses inherited env vars ...
       
       - name: Run service manually
-        env: 
-            RHEL_AI_URL: ${{ secrets.RHEL_AI_URL }}
-            RHEL_AI_PORT: ${{ secrets.RHEL_AI_PORT }}
-            RHEL_AI_API_KEY: ${{ secrets.RHEL_AI_API_KEY }}
         run: |
           # ... uses inherited env vars ...

Also applies to: 121-124, 130-132

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f1674d9 and f90ef53.

📒 Files selected for processing (6)
  • .github/workflows/e2e_tests_rhaiis.yaml (1 hunks)
  • .github/workflows/e2e_tests_rhelai.yaml (1 hunks)
  • README.md (1 hunks)
  • docker-compose.yaml (1 hunks)
  • docs/providers.md (1 hunks)
  • tests/e2e/configs/run-rhelai.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • .github/workflows/e2e_tests_rhaiis.yaml
  • docs/providers.md
  • README.md
  • docker-compose.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (8)
.github/workflows/e2e_tests_rhelai.yaml (4)

1-33: Workflow structure and trigger configuration look solid.

The daily schedule, on-demand dispatch option, and PR-aware checkout with proper credential handling follow GitHub Actions security best practices.


48-79: Lightspeed-stack service configuration is appropriate for E2E testing.

The remote llama-stack client setup and test authentication are suitable for this use case.


160-178: Test execution and failure diagnostics are well-structured.

The test invocation and conditional logging on failure are appropriate. The dual logging of both llama-stack and lightspeed-stack containers will aid in debugging.


113-115: Review comment is incorrect and should be disregarded.

The sed patterns are appropriate for the actual configuration file. Analysis of run-rhelai.yaml shows:

  • The first sed pattern db_path: \.llama/distributions correctly matches all 8 existing db_path lines that use this relative path
  • The second sed pattern db_path: tmp/ does not match any lines in the config—the only "tmp/" in the file is storage_dir: /tmp/llama-stack-files, which doesn't match the pattern. The second sed is a harmless no-op
  • The pattern includes the key name "db_path:" which provides adequate specificity to prevent false matches in different contexts

The suggested fixes in the review would actually break the intended replacements by searching for non-existent patterns like db_path: ../llama/distributions.

Likely an incorrect or invalid review comment.

tests/e2e/configs/run-rhelai.yaml (4)

1-16: API configuration is appropriate for comprehensive E2E testing.

The selected APIs cover inference, evaluation, safety, and operational concerns needed for meaningful RHEL AI integration testing.


29-117: Provider configuration is comprehensive and appropriate.

The mix of inline (local) and remote providers with proper configuration for each use case (agents, datasets, evaluation, safety, scoring, telemetry) provides good E2E test coverage. The Braintrust placeholder credentials are acceptable for test configuration.


118-140: Server, shields, and model configuration are well-structured.

The minimal server setup, Llama-guard shield integration, and selection of sentence-transformers for embeddings plus Llama-3.1-8B for inference via vLLM provide solid test coverage for the RHEL AI integration.


1-140: No issues found. The environment variable syntax ${env.VARIABLE_NAME} is the standard pattern used consistently across all llama-stack configuration files in the codebase (run-azure.yaml, run-ci.yaml, run-rhaiis.yaml, and run-rhelai.yaml). This confirms the syntax is fully supported and compatible with llama-stack's YAML parser.

Comment on lines 120 to 125
- name: Test RHEL_AI connectivity
env:
RHEL_AI_URL: ${{ secrets.RHEL_AI_URL }}
RHEL_AI_PORT: ${{ secrets.RHEL_AI_PORT }}
RHEL_AI_API_KEY: ${{ secrets.RHEL_AI_API_KEY }}
run: |
curl -f ${RHEL_AI_URL}:${RHEL_AI_PORT}/v1/models -H "Authorization: Bearer ${RHEL_AI_API_KEY}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add quoting and error handling to curl command.

Environment variables used in the curl command (line 126) are unquoted and lack null-value checks. If RHEL_AI_URL or RHEL_AI_PORT are empty (e.g., secrets not configured), the command will silently construct an invalid URL. The curl output also lacks error reporting.

Apply this diff:

       - name: Test RHEL_AI connectivity
         env: 
             RHEL_AI_URL: ${{ secrets.RHEL_AI_URL }}
             RHEL_AI_PORT: ${{ secrets.RHEL_AI_PORT }}
             RHEL_AI_API_KEY: ${{ secrets.RHEL_AI_API_KEY }}
         run: |  
-          curl -f ${RHEL_AI_URL}:${RHEL_AI_PORT}/v1/models   -H "Authorization: Bearer ${RHEL_AI_API_KEY}"
+          if [ -z "$RHEL_AI_URL" ] || [ -z "$RHEL_AI_PORT" ] || [ -z "$RHEL_AI_API_KEY" ]; then
+            echo "Error: RHEL_AI_URL, RHEL_AI_PORT, or RHEL_AI_API_KEY not set"
+            exit 1
+          fi
+          curl -f "${RHEL_AI_URL}:${RHEL_AI_PORT}/v1/models" \
+            -H "Authorization: Bearer ${RHEL_AI_API_KEY}" || {
+            echo "Connectivity test failed - RHEL AI endpoint unreachable"
+            exit 1
+          }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Test RHEL_AI connectivity
env:
RHEL_AI_URL: ${{ secrets.RHEL_AI_URL }}
RHEL_AI_PORT: ${{ secrets.RHEL_AI_PORT }}
RHEL_AI_API_KEY: ${{ secrets.RHEL_AI_API_KEY }}
run: |
curl -f ${RHEL_AI_URL}:${RHEL_AI_PORT}/v1/models -H "Authorization: Bearer ${RHEL_AI_API_KEY}"
- name: Test RHEL_AI connectivity
env:
RHEL_AI_URL: ${{ secrets.RHEL_AI_URL }}
RHEL_AI_PORT: ${{ secrets.RHEL_AI_PORT }}
RHEL_AI_API_KEY: ${{ secrets.RHEL_AI_API_KEY }}
run: |
if [ -z "$RHEL_AI_URL" ] || [ -z "$RHEL_AI_PORT" ] || [ -z "$RHEL_AI_API_KEY" ]; then
echo "Error: RHEL_AI_URL, RHEL_AI_PORT, or RHEL_AI_API_KEY not set"
exit 1
fi
curl -f "${RHEL_AI_URL}:${RHEL_AI_PORT}/v1/models" \
-H "Authorization: Bearer ${RHEL_AI_API_KEY}" || {
echo "Connectivity test failed - RHEL AI endpoint unreachable"
exit 1
}
🤖 Prompt for AI Agents
In .github/workflows/e2e_tests_rhelai.yaml around lines 120 to 127, the curl
command uses unquoted environment variables and does not check for empty values
or surface curl errors; update the workflow to validate that RHEL_AI_URL and
RHEL_AI_PORT (and RHEL_AI_API_KEY) are set before running curl, quote the
variables when constructing the URL and header, and run curl with -fS --retry
and capture/print HTTP status or fail explicitly so the job errors on missing
secrets or connection failures.

@are-ces are-ces force-pushed the rhelai-support branch 2 times, most recently from 00c28d2 to 7391953 Compare October 29, 2025 13:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.github/workflows/e2e_tests_rhelai.yaml (1)

122-125: Add quoting and validation to curl command (duplicate of previous review).

The curl command uses unquoted environment variables that may be empty or unset. If RHEL_AI_URL, RHEL_AI_PORT, or RHEL_AI_API_KEY are not configured as secrets, the command will fail silently or with a cryptic error message. Add explicit validation and proper quoting.

       - name: Test RHEL_AI connectivity
         run: |  
+          if [ -z "$RHEL_AI_URL" ] || [ -z "$RHEL_AI_PORT" ] || [ -z "$RHEL_AI_API_KEY" ]; then
+            echo "Error: RHEL_AI_URL, RHEL_AI_PORT, or RHEL_AI_API_KEY not set"
+            exit 1
+          fi
-          curl -f ${RHEL_AI_URL}:${RHEL_AI_PORT}/v1/models   -H "Authorization: Bearer ${RHEL_AI_API_KEY}"  
+          curl -f "${RHEL_AI_URL}:${RHEL_AI_PORT}/v1/models" \
+            -H "Authorization: Bearer ${RHEL_AI_API_KEY}" || {
+            echo "Connectivity test failed - RHEL AI endpoint unreachable"
+            exit 1
+          }
🧹 Nitpick comments (1)
.github/workflows/e2e_tests_rhelai.yaml (1)

140-143: Consider making the service wait time configurable.

The hardcoded 20-second sleep is reasonable for typical startup, but services (llama-stack, lightspeed-stack) may have variable startup times depending on model loading and system resources. Consider adding a health check loop or allowing this to be overridden via an environment variable or job parameter for better resilience.

       - name: Wait for services
         run: |
           echo "Waiting for services to be healthy..."
-          sleep 20  # adjust depending on boot time
+          WAIT_TIME=${SERVICE_WAIT_TIME:-20}
+          echo "Waiting ${WAIT_TIME} seconds for services to stabilize..."
+          sleep "$WAIT_TIME"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7391953 and 660f45f.

📒 Files selected for processing (1)
  • .github/workflows/e2e_tests_rhelai.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tests/e2e/configs/run-rhelai.yaml (1)

75-81: TLS verification should be configurable via environment variable.

Line 80 hardcodes tls_verify: false, which disables TLS certificate validation. While acceptable for development/test environments, this should be configurable to support different security postures.

Following the pattern used elsewhere in the codebase (e.g., examples/vertexai-run.yaml), make this environment-driven:

     - provider_id: vllm
       provider_type: remote::vllm
       config:
         url: http://${env.RHEL_AI_URL}:${env.RHEL_AI_PORT}/v1/
         api_token: ${env.RHEL_AI_API_KEY}
-        tls_verify: false
+        tls_verify: ${env.RHEL_AI_TLS_VERIFY:=false}
         max_tokens: 2048

Then add RHEL_AI_TLS_VERIFY to .github/workflows/e2e_tests_rhelai.yaml (env section) and docker-compose.yaml so it can be controlled at runtime.

.github/workflows/e2e_tests_rhelai.yaml (1)

121-125: Add null checks and quote variables in connectivity test.

The curl command lacks validation that required secrets are set, and uses unquoted variables which could be problematic if they contain special characters.

       - name: Test RHEL_AI connectivity
+        env:
+          RHEL_AI_URL: ${{ secrets.RHEL_AI_URL }}
+          RHEL_AI_PORT: ${{ secrets.RHEL_AI_PORT }}
+          RHEL_AI_API_KEY: ${{ secrets.RHEL_AI_API_KEY }}
         run: |  
-          echo $RHEL_AI_MODEL
-          curl -f ${RHEL_AI_URL}:${RHEL_AI_PORT}/v1/models   -H "Authorization: Bearer ${RHEL_AI_API_KEY}"
+          if [ -z "$RHEL_AI_URL" ] || [ -z "$RHEL_AI_PORT" ] || [ -z "$RHEL_AI_API_KEY" ]; then
+            echo "Error: RHEL_AI_URL, RHEL_AI_PORT, or RHEL_AI_API_KEY not set"
+            exit 1
+          fi
+          curl -f "${RHEL_AI_URL}:${RHEL_AI_PORT}/v1/models" \
+            -H "Authorization: Bearer ${RHEL_AI_API_KEY}" || {
+            echo "Connectivity test failed - RHEL AI endpoint unreachable"
+            exit 1
+          }
🧹 Nitpick comments (1)
.github/workflows/e2e_tests_rhaiis.yaml (1)

124-124: Quote environment variables in curl command for robustness.

While the -f flag correctly fails on HTTP errors, unquoted variables should be quoted to prevent shell expansion issues if the values contain special characters or spaces.

-          curl -f ${RHAIIS_URL}:${RHAIIS_PORT}/v1/models   -H "Authorization: Bearer ${RHAIIS_API_KEY}"
+          curl -f "${RHAIIS_URL}:${RHAIIS_PORT}/v1/models" \
+            -H "Authorization: Bearer ${RHAIIS_API_KEY}"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 660f45f and 5a66916.

📒 Files selected for processing (6)
  • .github/workflows/e2e_tests_rhaiis.yaml (1 hunks)
  • .github/workflows/e2e_tests_rhelai.yaml (1 hunks)
  • README.md (1 hunks)
  • docker-compose.yaml (1 hunks)
  • docs/providers.md (1 hunks)
  • tests/e2e/configs/run-rhelai.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • README.md
  • docs/providers.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (4)
docker-compose.yaml (1)

20-23: Environment variables for RHEL AI added cleanly.

The addition of RHEL_AI_URL, RHEL_AI_PORT, RHEL_AI_API_KEY, and RHEL_AI_MODEL follows the existing pattern and naming conventions. Good alignment with the new RHEL AI provider integration.

.github/workflows/e2e_tests_rhelai.yaml (3)

81-119: Configuration setup logic is sound.

The configuration selection and path transformation steps include good defensive checks (directory exists, file found, readable alternatives on failure) and clear diagnostics.


126-152: Service startup and health check sequence is well-structured.

The workflow includes appropriate checks (services started without exit codes, wait period, connectivity test) with good error diagnostics (docker logs on failure). This progressive verification approach is sound.


154-175: Test execution and failure logging are well-organized.

The workflow installs dependencies efficiently (uv sync), runs the standard test target, and provides targeted service logs on failure. The conditional logging step (if: failure()) is appropriate and keeps logs focused.

Copy link
Contributor

@radofuchs radofuchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit cd3291c into lightspeed-core:main Oct 31, 2025
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants