Skip to content

WIP: Add k3s-based CAPZ e2e test (no IPI cluster needed)#79611

Draft
marek-veber wants to merge 5 commits into
openshift:mainfrom
marek-veber:capz-k3s-e2e
Draft

WIP: Add k3s-based CAPZ e2e test (no IPI cluster needed)#79611
marek-veber wants to merge 5 commits into
openshift:mainfrom
marek-veber:capz-k3s-e2e

Conversation

@marek-veber
Copy link
Copy Markdown

@marek-veber marek-veber commented May 21, 2026

Summary

  • Adds a new capz-k3s-e2e Prow job that uses k3s as the management cluster instead of provisioning a full IPI OpenShift cluster
  • k3s runs directly inside the test pod via nested_podman capabilities (SYS_ADMIN + NET_ADMIN)
  • Saves ~40 minutes of IPI provisioning time and reduces Azure compute costs

Purpose

Draft PR to validate whether nested_podman provides enough capabilities for k3s to start inside a Prow test pod.

New files

  • ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh — starts k3s, deploys controllers, runs e2e tests
  • ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-ref.yaml — step ref with nested_podman: true
  • ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-workflow.yaml — workflow without IPI

Why k3s instead of Kind?

Kind-in-Podman fails because containerd inside the Kind node can't do overlay mounts 3 levels deep (Docker → Podman → containerd → overlay = failed to mount rootfs component: invalid argument). k3s runs everything as a single process with --snapshotter=native, avoiding nested container issues entirely.

🤖 Generated with Claude Code

Summary by CodeRabbit

This PR adds a lightweight Prow CI job to stolostron/capi-tests: capz-k3s-e2e. Instead of provisioning an IPI OpenShift workload cluster, the job starts a k3s management cluster inside the test pod (requires nested_podman with SYS_ADMIN + NET_ADMIN), deploys CAPI/CAPZ/ASO controllers into it, and runs CAPZ e2e phases 01–08. The goal is to avoid IPI provisioning (~40 minutes saved) and reduce Azure compute costs. The PR is a draft to validate whether nested_podman provides sufficient capabilities for k3s to start reliably inside Prow test pods.

Practical impact / what changed

  • Registers a new Prow job capz-k3s-e2e in stolostron/capi-tests (cluster_profile: openshift-org-azure, WORKLOAD_CLUSTER_NAMESPACE: capz-test-prow) with a 4h job timeout.
  • Adds a step script that:
    • installs and starts k3s (default K3S_VERSION v1.31.6+k3s1) with --snapshotter=native, traefik disabled, kubeconfig written to /tmp (mode 644), and kubelet eviction args;
    • manages lifecycle with a cleanup trap, waits up to ~300s for the k3s node Ready state, and exits on failure;
    • configures the test suite to use k3s (DEPLOY_CHARTS=true, USE_K8S=false), installs gotestsum (pinned), and runs the e2e suite via gotestsum producing ${ARTIFACT_DIR}/junit-e2e.xml with a 150m test timeout.
    • (Commit fix) downloads k3s to /tmp instead of /usr/local/bin to accommodate non-root, read-only CI pods.
  • Adds a step reference capz-test-k3s-e2e that enables nested_podman, requests 4 CPU and 8Gi memory, sets a 3h step timeout, exposes K3S_VERSION and WORKLOAD_CLUSTER_NAMESPACE env defaults, and documents the nested-podman requirement.
  • Adds a workflow capz-test-k3s-e2e that runs the step and posts capz-test-summary; workflow docs note the ~40m savings and nested-podman requirement.
  • Adds OWNERS and metadata JSON files for the new step/workflow.

Repositories / CI surface affected

  • ci-operator configuration and step-registry for stolostron/capi-tests: new job, step script, step ref, workflow, OWNERS, and metadata.

Rationale

  • Kind-in-Podman fails due to nested overlay mount issues when containerd runs inside Kind nodes; k3s runs as a single process with --snapshotter=native, avoiding nested overlay problems and making an in-pod management cluster feasible.

Status / notes

  • Draft PR for validating nested_podman capability in Prow pods.
  • Step timeout: 3h; job timeout: 4h. Requested resources: 4 CPU, 8Gi memory. gotestsum pinned in the step (per script).

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a k3s-based CAPZ e2e test: a command script to bootstrap/run k3s and tests, a step reference with nested-podman and resources, a workflow to run/post the step, CI job registration in stolostron config, and an OWNERS file for the step registry.

Changes

CAPZ k3s e2e testing

Layer / File(s) Summary
k3s e2e step reference and docs
ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-ref.yaml
Defines capz-test-k3s-e2e step ref with nested_podman: true, resource requests (4 CPU / 8Gi), env defaults (K3S_VERSION, WORKLOAD_CLUSTER_NAMESPACE), timeout, and capability notes (SYS_ADMIN/NET_ADMIN).
k3s e2e test step (commands)
ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh
Bootstraps an in-pod k3s server (start flags, kubeconfig), polls for node readiness, cleans up the server on exit, installs gotestsum, and runs the e2e suite producing a JUnit XML artifact.
Workflow orchestration
ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-workflow.yaml
Adds capz-test-k3s-e2e workflow that runs the step and posts capz-test-summary, with docs describing k3s-based no-IPI execution and ~40-minute saving vs IPI.
CAPZ CI job registration
ci-operator/config/stolostron/capi-tests/stolostron-capi-tests-main.yaml
Registers capz-k3s-e2e job with cluster_profile: openshift-org-azure, WORKLOAD_CLUSTER_NAMESPACE: capz-test-prow, build_root.project_image.dockerfile_path: Dockerfile.prow, release targets for OCP 4.19, default resource requests, and a 4h timeout.
OWNERS and metadata
ci-operator/step-registry/capz/test/k3s-e2e/OWNERS, ci-operator/step-registry/capz/test/k3s-e2e/*.metadata.json
Adds approvers and reviewers: radekcap, marek-veber, and mzazrivec, and metadata files linking owners to the step and workflow.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels: ok-to-test, rehearsals-ack

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding a k3s-based CAPZ e2e test as an alternative to IPI cluster provisioning.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed This PR adds CI/prow configuration and shell scripts only—no Ginkgo test definitions with dynamic names. Test code is sourced externally from CAPZ repository.
Test Structure And Quality ✅ Passed PR adds Prow CI configuration and setup scripts, not Ginkgo test code. Custom check for Ginkgo test quality is not applicable as no test code is present.
Microshift Test Compatibility ✅ Passed PR adds only CI/Prow infrastructure configuration and shell scripts, not Ginkgo e2e tests. The custom check applies only when Ginkgo tests are added.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds CI infrastructure for k3s-based CAPZ tests but no new Ginkgo e2e tests. Test code sourced from capi-tests repository, not added here.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds only CI/test infrastructure files, not deployment manifests or operators. Check applies to production deployments, not ephemeral CI infrastructure.
Ote Binary Stdout Contract ✅ Passed The OTE Binary Stdout Contract check applies only to Go test binaries. This PR contains only YAML configuration, shell scripts, JSON metadata, and OWNERS files—no Go test code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR adds only CI/infrastructure configuration files and setup scripts, not new Ginkgo e2e test code. No It()/Describe() test definitions are added. The actual tests run from capi-tests repo.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2026
@marek-veber
Copy link
Copy Markdown
Author

/test all

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber, pj-rehearse: unable to determine affected jobs. This could be due to a branch that needs to be rebased. ERROR:

could not determine changed registry steps: could not load step registry: Workflow capz-test-k3s-e2e contains non-existent reference capz-test-teardown
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh (1)

52-64: ⚡ Quick win

Surface k3s diagnostics on readiness timeout.

When k3s fails to become ready, the script exits with only a fatal message and no diagnostics, which will make debugging the very thing this PR is validating much harder. Consider capturing k3s server output to a file and dumping it (plus kubectl get nodes/kubectl get pods -A) into ${ARTIFACT_DIR} before exiting.

🔧 Proposed change
 k3s server \
   --disable=traefik \
   --snapshotter=native \
   --write-kubeconfig="${K3S_KUBECONFIG}" \
   --write-kubeconfig-mode=644 \
   --kubelet-arg="eviction-hard=imagefs.available<1%,nodefs.available<1%" \
   --kubelet-arg="eviction-minimum-reclaim=imagefs.available=1%,nodefs.available=1%" \
-  &
+  >"${ARTIFACT_DIR}/k3s-server.log" 2>&1 &
 K3S_PID=$!
...
 if [ "${READY}" != true ]; then
   echo "[k3s] FATAL: k3s did not become ready within 300s"
+  KUBECONFIG="${K3S_KUBECONFIG}" kubectl get nodes -o wide >"${ARTIFACT_DIR}/k3s-nodes.txt" 2>&1 || true
+  KUBECONFIG="${K3S_KUBECONFIG}" kubectl get pods -A -o wide >"${ARTIFACT_DIR}/k3s-pods.txt" 2>&1 || true
   exit 1
 fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh`
around lines 52 - 64, When the readiness loop using
KUBECONFIG="${K3S_KUBECONFIG}" and the READY flag times out, capture diagnostics
before exiting: redirect the k3s server logs into a file under ${ARTIFACT_DIR}
(e.g., k3s-server.log), and run KUBECONFIG="${K3S_KUBECONFIG}" kubectl get nodes
--no-headers and kubectl get pods -A, saving their outputs into separate files
in ${ARTIFACT_DIR}; place this dump in the failure branch where the script
currently checks if [ "${READY}" != true ] and before the echo/exit so artifacts
are preserved for debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci-operator/config/stolostron/capi-tests/stolostron-capi-tests-main.yaml`:
- Around line 19-57: The CI config added two new test entries ("capz-e2e" and
"capz-k3s-e2e") under tests which requires regenerating downstream ProwJob
artifacts; run the repository's generation target (make update) to regenerate
files under ci-operator/jobs/, review and stage the changed generated artifacts,
and commit those updated artifacts alongside the change that introduced the new
tests so the new capz-e2e and capz-k3s-e2e jobs are present in the committed
outputs.

In `@ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh`:
- Line 16: The script currently enables xtrace via the literal line "set -o
xtrace"; remove that default and ensure the script starts with the required safe
shell flags "set -euo pipefail" instead, and if you need -x for debugging, gate
enabling xtrace behind a DEBUG (or CI_DEBUG) environment variable (e.g., test if
DEBUG is set and then call "set -o xtrace"), making sure kubeconfig handling and
kubectl/k3s calls no longer run with -x by default.

---

Nitpick comments:
In `@ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh`:
- Around line 52-64: When the readiness loop using
KUBECONFIG="${K3S_KUBECONFIG}" and the READY flag times out, capture diagnostics
before exiting: redirect the k3s server logs into a file under ${ARTIFACT_DIR}
(e.g., k3s-server.log), and run KUBECONFIG="${K3S_KUBECONFIG}" kubectl get nodes
--no-headers and kubectl get pods -A, saving their outputs into separate files
in ${ARTIFACT_DIR}; place this dump in the failure branch where the script
currently checks if [ "${READY}" != true ] and before the echo/exit so artifacts
are preserved for debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2b9b4e37-2673-4efd-a25f-80be41b10a9e

📥 Commits

Reviewing files that changed from the base of the PR and between 5403786 and b283938.

📒 Files selected for processing (4)
  • ci-operator/config/stolostron/capi-tests/stolostron-capi-tests-main.yaml
  • ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh
  • ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-ref.yaml
  • ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-workflow.yaml

export USE_KUBECONFIG="${K3S_KUBECONFIG}"

source openshift-ci/capz-test-env.sh
set -o xtrace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid enabling xtrace by default.

Per coding guidelines, step-registry command scripts should run with set -euo pipefail and only enable -x (here set -o xtrace) when actively debugging. Since this script also exports/handles a k3s kubeconfig path and runs kubectl/k3s commands, leaving xtrace on by default risks leaking command arguments into job logs and increases log noise. Consider removing it (or gating it behind a debug flag) before merging out of WIP.

🔧 Proposed change
-source openshift-ci/capz-test-env.sh
-set -o xtrace
+source openshift-ci/capz-test-env.sh
+# Enable tracing only when debugging:
+# set -o xtrace

As per coding guidelines: "Step registry script files must use set -euo pipefail (without -x) as default and only enable -x when actively debugging".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
set -o xtrace
source openshift-ci/capz-test-env.sh
# Enable tracing only when debugging:
# set -o xtrace
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh` at
line 16, The script currently enables xtrace via the literal line "set -o
xtrace"; remove that default and ensure the script starts with the required safe
shell flags "set -euo pipefail" instead, and if you need -x for debugging, gate
enabling xtrace behind a DEBUG (or CI_DEBUG) environment variable (e.g., test if
DEBUG is set and then call "set -o xtrace"), making sure kubeconfig handling and
kubectl/k3s calls no longer run with -x by default.

@marek-veber
Copy link
Copy Markdown
Author

/test all

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marek-veber

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber, pj-rehearse: unable to determine affected jobs. This could be due to a branch that needs to be rebased. ERROR:

couldn't prepare candidate: couldn't checkout base SHA 96c469107b7ac535f2a4021db6a44ef6e14b0ec2: error checking out "96c469107b7ac535f2a4021db6a44ef6e14b0ec2": exit status 128 fatal: unable to read tree (96c469107b7ac535f2a4021db6a44ef6e14b0ec2)

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@marek-veber
Copy link
Copy Markdown
Author

/test all

@marek-veber
Copy link
Copy Markdown
Author

/retest

@marek-veber
Copy link
Copy Markdown
Author

/pj-rehearse capz-k3s-e2e

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber: job(s): capz-k3s-e2e either don't exist or were not found to be affected, and cannot be rehearsed

@marek-veber
Copy link
Copy Markdown
Author

/pj-rehearse pull-ci-stolostron-capi-tests-main-capz-k3s-e2e

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

marek-veber and others added 3 commits May 21, 2026 19:20
… teardown ref

- Add missing OWNERS file for k3s-e2e step registry
- Add grace_period to ref (required when using trap)
- Remove non-existent capz-test-teardown reference
- Remove non-existent capz-e2e job (missing step registrations)
- Fix cluster profile: azure4 -> openshift-org-azure
- Remove undeclared DEPLOY_CHARTS env override (set in script)
- Regenerate ci-operator config and job configs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@marek-veber
Copy link
Copy Markdown
Author

/pj-rehearse pull-ci-stolostron-capi-tests-main-capz-k3s-e2e

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@marek-veber
Copy link
Copy Markdown
Author

/pj-rehearse pull-ci-stolostron-capi-tests-main-capz-k3s-e2e

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@marek-veber
Copy link
Copy Markdown
Author

/pj-rehearse pull-ci-stolostron-capi-tests-main-capz-k3s-e2e

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

CI pods run as non-root with read-only /usr/local/bin, causing curl
write error (exit 23). Use /tmp and prepend to PATH.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh`:
- Around line 32-34: Replace downloading to and prepending the world-writable
/tmp with a secure dedicated temp directory and make the download fail fast:
create a private dir via mktemp -d, download K3S from K3S_URL into that dir
using curl with fail-on-error flags (e.g. -f/--fail and -L), make the downloaded
binary executable (the file referenced as /tmp/k3s in the diff), and update PATH
to include only that temp dir (not /tmp) so the temp dir precedes system bins;
ensure you also clean up the temp dir on exit or error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d62306fe-879d-47bc-8ee4-b5a8d49cf0f6

📥 Commits

Reviewing files that changed from the base of the PR and between 66db1f3 and c78533b.

📒 Files selected for processing (1)
  • ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh

Comment on lines +32 to +34
curl -sLo /tmp/k3s "${K3S_URL}"
chmod +x /tmp/k3s
export PATH="/tmp:${PATH}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid prepending world-writable /tmp to PATH.

On Line 34, putting /tmp first in PATH allows command shadowing risk (for tools invoked later like kubectl/go). Use a dedicated temp dir and fail fast on HTTP errors.

🔧 Proposed change
-curl -sLo /tmp/k3s "${K3S_URL}"
-chmod +x /tmp/k3s
-export PATH="/tmp:${PATH}"
+K3S_BIN_DIR="$(mktemp -d)"
+K3S_BIN="${K3S_BIN_DIR}/k3s"
+curl -fsSLo "${K3S_BIN}" "${K3S_URL}"
+chmod 0755 "${K3S_BIN}"
+export PATH="${K3S_BIN_DIR}:${PATH}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
curl -sLo /tmp/k3s "${K3S_URL}"
chmod +x /tmp/k3s
export PATH="/tmp:${PATH}"
K3S_BIN_DIR="$(mktemp -d)"
K3S_BIN="${K3S_BIN_DIR}/k3s"
curl -fsSLo "${K3S_BIN}" "${K3S_URL}"
chmod 0755 "${K3S_BIN}"
export PATH="${K3S_BIN_DIR}:${PATH}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci-operator/step-registry/capz/test/k3s-e2e/capz-test-k3s-e2e-commands.sh`
around lines 32 - 34, Replace downloading to and prepending the world-writable
/tmp with a secure dedicated temp directory and make the download fail fast:
create a private dir via mktemp -d, download K3S from K3S_URL into that dir
using curl with fail-on-error flags (e.g. -f/--fail and -L), make the downloaded
binary executable (the file referenced as /tmp/k3s in the diff), and update PATH
to include only that temp dir (not /tmp) so the temp dir precedes system bins;
ensure you also clean up the temp dir on exit or error.

@marek-veber
Copy link
Copy Markdown
Author

/pj-rehearse pull-ci-stolostron-capi-tests-main-capz-k3s-e2e

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

CI pods run as random non-root UID. k3s requires --rootless flag
in this case. Also install slirp4netns dependency, use writable
data-dir, and detect early if the k3s process dies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@marek-veber
Copy link
Copy Markdown
Author

/pj-rehearse pull-ci-stolostron-capi-tests-main-capz-k3s-e2e

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@marek-veber: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-stolostron-capi-tests-main-capz-k3s-e2e stolostron/capi-tests presubmit Presubmit changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@marek-veber
Copy link
Copy Markdown
Author

/pj-rehearse abort

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@marek-veber
Copy link
Copy Markdown
Author

/pj-rehearse pull-ci-stolostron-capi-tests-main-capz-k3s-e2e

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@marek-veber: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 22, 2026

@marek-veber: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/stolostron/capi-tests/main/capz-k3s-e2e 921cac9 link unknown /pj-rehearse pull-ci-stolostron-capi-tests-main-capz-k3s-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant