Skip to content

STOR-3013: Add e2e test for SELinuxMount upgrade readiness#715

Draft
rvagner78 wants to merge 4 commits into
openshift:mainfrom
rvagner78:stor-3013
Draft

STOR-3013: Add e2e test for SELinuxMount upgrade readiness#715
rvagner78 wants to merge 4 commits into
openshift:mainfrom
rvagner78:stor-3013

Conversation

@rvagner78

@rvagner78 rvagner78 commented Jun 26, 2026

Copy link
Copy Markdown

Summary by CodeRabbit

  • Tests
    • Added an end-to-end readiness test covering SELinux mount conflicts for pods sharing an RWO PVC.
    • Verifies upgrade readiness is blocked when two pods use the same PVC with conflicting SELinux levels.
    • Confirms the system emits the expected warning metric and alert state while blocked.
    • Ensures readiness returns once the conflicting pod is removed, and remains healthy when pods use compatible SELinux labels.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 26, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2026
@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot

openshift-ci-robot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@rvagner78: This pull request references STOR-3013 which is a valid jira issue.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Walkthrough

Adds a new e2e test suite that creates SELinux-labeled pods on a shared PVC, checks the upgrade readiness condition, and queries Thanos/Prometheus metrics and alerts during conflicting and non-conflicting pod scenarios.

Changes

SELinux mount readiness test

Layer / File(s) Summary
Test setup and pod helpers
test/e2e/selinux_mount_readiness.go
Creates the namespace and PVC, gates execution to CSI platforms, builds SELinux-labeled pause pods, waits for pod phase transitions, and polls the Storage condition.
Monitoring route and query access
test/e2e/selinux_mount_readiness.go
Discovers the Thanos Querier URL, obtains a bearer token, sends authenticated monitoring requests, parses query totals, and reads alert states.
Conflicting pod readiness flow
test/e2e/selinux_mount_readiness.go
Creates compatible and conflicting pods, waits for the readiness condition to turn false, checks the conflict metric and alert state, then deletes the conflicting pod and waits for recovery.
Compatible pod steady state
test/e2e/selinux_mount_readiness.go
Verifies the readiness condition stays true when two pods share the same SELinux label on the same PVC.
Recover after pod recreation
test/e2e/selinux_mount_readiness.go
Verifies the readiness condition returns to true after a conflicting pod is recreated with a compatible SELinux label.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 4

❌ Failed checks (4 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Shared-PVC tests never co-locate pod2 with pod1, so they can hit Multi-Attach instead of SELinux conflict and violate existing e2e patterns. Pin pod2 to pod1's node (or equivalent node affinity) in all shared-PVC cases; consider splitting the first It into narrower checks if needed.
Microshift Test Compatibility ⚠️ Warning New e2e test lacks any MicroShift skip/tag and directly uses Storage CR plus openshift-monitoring/thanos-querier/prometheus-k8s, all unsupported on MicroShift. Add a [Skipped:MicroShift] or relevant [apigroup:...] tag, or guard with exutil.IsMicroShiftCluster(), and avoid storage/operator and monitoring-stack dependencies.
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning The test uses public k8s.gcr.io/pause:3.2 via PauseImage, which can fail in disconnected environments; no IPv4-only code was found. Use a mirrored/internal pause image or an image stream already available in-cluster; otherwise mark the test [Skipped:Disconnected].
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding an e2e test for SELinuxMount upgrade readiness.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo titles are fixed string literals; no dynamic values appear in Describe/It names.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No multi-node or HA assumption found: the new e2e test has no SNO skip guard, anti-affinity, node scaling, or distinct-node requirement; multiple pods on one node are fine.
Topology-Aware Scheduling Compatibility ✅ Passed The PR only adds e2e test pods; it introduces no node selectors, anti-affinity, spread constraints, or control-plane-targeted scheduling logic.
Ote Binary Stdout Contract ✅ Passed The new e2e file has no process-level stdout writes; it only uses fmt.Sprintf and Ginkgo logging inside test blocks/helpers.
No-Weak-Crypto ✅ Passed The only changed file is an e2e test; it contains no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, no custom crypto, and no secret/token comparisons.
Container-Privileges ✅ Passed The new test pod explicitly runs non-root, drops ALL caps, and sets allowPrivilegeEscalation=false; no host namespaces or privileged flags were added.
No-Sensitive-Data-In-Logs ✅ Passed Log calls only emit pod/namespace names and static labels; no tokens, passwords, API keys, or hostnames are logged.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.23.0)
test/e2e/selinux_mount_readiness.go

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.13][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rvagner78
Once this PR has been reviewed and has the lgtm label, please assign dobsonj for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
test/e2e/selinux_mount_readiness.go (1)

84-89: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

Cleanup uses the timeout-bound testContext.

All deferred cleanups (namespace, pod1, pod2) use testContext, which is derived from ctx with testTimeout. If the test body times out, testContext is already canceled/expired and the cleanup deletes will fail immediately, leaking the namespace/PVC/pods. Consider a fresh context.Background() (with its own short timeout) for teardown.

Also applies to: 119-123, 132-136

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/selinux_mount_readiness.go` around lines 84 - 89, The deferred
cleanup paths in selinux_mount_readiness.go are using the timeout-bound
testContext, so teardown can be canceled before namespace/pod deletion runs.
Update the deferred cleanup blocks for the namespace and the pod cleanup helpers
to use a fresh background context, ideally with a short dedicated timeout for
teardown instead of reusing testContext. Make the change in the cleanup closures
that call cleanupNamespace and the pod cleanup functions so they can still run
after the main test context expires.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/selinux_mount_readiness.go`:
- Around line 132-136: The deferred cleanup for pod2 in the
selinux_mount_readiness test is using pod2.Name after it has been cleared, so it
may try to delete an empty pod name. Update the cleanup logic around the
deferred deletePodWithWait closure and the later pod2 deletion path to track
“already deleted” with a separate boolean or guard instead of mutating
pod2.Name. Keep the deferred cleanup using the real pod2.Name only when the pod
still exists, and ensure the explicit delete path sets the flag rather than
blanking the name.
- Around line 65-72: The platform check in the selinux mount readiness test can
panic because infra.Status.PlatformStatus may be nil before accessing Type.
Update the infrastructure lookup/skip logic in the test function to guard for a
nil PlatformStatus before the switch, and only read PlatformStatus.Type after
confirming the pointer exists; if it is nil, skip or fail the test with a clear
message using the existing infra.Status.PlatformStatus access path.
- Around line 330-340: The value parsing in the query response aggregation
silently ignores malformed samples because the error from fmt.Sscanf is dropped.
In the helper that loops over queryResp.Data.Result, replace the Sscanf-based
conversion with explicit parsing in that block, handle the parse error
immediately, and propagate it instead of adding a default 0 to total. Keep the
fix localized to the result-processing logic so the calling path can surface bad
sample values clearly.

---

Nitpick comments:
In `@test/e2e/selinux_mount_readiness.go`:
- Around line 84-89: The deferred cleanup paths in selinux_mount_readiness.go
are using the timeout-bound testContext, so teardown can be canceled before
namespace/pod deletion runs. Update the deferred cleanup blocks for the
namespace and the pod cleanup helpers to use a fresh background context, ideally
with a short dedicated timeout for teardown instead of reusing testContext. Make
the change in the cleanup closures that call cleanupNamespace and the pod
cleanup functions so they can still run after the main test context expires.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9b3f00f5-6920-4de8-983c-302066099c87

📥 Commits

Reviewing files that changed from the base of the PR and between 1715686 and 3c0cbd4.

📒 Files selected for processing (1)
  • test/e2e/selinux_mount_readiness.go

Comment thread test/e2e/selinux_mount_readiness.go Outdated
Comment on lines +65 to +72
infra, err := configClient.ConfigV1().Infrastructures().Get(testContext, "cluster", metav1.GetOptions{})
if err != nil {
g.Fail(fmt.Sprintf("Failed to get infrastructure: %v", err))
}
switch infra.Status.PlatformStatus.Type {
case configv1.BareMetalPlatformType, configv1.NonePlatformType:
g.Skip(fmt.Sprintf("Test not applicable for platform %s (no default CSI storage)", infra.Status.PlatformStatus.Type))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Possible nil-pointer dereference on PlatformStatus.

infra.Status.PlatformStatus is a pointer and can be nil. Accessing .Type on Line 69/71 would panic before any platform skip logic runs. Guard for nil first.

🛡️ Proposed guard
+			if infra.Status.PlatformStatus == nil {
+				g.Skip("Test not applicable: infrastructure PlatformStatus is not set")
+			}
 			switch infra.Status.PlatformStatus.Type {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
infra, err := configClient.ConfigV1().Infrastructures().Get(testContext, "cluster", metav1.GetOptions{})
if err != nil {
g.Fail(fmt.Sprintf("Failed to get infrastructure: %v", err))
}
switch infra.Status.PlatformStatus.Type {
case configv1.BareMetalPlatformType, configv1.NonePlatformType:
g.Skip(fmt.Sprintf("Test not applicable for platform %s (no default CSI storage)", infra.Status.PlatformStatus.Type))
}
infra, err := configClient.ConfigV1().Infrastructures().Get(testContext, "cluster", metav1.GetOptions{})
if err != nil {
g.Fail(fmt.Sprintf("Failed to get infrastructure: %v", err))
}
if infra.Status.PlatformStatus == nil {
g.Skip("Test not applicable: infrastructure PlatformStatus is not set")
}
switch infra.Status.PlatformStatus.Type {
case configv1.BareMetalPlatformType, configv1.NonePlatformType:
g.Skip(fmt.Sprintf("Test not applicable for platform %s (no default CSI storage)", infra.Status.PlatformStatus.Type))
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/selinux_mount_readiness.go` around lines 65 - 72, The platform check
in the selinux mount readiness test can panic because
infra.Status.PlatformStatus may be nil before accessing Type. Update the
infrastructure lookup/skip logic in the test function to guard for a nil
PlatformStatus before the switch, and only read PlatformStatus.Type after
confirming the pointer exists; if it is nil, skip or fail the test with a clear
message using the existing infra.Status.PlatformStatus access path.

Source: Linters/SAST tools

Comment thread test/e2e/selinux_mount_readiness.go
Comment on lines +330 to +340
var total float64
for _, result := range queryResp.Data.Result {
if len(result.Value) >= 2 {
if valStr, ok := result.Value[1].(string); ok {
var val float64
fmt.Sscanf(valStr, "%f", &val)
total += val
}
}
}
return total, nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Silent parse failure from unchecked fmt.Sscanf.

The fmt.Sscanf error is ignored (Line 335), so a malformed value silently contributes 0 to total rather than surfacing a parse problem. Prefer strconv.ParseFloat with explicit error handling for the sample value.

♻️ Proposed change
-			if valStr, ok := result.Value[1].(string); ok {
-				var val float64
-				fmt.Sscanf(valStr, "%f", &val)
-				total += val
-			}
+			if valStr, ok := result.Value[1].(string); ok {
+				val, err := strconv.ParseFloat(valStr, 64)
+				if err != nil {
+					return 0, fmt.Errorf("failed to parse metric value %q: %w", valStr, err)
+				}
+				total += val
+			}

As per path instructions: "Never ignore error returns".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var total float64
for _, result := range queryResp.Data.Result {
if len(result.Value) >= 2 {
if valStr, ok := result.Value[1].(string); ok {
var val float64
fmt.Sscanf(valStr, "%f", &val)
total += val
}
}
}
return total, nil
var total float64
for _, result := range queryResp.Data.Result {
if len(result.Value) >= 2 {
if valStr, ok := result.Value[1].(string); ok {
val, err := strconv.ParseFloat(valStr, 64)
if err != nil {
return 0, fmt.Errorf("failed to parse metric value %q: %w", valStr, err)
}
total += val
}
}
}
return total, nil
🧰 Tools
🪛 golangci-lint (2.12.2)

[error] 335-335: Error return value of fmt.Sscanf is not checked

(errcheck)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/selinux_mount_readiness.go` around lines 330 - 340, The value
parsing in the query response aggregation silently ignores malformed samples
because the error from fmt.Sscanf is dropped. In the helper that loops over
queryResp.Data.Result, replace the Sscanf-based conversion with explicit parsing
in that block, handle the parse error immediately, and propagate it instead of
adding a default 0 to total. Keep the fix localized to the result-processing
logic so the calling path can surface bad sample values clearly.

Sources: Path instructions, Linters/SAST tools

@rvagner78

Copy link
Copy Markdown
Author

/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview openshift/api#2882 openshift/kubernetes#2671 #714 #715

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@rvagner78: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7b96e9d0-7156-11f1-9c62-ad103feffb0c-0

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@rvagner78: This PR was included in a payload test run from #715
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7b96e9d0-7156-11f1-9c62-ad103feffb0c-0

@jsafrane

Copy link
Copy Markdown
Contributor

/payload-job periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview

Running the test without implementation to see how it fails. It should fail nicely.

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/943f7110-7164-11f1-93a6-16440f096f18-0

kubeConfig *rest.Config
)

g.BeforeEach(func() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hink the platfom check + namespace creation should be in BeforeEach + their cleanup in AfterEach.
Maybe even PVC creation.

rvagner78 and others added 3 commits June 29, 2026 13:44
Ginkgo cancels the BeforeEach context when the hook returns, so derive
the shared test context from context.Background() instead.

Co-authored-by: Cursor <cursoragent@cursor.com>
The thanos-querier route rejects unauthenticated requests with 401.
Use a bearer token from kubeconfig or the prometheus-k8s service account.

Co-authored-by: Cursor <cursoragent@cursor.com>
@rvagner78

Copy link
Copy Markdown
Author

/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview openshift/kubernetes#2671 #714 #715

@openshift-ci

openshift-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@rvagner78: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/selinux_mount_readiness.go`:
- Around line 71-73: The standalone timeout context creation in the BeforeEach
hook is triggering the fatcontext lint warning. Move the context setup out of
the BeforeEach function literal into a small helper (for example, a dedicated
function used by selinux_mount_readiness.go) that returns the context and cancel
pair, then call that helper from BeforeEach so the lifetime stays standalone
while satisfying golangci-lint.
- Around line 109-111: The shared-PVC SELinux test currently creates pod2
without ensuring it lands on the same node as pod1, so it can fail for
Multi-Attach instead of validating the SELinux-label behavior. In the SELinux
readiness test around newSelinuxTestPod and the pod2 creation flow, wait for
pod1 to reach Running, read pod1.Spec.NodeName, and force pod2 onto that node
(or use equivalent node affinity) before calling CoreV1().Pods(ns.Name).Create
for each shared-PVC case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5dafda6e-c1eb-4a8a-b4b7-1540707759e1

📥 Commits

Reviewing files that changed from the base of the PR and between 3c0cbd4 and 4f50231.

📒 Files selected for processing (1)
  • test/e2e/selinux_mount_readiness.go

Comment on lines +71 to +73
// Use a standalone context: Ginkgo cancels the context passed to
// BeforeEach when the hook returns, before the It block runs.
testContext, cancel = context.WithTimeout(context.Background(), testTimeout)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Resolve the fatcontext lint failure before merge.

context.WithTimeout(context.Background(), ...) inside the BeforeEach function literal is currently reported by golangci-lint. Move the context construction into a small helper to preserve the standalone lifetime while satisfying the linter.

Proposed fix
+func newSelinuxReadinessContext() (context.Context, context.CancelFunc) {
+	return context.WithTimeout(context.Background(), testTimeout)
+}
+
...
-			testContext, cancel = context.WithTimeout(context.Background(), testTimeout)
+			testContext, cancel = newSelinuxReadinessContext()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Use a standalone context: Ginkgo cancels the context passed to
// BeforeEach when the hook returns, before the It block runs.
testContext, cancel = context.WithTimeout(context.Background(), testTimeout)
func newSelinuxReadinessContext() (context.Context, context.CancelFunc) {
return context.WithTimeout(context.Background(), testTimeout)
}
...
testContext, cancel = newSelinuxReadinessContext()
🧰 Tools
🪛 golangci-lint (2.12.2)

[error] 73-73: nested context in function literal

(fatcontext)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/selinux_mount_readiness.go` around lines 71 - 73, The standalone
timeout context creation in the BeforeEach hook is triggering the fatcontext
lint warning. Move the context setup out of the BeforeEach function literal into
a small helper (for example, a dedicated function used by
selinux_mount_readiness.go) that returns the context and cancel pair, then call
that helper from BeforeEach so the lifetime stays standalone while satisfying
golangci-lint.

Source: Linters/SAST tools

Comment on lines +109 to +111
g.By(fmt.Sprintf("Creating pod with SELinux level %s", selinuxTestLevelConflicting))
pod2 := newSelinuxTestPod(ns.Name, "selinux-test-pod-2", pvc.Name, selinuxTestLevelConflicting)
pod2, err = kubeClient.CoreV1().Pods(ns.Name).Create(testContext, pod2, metav1.CreateOptions{})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the shared-PVC pod creation paths and whether pod2 is pinned to pod1's node.
rg -n -C3 'ReadWriteOnce|newSelinuxTestPod|NodeName|PodAffinity|selinux-test-pod-2' test/e2e/selinux_mount_readiness.go

Repository: openshift/cluster-storage-operator

Length of output: 3834


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the relevant test flow and pod template helper.
sed -n '80,140p' test/e2e/selinux_mount_readiness.go
printf '\n---\n'
sed -n '180,280p' test/e2e/selinux_mount_readiness.go
printf '\n---\n'
sed -n '280,420p' test/e2e/selinux_mount_readiness.go

Repository: openshift/cluster-storage-operator

Length of output: 12197


Pin pod2 to pod1’s node for the shared PVC tests.
RWO alone doesn’t guarantee co-location, so pod2 can land on another node and get stuck Pending/Multi-Attach instead of exercising the SELinux-label path. Wait for pod1 to be Running, read pod1.Spec.NodeName, and reuse it (or equivalent node affinity) in each shared-PVC case.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/selinux_mount_readiness.go` around lines 109 - 111, The shared-PVC
SELinux test currently creates pod2 without ensuring it lands on the same node
as pod1, so it can fail for Multi-Attach instead of validating the SELinux-label
behavior. In the SELinux readiness test around newSelinuxTestPod and the pod2
creation flow, wait for pod1 to reach Running, read pod1.Spec.NodeName, and
force pod2 onto that node (or use equivalent node affinity) before calling
CoreV1().Pods(ns.Name).Create for each shared-PVC case.

@rvagner78

Copy link
Copy Markdown
Author

/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview openshift/kubernetes#2671 #714 #715

@openshift-ci

openshift-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@rvagner78: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ecea6200-73be-11f1-80a5-d953eec5d313-0

@openshift-ci

openshift-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@rvagner78: This PR was included in a payload test run from #715
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ecea6200-73be-11f1-80a5-d953eec5d313-0

@rvagner78

Copy link
Copy Markdown
Author

/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview openshift/kubernetes#2671 #714 #715

@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

@rvagner78: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ff343440-7490-11f1-893c-a6a79a083dc4-0

@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

@rvagner78: This PR was included in a payload test run from #715
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ff343440-7490-11f1-893c-a6a79a083dc4-0

@jsafrane

jsafrane commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial openshift/kubernetes#2671 #714 #715

this should succeed

@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@jsafrane: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@jsafrane

jsafrane commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial openshift/kubernetes#2671 #714

@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@jsafrane: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-1of3
  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-2of3
  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c7f7fb70-752a-11f1-8353-8747cb10a2fe-0

@jsafrane

jsafrane commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

/payload-job periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial

this should fail, as the implementation is missing in KCM

@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@jsafrane: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-1of3
  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-2of3
  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/db69db60-752a-11f1-8455-70456461eb30-0

@rvagner78

Copy link
Copy Markdown
Author

/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial openshift/kubernetes#2671 #714 #715

@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@rvagner78: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-1of3
  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-2of3
  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5b0f6ec0-7544-11f1-86d8-78f0905bdb7b-0

@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@rvagner78: This PR was included in a payload test run from #715
trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-1of3
  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-2of3
  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-techpreview-serial-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5b0f6ec0-7544-11f1-86d8-78f0905bdb7b-0

g.By("Creating test namespace")
ns := &v1.Namespace{
ObjectMeta: metav1.ObjectMeta{
GenerateName: "selinux-readiness-test-",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the namespace name should start with e2e-. Not all test namespaces have it, but it's a nice habbit.


// Use a standalone context: Ginkgo cancels the context passed to
// BeforeEach when the hook returns, before the It block runs.
testContext, cancel = context.WithTimeout(context.Background(), testTimeout)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the test steps dependent on each other. If CSO takes 9 minutes to mark a cluster un-upgradeable, then it has remaining 1 minute to check the metrics + alert + wait for cluster upgradeable.

It's better to split them to individual timeouts - ClusterOperator change timeout say 5 minutes, alert 2 minutes, prometheus 2 minutes. Both Prometheus and alert should be already available when CSO marks the cluster upradeable=false, but let's give it some wiggle space.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases, just replace PollUntilContextCancel with PollUntilContextTimeout with the right timeout

return ns
}

func cleanupSelinuxReadinessNamespace(ctx context.Context, kubeClient kubernetes.Interface, namespace string) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func cleanupSelinuxReadinessNamespace(ctx context.Context, kubeClient kubernetes.Interface, namespace string) {
func cleanupSELinuxReadinessNamespace(ctx context.Context, kubeClient kubernetes.Interface, namespace string) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

func createSelinuxReadinessPVC(ctx context.Context, kubeClient kubernetes.Interface, namespace string) *v1.PersistentVolumeClaim {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func createSelinuxReadinessPVC(ctx context.Context, kubeClient kubernetes.Interface, namespace string) *v1.PersistentVolumeClaim {
func createSELinuxReadinessPVC(ctx context.Context, kubeClient kubernetes.Interface, namespace string) *v1.PersistentVolumeClaim {


func waitForOperatorCondition(ctx context.Context, opClient opclient.Interface, conditionType string, expectedStatus operatorapi.ConditionStatus) error {
return wait.PollUntilContextCancel(ctx, waitPollInterval, true, func(ctx context.Context) (bool, error) {
storage, err := opClient.OperatorV1().Storages().Get(ctx, "cluster", metav1.GetOptions{})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should test the overall output, i.e. that ClusterOperator storage has Upgradeable: False.
Testing Storage CR cluster is good to have too.

g.Fail(fmt.Sprintf("Failed to get infrastructure: %v", err))
}
switch infra.Status.PlatformStatus.Type {
case configv1.BareMetalPlatformType, configv1.NonePlatformType:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test can run on any platform, but it must have the default StorageClass with a CSI driver that has SELinuxMount: true.

I.e. list all default storage classes (they have a distinct annotation), in each of them find provisioner: field, and for each provisioner find CSIDriver instance with the same name and its SELinuxMount fields. In case there are multiple default storage classes, all of them must match.

Skip the test if the CSIDriver does not exist or there is no default StorageClass. Throw an error on any other error.

}
})

g.It("should become upgradeable again when a conflicting pod is recreated with a compatible SELinux label", func() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a similar test, but instead of setting the same SELinuxLabel, change SELinuxChangePolicy to Recursive in both Pods. The cluster should get upgradeable too.

And you can easily notice there is a lot of code shared among all e2e tests - always create 2 pods and wait for upgradeable=false, change one or both pods and wait for something. Make a function for this.

It might be better to create Deployments and then edit the Deployments to make the cluster upgradeable. It will reflect better what users do. They don't create Pods.

}
}

func createSelinuxReadinessNamespace(ctx context.Context, kubeClient kubernetes.Interface) *v1.Namespace {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func createSelinuxReadinessNamespace(ctx context.Context, kubeClient kubernetes.Interface) *v1.Namespace {
func createSELinuxReadinessNamespace(ctx context.Context, kubeClient kubernetes.Interface) *v1.Namespace {

Comment on lines +501 to +505
spec, ok := route.Object["spec"].(map[string]interface{})
if !ok {
return "", fmt.Errorf("thanos-querier route has no spec")
}
host, ok := spec["host"].(string)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some nested helpers, try unstructured.NestedString(route.Object, "spec", "host") and unstructured.NestedString(route.Object, "status", "ingress", "0", "host") (AI generated, I am not sure about "0" there)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants