Skip to content

OCPEDGE-2116: feat: support MAC-address based fencing credentials lookup#1600

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
fracappa:fca/support-mac-based-fencing-credentials
May 28, 2026
Merged

OCPEDGE-2116: feat: support MAC-address based fencing credentials lookup#1600
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
fracappa:fca/support-mac-based-fencing-credentials

Conversation

@fracappa
Copy link
Copy Markdown
Contributor

@fracappa fracappa commented Apr 27, 2026

When the installer creates fencing credentials using MAC addresses instead of hostnames, the secrets are named with a SHA256 hash of the normalized MAC (e.g. fencing-credentials-{hash}).

This PR adds a multi-phase fencing secret resolution to GetFencingSecrets:

  1. Hostname: try fencing-credentials-{nodeName} directly
  2. MAC hash: read MAC addresses from node annotation (tnf.openshift.io/mac-addresses), hash each, try fencing-credentials-{hash} by name
  3. Redfish UUID: query each unclaimed fencing secret's Redfish endpoint for the system UUID, match against node.status.nodeInfo.systemUUID

The auth job (per-node) discovers all non-loopback MAC addresses via nsenter and annotates the node, so they are available when the setup/fencing/update-setup jobs resolve secrets.

Also removes unused machines and baremetalhosts RBAC rules from the TNF clusterrole and adds patch on nodes for the annotation.

Summary by CodeRabbit

  • New Features

    • Improved cluster fencing: multi-node credential resolution and host-to-node matching for more reliable STONITH configuration.
  • Chores

    • Updated cluster RBAC to allow retrieval/listing of machine and baremetalhost resources to support fencing workflows.
  • Tests

    • Added unit tests covering multi-node credential resolution and host-to-node matching.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 27, 2026

@fracappa: This pull request references OCPEDGE-2116 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

When the installer creates fencing credentials using MAC addresses instead of hostname, the secrets are named with a SHA256 hash of the normalized MAC (e.g. fencing-credentials-11aa22bb). Add a fallaback to GetFencingSecrets that resolves the node's boot MAC from BareMetalHost CRs and computes the matching hash when the hostname-based secret is not found.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a dynamic Kubernetes client, replaces per-node fencing secret lookups with a bulk GetFencingSecrets resolver that matches secrets by address/hostname and BMH references, updates ConfigureFencing to accept a dynamic client, wires the client through runners, updates RBAC, and adds tests for multi-node resolution.

Changes

Bulk Fencing Credential Resolution

Layer / File(s) Summary
RBAC permissions and constants
bindata/tnfdeployment/clusterrole.yaml, pkg/tnf/pkg/tools/secrets.go
ClusterRole gains get/list for machine.openshift.io machines and metal3.io baremetalhosts; imports and GVR/constants for dynamic BMH/Machine lookups are added.
Runner wiring: dynamic client creation
pkg/tnf/fencing/runner.go, pkg/tnf/setup/runner.go, pkg/tnf/update-setup/runner.go
Each runner creates a dynamic client via dynamic.NewForConfig(clientConfig), returns on error, and passes dyClient into pcs.ConfigureFencing.
ConfigureFencing update
pkg/tnf/pkg/pcs/fencing.go
Adds dynamic.Interface parameter; calls tools.GetFencingSecrets once to obtain secretMap and builds per-node fencing configs from that map instead of per-node secret calls.
Multi-node secret resolver
pkg/tnf/pkg/tools/secrets.go
Replaces GetFencingSecret with GetFencingSecrets: two-phase lookup — direct per-node Secret fetch (defer not-found), list unclaimed fencing Secrets, match by Secret data["address"] ↔ BMH spec.bmc.address, map BMH→node via Machine.status.nodeRef or node annotation, assign secrets, and error if unresolved nodes remain.
Secret helper utilities
pkg/tnf/pkg/tools/secrets.go
Adds listUnclaimedFencingSecrets, matchBMHsToNodes, buildBMHToNodeMap, buildMachineToNodeMap, and buildMachineToNodeMapFromAnnotations to support bulk resolution.
Tests for resolver
pkg/tnf/pkg/tools/secrets_test.go
Adds TestGetFencingSecrets with helpers for Secrets, unstructured BMH/Machine, fake dynamic client; covers hostname-match, BMH-match, machine- and annotation-based mappings, failure cases, and skipping empty node names.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • ardaguclu
  • atiratree
  • fonta-rh
🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Topology-Aware Scheduling Compatibility ⚠️ Warning PR adds CronJob with nodeSelector targeting control-plane nodes without topology-aware checks, causing Pending pods on HyperShift. Add topology checks before applying nodeSelector, or remove master nodeSelector and use only tolerations for compatibility with SNO, HyperShift, and TNF topologies.
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding support for MAC-address based fencing credentials lookup, which is the core purpose of the refactoring across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR uses standard Go testing, not Ginkgo. No Ginkgo test names found. All test names are static and descriptive with no dynamic content.
Test Structure And Quality ✅ Passed The test uses standard Go testing with table-driven tests, not Ginkgo. Custom check explicitly targets Ginkgo tests with BeforeEach/AfterEach patterns, so not applicable here.
Microshift Test Compatibility ✅ Passed This PR contains no new Ginkgo e2e tests. The only test file added (secrets_test.go) contains standard Go unit tests using func TestGetFencingSecrets, not Ginkgo patterns.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests were added in this PR. Only standard Go unit tests (testing.T) were added to pkg/tnf/pkg/tools/secrets_test.go. The check is not applicable.
Ote Binary Stdout Contract ✅ Passed PR modifies library functions only (no entry points). No direct stdout writes (fmt.Print*, os.Stdout) found. klog properly configured via k8s/component-base/logs in entry points.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Custom check does not apply: PR adds no Ginkgo e2e tests, only standard Go unit tests. IPv4 addresses in test are mock data, not e2e test assertions or external connectivity.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 27, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@fracappa fracappa force-pushed the fca/support-mac-based-fencing-credentials branch from cd4d1fd to 90ccef0 Compare April 27, 2026 15:05
@fracappa fracappa force-pushed the fca/support-mac-based-fencing-credentials branch 3 times, most recently from e043c1d to 417dce3 Compare May 14, 2026 17:28
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/tnf/pkg/tools/mac.go`:
- Around line 130-166: The loop over bmhList currently returns an error as soon
as it finds a matching BareMetalHost (consumerRef name == machineName) that has
no MACs; change the logic so that when len(macs) == 0 you continue the loop
instead of returning (i.e., replace the early return in the block checking "if
len(macs) == 0 { ... }" with continue) so other matching BMHs are checked; keep
the final return after the loop that errors when no matching BMHs with MACs were
found (preserve the existing error message).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1e802cb5-28b6-4f07-929a-e66973b06ec5

📥 Commits

Reviewing files that changed from the base of the PR and between c0614ca and e043c1d.

📒 Files selected for processing (9)
  • bindata/tnfdeployment/clusterrole.yaml
  • pkg/tnf/fencing/runner.go
  • pkg/tnf/pkg/pcs/fencing.go
  • pkg/tnf/pkg/tools/mac.go
  • pkg/tnf/pkg/tools/mac_test.go
  • pkg/tnf/pkg/tools/secrets.go
  • pkg/tnf/pkg/tools/secrets_test.go
  • pkg/tnf/setup/runner.go
  • pkg/tnf/update-setup/runner.go

Comment thread pkg/tnf/pkg/tools/mac.go Outdated
@fracappa fracappa force-pushed the fca/support-mac-based-fencing-credentials branch 2 times, most recently from eb87e9a to 7c7bcba Compare May 19, 2026 08:31
@fracappa fracappa marked this pull request as ready for review May 20, 2026 13:38
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2026
@openshift-ci openshift-ci Bot requested review from clobrano and fonta-rh May 20, 2026 13:38
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/tnf/pkg/tools/secrets_test.go (1)

90-217: ⚡ Quick win

Add a mixed mapping test (partial Machine CRs + node-annotation fallback).

Please add a case where one node resolves via Machine status.nodeRef and another resolves only via machine.openshift.io/machine annotation. That guards the intended “all available signals” behavior from regressing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/tnf/pkg/tools/secrets_test.go` around lines 90 - 217, Add a new test case
to the existing tests slice that verifies mixed resolution: one node should be
matched to a secret via a Machine CR exposing status.nodeRef (create a Machine
object whose status.nodeRef.Name equals the node name using the helper
newMachine or a variant that sets status.nodeRef) and the other node should be
matched only via the node annotation "machine.openshift.io/machine" (add the
annotation to the Node object). Use newFencingSecret for secrets and
newBMHWithConsumer to link BMHs to machines, set nodeNames to include both
nodes, and set wantMap to assert each node maps to the correct secret; set
wantErr=false. Ensure the test includes bmhs, machines, nodes, and secrets
entries to exercise both code paths (Machine.status.nodeRef and node annotation
fallback).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/tnf/pkg/tools/secrets.go`:
- Around line 185-188: The current logic replaces machineToNode with
annotation-derived map only when buildMachineToNodeMap(ctx, dyClient) returns
empty; instead, call both buildMachineToNodeMap(ctx, dyClient) and
buildMachineToNodeMapFromAnnotations(ctx, kubeClient, nodeNames) and merge them
so any missing node entries in the Machine-CR map are filled from the
annotations. Update the code around machineToNode so that you iterate over the
annotation map and add entries only for keys not present in the Machine-CR map
(preserving Machine-CR values when present) to produce a combined mapping.
- Around line 253-255: The code currently splits annotation into parts using
strings.SplitN(annotation, "/", 2) and only checks parts length and empty name;
update the logic to also validate the annotation namespace (parts[0]) before
using the machine name so you don't accept annotations from other namespaces. In
the block around parts := strings.SplitN(annotation, "/", 2) (where you
currently check len(parts) != 2 || parts[1] == ""), add a check that parts[0]
equals the expected namespace string (e.g., "machine.openshift.io") and continue
if it does not match, ensuring any subsequent use of parts[1] (the machine name)
only occurs when the namespace is verified.

---

Nitpick comments:
In `@pkg/tnf/pkg/tools/secrets_test.go`:
- Around line 90-217: Add a new test case to the existing tests slice that
verifies mixed resolution: one node should be matched to a secret via a Machine
CR exposing status.nodeRef (create a Machine object whose status.nodeRef.Name
equals the node name using the helper newMachine or a variant that sets
status.nodeRef) and the other node should be matched only via the node
annotation "machine.openshift.io/machine" (add the annotation to the Node
object). Use newFencingSecret for secrets and newBMHWithConsumer to link BMHs to
machines, set nodeNames to include both nodes, and set wantMap to assert each
node maps to the correct secret; set wantErr=false. Ensure the test includes
bmhs, machines, nodes, and secrets entries to exercise both code paths
(Machine.status.nodeRef and node annotation fallback).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 711be4e4-7a94-453a-9b86-a5c15ee74d6f

📥 Commits

Reviewing files that changed from the base of the PR and between 8ede4ee and 7c7bcba.

📒 Files selected for processing (3)
  • pkg/tnf/pkg/pcs/fencing.go
  • pkg/tnf/pkg/tools/secrets.go
  • pkg/tnf/pkg/tools/secrets_test.go

Comment thread pkg/tnf/pkg/tools/secrets.go Outdated
Comment thread pkg/tnf/pkg/tools/secrets.go Outdated
@fracappa fracappa force-pushed the fca/support-mac-based-fencing-credentials branch from 3ed6430 to fd1dfe2 Compare May 20, 2026 15:21
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/tnf/pkg/tools/secrets.go`:
- Around line 185-190: The variable machineToNode may be nil if
buildMachineToNodeMap(ctx, dyClient) returns nil, causing a panic when assigning
entries from annotationMachineToNode; before the merge loop (which references
machineToNode and annotationMachineToNode), ensure machineToNode is initialized
(e.g., if machineToNode == nil then allocate a new map[string]string) so that
the subsequent loop that writes into machineToNode (the for loop using
machineName, nodeName) is safe.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f4544ae5-31cd-4131-b069-adf783a51b33

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7bcba and fd1dfe2.

📒 Files selected for processing (7)
  • bindata/tnfdeployment/clusterrole.yaml
  • pkg/tnf/fencing/runner.go
  • pkg/tnf/pkg/pcs/fencing.go
  • pkg/tnf/pkg/tools/secrets.go
  • pkg/tnf/pkg/tools/secrets_test.go
  • pkg/tnf/setup/runner.go
  • pkg/tnf/update-setup/runner.go

Comment thread pkg/tnf/pkg/tools/secrets.go Outdated
@fracappa fracappa force-pushed the fca/support-mac-based-fencing-credentials branch 3 times, most recently from f407e2a to b93b47a Compare May 25, 2026 15:18
@fonta-rh
Copy link
Copy Markdown
Contributor

Test suggestion: cover the IsNotFound vs API-error branch in Phase 1 and Phase 2

GetFencingSecrets distinguishes errors.IsNotFound (fall through to next phase) from other API errors (fail immediately) in both Phase 1 (secrets.go:651) and Phase 2's matchMACHashToSecrets (secrets.go:743). This routing logic is load-bearing but currently untested — the fake client returns NotFound for missing objects by default, so only the happy fallthrough path is exercised.

A PrependReactor injecting a non-404 error (e.g., 403 Forbidden) on secret Get would cover the hard-failure branch in each phase. One test case per phase, asserting GetFencingSecrets returns the API error immediately rather than falling through.

Not blocking — the pattern is standard k8s and unlikely to regress accidentally. But it documents the contract and catches future refactors that might accidentally change the error routing.

When the installer creates fencing credentials using MAC addresses
instead of hostname, the secrets are named with a SHA256 hash of the
normalized MAC (e.g. fencing-credentials-11aa22bb). If both hostname
and MAC address are not provided (or if provided wrongly) the system
falls back to match fencing-credentials and Nodes by UUID, using Redfish.
@fracappa fracappa force-pushed the fca/support-mac-based-fencing-credentials branch from 0a60beb to 636bb69 Compare May 27, 2026 13:36
@fonta-rh
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2026
@vimauro
Copy link
Copy Markdown
Contributor

vimauro commented May 27, 2026

/lgtm

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

@fracappa: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@fonta-rh
Copy link
Copy Markdown
Contributor

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fonta-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 28, 2026
@fracappa
Copy link
Copy Markdown
Contributor Author

/verified by @fracappa

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 28, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@fracappa: This PR has been marked as verified by @fracappa.

Details

In response to this:

/verified by @fracappa

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot Bot merged commit 484906e into openshift:main May 28, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants