OCPEDGE-2116: feat: support MAC-address based fencing credentials lookup by fracappa · Pull Request #1600 · openshift/cluster-etcd-operator

fracappa · 2026-04-27T15:04:09Z

When the installer creates fencing credentials using MAC addresses instead of hostnames, the secrets are named with a SHA256 hash of the normalized MAC (e.g. fencing-credentials-{hash}).

This PR adds a multi-phase fencing secret resolution to GetFencingSecrets:

Hostname: try fencing-credentials-{nodeName} directly
MAC hash: read MAC addresses from node annotation (tnf.openshift.io/mac-addresses), hash each, try fencing-credentials-{hash} by name
Redfish UUID: query each unclaimed fencing secret's Redfish endpoint for the system UUID, match against node.status.nodeInfo.systemUUID

The auth job (per-node) discovers all non-loopback MAC addresses via nsenter and annotates the node, so they are available when the setup/fencing/update-setup jobs resolve secrets.

Also removes unused machines and baremetalhosts RBAC rules from the TNF clusterrole and adds patch on nodes for the annotation.

Summary by CodeRabbit

New Features
- Improved cluster fencing: multi-node credential resolution and host-to-node matching for more reliable STONITH configuration.
Chores
- Updated cluster RBAC to allow retrieval/listing of machine and baremetalhost resources to support fencing workflows.
Tests
- Added unit tests covering multi-node credential resolution and host-to-node matching.

openshift-ci-robot · 2026-04-27T15:04:13Z

@fracappa: This pull request references OCPEDGE-2116 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

When the installer creates fencing credentials using MAC addresses instead of hostname, the secrets are named with a SHA256 hash of the normalized MAC (e.g. fencing-credentials-11aa22bb). Add a fallaback to GetFencingSecrets that resolves the node's boot MAC from BareMetalHost CRs and computes the matching hash when the hostname-based secret is not found.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-04-27T15:04:25Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a dynamic Kubernetes client, replaces per-node fencing secret lookups with a bulk GetFencingSecrets resolver that matches secrets by address/hostname and BMH references, updates ConfigureFencing to accept a dynamic client, wires the client through runners, updates RBAC, and adds tests for multi-node resolution.

Changes

Bulk Fencing Credential Resolution

Layer / File(s)	Summary
RBAC permissions and constants `bindata/tnfdeployment/clusterrole.yaml`, `pkg/tnf/pkg/tools/secrets.go`	ClusterRole gains `get`/`list` for `machine.openshift.io` `machines` and `metal3.io` `baremetalhosts`; imports and GVR/constants for dynamic BMH/Machine lookups are added.
Runner wiring: dynamic client creation `pkg/tnf/fencing/runner.go`, `pkg/tnf/setup/runner.go`, `pkg/tnf/update-setup/runner.go`	Each runner creates a dynamic client via `dynamic.NewForConfig(clientConfig)`, returns on error, and passes `dyClient` into `pcs.ConfigureFencing`.
ConfigureFencing update `pkg/tnf/pkg/pcs/fencing.go`	Adds `dynamic.Interface` parameter; calls `tools.GetFencingSecrets` once to obtain `secretMap` and builds per-node fencing configs from that map instead of per-node secret calls.
Multi-node secret resolver `pkg/tnf/pkg/tools/secrets.go`	Replaces `GetFencingSecret` with `GetFencingSecrets`: two-phase lookup — direct per-node Secret fetch (defer not-found), list unclaimed fencing Secrets, match by Secret `data["address"]` ↔ BMH `spec.bmc.address`, map BMH→node via Machine.status.nodeRef or node annotation, assign secrets, and error if unresolved nodes remain.
Secret helper utilities `pkg/tnf/pkg/tools/secrets.go`	Adds `listUnclaimedFencingSecrets`, `matchBMHsToNodes`, `buildBMHToNodeMap`, `buildMachineToNodeMap`, and `buildMachineToNodeMapFromAnnotations` to support bulk resolution.
Tests for resolver `pkg/tnf/pkg/tools/secrets_test.go`	Adds `TestGetFencingSecrets` with helpers for Secrets, unstructured BMH/Machine, fake dynamic client; covers hostname-match, BMH-match, machine- and annotation-based mappings, failure cases, and skipping empty node names.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

ardaguclu
atiratree
fonta-rh

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Topology-Aware Scheduling Compatibility	⚠️ Warning	PR adds CronJob with nodeSelector targeting control-plane nodes without topology-aware checks, causing Pending pods on HyperShift.	Add topology checks before applying nodeSelector, or remove master nodeSelector and use only tolerations for compatibility with SNO, HyperShift, and TNF topologies.

✅ Passed checks (10 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding support for MAC-address based fencing credentials lookup, which is the core purpose of the refactoring across multiple files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	PR uses standard Go testing, not Ginkgo. No Ginkgo test names found. All test names are static and descriptive with no dynamic content.
Test Structure And Quality	✅ Passed	The test uses standard Go testing with table-driven tests, not Ginkgo. Custom check explicitly targets Ginkgo tests with BeforeEach/AfterEach patterns, so not applicable here.
Microshift Test Compatibility	✅ Passed	This PR contains no new Ginkgo e2e tests. The only test file added (secrets_test.go) contains standard Go unit tests using func TestGetFencingSecrets, not Ginkgo patterns.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added in this PR. Only standard Go unit tests (testing.T) were added to pkg/tnf/pkg/tools/secrets_test.go. The check is not applicable.
Ote Binary Stdout Contract	✅ Passed	PR modifies library functions only (no entry points). No direct stdout writes (fmt.Print*, os.Stdout) found. klog properly configured via k8s/component-base/logs in entry points.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	Custom check does not apply: PR adds no Ginkgo e2e tests, only standard Go unit tests. IPv4 addresses in test are mock data, not e2e test assertions or external connectivity.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-04-27T15:04:26Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/tnf/pkg/tools/mac.go`:
- Around line 130-166: The loop over bmhList currently returns an error as soon
as it finds a matching BareMetalHost (consumerRef name == machineName) that has
no MACs; change the logic so that when len(macs) == 0 you continue the loop
instead of returning (i.e., replace the early return in the block checking "if
len(macs) == 0 { ... }" with continue) so other matching BMHs are checked; keep
the final return after the loop that errors when no matching BMHs with MACs were
found (preserve the existing error message).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1e802cb5-28b6-4f07-929a-e66973b06ec5

📥 Commits

Reviewing files that changed from the base of the PR and between c0614ca and e043c1d.

📒 Files selected for processing (9)

bindata/tnfdeployment/clusterrole.yaml
pkg/tnf/fencing/runner.go
pkg/tnf/pkg/pcs/fencing.go
pkg/tnf/pkg/tools/mac.go
pkg/tnf/pkg/tools/mac_test.go
pkg/tnf/pkg/tools/secrets.go
pkg/tnf/pkg/tools/secrets_test.go
pkg/tnf/setup/runner.go
pkg/tnf/update-setup/runner.go

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

pkg/tnf/pkg/tools/secrets_test.go (1)

90-217: ⚡ Quick win

Add a mixed mapping test (partial Machine CRs + node-annotation fallback).

Please add a case where one node resolves via Machine status.nodeRef and another resolves only via machine.openshift.io/machine annotation. That guards the intended “all available signals” behavior from regressing.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/tnf/pkg/tools/secrets_test.go` around lines 90 - 217, Add a new test case
to the existing tests slice that verifies mixed resolution: one node should be
matched to a secret via a Machine CR exposing status.nodeRef (create a Machine
object whose status.nodeRef.Name equals the node name using the helper
newMachine or a variant that sets status.nodeRef) and the other node should be
matched only via the node annotation "machine.openshift.io/machine" (add the
annotation to the Node object). Use newFencingSecret for secrets and
newBMHWithConsumer to link BMHs to machines, set nodeNames to include both
nodes, and set wantMap to assert each node maps to the correct secret; set
wantErr=false. Ensure the test includes bmhs, machines, nodes, and secrets
entries to exercise both code paths (Machine.status.nodeRef and node annotation
fallback).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/tnf/pkg/tools/secrets.go`:
- Around line 185-188: The current logic replaces machineToNode with
annotation-derived map only when buildMachineToNodeMap(ctx, dyClient) returns
empty; instead, call both buildMachineToNodeMap(ctx, dyClient) and
buildMachineToNodeMapFromAnnotations(ctx, kubeClient, nodeNames) and merge them
so any missing node entries in the Machine-CR map are filled from the
annotations. Update the code around machineToNode so that you iterate over the
annotation map and add entries only for keys not present in the Machine-CR map
(preserving Machine-CR values when present) to produce a combined mapping.
- Around line 253-255: The code currently splits annotation into parts using
strings.SplitN(annotation, "/", 2) and only checks parts length and empty name;
update the logic to also validate the annotation namespace (parts[0]) before
using the machine name so you don't accept annotations from other namespaces. In
the block around parts := strings.SplitN(annotation, "/", 2) (where you
currently check len(parts) != 2 || parts[1] == ""), add a check that parts[0]
equals the expected namespace string (e.g., "machine.openshift.io") and continue
if it does not match, ensuring any subsequent use of parts[1] (the machine name)
only occurs when the namespace is verified.

---

Nitpick comments:
In `@pkg/tnf/pkg/tools/secrets_test.go`:
- Around line 90-217: Add a new test case to the existing tests slice that
verifies mixed resolution: one node should be matched to a secret via a Machine
CR exposing status.nodeRef (create a Machine object whose status.nodeRef.Name
equals the node name using the helper newMachine or a variant that sets
status.nodeRef) and the other node should be matched only via the node
annotation "machine.openshift.io/machine" (add the annotation to the Node
object). Use newFencingSecret for secrets and newBMHWithConsumer to link BMHs to
machines, set nodeNames to include both nodes, and set wantMap to assert each
node maps to the correct secret; set wantErr=false. Ensure the test includes
bmhs, machines, nodes, and secrets entries to exercise both code paths
(Machine.status.nodeRef and node annotation fallback).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 711be4e4-7a94-453a-9b86-a5c15ee74d6f

📥 Commits

Reviewing files that changed from the base of the PR and between 8ede4ee and 7c7bcba.

📒 Files selected for processing (3)

pkg/tnf/pkg/pcs/fencing.go
pkg/tnf/pkg/tools/secrets.go
pkg/tnf/pkg/tools/secrets_test.go

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/tnf/pkg/tools/secrets.go`:
- Around line 185-190: The variable machineToNode may be nil if
buildMachineToNodeMap(ctx, dyClient) returns nil, causing a panic when assigning
entries from annotationMachineToNode; before the merge loop (which references
machineToNode and annotationMachineToNode), ensure machineToNode is initialized
(e.g., if machineToNode == nil then allocate a new map[string]string) so that
the subsequent loop that writes into machineToNode (the for loop using
machineName, nodeName) is safe.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f4544ae5-31cd-4131-b069-adf783a51b33

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7bcba and fd1dfe2.

📒 Files selected for processing (7)

bindata/tnfdeployment/clusterrole.yaml
pkg/tnf/fencing/runner.go
pkg/tnf/pkg/pcs/fencing.go
pkg/tnf/pkg/tools/secrets.go
pkg/tnf/pkg/tools/secrets_test.go
pkg/tnf/setup/runner.go
pkg/tnf/update-setup/runner.go

fonta-rh · 2026-05-27T12:54:23Z

Test suggestion: cover the IsNotFound vs API-error branch in Phase 1 and Phase 2

GetFencingSecrets distinguishes errors.IsNotFound (fall through to next phase) from other API errors (fail immediately) in both Phase 1 (secrets.go:651) and Phase 2's matchMACHashToSecrets (secrets.go:743). This routing logic is load-bearing but currently untested — the fake client returns NotFound for missing objects by default, so only the happy fallthrough path is exercised.

A PrependReactor injecting a non-404 error (e.g., 403 Forbidden) on secret Get would cover the hard-failure branch in each phase. One test case per phase, asserting GetFencingSecrets returns the API error immediately rather than falling through.

Not blocking — the pattern is standard k8s and unlikely to regress accidentally. But it documents the contract and catches future refactors that might accidentally change the error routing.

When the installer creates fencing credentials using MAC addresses instead of hostname, the secrets are named with a SHA256 hash of the normalized MAC (e.g. fencing-credentials-11aa22bb). If both hostname and MAC address are not provided (or if provided wrongly) the system falls back to match fencing-credentials and Nodes by UUID, using Redfish.

fonta-rh · 2026-05-27T13:43:34Z

/lgtm

vimauro · 2026-05-27T14:24:03Z

/lgtm

openshift-ci · 2026-05-27T18:35:06Z

@fracappa: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

fonta-rh · 2026-05-28T10:58:50Z

/approve

openshift-ci · 2026-05-28T10:59:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fonta-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~bindata/tnfdeployment/OWNERS~~ [fonta-rh]
~~pkg/tnf/OWNERS~~ [fonta-rh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fracappa · 2026-05-28T11:47:37Z

/verified by @fracappa

openshift-ci-robot · 2026-05-28T11:47:53Z

@fracappa: This PR has been marked as verified by @fracappa.

Details

In response to this:

/verified by @fracappa

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 27, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 27, 2026

$@fracappa$ fracappa force-pushed the fca/support-mac-based-fencing-credentials branch from cd4d1fd to 90ccef0 Compare April 27, 2026 15:05

$@fracappa$ fracappa force-pushed the fca/support-mac-based-fencing-credentials branch 3 times, most recently from e043c1d to 417dce3 Compare May 14, 2026 17:28

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread pkg/tnf/pkg/tools/mac.go Outdated

$@fracappa$ fracappa force-pushed the fca/support-mac-based-fencing-credentials branch 2 times, most recently from eb87e9a to 7c7bcba Compare May 19, 2026 08:31

$@fracappa$ fracappa marked this pull request as ready for review May 20, 2026 13:38

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2026

openshift-ci Bot requested review from clobrano and fonta-rh May 20, 2026 13:38

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread pkg/tnf/pkg/tools/secrets.go Outdated

Comment thread pkg/tnf/pkg/tools/secrets.go Outdated

$@fracappa$ fracappa force-pushed the fca/support-mac-based-fencing-credentials branch from 3ed6430 to fd1dfe2 Compare May 20, 2026 15:21

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread pkg/tnf/pkg/tools/secrets.go Outdated

$@fracappa$ fracappa force-pushed the fca/support-mac-based-fencing-credentials branch 3 times, most recently from f407e2a to b93b47a Compare May 25, 2026 15:18

$@fracappa$ fracappa force-pushed the fca/support-mac-based-fencing-credentials branch from 0a60beb to 636bb69 Compare May 27, 2026 13:36

openshift-ci Bot assigned fonta-rh May 27, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2026

openshift-ci Bot assigned vimauro May 27, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 28, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 28, 2026

openshift-merge-bot Bot merged commit 484906e into openshift:main May 28, 2026
17 checks passed

Conversation

fracappa commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Apr 27, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

openshift-ci Bot commented Apr 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fonta-rh commented May 27, 2026

Uh oh!

fonta-rh commented May 27, 2026

Uh oh!

vimauro commented May 27, 2026

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

fonta-rh commented May 28, 2026

Uh oh!

openshift-ci Bot commented May 28, 2026

Uh oh!

fracappa commented May 28, 2026

Uh oh!

openshift-ci-robot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

$@fracappa$ fracappa commented Apr 27, 2026 •

edited

Loading

openshift-ci-robot commented Apr 27, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading