OCPEDGE-2084: Add PacemakerStatus CRD for two-node fencing #2544

jaypoulz · 2025-10-21T18:19:04Z

Introduces tnf.etcd.openshift.io/v1alpha1 API group with PacemakerStatus custom resource. This provides visibility into Pacemaker cluster health for dual-replica etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. Not gated because it's only used by CEO when two-node has transitioned.

Works in conjunction with openshift/cluster-etcd-operator#1487

openshift-ci-robot · 2025-10-21T18:19:09Z

@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Introduces tnf.etcd.openshift.io/v1alpha1 API group with PacemakerStatus custom resource. This provides visibility into Pacemaker cluster health for dual-replica etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. Gated by DualReplica feature and managed by two-node-fencing component.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-10-21T18:19:13Z

Hello @jaypoulz! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

openshift-ci-robot · 2025-10-21T18:21:22Z

@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Introduces tnf.etcd.openshift.io/v1alpha1 API group with PacemakerStatus custom resource. This provides visibility into Pacemaker cluster health for dual-replica etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. Gated by DualReplica feature and managed by two-node-fencing component.

Works in conjunction with openshift/cluster-etcd-operator#1487

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

saschagrunert · 2025-10-22T14:27:51Z

@jaypoulz thank you for the PR, do you mind making the CI happy?

jaypoulz · 2025-10-22T14:34:46Z

Hi @saschagrunert :) Working on it! :D
New to this repo so working through beginner challenges 😸

jaypoulz · 2025-10-22T14:42:21Z

A few open questions I have:

This is a config object of a sort. It's created by cluster-etcd-operator only when you have a two-node cluster and only for the purposes of gathering information about the health of pacemaker (our ha tool) from the nodes. I put it in etcd/tnf (two node fencing) because it seemed sensible. But I'm not sure if it needs to be in config.

That said, it doesn't work like a normal config - there's no spec and it shouldn't be created during bootstrap. The CRD just needs to be present when the CEO runs an cronjob to post an update to it.

bash hack/update-protobuf.sh failed for me because it wanted the path to be installed in my go path. That said, cursor happily runs it and copies over the files without issue. I'm just skeptical of the zz_generated files, but I assume those are verified by CI?
For the non-boolean enum fields. Should I be creating static string definitions that can be exported to CEO? How do I generate those?

saschagrunert · 2025-10-23T12:10:01Z

Yeah, I'll ignore the CI failures for now, running ./hack/update-codegen.sh locally also gives me a diff in openapi/generated_openapi/zz_generated.openapi.go. 🙃

A few open questions I have:

This is a config object of a sort. It's created by cluster-etcd-operator only when you have a two-node cluster and only for the purposes of gathering information about the health of pacemaker (our ha tool) from the nodes. I put it in etcd/tnf (two node fencing) because it seemed sensible. But I'm not sure if it needs to be in config.

I'm new to API review, but my gut feeling tells me that a dedicated etcd API group sounds fine for that purpose.

That said, it doesn't work like a normal config - there's no spec and it shouldn't be created during bootstrap. The CRD just needs to be present when the CEO runs an cronjob to post an update to it.

bash hack/update-protobuf.sh failed for me because it wanted the path to be installed in my go path. That said, cursor happily runs it and copies over the files without issue. I'm just skeptical of the zz_generated files, but I assume those are verified by CI?

You can also try to run it in a container by make verify-with-container.

For the non-boolean enum fields. Should I be creating static string definitions that can be exported to CEO? How do I generate those?

Do you mind elaborating on that? Do you mean generating the code for the unions?

API docs ref: https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#writing-a-union-in-go

@jaypoulz is there an OpenShift enhancement available for this change?

etcd/install.go

etcd/tnf/v1alpha1/tests/pacemakerstatuses.tnf.etcd.openshift.io/DualReplica.yaml

etcd/tnf/v1alpha1/types_pacemakerstatus.go

jaypoulz · 2025-10-28T00:46:49Z

@saschagrunert I think I hit all of your comments. I've also asked pacemaker expert CLumens from the RHEL team to make sure I wasn't misrepresenting anything in the new spec.

saschagrunert · 2025-10-28T08:52:32Z

/retest

etcd/v1alpha1/types_pacemakercluster.go

saschagrunert · 2025-10-28T11:42:51Z

etcd/v1alpha1/types_pacemakercluster.go

+	// ipv4Address is the IPv4 address of the node, if registered via IPv4
+	// +kubebuilder:validation:MinLength=7
+	// +kubebuilder:validation:MaxLength=15
+	// +kubebuilder:validation:Pattern="^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$"
+	// +optional
+	IPv4Address string `json:"ipv4Address,omitempty"`
+
+	// ipv6Address is the IPv6 address of the node, if registered via IPv6
+	// +kubebuilder:validation:MinLength=2
+	// +kubebuilder:validation:MaxLength=39
+	// +kubebuilder:validation:Format=ipv6
+	// +kubebuilder:validation:Pattern=`^(([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$`
+	// +optional
+	IPv6Address string `json:"ipv6Address,omitempty"`


CEL has IP validations that support both IPv4 and IPv6. It would be better to combine them and use the CEL validations instead, sorry for the back and forth here:

https://github.com/kubernetes/kubernetes/blob/f0ed028e753f97f8b74044c75b8d746e1dce00c6/staging/src/k8s.io/apiserver/pkg/cel/library/ip.go#L30-L125

no no I appreciate this :D
I can see the API getting better with each revision 🥂

Based on what I saw "canonical" seems to be the way to test for a valid IPv4 or IPv6 address.
Added validation based on what I saw elsewhere in API

Actually no, this needs further work. Canonical is a useful check but it doesn't guarantee that you have a usable individual IP. Adding more checks

@saschagrunert so it turns out the version of schema checker is too old to support ip(self).isCanonical()
I've added parsing for the IP in the code that invokes the API, so I think it's overkill to update schema checker just for this, but I wanted to explain why it's no longer in the diff.

That was the second variation I tried. The validation error I was referring to is this one:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_api/2544/pull-ci-openshift-api-master-verify-crd-schema/1983194099787763712

In terms of why isIP() is not sufficient:
It depends how strict we want to be in this API. The IPs we use are expected to identify the nodes as their endpoint-identities for etcd. So they should be unique, they should (ideally) be in their canonical form, and they should be the kinds of IPs that are not reserved for special cases.

I defaulted to more strict because I've never written one of these so I decided to err on the side of adding the restrictions that made sense to me.

I'll add it back and we I'll defer to your guidance on how to proceed :)

Just double checking the docs https://github.com/kubernetes/kubernetes/blob/3daf280c464c712f38fe2a24d9434fcf2670c251/staging/src/k8s.io/apiserver/pkg/cel/library/ip.go#L76

Looks like ip.isCanonical(self) might be the right incantation

Strange o.O
I'll try it 😺

clumens · 2025-10-28T17:06:49Z

etcd/v1alpha1/types_pacemakercluster.go

+
+	// nodeHistory provides recent operation history for troubleshooting
+	// When present, it must be a list of 1 or more PacemakerNodeHistoryEntry objects.
+	// When not present, the node history is not available. This is the expected status for a healthy cluster.


I wouldn't think that an empty node history is expected status. Assuming that this is basically <node_history/> from crm_mon, you should have a tree with at least start and monitor operations for every resource on each node. If there's no history, I would assume there's no running resources.

The reason we allow this to be empty is that we only push up "recent" information. Basically, we are trying to collect the information from pacemaker that would indicate that we've gone off the rails. So before this API is invoked, we gather all of the information, then filter out any node history event that isn't within the last 5 minutes.

For fencing history, we carry failures longer - a 24 hour long context window.

So it's not a full 1-1 mapping. :) I'll make a note of these in the API.
This information is used for event records only. I don't think we need to be exhaustive about all events, just warning that something happened within the last n minutes or hours is needed for the event record.

Also, this check runs every 30 seconds, and events get reported exactly once (deduplication is done on the client side).

clumens · 2025-10-28T17:07:28Z

etcd/v1alpha1/types_pacemakercluster.go

+	// nodeHistory provides recent operation history for troubleshooting
+	// When present, it must be a list of 1 or more PacemakerNodeHistoryEntry objects.
+	// When not present, the node history is not available. This is the expected status for a healthy cluster.
+	// Node history being capped at 16 is a reasonable limit to prevent abuse of the API, since the action history reported by the cluster


Depending on the number of resources you've got running, 16 may be too low. On my test cluster, each resource has two history entries just from starting up.

We have 6 (2 kubelet, 2 etcd, 2 fencing-agents)
I can bump it to 32, but would have the same concern given the context that we only show node history for the last 5 minutes of history?

More specfically:
(pre-API) Events reported = events that occured in the last 5 minutes running every 30s
(post-API) Events presented to user = events that occured in the last 5 minutes - events already reported running every 30s

clumens · 2025-10-28T17:08:27Z

etcd/v1alpha1/types_pacemakercluster.go

+	// +kubebuilder:validation:Minimum=0
+	// +kubebuilder:validation:Maximum=16
+	// +optional
+	ResourcesTotal *int32 `json:"resourcesTotal,omitempty"`


Do you care about maintenance mode or Pacemaker Remote nodes?

TNF doesn't use either of these. If we end up needing to introduce maintenance mode for whatever reason, some extensions to the API would be needed. Likewise, I don't see us ever supporting remote nodes.

That said, is there a specific reason you highlighted this concern for resourcesTotal? Or was this a general question for why we don't check for this when we gather node info?

clumens · 2025-10-28T17:10:24Z

etcd/v1alpha1/types_pacemakercluster.go

+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=256
+	// +optional
+	Node string `json:"node,omitempty"`


Do you support clone resources? If so, those can run on multiple nodes at the same time in which case making this some sort of list type would make more sense to me. Also if you care about clones, keep in mind that the name of the primitive resource being cloned is not unique.

We do support clone resources. Both etcd and kublet rune as clone resources. This is why the expected number of resources is 6 (clones for etcd and kubelet), and unique fencing agents for both nodes.

Currently when we build out the error message, we go through them all individually. Grouping them is an interesting idea. It could improve visual clarity, But it's seems like something we can do during rendering. Treating each resource as unique feels simpler.

etcd/v1alpha1/types_pacemakercluster.go

saschagrunert

LGTM from an API Shadow review perspective.

openshift-ci · 2025-10-29T07:11:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: saschagrunert
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

saschagrunert · 2025-10-29T07:27:31Z

/retest

saschagrunert · 2025-10-29T07:39:53Z

etcd/v1alpha1/types_pacemakercluster.go

+
+// PacemakerDaemonStateType represents the state of the pacemaker daemon
+// +kubebuilder:validation:Enum=Running;KnownNotRunning
+type PacemakerDaemonStateType string


We may need to add docs about the possible values here as well. If so, then the same would apply to QuorumStatusType, NodeOnlineStatusType, NodeModeType, ResourceRoleType, ResourceActiveStatusType, FencingActionType and FencingStatusType

I'll add them just for completeness :)

JoelSpeed · 2025-10-29T10:15:29Z

Since @saschagrunert has said this is good from his side, I'll now take over the API review. Since it's shift week, I'm not expecting to pick this up until Monday

jaypoulz · 2025-10-29T13:08:18Z

Sounds good to me! :)

Introduces etcd.openshift.io/v1alpha1 API group with a PacemakerCluster custom resource. This provides visibility into Pacemaker cluster health for Two Node Fencing (TNF) etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. This API is not explicitly gated because it's only created by CEO once the transition to an ExternalEtcd has occured. This means that it is naturally gated by the TNF topology.

openshift-ci · 2025-10-29T17:29:48Z

@jaypoulz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`df97bb6`	link	false	`/test okd-scos-e2e-aws-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 21, 2025

openshift-ci bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 21, 2025

openshift-ci bot requested review from JoelSpeed and everettraven October 21, 2025 18:19

openshift-ci bot added the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Oct 21, 2025

jaypoulz force-pushed the OCPEDGE-2084 branch from 58218ce to 96e327f Compare October 21, 2025 18:23

openshift-ci bot removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Oct 21, 2025

jaypoulz force-pushed the OCPEDGE-2084 branch 4 times, most recently from 2ba442d to 29b9fec Compare October 21, 2025 23:56

jaypoulz force-pushed the OCPEDGE-2084 branch from 29b9fec to 26f7821 Compare October 22, 2025 14:29

jaypoulz force-pushed the OCPEDGE-2084 branch 2 times, most recently from b0ff230 to 1b57b09 Compare October 22, 2025 16:59

openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 22, 2025

jaypoulz force-pushed the OCPEDGE-2084 branch 4 times, most recently from b9b727f to fdd53e9 Compare October 22, 2025 20:37

saschagrunert reviewed Oct 23, 2025

View reviewed changes

etcd/install.go Outdated Show resolved Hide resolved

saschagrunert reviewed Oct 23, 2025

View reviewed changes

etcd/tnf/v1alpha1/tests/pacemakerstatuses.tnf.etcd.openshift.io/DualReplica.yaml Outdated Show resolved Hide resolved

saschagrunert reviewed Oct 23, 2025

View reviewed changes

etcd/tnf/v1alpha1/types_pacemakerstatus.go Outdated Show resolved Hide resolved

jaypoulz force-pushed the OCPEDGE-2084 branch 3 times, most recently from 3f45017 to 2fb0282 Compare October 24, 2025 21:15

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 24, 2025

jaypoulz force-pushed the OCPEDGE-2084 branch 5 times, most recently from 8979f47 to 6ca958d Compare October 28, 2025 00:42

saschagrunert reviewed Oct 28, 2025

View reviewed changes

etcd/v1alpha1/types_pacemakercluster.go Outdated Show resolved Hide resolved

etcd/v1alpha1/types_pacemakercluster.go Show resolved Hide resolved

saschagrunert reviewed Oct 28, 2025

View reviewed changes

jaypoulz force-pushed the OCPEDGE-2084 branch 3 times, most recently from 3e02535 to e6b5c99 Compare October 28, 2025 17:20

clumens reviewed Oct 28, 2025

View reviewed changes

jaypoulz force-pushed the OCPEDGE-2084 branch 3 times, most recently from d29f516 to cf53006 Compare October 28, 2025 23:11

saschagrunert approved these changes Oct 29, 2025

View reviewed changes

saschagrunert reviewed Oct 29, 2025

View reviewed changes

saschagrunert mentioned this pull request Oct 29, 2025

claude: take latest OpenShift and Kubernetes API conventions into account #2548

Open

jaypoulz force-pushed the OCPEDGE-2084 branch from cf53006 to df97bb6 Compare October 29, 2025 14:48

Uh oh!

OCPEDGE-2084: Add PacemakerStatus CRD for two-node fencing #2544

Are you sure you want to change the base?

OCPEDGE-2084: Add PacemakerStatus CRD for two-node fencing #2544

Conversation

jaypoulz commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 21, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Oct 21, 2025

Uh oh!

openshift-ci-robot commented Oct 21, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saschagrunert commented Oct 22, 2025

Uh oh!

jaypoulz commented Oct 22, 2025

Uh oh!

jaypoulz commented Oct 22, 2025

Uh oh!

saschagrunert commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jaypoulz commented Oct 28, 2025

Uh oh!

saschagrunert commented Oct 28, 2025

Uh oh!

Uh oh!

Uh oh!

saschagrunert Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaypoulz Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaypoulz Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaypoulz commented Oct 21, 2025 •

edited

Loading

openshift-ci-robot commented Oct 21, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 21, 2025 •

edited by openshift-ci bot

Loading

saschagrunert commented Oct 23, 2025 •

edited

Loading

saschagrunert Oct 28, 2025 •

edited

Loading

jaypoulz Oct 28, 2025 •

edited

Loading

jaypoulz Oct 28, 2025 •

edited

Loading