-
Couldn't load subscription status.
- Fork 576
OCPEDGE-2084: Add PacemakerStatus CRD for two-node fencing #2544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hello @jaypoulz! Some important instructions when contributing to openshift/api: |
|
@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
58218ce to
96e327f
Compare
2ba442d to
29b9fec
Compare
|
@jaypoulz thank you for the PR, do you mind making the CI happy? |
29b9fec to
26f7821
Compare
|
Hi @saschagrunert :) Working on it! :D |
|
A few open questions I have:
That said, it doesn't work like a normal config - there's no spec and it shouldn't be created during bootstrap. The CRD just needs to be present when the CEO runs an cronjob to post an update to it.
|
b0ff230 to
1b57b09
Compare
b9b727f to
fdd53e9
Compare
|
Yeah, I'll ignore the CI failures for now, running
I'm new to API review, but my gut feeling tells me that a dedicated
You can also try to run it in a container by
Do you mind elaborating on that? Do you mean generating the code for the unions? API docs ref: https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#writing-a-union-in-go @jaypoulz is there an OpenShift enhancement available for this change? |
etcd/tnf/v1alpha1/tests/pacemakerstatuses.tnf.etcd.openshift.io/DualReplica.yaml
Outdated
Show resolved
Hide resolved
3f45017 to
2fb0282
Compare
8979f47 to
6ca958d
Compare
|
@saschagrunert I think I hit all of your comments. I've also asked pacemaker expert CLumens from the RHEL team to make sure I wasn't misrepresenting anything in the new spec. |
|
/retest |
| // ipv4Address is the IPv4 address of the node, if registered via IPv4 | ||
| // +kubebuilder:validation:MinLength=7 | ||
| // +kubebuilder:validation:MaxLength=15 | ||
| // +kubebuilder:validation:Pattern="^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$" | ||
| // +optional | ||
| IPv4Address string `json:"ipv4Address,omitempty"` | ||
|
|
||
| // ipv6Address is the IPv6 address of the node, if registered via IPv6 | ||
| // +kubebuilder:validation:MinLength=2 | ||
| // +kubebuilder:validation:MaxLength=39 | ||
| // +kubebuilder:validation:Format=ipv6 | ||
| // +kubebuilder:validation:Pattern=`^(([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$` | ||
| // +optional | ||
| IPv6Address string `json:"ipv6Address,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CEL has IP validations that support both IPv4 and IPv6. It would be better to combine them and use the CEL validations instead, sorry for the back and forth here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no no I appreciate this :D
I can see the API getting better with each revision 🥂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what I saw "canonical" seems to be the way to test for a valid IPv4 or IPv6 address.
Added validation based on what I saw elsewhere in API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no, this needs further work. Canonical is a useful check but it doesn't guarantee that you have a usable individual IP. Adding more checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saschagrunert so it turns out the version of schema checker is too old to support ip(self).isCanonical()
I've added parsing for the IP in the code that invokes the API, so I think it's overkill to update schema checker just for this, but I wanted to explain why it's no longer in the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the second variation I tried. The validation error I was referring to is this one:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_api/2544/pull-ci-openshift-api-master-verify-crd-schema/1983194099787763712
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of why isIP() is not sufficient:
It depends how strict we want to be in this API. The IPs we use are expected to identify the nodes as their endpoint-identities for etcd. So they should be unique, they should (ideally) be in their canonical form, and they should be the kinds of IPs that are not reserved for special cases.
I defaulted to more strict because I've never written one of these so I decided to err on the side of adding the restrictions that made sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it back and we I'll defer to your guidance on how to proceed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking the docs https://github.com/kubernetes/kubernetes/blob/3daf280c464c712f38fe2a24d9434fcf2670c251/staging/src/k8s.io/apiserver/pkg/cel/library/ip.go#L76
Looks like ip.isCanonical(self) might be the right incantation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange o.O
I'll try it 😺
3e02535 to
e6b5c99
Compare
|
|
||
| // nodeHistory provides recent operation history for troubleshooting | ||
| // When present, it must be a list of 1 or more PacemakerNodeHistoryEntry objects. | ||
| // When not present, the node history is not available. This is the expected status for a healthy cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't think that an empty node history is expected status. Assuming that this is basically <node_history/> from crm_mon, you should have a tree with at least start and monitor operations for every resource on each node. If there's no history, I would assume there's no running resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we allow this to be empty is that we only push up "recent" information. Basically, we are trying to collect the information from pacemaker that would indicate that we've gone off the rails. So before this API is invoked, we gather all of the information, then filter out any node history event that isn't within the last 5 minutes.
For fencing history, we carry failures longer - a 24 hour long context window.
So it's not a full 1-1 mapping. :) I'll make a note of these in the API.
This information is used for event records only. I don't think we need to be exhaustive about all events, just warning that something happened within the last n minutes or hours is needed for the event record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this check runs every 30 seconds, and events get reported exactly once (deduplication is done on the client side).
| // nodeHistory provides recent operation history for troubleshooting | ||
| // When present, it must be a list of 1 or more PacemakerNodeHistoryEntry objects. | ||
| // When not present, the node history is not available. This is the expected status for a healthy cluster. | ||
| // Node history being capped at 16 is a reasonable limit to prevent abuse of the API, since the action history reported by the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the number of resources you've got running, 16 may be too low. On my test cluster, each resource has two history entries just from starting up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 6 (2 kubelet, 2 etcd, 2 fencing-agents)
I can bump it to 32, but would have the same concern given the context that we only show node history for the last 5 minutes of history?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specfically:
(pre-API) Events reported = events that occured in the last 5 minutes running every 30s
(post-API) Events presented to user = events that occured in the last 5 minutes - events already reported running every 30s
| // +kubebuilder:validation:Minimum=0 | ||
| // +kubebuilder:validation:Maximum=16 | ||
| // +optional | ||
| ResourcesTotal *int32 `json:"resourcesTotal,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you care about maintenance mode or Pacemaker Remote nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TNF doesn't use either of these. If we end up needing to introduce maintenance mode for whatever reason, some extensions to the API would be needed. Likewise, I don't see us ever supporting remote nodes.
That said, is there a specific reason you highlighted this concern for resourcesTotal? Or was this a general question for why we don't check for this when we gather node info?
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=256 | ||
| // +optional | ||
| Node string `json:"node,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you support clone resources? If so, those can run on multiple nodes at the same time in which case making this some sort of list type would make more sense to me. Also if you care about clones, keep in mind that the name of the primitive resource being cloned is not unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do support clone resources. Both etcd and kublet rune as clone resources. This is why the expected number of resources is 6 (clones for etcd and kubelet), and unique fencing agents for both nodes.
Currently when we build out the error message, we go through them all individually. Grouping them is an interesting idea. It could improve visual clarity, But it's seems like something we can do during rendering. Treating each resource as unique feels simpler.
d29f516 to
cf53006
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from an API Shadow review perspective.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: saschagrunert The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
|
||
| // PacemakerDaemonStateType represents the state of the pacemaker daemon | ||
| // +kubebuilder:validation:Enum=Running;KnownNotRunning | ||
| type PacemakerDaemonStateType string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to add docs about the possible values here as well. If so, then the same would apply to QuorumStatusType, NodeOnlineStatusType, NodeModeType, ResourceRoleType, ResourceActiveStatusType, FencingActionType and FencingStatusType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add them just for completeness :)
|
Since @saschagrunert has said this is good from his side, I'll now take over the API review. Since it's shift week, I'm not expecting to pick this up until Monday |
|
Sounds good to me! :) |
Introduces etcd.openshift.io/v1alpha1 API group with a PacemakerCluster custom resource. This provides visibility into Pacemaker cluster health for Two Node Fencing (TNF) etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. This API is not explicitly gated because it's only created by CEO once the transition to an ExternalEtcd has occured. This means that it is naturally gated by the TNF topology.
cf53006 to
df97bb6
Compare
|
@jaypoulz: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Introduces tnf.etcd.openshift.io/v1alpha1 API group with PacemakerStatus custom resource. This provides visibility into Pacemaker cluster health for dual-replica etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. Not gated because it's only used by CEO when two-node has transitioned.
Works in conjunction with openshift/cluster-etcd-operator#1487