-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 Proposal: Improving status in CAPI resources #10897
📖 Proposal: Improving status in CAPI resources #10897
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
i can take some AI / PR changes when / if needed.
I did a first pass. Thanks a lot @fabriziopandini this is an excellent write up! |
I might have missed but didn't see anything related to meaning of absence of condition? Do we want to state that our core conditions must always be set either true/false/unknown meaning absence indicate a controller operational issue? thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, @fabriziopandini I gotta say this is FANTASTIC and AMAZING work, and thank you so much for starting this.
Seriously, it shows care for our users, and the proposal is a great read.
K8s guidelines are stating that:
I personally think that when we will abide to first guideline above, the absense of conditions will provide a signal that a controller is not reconciling for the first time an object, but this won't be a signal to surface controller's operational issues happening after an object is reconciled for the first time (ObservedGeneration is probably a better signal for this). |
@enxebre @JoelSpeed @vincepri, thanks for the first round of comments, that's awesome. I will give some more time to let other's comments to flow in and keep some discussion thread going async, then we should probably have a discussion (might be in the office hours) to address a few key points |
Co-authored-by: Stefan Büringer buringerst@vmware.com
@enxebre, @vincepri, @sbueringer, @chrischdi, @killianmuldoon, @neolit123, @JoelSpeed, @peterochodo, @zjs, @nrb + everyone who provided feedback to this proposal: Thanks! The doc is ready for a final pass, looking forward to start implementing this! |
As discussed in Sep 4th office hour meeting, lazy consensus deadline set for Sep 13th |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally +1 to go ahead, don't think I've left any blocking comments but have left a few nits that can be sorted during implementation.
Did have one question about the readiness gates though, wondering why we haven't allowed negative polarity conditions there
// Conditions represent the observations of a Machine's current state. | ||
// +optional | ||
// +listType=map | ||
// +listMapKey=type | ||
Conditions []metav1.Condition `json:"conditions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits that don't necessarily need to be fixed here.
- Idiomatically, in Kube API types we always put the Conditions as the first field in the struct, may be nice to follow this convention
- It's also typical to explain the known condition types, when we implement this, could we add that list?
Known condition types are Foo, Bar and Baz.
kind of thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
noted in the tracking issue for the implementation
- `HealthCheckSucceeded` and `OwnerRemediated` (or `ExternalRemediationRequestAvailable`) conditions are set by the | ||
MachineHealthCheck controller in case a MachineHealthCheck targets the machine. | ||
- KubeadmControlPlane also adds additional conditions to Machines, but those conditions are not included in the table above | ||
for sake of simplicity (however they are documented in the KubeadmControlPlane paragraph). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we thought about what might happen if an MHC adds this condition, but then is modified/removed and no longer targets the Machine? Is there something to GC these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is MHC behaviour not impacted by this proposal, but I think it is not an issue.
- If MHC is removed after marking a machine as unhealthy, the machine gets remediated
- If MHC is removed after marking a machine healthy, the machine it considered healthy until another MHC instance take over
Notes: | ||
- Both `MinReadySeconds` and `ReadinessGates` should be treated as other in-place propagated fields (changing them should not trigger rollouts). | ||
- Similarly to Pod's `ReadinessGates`, also Machine's `ReadinessGates` accept only conditions with positive polarity; | ||
The Cluster API project might revisit this in the future to stay aligned with Kubernetes or if there are use cases justifying this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A machine is ready when it is not paused, cannot currently be represented then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding support for negative polarity conditions is tracked in #11105 as a follow up - after we got a first implementation up and running
// A Cluster is available if: | ||
// * Cluster's `RemoteConnectionProbe` and `TopologyReconciled` conditions are true and | ||
// * the control plane `Available` condition is true and | ||
// * all worker resource's `Available` conditions are true and | ||
// * all conditions defined in AvailabilityGates are true as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, formatting like this wont translate across when you generate it into a CRD schema, and will look awkward when you use kubectl explain
. Better to use complete sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thx!! /lgtm |
LGTM label has been added. Git tree hash: c908bd77c697fa157becaaa374e6376ad7ce0cf3
|
/lgtm |
/lgtm 🎉 Thanks! |
/lgtm /hold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Lazy consensus deadline passed |
/lgtm |
What this PR does / why we need it:
This is a proposal about how can we improve status in v1Beta2 Cluster API resources, addressing several feedback in our backlog (see #10852 for a great wrap up) + making an important step towards v1 API
@enxebre, @vincepri, @sbueringer, @chrischdi, @killianmuldoon what I'm looking for at this stage is a feedback about general direction
/area api