New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OCPCLOUD-907] Add remediationsAllowed
field to MHC status
#652
[OCPCLOUD-907] Add remediationsAllowed
field to MHC status
#652
Conversation
@JoelSpeed: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
f7ada32
to
726603f
Compare
allowedRemediations
field to MHC statusremediationsAllowed
field to MHC status
/hold cancel I think things have stabilised on this feature upstream, this PR now matches what is being done upstream and as such I think is good to go into 4.7 |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this generally makes sense to me, i just had a minor question
|
||
// ANCHOR: Condition | ||
|
||
// Condition defines an observation of a Cluster API resource operational state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know this shares some ancestry with upstream, should we say "Machine API" in our version here?
|
||
// ANCHOR: Conditions | ||
|
||
// Conditions provide observations of the operational state of a Cluster API resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing here, re: Machine API
726603f
to
e6613a5
Compare
@elmiko I fixed your comments and a couple of other issues, please take another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Joel, i went over the code again and just had a question about something that didn't make immediate sense to me.
) | ||
|
||
// Remediation not allowed, the number of not started or unhealthy machines exceeds maxUnhealthy | ||
mhc.Status.RemediationsAllowed = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this get updated on the object in etcd at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I've moved the reconcile status so it happens for all code paths, ideally I would spend time to do a bigger refactor of this logic so it's more obvious what's going on here
I will make sure to update the tests before we merge this
f659c2f
to
164e0e9
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this continues to make sense for me, i just had a small question
// unhealthyMachineCount calculates the number of presently unhealthy or missing machines | ||
// ie the delta between the expected number of machines and the current number deemed healthy | ||
func unhealthyMachineCount(mhc *mapiv1.MachineHealthCheck) int { | ||
return derefInt(mhc.Status.ExpectedMachines) - derefInt(mhc.Status.CurrentHealthy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that either of these derefInt
calls could fail (eg the value does not exist)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will default to 0 if the deref fails (I think that's the point of the function), but no matter the behaviour, this hasn't actually changed from before this PR, just moved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok perfect. thanks!
164e0e9
to
5431a50
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elmiko The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest |
2 similar comments
/retest |
/retest |
This adds an
remediationsAllowed
field to the MHC status that tells a user how many more remediations will be allowed before the controller starts short circuiting remediations.IE if the value of the field is 1, 1 more machine can be remediated before
maxUnhealthy
blocks the remediation. If the value is 0, no more remediations are allowed, the limit ofmaxUnhealthy
has been reached.This also adds a condition that shows whether remediation is currently considered to be allowed or not
This is the openshift counterpart to kubernetes-sigs/cluster-api#3372
/hold
Want to make sure we agree on the naming upstream before merging this