New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-4820: Controller version mismatch causing degradation during upgrades #3738
OCPBUGS-4820: Controller version mismatch causing degradation during upgrades #3738
Conversation
@djoshy: This pull request references Jira Issue OCPBUGS-4820, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@djoshy: This pull request references Jira Issue OCPBUGS-4820, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Skipping CI for Draft Pull Request. |
@djoshy: This pull request references Jira Issue OCPBUGS-4820, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test e2e-gcp-op |
2dc8aa3
to
22ceec8
Compare
/test e2e-gcp-op |
22ceec8
to
5d51044
Compare
/test e2e-gcp-op |
5d51044
to
b39b8c4
Compare
/retest-required |
b39b8c4
to
2fd7124
Compare
/test verify |
2fd7124
to
ad1f537
Compare
/retest-required |
func (ctrl *Controller) getCurrentEtcdLeader(candidates []*corev1.Node) (*corev1.Node, error) { | ||
return nil, nil | ||
// getCurrentEtcdLeaderName fetches the name of the current node running the machine-config-operator pod | ||
func (ctrl *Controller) getCurrentEtcdLeaderName() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if it's about the MCO pod now, why not rename the function too? The etcd leader can be on a different node than the operator pod, unless I'm missing something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an interesting topic here though in that it might actually be advantageous to us to keep the etcd leader and the operator and controller pods all co-located at least in a steady state, because that node should be upgraded last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah I can see why a name change to just get the operator pod node might be more accurate! The way I understood the etcd landscape is: there are two leader elections at startup; one for operator and one for the controller and therefore there are two etcd leaders(just in two different elections). Is that not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like the idea of co-locating them too, I'm not sure why they were split up but I can dig into the costs/benefits of that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah...so very briefly, etcd is a replicated key-value store that is the data storage heart of Kubernetes. Underneath, etcd uses raft which is a generic protocol, and has the concept of a "leader".
I think what you're crossing here with that is that our operator and controller pods use a "leader election" protocol too using the apiserver (which uses etcd, which uses raft) - but this is at an entirely different level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, makes sense - I'll update the function name to be more accurate. I'll also make a follow-up spike/task to explore putting the pods and the etcd-leader on the same node. Thanks!
ad1f537
to
ff4ec93
Compare
@djoshy: This pull request references Jira Issue OCPBUGS-4820, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Verified using IPI on AWS. In order to verify the order used to updated the master nodes, using the cordon command and deleting the machine-config-operator pod, we deploy the machine-config-operator pod in every cluster and force an update of the master MCP by applying a config. The expected result is that the master nodes should be updated alphabetically by the zone they are using. us-east-2a -> us-east-2b -> us-east2c. But the one hosting the machine-config-operator pod should be the last one to be udated. These were the results:
The results match the expected behavior. We have run the following tests to make sure that the order used to update MCPs is not broken with this PR:
They both passed. We add the qe-approved label /label qe-approved Thank you very much! |
072b13a
to
bf9d7be
Compare
/retest-required |
/retest |
/retest-required |
1 similar comment
/retest-required |
@djoshy: This pull request references Jira Issue OCPBUGS-4820, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
// nolint:unparam | ||
func (ctrl *Controller) filterControlPlaneCandidateNodes(pool *mcfgv1.MachineConfigPool, candidates []*corev1.Node, capacity uint) ([]*corev1.Node, uint, error) { | ||
if len(candidates) <= 1 { | ||
return candidates, capacity, nil | ||
} | ||
etcdLeader, err := ctrl.getCurrentEtcdLeader(candidates) | ||
operatorNodeName, err := ctrl.getOperatorNodeName() | ||
if err != nil { | ||
glog.Warningf("Failed to find current etcd leader (continuing anyways): %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the message updated with operator pod.
glog.Infof("Deferring update of etcd leader: %s", node.Name) | ||
if node.Name == operatorNodeName { | ||
ctrl.eventRecorder.Eventf(pool, corev1.EventTypeNormal, "DeferringOperatorNodeUpdate", "Deferring update of machine config operator node %s", node.Name) | ||
glog.Infof("Deferring update of machine config operator node: %s", node.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move to using klog as this is what we will be using from now on in MCO for logging, see #3734
pkg/operator/sync.go
Outdated
@@ -812,7 +827,7 @@ func (optr *Operator) syncRequiredMachineConfigPools(_ *renderConfig) error { | |||
} | |||
// If we don't account for pause here, we will spin in this loop until we hit the 10 minute timeout because paused pools can't sync. | |||
if pool.Spec.Paused { | |||
return false, fmt.Errorf("Required MachineConfigPool '%s' is paused and can not sync until it is unpaused", pool.Name) | |||
return false, fmt.Errorf("error required pool %s is paused and cannot sync until it is unpaused", pool.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/pool/MachineConfigPool/ for verbose messaging
bf9d7be
to
192b802
Compare
Rebased and made suggested changes @sinnykumari thanks! (: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work David!
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@djoshy: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
fc79ca7
into
openshift:master
@djoshy: Jira Issue OCPBUGS-4820: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-4820 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Added a new function called getOperatorNodeName()
Fleshed out the findEtcdLeader stubin the controller that helps us queue control plane nodes for updates. I'll do a subsequent commit for bumping the timeout, but want to run a few tests with the queuing in place first.- Description for the changelog
controller: defer update of node running operator