-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Every control plane component should have PDB #26160
test: Every control plane component should have PDB #26160
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ravisantoshgudimetla The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8274722
to
089b74e
Compare
} | ||
var inHAMode bool | ||
if infra.Status.InfrastructureTopology == oapi.HighlyAvailableTopologyMode { | ||
inHAMode = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it should be orthogonal. Can't we get at this just by ignoring replicas < 2
workloads? Because even if they allow surging, we don't surge on drains.
inHAMode = true | ||
} | ||
//controlPlaneStaticPodNames := sets.NewString("etcd", "kube-apiserver", "kube-controller-manager", | ||
// "openshift-kube-scheduler") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: did you mean to drop this comment now that you have getAllControlPlaneStaticPods
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course :(
selectorLabels = t.Spec.Selector | ||
replicas = t.Status.DesiredNumberScheduled | ||
default: | ||
panic("not an object") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: panic(fmt.Sprintf("unrecognized workload type %T", workload))
or some such to give more context?
if !strings.Contains(meta.Namespace, "openshift-") { | ||
// Not every component is following the convention of -operator at the end, there are some namespaces | ||
// like openshift-monitoring and openshift-network-diagnostics and some of them run on worker nodes | ||
// as well. So, need to figure out how to exclude operator components and components running on worker node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the need to exclude operators from this. If we have an operator with replicas > 1
, can't we apply the same PDB considerations to it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here is to get control plane components most of which run on master nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenShift-core operators will also run on the control-plane nodes. If any of them have replicas > 1
(maybe none of them), why wouldn't we want them covered by PDBs?
e2e.Failf("unable to list statefulsets: %v", err) | ||
} | ||
// ovn-kube, openshift-controller-manager are running as DS. | ||
daemonsets, err := kubeClient.AppsV1().DaemonSets("").List(context.Background(), metav1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DaemonSets aren't drained. There may still be caps around how many we want to disrupt by shutting down nodes, but I dunno if PDBs actually protect them from this today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many things don't require PDBs, like the machine-api. That runs with 1 replica, short periods of unavailability while it is rescheduled to another host are fine.
e2e.Failf("unable to list statefulsets: %v", err) | ||
} | ||
// ovn-kube, openshift-controller-manager are running as DS. | ||
daemonsets, err := kubeClient.AppsV1().DaemonSets("").List(context.Background(), metav1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
daemonsets can't be evicted today via drain, so enforcing PDBs without upstream changes on these would not be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I think we have a flag which evicts DS pods too but I can keep DS out of the PR now.
if inHAMode { | ||
key = "HA/" + key | ||
if !isPDBConfigured { | ||
pdbMissingWorkloads = append(pdbMissingWorkloads, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to complain about missing PDBs on replicas=1
workloads; they're clearly interruptible. Can we restructure this like:
if replicas > 1 { // no reason to set this unless we are trying to be HA, right?
if currentPDB == nil { // you'd need to make currentPDB a pointer for this to work
pdbMissingWorkloads = append(pdbMissingWorkloads, key)
} else {
// checks to see if we were comfortable with the PDB configuration
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it. In fact, I can ignore it before the loop
if currentPDB.Spec.MaxUnavailable != nil && currentPDB.Spec.MaxUnavailable.IntValue() < 1 { | ||
pdbMisconfiguredWorkloads = append(pdbMisconfiguredWorkloads, key) | ||
} else if currentPDB.Spec.MinAvailable != nil && currentPDB.Spec.MinAvailable.IntValue() < 1 { | ||
pdbMisconfiguredWorkloads = append(pdbMisconfiguredWorkloads, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add a message about why we are mad at this key, by either making pdbMisconfiguredWorkloads
a map from workload key to string:
pdbMisconfiguredWorkloads[key] = fmt.Sprintf("sets minAvailable %s, but $WHY_WE_WANT_IT_UNSET_OR_GREATER_THAN_ZERO", currentPDB.Spec.MinAvailable)
or a map from a misconfigured-reason to a slice of keys:
pdbMisconfiguredWorkloads[pdbMaxUnavailableTooSmall] = append(pdbMisconfiguredWorkloads[pdbMaxUnavailableTooSmall], key)
operatorsHavingPDBS = append(operatorsHavingPDBS, key) | ||
// The convention we came up with for static pod control plane components is to have a deployment | ||
// doing health check for the corresponding component. This can be changed to labels later. | ||
// As of now, I am assuming if the deployment name has `-guard` in the name and it runs in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems reasonable
FYI, if we're interested in draining daemonsets, I have a patch upstream ready to go: kubernetes/kubernetes#88345 This would allow us to put PDBs that cover daemonsets to add back pressure to rolling nodes, however, we will be exposed to endless looping of drain if any daemonset takes longer to stop than the timeout, as a new one would be present when we requeue the drain operation. We would need to implement some kind of filtering mechanism. We added custom pod filters for draining, so it's possible to do that today, we just need to work out what that might look like if we're interested in pursuing this. |
@ravisantoshgudimetla: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This ensures every component is having PDB associated. Need to figure out how to ensure only operands have the PDB not the operators. One way to do it is to exclude
-operator
namespace but not all components are following the convention.it is currently missing static pods, will add them soon.
cc @smarterclayton @wking @deads2k @soltysh