-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETCD-479: add node selectors and taints to etcdbackup #1604
base: master
Are you sure you want to change the base?
Conversation
@tjungblu: This pull request references ETCD-479 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hello @tjungblu! Some important instructions when contributing to openshift/api: |
@tjungblu: This pull request references ETCD-479 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tjungblu The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@tjungblu: This pull request references ETCD-479 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
// nodeSelector is the node selector applied to the backup pods. | ||
// | ||
// If empty, the cluster-etcd-operator sets a node selector for the | ||
// "node-role.kubernetes.io/master" label. This default is subject to | ||
// change. | ||
// | ||
// +optional | ||
NodeSelector map[string]string `json:"nodeSelector,omitempty"` | ||
|
||
// tolerations is a list of tolerations applied to the backup pods. | ||
// | ||
// If empty, the cluster-etcd-operator sets a toleration for the | ||
// "node-role.kubernetes.io/master" taint. This default is subject to | ||
// change. | ||
// | ||
// +optional | ||
Tolerations []corev1.Toleration `json:"tolerations,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through https://issues.redhat.com/browse/ETCD-479 it seems like the primary rationale is to allow us to force the CronJob's retention pod and the EtcdBackup's backup pod to run on the same master node.
Can you clarify the intended workflow here? I'm guessing we will have the periodic backup controller set the node selector on the CronJob as one of the hostnames e.g
nodeSelector:
kubernetes.io/hostname: "<hostname>"
And then have the CronJob pod always create EtcdBackup
CRs with that node selector set via this field.
Only concern would be on how this plays out with local volumes where the PV is tied to a specific node. Seems like the volumeBindingMode: WaitForFirstConsumer
should account for node selectors in that case.
https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode
Also I notice we don't have the NodeSelector and Tolerations spec on the config.openshift.io/v1alpha1 Backup
config CR. If this is intended to give the admin more control over where the backups are executed, won't we need that config in the scheduled backups as well and not just the one time backups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was also a feature that @deads2k wanted to have a "load" balanced hostPath distribution of the snapshots. There are several challenges exposing it as an API though, you've highlighted some. Let's see how we can find a solution.
Can you clarify the intended workflow here? I'm guessing we will have the periodic backup controller set the node selector on the CronJob as one of the hostnames ... And then have the CronJob pod always create EtcdBackup CRs with that node selector set via this field.
I believe @deads2k has pitched a "pick a random node from the control plane" before running the backup. We obviously can't constantly patch the cronjob with a new node selector every hour. So it makes sense to have that control on the etcdbackup invocation, where we can also ensure the node placement via downward API. The placement of the cron would always be left to the scheduler, but on any control plane node.
If this is intended to give the admin more control over where the backups are executed, won't we need that config in the scheduled backups as well and not just the one time backups?
That makes sense to me, but I didn't want to create more debatable changes for now. We can follow-up on this.
Only concern would be on how this plays out with local volumes where the PV is tied to a specific node. Seems like the volumeBindingMode: WaitForFirstConsumer should account for node selectors in that case.
Here's another catch, while we can take snapshots from any node in the cluster that can reach etcd, the static pod yamls are always on a control plane node :) While you don't strictly need them for all recovery scenarios, it's still useful to have them for a faster recovery (the operators will eventually kick-in and recreate them anyway).
If we feel like it makes little sense to give placement control to customers at all (I would agree), then we still need to solve the retention issue somehow. One would be to directly run the backup from where the cronjob-pod runs, so we don't have another indirection of spawning jobs/pods. It's not as nice architecturally, but that at least ensures the retention can work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k would be great to hear your opinion on that, too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We obviously can't constantly patch the cronjob with a new node selector every hour.
How are the pods getting scheduled in other scenarios?
I wouldn't expect a user to get (or have to) specify node selection criteria and tolerations for etcd backups since we know exactly which hosts such pods should run on.
@tjungblu: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Adding the usual node selector/taint pair to the etcdbackups. That enables us and the customers to more accurately place where the backup is stored and taken from.
Primary use case would be to ensure the retention always runs on the same node as the backup is stored (eg in hostPath scenarios).