Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD-479: add node selectors and taints to etcdbackup #1604

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tjungblu
Copy link

@tjungblu tjungblu commented Sep 29, 2023

Adding the usual node selector/taint pair to the etcdbackups. That enables us and the customers to more accurately place where the backup is stored and taken from.

Primary use case would be to ensure the retention always runs on the same node as the backup is stored (eg in hostPath scenarios).

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 29, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 29, 2023

@tjungblu: This pull request references ETCD-479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 29, 2023

Hello @tjungblu! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 29, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 29, 2023

@tjungblu: This pull request references ETCD-479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 29, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tjungblu
Once this PR has been reviewed and has the lgtm label, please assign spadgett for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 29, 2023

@tjungblu: This pull request references ETCD-479 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

Adding the usual node selector/taint pair to the etcdbackups. That enables us and the customers to more accurately place where the backup is stored and taken from.

Primary use case would be to ensure the retention always runs on the same node as the backup is stored (eg in hostPath scenarios).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment on lines +42 to +58
// nodeSelector is the node selector applied to the backup pods.
//
// If empty, the cluster-etcd-operator sets a node selector for the
// "node-role.kubernetes.io/master" label. This default is subject to
// change.
//
// +optional
NodeSelector map[string]string `json:"nodeSelector,omitempty"`

// tolerations is a list of tolerations applied to the backup pods.
//
// If empty, the cluster-etcd-operator sets a toleration for the
// "node-role.kubernetes.io/master" taint. This default is subject to
// change.
//
// +optional
Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through https://issues.redhat.com/browse/ETCD-479 it seems like the primary rationale is to allow us to force the CronJob's retention pod and the EtcdBackup's backup pod to run on the same master node.

Can you clarify the intended workflow here? I'm guessing we will have the periodic backup controller set the node selector on the CronJob as one of the hostnames e.g

nodeSelector:
  kubernetes.io/hostname: "<hostname>"

And then have the CronJob pod always create EtcdBackup CRs with that node selector set via this field.

Only concern would be on how this plays out with local volumes where the PV is tied to a specific node. Seems like the volumeBindingMode: WaitForFirstConsumer should account for node selectors in that case.
https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode

Also I notice we don't have the NodeSelector and Tolerations spec on the config.openshift.io/v1alpha1 Backup config CR. If this is intended to give the admin more control over where the backups are executed, won't we need that config in the scheduled backups as well and not just the one time backups?

Copy link
Author

@tjungblu tjungblu Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was also a feature that @deads2k wanted to have a "load" balanced hostPath distribution of the snapshots. There are several challenges exposing it as an API though, you've highlighted some. Let's see how we can find a solution.

Can you clarify the intended workflow here? I'm guessing we will have the periodic backup controller set the node selector on the CronJob as one of the hostnames ... And then have the CronJob pod always create EtcdBackup CRs with that node selector set via this field.

I believe @deads2k has pitched a "pick a random node from the control plane" before running the backup. We obviously can't constantly patch the cronjob with a new node selector every hour. So it makes sense to have that control on the etcdbackup invocation, where we can also ensure the node placement via downward API. The placement of the cron would always be left to the scheduler, but on any control plane node.

If this is intended to give the admin more control over where the backups are executed, won't we need that config in the scheduled backups as well and not just the one time backups?

That makes sense to me, but I didn't want to create more debatable changes for now. We can follow-up on this.

Only concern would be on how this plays out with local volumes where the PV is tied to a specific node. Seems like the volumeBindingMode: WaitForFirstConsumer should account for node selectors in that case.

Here's another catch, while we can take snapshots from any node in the cluster that can reach etcd, the static pod yamls are always on a control plane node :) While you don't strictly need them for all recovery scenarios, it's still useful to have them for a faster recovery (the operators will eventually kick-in and recreate them anyway).


If we feel like it makes little sense to give placement control to customers at all (I would agree), then we still need to solve the retention issue somehow. One would be to directly run the backup from where the cronjob-pod runs, so we don't have another indirection of spawning jobs/pods. It's not as nice architecturally, but that at least ensures the retention can work correctly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k would be great to hear your opinion on that, too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We obviously can't constantly patch the cronjob with a new node selector every hour.

How are the pods getting scheduled in other scenarios?

I wouldn't expect a user to get (or have to) specify node selection criteria and tolerations for etcd backups since we know exactly which hosts such pods should run on.

Copy link
Contributor

openshift-ci bot commented Mar 19, 2024

@tjungblu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-crd-schema c6f605b link true /test verify-crd-schema
ci/prow/e2e-aws-serial c6f605b link true /test e2e-aws-serial
ci/prow/e2e-upgrade-minor c6f605b link true /test e2e-upgrade-minor

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants