Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy velero into user clusters. #12827

Merged
merged 19 commits into from
Nov 23, 2023

Conversation

moelsayed
Copy link
Contributor

@moelsayed moelsayed commented Nov 8, 2023

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #12828

What type of PR is this?
/kind feature
This PR is the first part of the velero user-cluster integration. It adds an EE controller that mainly deploys necessary components to automatically deploy velero in the user cluster scope.

The following components are deployed in the user-cluster:

  • velero CRDs
  • velero namespace
  • velero service account
  • velero clusterrolebinding. This is bound to the cluster-admin account, which is required to be able to do a full backup/restore procedure.
  • We need to deploy a velero CR, and we don't have a reconciler for it, so I implemented a custom function to deploy it.

The following components are deployed on the seed cluster, in the user-cluster namespace:

  • velero deployment
  • velero cloud credentials secret, which is basically a copy of the secret referenced in the backup destination configured in the seed.

Special notes for your reviewer:

  • The static CRDs are generated by velero and can be safely skipped during review
  • This is an early stage implementation of the feature. Several parts will be refactored and changed in following PRs.

Does this PR introduce a user-facing change? Then add your Release Note here:

NONE

Documentation:

TBD

@kubermatic-bot kubermatic-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. docs/tbd Denotes a PR that needs documentation (change) that will be done later. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. do-not-merge/code-freeze Indicates that a PR should not merge because it has not been approved for code freeze yet. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 8, 2023
@moelsayed moelsayed force-pushed the feature/backup_integration branch 5 times, most recently from 7d016ef to 6a458d2 Compare November 14, 2023 10:17
@@ -0,0 +1,122 @@
apiVersion: apiextensions.k8s.io/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these CRDs come from? How are they kept up-to-date? For cert-manager and the Velero chart, we have scripts in hack/ to fetch them. These however.. ?

A script could also add a comment to these files, like https://github.com/kubermatic/kubermatic/blob/main/charts/backup/velero/crd/backuprepositories.yaml#L1, then you don't need an extra exclusion rule on the boilerplate checker anymore and the next guy after you will know how to update the CRDs.

Comment on lines 1049 to 1053
ClusterbackupKubeconfigSecretName = "velero-kubeconfig"
ClusterbackupUsername = "velero"
ClusterBackupServiceAccountName = "velero"
ClusterBackupNamespaceName = "velero"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some are Clusterbackup and some are ClusterBackup? Why?

@@ -0,0 +1,91 @@
/*
Copyright 2022 The Kubermatic Kubernetes Platform contributors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the year to 2023.

destinations := seed.Spec.EtcdBackupRestore.Destinations
defaultDestination := seed.Spec.EtcdBackupRestore.DefaultDestination
if len(destinations) == 0 || defaultDestination == "" {
log.Infof("seed [%s] has no backup destinations or no default backup destinations defined. Skipping cluster backup config for cluster [%s]", seed.Name, cluster.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we really want to log here. This function might be called many times when reconciling and this could spam the controller's log. Unconditional logs, especially more than debug statements, can quickly become a pain in the rear.

Also, if we're logging, we're doing structured logging (Infow), not printf-style logging (Infof).

Comment on lines 72 to 74
if dest.BucketName == "" || dest.Endpoint == "" || dest.Credentials == nil {
return nil, fmt.Errorf("failed to validate backup destination configuration: bucketName, endpoint or credentials are not valid")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should [also] be validated in the seed validation webhook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be removed in the next PR, as we need a dedicated CR for the backup destinations.

return nil, fmt.Errorf("failed to validate backup destination configuration: bucketName, endpoint or credentials are not valid")
}
return &resources.ClusterBackupConfig{
Enabled: cluster.Spec.Features[kubermaticv1.ClusterFeatureUserClusterBackup],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You checked this earlier, you could just return Enabled: true here.

Comment on lines 81 to 91
func extractClusterSeedName(clusterName, clusterURL string) (string, error) {
u, err := url.Parse(clusterURL)
if err != nil {
return "", fmt.Errorf("failed to parse cluster URL: %w", err)
}
parts := strings.Split(u.Host, ".")
if len(parts) < 4 || clusterName != parts[0] { // at least a cluster name, seed name and a base domain.
return "", fmt.Errorf("invalid cluster URL: %s", u.Host)
}
return parts[1], nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not allowed. The DNS name for seeds can be overwritten and we must never attempt to deduce anything from it. If you need the Seed name, you need the Seed object's real (Kubernetes) name, not some URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored.

Copy link
Contributor

@xrstf xrstf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RC

@xrstf
Copy link
Contributor

xrstf commented Nov 14, 2023

Also, with our cool new hip Application feature, why don't we install Velero using that? We already use it for Cilium (i.e. critical components). -- is it because of the "spli installation", where half of Velero runs on the seed and the rest in the user-cluster, and apps are only installed into user-clusters?

Yeah that's probably it. Scratch that question.

@moelsayed
Copy link
Contributor Author

@xrstf Thank you for the review! If you could please take another look!

@moelsayed moelsayed force-pushed the feature/backup_integration branch 2 times, most recently from e900db8 to 934bb2e Compare November 17, 2023 17:06
@moelsayed moelsayed requested a review from xrstf November 20, 2023 14:10
@embik embik removed the do-not-merge/code-freeze Indicates that a PR should not merge because it has not been approved for code freeze yet. label Nov 22, 2023
@xrstf
Copy link
Contributor

xrstf commented Nov 23, 2023

/approve
/lgtm

@kubermatic-bot kubermatic-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 23, 2023
@kubermatic-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a2a94a0febf55de52f7a7302fe825179d5319cfa

@kubermatic-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xrstf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubermatic-bot kubermatic-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 23, 2023
@kubermatic-triage-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs

Review the full test history

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@kubermatic-triage-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs

Review the full test history

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubermatic-bot kubermatic-bot merged commit 4f25673 into kubermatic:main Nov 23, 2023
20 checks passed
@kubermatic-bot kubermatic-bot added this to the KKP 2.25 milestone Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. docs/tbd Denotes a PR that needs documentation (change) that will be done later. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deploy velero for user-clusters
5 participants