Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added role, rolebinding, serviceaccount, and environment variables. #233

Merged
merged 4 commits into from
Nov 5, 2021

Conversation

aaronfern
Copy link
Contributor

@aaronfern aaronfern commented Sep 20, 2021

Added role, rolebinding, serviceaccount, and environment variables to allow etcd-backup-restore sidecar to become kubernetes aware

How to categorize this PR?

/area backup
/area disaster-recovery
/area high-availability
/kind enhancement

What this PR does / why we need it:
As part of making the backup-restore sidecar kubernetes aware, this PR does the following

  • Allows druid to now create serviceaccount, role, and rolebinding and associates them to the etcd pods, to authenticate it to the API server.
  • Passes the pod name and current namespace to the backup-restore pod as environment variables
  • Removes logic for cronjob as it is to be replaced with trigger based compaction jobs instead

Which issue(s) this PR fixes:
Partially fixes gardener/etcd-backup-restore#333

Special notes for your reviewer:

Release note:

`druid` will now also reconcile a `serviceaccount`, a `role`, and a `rolebinding` as part of it's `etcd` reconcile flow and associate it with the etcd pod

@aaronfern aaronfern requested a review from a team as a code owner September 20, 2021 15:29
@gardener-robot gardener-robot added area/backup Backup related area/disaster-recovery Disaster recovery related area/high-availability High availability related kind/enhancement Enhancement, improvement, extension needs/review Needs review size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else labels Sep 20, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 20, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 22, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 22, 2021
@timuthy
Copy link
Member

timuthy commented Sep 29, 2021

/assign

Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I left a few comments and questions.

charts/etcd/templates/etcd-role.yaml Show resolved Hide resolved
charts/etcd/templates/etcd-role.yaml Outdated Show resolved Hide resolved
charts/etcd/templates/etcd-rolebinding.yaml Show resolved Hide resolved
controllers/controller_ref_manager.go Outdated Show resolved Hide resolved
@gardener-robot gardener-robot added the needs/changes Needs (more) changes label Sep 29, 2021
Copy link
Contributor

@shreyas-s-rao shreyas-s-rao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronfern Thanks for the PR. This PR will now need to be modified to remove cronjob, after out-of-band discussions to introduce trigger-based compaction jobs. Also, please remove the claim logic for the three resources you've introduced, as this is not a necessity, as mentioned here. Thanks.

@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 6, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 6, 2021
@abdasgupta abdasgupta changed the title Added role, rolebinding, serviceaccount, and environment variables to… Added role, rolebinding, serviceaccount, and environment variables. Oct 6, 2021
Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback @aaronfern. After having a second look, I added some additional comments.

charts/etcd/templates/etcd-role.yaml Show resolved Hide resolved
charts/etcd/templates/etcd-statefulset.yaml Outdated Show resolved Hide resolved
config/crd/bases/druid.gardener.cloud_etcds.yaml Outdated Show resolved Hide resolved
controllers/etcd_controller.go Outdated Show resolved Hide resolved

if !backupCompactionScheduleFound {
if err == nil {
err = r.Delete(ctx, &cronJob, client.PropagationPolicy(metav1.DeletePropagationForeground))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we can simply remove the deletion of cron jobs because people might have already enabled that feature, so these objects will remain in the cluster after switching to Job in #235.
In general I have to admit, I don't like that this PR tries to include both: Adding new resource for Backup-Restore and removing the CronJob behavior. Why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timuthy Unfortunately since we saw the cost impact of cron-schedule-based compaction only after making the release v0.6.0, we came up with the on-demand compaction job approach and decided to release it as a patch on v0.6.0 rather than a new release v0.7.0, to avoid making v0.6.0 a "dead" release (because v0.6.0, though technically usable, is not practically a viable solution).

I agree, we also need to include code to remove any possible cronjob resources in the cluster created previously, which should be ideally handled by #235 .

Agreed that it would make more sense that the switch from cronjob to job happens fully in #235 , while this PR only takes care of adding the RBAC required for backup sidecar's lease updation feature.

@abdasgupta WDYT? Can you add the cronjob removal code and removal of BackupCompactionSchedule in your PR with a breaking change release note?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I already took care of cron job related codes in my PR. Would add a release note. Thanks.

@@ -141,9 +141,6 @@ type BackupSpec struct {
// EnableProfiling defines if profiling should be enabled for the etcd-backup-restore-sidecar
// +optional
EnableProfiling *bool `json:"enableProfiling,omitempty"`
// BackupCompactionSchedule defines the cron standard for compacting the snapstore
// +optional
BackupCompactionSchedule *string `json:"backupCompactionSchedule,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a breaking change release note for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this deletion. See previous comment about separation of duties between this PR and #235

@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 12, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 12, 2021
Copy link
Contributor

@shreyas-s-rao shreyas-s-rao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronfern Please remove the cronjob-removal related changes from your PR, since @abdasgupta will be including those changes in #235 with a breaking change release note. Let's keep this PR only for adding RBAC for backup sidecar to renew snapshot and member leases.

Please also add the relevant tests for sa, role, rb, like ensuring that the sa has read/write permissions on leases but not on other objects, and also ensuring that if the role/rb are not deployed, then sa cannot access leases as well.

@@ -1,241 +0,0 @@
apiVersion: batch/v1beta1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should ideally be removed in #235. Please undo this deletion here, since we don't want to mix the RBAC for backup sidecar along with cronjob removal. Those are two separate changes.
cc @abdasgupta

@@ -87,11 +87,6 @@ func (in *BackupSpec) DeepCopyInto(out *BackupSpec) {
*out = new(bool)
**out = **in
}
if in.BackupCompactionSchedule != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment.

@@ -141,9 +141,6 @@ type BackupSpec struct {
// EnableProfiling defines if profiling should be enabled for the etcd-backup-restore-sidecar
// +optional
EnableProfiling *bool `json:"enableProfiling,omitempty"`
// BackupCompactionSchedule defines the cron standard for compacting the snapstore
// +optional
BackupCompactionSchedule *string `json:"backupCompactionSchedule,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this deletion. See previous comment about separation of duties between this PR and #235

@@ -46,12 +46,6 @@ spec:
backup:
description: BackupSpec defines parametes associated with the full and delta snapshots of etcd
properties:
backupCompactionSchedule:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment about separation of tasks between this PR and #235

@@ -31,8 +31,8 @@ import (
"k8s.io/client-go/tools/cache"

appsv1 "k8s.io/api/apps/v1"
batchv1 "k8s.io/api/batch/v1beta1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my previous comment about separation of tasks between this PR and #235. Also applies to the other cronjob related changes in this file as well.

@@ -104,7 +104,7 @@ var _ = BeforeSuite(func(done Done) {
Expect(err).NotTo(HaveOccurred())

Expect(err).NotTo(HaveOccurred())
er, err := NewEtcdReconcilerWithImageVector(mgr, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment about separation of tasks between this PR and #235

@@ -791,142 +797,129 @@ func (r *EtcdReconciler) getStatefulSetFromEtcd(etcd *druidv1alpha1.Etcd, values
return decoded, nil
}

func (r *EtcdReconciler) reconcileCronJob(ctx context.Context, logger logr.Logger, etcd *druidv1alpha1.Etcd, values map[string]interface{}) (*batchv1.CronJob, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment about separation of tasks between this PR and #235. Also applies to other cronjob-related changes in this file.

@shreyas-s-rao
Copy link
Contributor

/needs rebase

@gardener-robot gardener-robot added needs/rebase Needs git rebase size/l Size of pull request is large (see gardener-robot robot/bots/size.py) and removed size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) labels Oct 22, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 22, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 22, 2021
@gardener-robot gardener-robot added size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) and removed size/l Size of pull request is large (see gardener-robot robot/bots/size.py) labels Oct 22, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 22, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 22, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 22, 2021
Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, otherwise it looks good.

@@ -777,8 +804,90 @@ var _ = Describe("Druid", func() {
})
})

Describe("with etcd resource", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to create a separate test case for the additional resources? I would suggest to add the validation to existing tests, e.g. DescribeTable("when etcd resource is created with backupCompactionSchedule field". This will save some testing time. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense
Updated to add role validations to existing tests

@gardener-robot gardener-robot added size/l Size of pull request is large (see gardener-robot robot/bots/size.py) and removed size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) labels Oct 26, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 26, 2021
Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/changes Needs (more) changes needs/rebase Needs git rebase needs/review Needs review needs/second-opinion Needs second review by someone else labels Nov 5, 2021
@timuthy timuthy merged commit 222a1cc into gardener:master Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backup Backup related area/disaster-recovery Disaster recovery related area/high-availability High availability related kind/enhancement Enhancement, improvement, extension needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/l Size of pull request is large (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Make etcd-backup-restore Kubernetes-aware keep member and backup status up-to-date
8 participants