[Feature] Make etcd-backup-restore Kubernetes-aware keep member and backup status up-to-date #333

amshuman-kr · 2021-05-17T08:13:40Z

Feature (What you would like to be added):

Make etcd-backup-restore Kubernetes-aware keep the corresponding member status and BackupReady condition up-to-date in the Etcd resource status.

Motivation (Why is this needed?):

etcd-backup-restore is the closest to the source of this information and making it responsible to update these status sections will be a solution at the lowest viable level and avoid etcd-druid becoming a bottleneck.

Approach/Hint to the implement solution (optional):

Member status can be obtained by a heart-beat on the corresponding ETCD member's endpoint status. The same heart-beat can be used to decide backup leader election in [Feature] Add leader-election to the backup-restore sidecar according to the multi-node ETCD proposal. #321
The BackupReady status could be determined based on whether latest backup upload (full or incremental) successful.

Also, it would be preferable to use StatusWriter.Patch() to avoid race-conditions.

The text was updated successfully, but these errors were encountered:

timuthy · 2021-05-20T12:04:17Z

Also, it would be preferable to use StatusWriter.Patch() to avoid race-conditions.

Please see gardener/gardener#4083. As soon as we only have seeds running 1.18 and the support of Etcd-Druid, we can use Server-Side Apply to let each etcd-backup-restore actor only update the relevant portion of the etcd resource which is status.members[id=<cluster-member-id>].

This will spare us potential conflicts, saves time and network costs.

Here is a simple example of how to use SSA:

	etcd := &druidv1alpha1.Etcd{
		TypeMeta: metav1.TypeMeta{
			APIVersion: "druid.gardener.cloud/v1alpha1",
			Kind:       "Etcd",
		},
		ObjectMeta: metav1.ObjectMeta{
			Name:      <name>,
			Namespace: <namespace>,
		},
		Status: druidv1alpha1.EtcdStatus{
			Members: []druidv1alpha1.EtcdMemberStatus{
				{
					ID:                 "1",
					Name:               "member1",
					Role:               druidv1alpha1.EtcdRoleMember,
					Status:             druidv1alpha1.EtcdMemeberStatusReady,
					Reason:             "up and running",
					LastUpdateTime:     metav1.NewTime(time.Now()),
					LastTransitionTime: metav1.NewTime(time.Now()),
				},
			},
		},
	}

	if err := ec.Status().Patch(ctx, etcd, client.Apply, client.FieldOwner("etcdbr1"), client.ForceOwnership); err != nil {
		return err
	}

shreyas-s-rao · 2021-07-20T14:20:09Z

Backup sidecar will now have to adhere to gardener/etcd-druid#206, by creating and maintaining the EtcdMember resource, if the corresponding CRD exists, or if a corresponding flag is set in the etcdbrctl command that tells the backup sidecar to work in a k8s-aware environment. It will still maintain the BackupReady condition on the Etcd status according to the original proposal.
@amshuman-kr who is responsible for the deletion of the EtcdMember resource? Ideally, upon termination request, the backup sidecar should delete its own EtcdMember before terminating, but there are various situations when that might not be possible. In that case, would etcd-druid wait for failed heartbeat and then decide whether to set status: Unready or delete the resource itself? I'm a bit unclear about this.

amshuman-kr · 2021-07-20T14:28:23Z

who is responsible for the deletion of the EtcdMember resource? Ideally, upon termination request, the backup sidecar should delete its own EtcdMember before terminating, but there are various situations when that might not be possible. In that case, would etcd-druid wait for failed heartbeat and then decide whether to set status: Unready or delete the resource itself?

Whoever was responsible for removing the member entry in the status in the earlier design with SSA. Basically, the custodian controller.

There is still the case of superfluous member entries in etcd cluster which is to be cleaned up by the leading sidecar.

amshuman-kr · 2021-07-26T07:46:50Z

Given the limitations in Server-Side Apply (please see gardener/etcd-druid#179), gardener/etcd-druid#207 describes the alternative of etcd-backup-restore renewing the corresponding lease object continually as a way to inform etcd-druid about the health/status of the corresponding member.

amshuman-kr · 2021-10-25T06:51:38Z

🎉 ❤️

amshuman-kr added the kind/enhancement Enhancement, improvement, extension label May 17, 2021

timuthy mentioned this issue May 21, 2021

[Feature] Prepare Druid for Server-Side Apply gardener/etcd-druid#179

Closed

2 tasks

shreyas-s-rao self-assigned this Jun 29, 2021

shreyas-s-rao added this to the v0.14.0 milestone Jul 7, 2021

This was referenced Sep 20, 2021

Added role, rolebinding, serviceaccount, and environment variables. gardener/etcd-druid#233

Merged

Added new package to allow sidecar to update member lease and snapsho… #382

Merged

aaronfern closed this as completed in #382 Oct 22, 2021

ashwani2k added reelase/beta release/beta Planned for Beta release of the Feature labels Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Make etcd-backup-restore Kubernetes-aware keep member and backup status up-to-date #333

[Feature] Make etcd-backup-restore Kubernetes-aware keep member and backup status up-to-date #333

amshuman-kr commented May 17, 2021 •

edited

Loading

timuthy commented May 20, 2021

shreyas-s-rao commented Jul 20, 2021 •

edited

Loading

amshuman-kr commented Jul 20, 2021

amshuman-kr commented Jul 26, 2021

amshuman-kr commented Oct 25, 2021

[Feature] Make etcd-backup-restore Kubernetes-aware keep member and backup status up-to-date #333

[Feature] Make etcd-backup-restore Kubernetes-aware keep member and backup status up-to-date #333

Comments

amshuman-kr commented May 17, 2021 • edited Loading

timuthy commented May 20, 2021

shreyas-s-rao commented Jul 20, 2021 • edited Loading

amshuman-kr commented Jul 20, 2021

amshuman-kr commented Jul 26, 2021

amshuman-kr commented Oct 25, 2021

amshuman-kr commented May 17, 2021 •

edited

Loading

shreyas-s-rao commented Jul 20, 2021 •

edited

Loading