Add a new configurable field `fullSnapshotLeaseUpdateInterval` in spec.backup section of Etcd CR #764

anveshreddy18 · 2024-02-08T11:51:28Z

How to categorize this PR?

/area usability
/kind enhancement

What this PR does / why we need it:

This PR adds a new field fullSnapshotLeaseUpdateInterval in the spec.backup section of Etcd yaml and makes necessary changes, which allows to configure full-snapshot-lease-update-interval parameter used to configure the interval to retry updating full snapshot lease.

The backup-restore PR#711 introduces a new flag full-snapshot-lease-update-interval to configure the retry interval for updating the full snapshot lease. Adding this new fullSnapshotLeaseUpdateInterval field to Etcd CR allows user to control the behaviour of retrying to update full snapshot lease

Note: It will be an optional field, and when not set, backup-restore takes care of setting a default value to it.

Which issue(s) this PR fixes:
Fixes #763

Special notes for your reviewer:

Release note:

Enabling the configurability of `full-snapshot-lease-update-interval` flag through the etcd resource spec `.spec.backup.fullSnapshotLeaseUpdateInterval`.

shreyas-s-rao

Thanks for the PR @anveshreddy18 !

Couple of comments from my side:

PTAL at the points I have mentioned in Add a new configurable field fullSnapshotLeaseUpdateInterval in spec.backup section of Etcd CR #763 (comment)
Please use a PR-built image of etcdbr in this PR (at charts/images.yaml) so that this PR can be easily tested. You can obtain the PR-built image from the concourse publish step from Full snapshot lease update retry on failure etcd-backup-restore#711

ishan16696

LGTM!!

shreyas-s-rao · 2024-03-21T05:51:11Z

/hold until PR for #728 gets merged, since that will bring changes in the component model for resources deployed by druid.

…spec.backup section of etcd yaml

unmarshall · 2024-06-27T05:40:51Z

api/v1alpha1/etcd.go

@@ -159,6 +159,9 @@ type BackupSpec struct {
 	// All full snapshots beyond this limit will be garbage collected.
 	// +optional
 	MaxBackupsLimitBasedGC *int32 `json:"maxBackupsLimitBasedGC,omitempty"`
+	// FullSnapshotLeaseUpdateInterval defines the interval for retrying to update the full snapshot lease.
+	// +optional
+	FullSnapshotLeaseUpdateInterval *metav1.Duration `json:"fullSnapshotLeaseUpdateInterval,omitempty"`


Why is a change in API required, especially when we are no longer going to use snapshot leases when we work on steward. Introducing something now in the API which is anyways going to be removed is not so nice.
Also the original issue was to retry update of full snapshot lease from backup-restore if the first attempt failed. Why does this now result in a ETCD API change?

I agree etcdbr has a default interval for doing this but we can also provide this option to configure from the druid to give a choice for users when this default value doesn't suit their purposes.

Regarding API change, since we have done a lot of api changes in 777, maybe we can push everything together and later when steward comes we can remove this as the changes are very minimal. The reason I say this is because I presume we don't have a near future plan for steward as priorities have changed to improve druid & other security issues. So I'm guessing it will take some time to get steward running, till then we can provide this option.

Also if snapshot leases will be removed in the steward, we can say the same for the PR #820 which decouples the ready condition for snapshot leases. But I think it's important to let the users configure & have better knowledge about the conditions even if they'll be present for a few months. WDYT?

Can you explain why do we need a consumer to define when and at what frequency a retry to update a snapshot lease should be done? Using snapshot leases is an implementation detail and should not be exposed anyways as part of the druid API. Implementations change and with any change in the implementation one cannot keep changing APIs.

Sorry, by user I meant Gardener here. The plan was to enable gardener to modify this as per the needs. Just that If they feel 3 min default is not necessary ( as it fills the logs with retry ) and want to increase it, then they have an option to. But yeah I agree it's not a necessary configuration which requires a change to API. I'll close this

we can say the same for the PR #820 which decouples the ready condition for snapshot leases.

These are not the same things. What #820 does is quite different. It changes what goes into the status which anyways cannot be influenced by any consumer of druid. The intent of #820 is to improve monitoring of full and delta snapshots that are taken by an etcd cluster. We do not leak any implementation detail into the API.

anveshreddy18 requested a review from a team as a code owner February 8, 2024 11:51

gardener-robot added area/usability Usability related kind/enhancement Enhancement, improvement, extension needs/review Needs review size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Feb 8, 2024

anveshreddy18 self-assigned this Feb 8, 2024

shreyas-s-rao requested changes Feb 8, 2024

View reviewed changes

gardener-robot added the needs/changes Needs (more) changes label Feb 8, 2024

gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Feb 9, 2024

anveshreddy18 changed the title ~~Add a new configurable field fullsnapLeaseUpdateRetryInterval in spec.backup section of Etcd CR~~ Add a new configurable field fullSnapshotLeaseUpdateInterval in spec.backup section of Etcd CR Feb 9, 2024

anveshreddy18 mentioned this pull request Feb 9, 2024

Full snapshot lease update retry on failure gardener/etcd-backup-restore#711

Merged

ishan16696 approved these changes Mar 15, 2024

View reviewed changes

gardener-robot added the reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies label Mar 21, 2024

Add a new configurable field fullsnapshotLeaseUpdateRetryInterval in …

fefae40

…spec.backup section of etcd yaml

anveshreddy18 force-pushed the configure/fullsnapshot-lease-update-retry-interval branch from 7c4f934 to fefae40 Compare June 24, 2024 11:52

gardener-robot added size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) and removed size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Jun 24, 2024

gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jun 24, 2024

unmarshall requested changes Jun 27, 2024

View reviewed changes

anveshreddy18 closed this Jun 27, 2024

gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new configurable field `fullSnapshotLeaseUpdateInterval` in spec.backup section of Etcd CR #764

Add a new configurable field `fullSnapshotLeaseUpdateInterval` in spec.backup section of Etcd CR #764

anveshreddy18 commented Feb 8, 2024 •

edited

Loading

shreyas-s-rao left a comment

ishan16696 left a comment

shreyas-s-rao commented Mar 21, 2024

unmarshall Jun 27, 2024

anveshreddy18 Jun 27, 2024

unmarshall Jun 27, 2024 •

edited

Loading

anveshreddy18 Jun 27, 2024

unmarshall Jun 27, 2024

Add a new configurable field fullSnapshotLeaseUpdateInterval in spec.backup section of Etcd CR #764

Add a new configurable field fullSnapshotLeaseUpdateInterval in spec.backup section of Etcd CR #764

Conversation

anveshreddy18 commented Feb 8, 2024 • edited Loading

shreyas-s-rao left a comment

Choose a reason for hiding this comment

ishan16696 left a comment

Choose a reason for hiding this comment

shreyas-s-rao commented Mar 21, 2024

unmarshall Jun 27, 2024

Choose a reason for hiding this comment

anveshreddy18 Jun 27, 2024

Choose a reason for hiding this comment

unmarshall Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

anveshreddy18 Jun 27, 2024

Choose a reason for hiding this comment

unmarshall Jun 27, 2024

Choose a reason for hiding this comment

Add a new configurable field `fullSnapshotLeaseUpdateInterval` in spec.backup section of Etcd CR #764

Add a new configurable field `fullSnapshotLeaseUpdateInterval` in spec.backup section of Etcd CR #764

anveshreddy18 commented Feb 8, 2024 •

edited

Loading

unmarshall Jun 27, 2024 •

edited

Loading