-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new configurable field fullSnapshotLeaseUpdateInterval
in spec.backup section of Etcd CR
#764
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @anveshreddy18 !
Couple of comments from my side:
- PTAL at the points I have mentioned in Add a new configurable field
fullSnapshotLeaseUpdateInterval
in spec.backup section of Etcd CR #763 (comment) - Please use a PR-built image of etcdbr in this PR (at
charts/images.yaml
) so that this PR can be easily tested. You can obtain the PR-built image from the concoursepublish
step from Full snapshot lease update retry on failure etcd-backup-restore#711
fullsnapLeaseUpdateRetryInterval
in spec.backup section of Etcd CRfullSnapshotLeaseUpdateInterval
in spec.backup section of Etcd CR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!
/hold until PR for #728 gets merged, since that will bring changes in the component model for resources deployed by druid. |
…spec.backup section of etcd yaml
7c4f934
to
fefae40
Compare
@@ -159,6 +159,9 @@ type BackupSpec struct { | |||
// All full snapshots beyond this limit will be garbage collected. | |||
// +optional | |||
MaxBackupsLimitBasedGC *int32 `json:"maxBackupsLimitBasedGC,omitempty"` | |||
// FullSnapshotLeaseUpdateInterval defines the interval for retrying to update the full snapshot lease. | |||
// +optional | |||
FullSnapshotLeaseUpdateInterval *metav1.Duration `json:"fullSnapshotLeaseUpdateInterval,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a change in API required, especially when we are no longer going to use snapshot leases when we work on steward. Introducing something now in the API which is anyways going to be removed is not so nice.
Also the original issue was to retry update of full snapshot lease from backup-restore if the first attempt failed. Why does this now result in a ETCD API change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree etcdbr
has a default interval for doing this but we can also provide this option to configure from the druid
to give a choice for users when this default value doesn't suit their purposes.
Regarding API change, since we have done a lot of api changes in 777, maybe we can push everything together and later when steward comes we can remove this as the changes are very minimal. The reason I say this is because I presume we don't have a near future plan for steward as priorities have changed to improve druid & other security issues. So I'm guessing it will take some time to get steward running, till then we can provide this option.
Also if snapshot leases will be removed in the steward
, we can say the same for the PR #820 which decouples the ready condition for snapshot leases. But I think it's important to let the users configure & have better knowledge about the conditions even if they'll be present for a few months. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why do we need a consumer to define when and at what frequency a retry to update a snapshot lease should be done? Using snapshot leases is an implementation detail and should not be exposed anyways as part of the druid API. Implementations change and with any change in the implementation one cannot keep changing APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, by user I meant Gardener here. The plan was to enable gardener to modify this as per the needs. Just that If they feel 3 min default is not necessary ( as it fills the logs with retry ) and want to increase it, then they have an option to. But yeah I agree it's not a necessary configuration which requires a change to API. I'll close this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can say the same for the PR #820 which decouples the ready condition for snapshot leases.
These are not the same things. What #820 does is quite different. It changes what goes into the status which anyways cannot be influenced by any consumer of druid. The intent of #820 is to improve monitoring of full and delta snapshots that are taken by an etcd cluster. We do not leak any implementation detail into the API.
How to categorize this PR?
/area usability
/kind enhancement
What this PR does / why we need it:
This PR adds a new field
fullSnapshotLeaseUpdateInterval
in thespec.backup
section of Etcd yaml and makes necessary changes, which allows to configurefull-snapshot-lease-update-interval
parameter used to configure the interval to retry updating full snapshot lease.full-snapshot-lease-update-interval
to configure the retry interval for updating the full snapshot lease. Adding this newfullSnapshotLeaseUpdateInterval
field to Etcd CR allows user to control the behaviour of retrying to update full snapshot leaseNote: It will be an optional field, and when not set, backup-restore takes care of setting a default value to it.
Which issue(s) this PR fixes:
Fixes #763
Special notes for your reviewer:
Release note: