Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new configurable field fullSnapshotLeaseUpdateInterval in spec.backup section of Etcd CR #764

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions api/v1alpha1/etcd.go
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,9 @@ type BackupSpec struct {
// All full snapshots beyond this limit will be garbage collected.
// +optional
MaxBackupsLimitBasedGC *int32 `json:"maxBackupsLimitBasedGC,omitempty"`
// FullSnapshotLeaseUpdateInterval defines the interval for retrying to update the full snapshot lease.
// +optional
FullSnapshotLeaseUpdateInterval *metav1.Duration `json:"fullSnapshotLeaseUpdateInterval,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a change in API required, especially when we are no longer going to use snapshot leases when we work on steward. Introducing something now in the API which is anyways going to be removed is not so nice.
Also the original issue was to retry update of full snapshot lease from backup-restore if the first attempt failed. Why does this now result in a ETCD API change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree etcdbr has a default interval for doing this but we can also provide this option to configure from the druid to give a choice for users when this default value doesn't suit their purposes.

Regarding API change, since we have done a lot of api changes in 777, maybe we can push everything together and later when steward comes we can remove this as the changes are very minimal. The reason I say this is because I presume we don't have a near future plan for steward as priorities have changed to improve druid & other security issues. So I'm guessing it will take some time to get steward running, till then we can provide this option.

Also if snapshot leases will be removed in the steward, we can say the same for the PR #820 which decouples the ready condition for snapshot leases. But I think it's important to let the users configure & have better knowledge about the conditions even if they'll be present for a few months. WDYT?

Copy link
Contributor

@unmarshall unmarshall Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why do we need a consumer to define when and at what frequency a retry to update a snapshot lease should be done? Using snapshot leases is an implementation detail and should not be exposed anyways as part of the druid API. Implementations change and with any change in the implementation one cannot keep changing APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, by user I meant Gardener here. The plan was to enable gardener to modify this as per the needs. Just that If they feel 3 min default is not necessary ( as it fills the logs with retry ) and want to increase it, then they have an option to. But yeah I agree it's not a necessary configuration which requires a change to API. I'll close this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can say the same for the PR #820 which decouples the ready condition for snapshot leases.

These are not the same things. What #820 does is quite different. It changes what goes into the status which anyways cannot be influenced by any consumer of druid. The intent of #820 is to improve monitoring of full and delta snapshots that are taken by an etcd cluster. We do not leak any implementation detail into the API.

// GarbageCollectionPeriod defines the period for garbage collecting old backups
// +optional
GarbageCollectionPeriod *metav1.Duration `json:"garbageCollectionPeriod,omitempty"`
Expand Down
5 changes: 5 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,10 @@ spec:
description: EtcdSnapshotTimeout defines the timeout duration
for etcd FullSnapshot operation
type: string
fullSnapshotLeaseUpdateInterval:
description: FullSnapshotLeaseUpdateInterval defines the interval
for retrying to update the full snapshot lease.
type: string
fullSnapshotSchedule:
description: FullSnapshotSchedule defines the cron standard schedule
for full snapshots.
Expand Down
4 changes: 4 additions & 0 deletions config/crd/bases/crd-druid.gardener.cloud_etcds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,10 @@ spec:
description: EtcdSnapshotTimeout defines the timeout duration
for etcd FullSnapshot operation
type: string
fullSnapshotLeaseUpdateInterval:
description: FullSnapshotLeaseUpdateInterval defines the interval
for retrying to update the full snapshot lease.
type: string
fullSnapshotSchedule:
description: FullSnapshotSchedule defines the cron standard schedule
for full snapshots.
Expand Down
3 changes: 3 additions & 0 deletions internal/component/statefulset/builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -501,6 +501,9 @@ func (b *stsBuilder) getBackupStoreCommandArgs() []string {
if b.etcd.Spec.Backup.FullSnapshotSchedule != nil {
commandArgs = append(commandArgs, fmt.Sprintf("--schedule=%s", *b.etcd.Spec.Backup.FullSnapshotSchedule))
}
if b.etcd.Spec.Backup.FullSnapshotLeaseUpdateInterval != nil {
commandArgs = append(commandArgs, fmt.Sprintf("--full-snapshot-lease-update-interval=%s", b.etcd.Spec.Backup.FullSnapshotLeaseUpdateInterval.Duration.String()))
}

// Delta snapshot command line args
// -----------------------------------------------------------------------------------------------------------------
Expand Down