Compacting Incremental Snapshot Files #88

afritzler · 2018-10-29T15:21:01Z

Question: How are you guys dealing with the incremental backup files when restoring a cluster? I am asking, because in Kubify we expect one full snapshot file to trigger a restore operation. What is the best way to compact all the incremental backup files into one? Do you have already something handy?

georgekuruvillak · 2018-10-30T11:40:24Z

We haven't planned compacting the incremental backup files into one as of now. But if you want we create a sub command that will take the incremental snaps and compact them and push them at command execution.

afritzler · 2018-10-30T13:07:32Z

We just need this for our restore process which is manual at the moment. So it would be enough if there would be a sub command which works on a folder containing the full backup and the incremental updates (e.g. downloaded from the corresponding S3 bucket folder) and creates a compacted full backup at the end. We could run this locally. This one we could then easily feed in when we restore the cluster.

georgekuruvillak · 2018-11-02T09:50:35Z

There is one way to do this indirectly. If you delete the member directory and do a data directory initialization, the data directory will be restored from the latest full and incremental snapshot. Can you check if this is sufficient?

swapnilgm · 2018-11-02T09:52:22Z

You can use etcdbrctl restore subcommand to restore from full snapshot including set of incremental snapshots. This will give you working <etcd-data-directory>. if this is not your requirement and you want only full snapshot file to feed in to kubify, you can can use db file from <etcd-data-direcotry>/member/snap/db.

afritzler · 2018-11-02T09:54:27Z

Thanks guys! I will have a look into it.

rfranzke · 2019-03-06T15:37:51Z

Hey guys,
is the compaction still not planned? We take an incremental snapshot every 5 minutes, so for one day we will end up with 288 incremental snapshots (~10 MB). If you now try to restore this it will take ~30-45min which is totally impracticable. We will not be able to implement a control plane move to other seeds if we have these huge restore times.

So, please think about compacting the incremental snapshots so that we can significantly speed up the restoration process.

amshuman-kr · 2019-03-07T07:41:30Z

Yes. This is needed.

amshuman-kr · 2019-03-07T09:08:08Z

We had listed 4 topics for optimization for etcd-backup-restore.

Delta snapshot - Completed (except for the large value case)
Full snapshot - Completed
Database verification - on going
Database restoration - next up

I think that this requirement for compaction of delta snapshots comes under the database restoration optimisation and should be picked along with it. We can even pick it in parallel to the topic of database verification if we have the bandwidth.

vlerenc · 2020-08-13T19:54:03Z

@shreyas-s-rao @swapnilgm Should this ticket be rather moved to https://github.com/gardener/etcd-druid/issues?

While we are at it, maybe there are more issues in this repo that should go there?

swapnilgm · 2020-08-14T03:06:47Z

Makes sense. These issue were crated prior to introducing etcd-druid.

majst01 · 2020-09-30T11:36:22Z

I think there is no real benefit of doing incrementals at all, when looking at our environment, full backup is ~100MB and a increment ~30-40MB uncompressed. I really would propose to switch to full backups only but compress them with lz4, this will lead to smaller full backups than the actual incremental files. The decompression will add ~0.2 sec. per file before the actual restore can start.

But overall the restoration time will decrease by factors.

related to: gardener/etcd-backup-restore#263

shreyas-s-rao · 2020-09-30T14:38:36Z

@majst01 We require incremental snapshots in a general sense to avoid frequent large full snapshots which usually create a network costs. If you do want to avoid delta snapshots, you can simply configure chart/etcd-backup-restore/values.yaml by setting the backup.deltaSnapshotPeriod value to anything less than 1s to completely disable delta snapshots, and you can also configure the backup.schedule value to set the full snapshot schedule to a higher frequency.

Regarding backup compression, we have opened gardener/etcd-backup-restore#255 and it's on the project roadmap.

majst01 · 2020-09-30T14:42:38Z

Thanks @shreyas-s-rao for the hints.

We tried ourselves to set the backup to do full backups only, but from our experience setting backup.schedule had no effect.
Maybe @Gerrit91 can give you more information on that.
We instead modified the generated etcd resource manually which is not the way to go.

amshuman-kr · 2020-10-01T07:22:00Z

We tried ourselves to set the backup to do full backups only, but from our experience setting backup.schedule had no effect.

@majst01 I just tried the following and delta snapshots were disabled with only fullsnapshots enabled.

    backup:
        ...
        fullSnapshotSchedule: "* * * * *"
        ...
        deltaSnapshotPeriod: 0s
        ...

time="2020-10-01T07:09:19Z" level=info msg="Taking scheduled snapshot for time: 2020-10-01 07:09:19.1179478 +0000 UTC" actor=snapshotter
{"level":"warn","ts":"2020-10-01T07:09:19.126Z","caller":"clientv3/retry_interceptor.go:116","msg":"retry stream intercept"}
time="2020-10-01T07:09:19Z" level=info msg="Successfully opened snapshot reader on etcd" actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="Total time to save snapshot: 0.004196 seconds."
time="2020-10-01T07:09:19Z" level=info msg="Successfully saved full snapshot at: Backup-1601536159/Full-00000000-00000001-1601536159" actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="Will take next full snapshot at time: 2020-10-01 07:10:00 +0000 UTC" actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="Setting status to : 200" actor=backup-restore-server
time="2020-10-01T07:09:19Z" level=info msg="Starting snapshotter..." actor=backup-restore-server
time="2020-10-01T07:09:19Z" level=info msg="Taking scheduled snapshot for time: 2020-10-01 07:09:19.132893 +0000 UTC" actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="There are no updates since last snapshot, skipping full snapshot." actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="Stopping full snapshot..." actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="Resetting full snapshot to run after 40.8607142s" actor=snapshotter

We instead modified the generated etcd resource manually which is not the way to go.

Did you mean the above?

amshuman-kr · 2020-10-01T07:30:05Z

I think there is no real benefit of doing incrementals at all, when looking at our environment, full backup is ~100MB

It really depends on the workload pattern in the cluster. In most clusters the delta snapshots are much smaller than 100M but in the seed and garden control-plane, it is quite high. As a consequence, the full snapshot size is also quite high (2-3G).

We have an issue to compress/decompress snapshots (full as well as delta) gardener/etcd-backup-restore#255.

We also have this issue to compact the delta snapshots in the background.

In the special case, the full snapshots are small and comparable to the delta snapshot, it might still make sense to do only full snapshots as you mentioned. Would it make sense to make this configurable in gardener?

majst01 · 2020-10-01T10:10:57Z

@amshuman-kr in which resource did you set these setting ?

    backup:
        ...
        fullSnapshotSchedule: "* * * * *"
        ...
        deltaSnapshotPeriod: 0s
        ...

Gerrit91 · 2020-10-01T10:13:22Z

The etcd resource is deployed by Gardener and I think only the extension-providers modify the values via webhooks. The gardener-extension-provider-gcp has a Schedule field in the deployment config, but as far as I can see from the code this config flag is not used.

amshuman-kr · 2020-10-01T16:43:15Z

@amshuman-kr in which resource did you set these setting ?

The Etcd resource of etcd-druid.

abdasgupta · 2020-11-30T16:42:28Z

/assign

amshuman-kr · 2021-06-14T08:39:20Z

This issue was partially addressed in gardener/etcd-backup-restore#301. The functionality will be complete once #191 is completed.

shreyas-s-rao · 2021-07-15T10:05:46Z

This issue is now fully addressed with #197. Hence closing it.
The snapshot compaction feature will be available in etcd-druid v0.6.0 release shortly.

georgekuruvillak closed this as completed Nov 17, 2018

rfranzke reopened this Mar 6, 2019

swapnilgm self-assigned this Mar 13, 2019

swapnilgm removed their assignment Aug 13, 2019

swapnilgm transferred this issue from gardener/etcd-backup-restore Aug 14, 2020

vlerenc added area/backup Backup related component/etcd-druid ETCD Druid priority/critical Needs to be resolved soon, because it impacts users negatively labels Sep 30, 2020

vlerenc added the roadmap/team-internal label Oct 17, 2020

vlerenc added this to the 2020-Q4 milestone Oct 18, 2020

gardener-robot removed this from the 2020-Q4 milestone Oct 22, 2020

gardener-robot added the status/accepted Issue was accepted as something we need to work on label Oct 22, 2020

vlerenc added this to the 2020-Q4 milestone Nov 10, 2020

gardener-robot assigned abdasgupta Nov 30, 2020

abdasgupta mentioned this issue Jan 5, 2021

Added support for backup compaction gardener/etcd-backup-restore#301

Merged

abdasgupta mentioned this issue Feb 24, 2021

[Feature] Flatten directory structure of backup store of ETCD BR gardener/etcd-backup-restore#308

Closed

vlerenc modified the milestones: 2020-Q4, 2019-Q1, 2021-Q1 Mar 5, 2021

gardener-robot added priority/2 Priority (lower number equals higher priority) and removed priority/critical Needs to be resolved soon, because it impacts users negatively labels Mar 8, 2021

amshuman-kr modified the milestones: 2021-Q1, v0.6.0 Apr 15, 2021

gardener-robot added roadmap/internal and removed roadmap/team-internal labels May 21, 2021

aaronfern mentioned this issue Jun 4, 2021

Added tests for compaction backward compatibility abdasgupta/etcd-backup-restore#1

Merged

amshuman-kr modified the milestones: v0.6.0, 2021-Q2 Jun 7, 2021

amshuman-kr closed this as completed in gardener/etcd-backup-restore#301 Jun 9, 2021

amshuman-kr mentioned this issue Jun 14, 2021

[Feature] Schedule regular backup compaction #191

Closed

amshuman-kr reopened this Jun 14, 2021

amshuman-kr modified the milestones: 2021-Q2, 2021-Q3 Jul 12, 2021

shreyas-s-rao closed this as completed Jul 15, 2021

amshuman-kr mentioned this issue Aug 13, 2021

Enabling scheduled backup compaction for ETCD via gardenlet feature gate gardener/gardener#4511

Closed

abdasgupta mentioned this issue Sep 13, 2021

[Feature] Replace full snapshots at regular interval with compacted snapshots #231

Open

amshuman-kr mentioned this issue Nov 28, 2021

[Feature] Validate and analysis restoration handling in etcd-backup-restore according to the multi-node ETCD proposal gardener/etcd-backup-restore#323

Closed

gardener-robot added the kind/roadmap Roadmap BLI label Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compacting Incremental Snapshot Files #88

Compacting Incremental Snapshot Files #88

afritzler commented Oct 29, 2018

georgekuruvillak commented Oct 30, 2018

afritzler commented Oct 30, 2018 •

edited

georgekuruvillak commented Nov 2, 2018 •

edited

swapnilgm commented Nov 2, 2018 •

edited

afritzler commented Nov 2, 2018

rfranzke commented Mar 6, 2019

amshuman-kr commented Mar 7, 2019

amshuman-kr commented Mar 7, 2019

vlerenc commented Aug 13, 2020

swapnilgm commented Aug 14, 2020

majst01 commented Sep 30, 2020

shreyas-s-rao commented Sep 30, 2020

majst01 commented Sep 30, 2020

amshuman-kr commented Oct 1, 2020 •

edited

amshuman-kr commented Oct 1, 2020

majst01 commented Oct 1, 2020

Gerrit91 commented Oct 1, 2020

amshuman-kr commented Oct 1, 2020 •

edited

abdasgupta commented Nov 30, 2020

amshuman-kr commented Jun 14, 2021

shreyas-s-rao commented Jul 15, 2021

Compacting Incremental Snapshot Files #88

Compacting Incremental Snapshot Files #88

Comments

afritzler commented Oct 29, 2018

georgekuruvillak commented Oct 30, 2018

afritzler commented Oct 30, 2018 • edited

georgekuruvillak commented Nov 2, 2018 • edited

swapnilgm commented Nov 2, 2018 • edited

afritzler commented Nov 2, 2018

rfranzke commented Mar 6, 2019

amshuman-kr commented Mar 7, 2019

amshuman-kr commented Mar 7, 2019

vlerenc commented Aug 13, 2020

swapnilgm commented Aug 14, 2020

majst01 commented Sep 30, 2020

shreyas-s-rao commented Sep 30, 2020

majst01 commented Sep 30, 2020

amshuman-kr commented Oct 1, 2020 • edited

amshuman-kr commented Oct 1, 2020

majst01 commented Oct 1, 2020

Gerrit91 commented Oct 1, 2020

amshuman-kr commented Oct 1, 2020 • edited

abdasgupta commented Nov 30, 2020

amshuman-kr commented Jun 14, 2021

shreyas-s-rao commented Jul 15, 2021

afritzler commented Oct 30, 2018 •

edited

georgekuruvillak commented Nov 2, 2018 •

edited

swapnilgm commented Nov 2, 2018 •

edited

amshuman-kr commented Oct 1, 2020 •

edited

amshuman-kr commented Oct 1, 2020 •

edited