Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[architecture] Performance of `restic snapshots` with high-latency remote #523

Open
mappu opened this Issue May 19, 2016 · 5 comments

Comments

Projects
None yet
2 participants
@mappu
Copy link
Contributor

mappu commented May 19, 2016

Output of restic version

Hi,

Current restic performs sequential roundtrips in some operations. One of these operations is snapshot list. With high-latency (e.g. 150ms transatlantic) remote server (e.g. restic-server) and unbounded repository growth (e.g. hourly backups since ~1 month = 600 snapshots), then restic requires ~2 minutes just to list available snapshots.

I have a patch that parallelizes requests in restic.LoadAllSnapshots(), and additionally a separate patch so that this code can actually be used inside cmd_snapshots.go since currently it's reimplemented. But, i think that's a stopgap measure and doesn't significantly address the issue. In the long term, what is the architectural possibility of building a kind of packfile system but for the snapshots?

Expected behavior

restic snapshots should run in time not proportional to latency*numSnapshots

Actual behavior

restic snapshots runs in time proportional to latency*numSnapshots

Steps to reproduce the behavior

  1. Make 600 or more snapshots - doesn't matter the content - to a high-latency server
  2. Observe the time needed to run restic snapshots
@mappu

This comment has been minimized.

Copy link
Contributor Author

mappu commented May 19, 2016

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented May 21, 2016

Hey, thanks for raising this issue. As you already discovered, this patch will only divide the time needed to list all snapshots by four, so it's not a long-term fix. If you want to create a PR to mitigate the issue for some time, please go ahead. In order to get that PR merged, please have a look at the worker pool implementation in restic/worker godoc and check if you can use it. You can see an example for its usage here: https://github.com/restic/restic/blob/master/src/cmds/restic/cmd_rebuild_index.go#L31-L66

For a long-term fix, I plan to make a change to the repository layout and allow packed snapshots. We already have the infrastructure for packing files (and it works really well), so when the prune/optimize command (whatever it will be called) runs, it will also pack snapshots. Since the data stored for each snapshot is really tiny it allows us to pack A LOT of snapshots into one packfile.

What do you think?

@fd0 fd0 added the enhancement label May 21, 2016

@mappu

This comment has been minimized.

Copy link
Contributor Author

mappu commented Jul 28, 2016

For a long-term fix, I plan to make a change to the repository layout and allow packed snapshots. We already have the infrastructure for packing files (and it works really well), so when the prune/optimize command (whatever it will be called) runs, it will also pack snapshots. Since the data stored for each snapshot is really tiny it allows us to pack A LOT of snapshots into one packfile.

It's an interesting idea. I like the reuse of the same pack system.

But it's only efficient as long as all the snapshots end up in the same pack, and, the whole pack is downloaded once and cached. If prune/optimize runs multiple times, then the snapshots will end up scattered throughout many packs. In the worst case (optimize after each backup) it's no more efficient than the current system.

I think optimize after each backup is a normal use case (e.g. backup ; delete snapshots older than 90 days ; optimize ) so this worst case would be hit often.

Maybe better to use the "supercedes" concept from the index system instead?

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Jul 28, 2016

Ah, maybe I should've made that clearer: In my plan, the pack files containing snapshots will live below the snapshots directory in the repository, and pack files there will only contain snapshots and no other data, so it's easy to cache them.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 3, 2018

Still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.