Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new optimized multithreaded restore implementation #1719

Merged
merged 13 commits into from Oct 14, 2018

Conversation

Projects
None yet
@ifedorenko
Copy link
Contributor

ifedorenko commented Apr 13, 2018

What is the purpose of this change? What does it change?

New multithreaded restore implementation with drastically reduced number of backend calls.

This pull request supersedes #1672

Was the change discussed in an issue or in the forum before?

Yes, it resolves #1605.

Checklist

  • I have read the Contribution Guidelines
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@ifedorenko ifedorenko referenced this pull request Apr 13, 2018

Closed

faster restorer musings, very much WIP #1672

2 of 7 tasks complete
@ifedorenko

This comment has been minimized.

Copy link
Contributor Author

ifedorenko commented Apr 13, 2018

@fd0 there are still few TODO's I plan to work on over the weekend, but I feel the code is mostly in its final shape and I'd appreciate your feedback on overall structure and readability, additional tests you think are worth writing, missing or unclear comments and anything else you find wrong. Thank you in advance.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Apr 13, 2018

Codecov Report

Merging #1719 into master will decrease coverage by 3.09%.
The diff coverage is 82.66%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #1719     +/-   ##
=========================================
- Coverage   49.95%   46.85%   -3.1%     
=========================================
  Files         169      176      +7     
  Lines       13568    14060    +492     
=========================================
- Hits         6778     6588    -190     
- Misses       5773     6479    +706     
+ Partials     1017      993     -24
Impacted Files Coverage Δ
cmd/restic/cmd_restore.go 53.65% <0%> (ø) ⬆️
internal/restorer/packcache.go 100% <100%> (ø)
internal/restorer/restorer.go 43.52% <50.9%> (-1.86%) ⬇️
internal/restic/node.go 40.83% <66.66%> (+0.93%) ⬆️
internal/restorer/filerestorer.go 76.05% <76.05%> (ø)
internal/restorer/fileswriter.go 79.48% <79.48%> (ø)
internal/restorer/filepacktraverser.go 84.61% <84.61%> (ø)
internal/restorer/packqueue.go 92.07% <92.07%> (ø)
internal/restorer/packheap.go 92.3% <92.3%> (ø)
internal/backend/b2/b2.go 0% <0%> (-80.69%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41a4d67...21a3486. Read the comment docs.

@ifedorenko ifedorenko force-pushed the ifedorenko:mt-restorer branch from 832efc5 to f240481 Apr 16, 2018

files []*fileInfo
}

func NewFileRestorer(repo restic.Repository) restic.FileRestorer {

This comment has been minimized.

@houndci-bot

houndci-bot Apr 16, 2018

exported function NewFileRestorer should have comment or be unexported

@ifedorenko

This comment has been minimized.

Copy link
Contributor Author

ifedorenko commented Apr 16, 2018

@fd0 I made all changes I wanted to make and believe this is now ready for review and testing. I may still do few "finishing touches", like rereading/reediting comments and such, but do not plan any further work until you have a chance to review this PR (or somebody finds/reports bugs).

@ifedorenko ifedorenko force-pushed the ifedorenko:mt-restorer branch from e0e112c to 5fddc4e Apr 20, 2018

@ifedorenko ifedorenko referenced this pull request Apr 26, 2018

Closed

Slow restore #1605

@ifedorenko ifedorenko force-pushed the ifedorenko:mt-restorer branch from 5fddc4e to b092df8 Apr 30, 2018

@ifedorenko ifedorenko force-pushed the ifedorenko:mt-restorer branch 3 times, most recently from e56071c to a4f2de0 Apr 30, 2018

@fd0
Copy link
Member

fd0 left a comment

So, I finally found some time to look through your PoC implementation, after finishing #1494 (and having a newborn baby). I really like it already (e.g. the treeVisitor struct)!

Things we need to do in order to get it merged:

  • Move the restorer to its own package, internal/restorer, similar to the archiver. That it was still in the internal/restic package was just a sign that it hasn't been refactored yet.
  • Remove the legacy implementation
  • I'd rather have the option to verify files as a separate PR, as it's a different feature, and leave it out for now. What are your thoughts on this?
  • The comment lines (with //////) need to be removed :) I'd rather like to improve the documentation in the source that godoc generates (you're aware of godoc, aren't you?), so we can read it there.
  • Once you move to the separate package, you can create smaller files for each component, so the comment line shouldn't be necessary any more.

Once we've reworked this PR a bit, we can add a user interface similar to what the new archiver (in master) has. If you like, I can work on that.

@ifedorenko

This comment has been minimized.

Copy link
Contributor Author

ifedorenko commented May 11, 2018

First, congrats with the baby!

Distant second, agree with all your suggestions.

Not sure if you've noticed, but the changes in this PR are already grouped into separate commits which I plan to have reviewed and merged separately. Unfortunately github sucks at managing multiple dependent changes, so I created this all-in-one PR (and github shows order of commits wrong... grrrr).

I don't mind at all if you add user interface to the restorer, but probably makes sense to wait until I move the restorer to the new package. Give me few days.

@ifedorenko ifedorenko force-pushed the ifedorenko:mt-restorer branch from a4f2de0 to d6883f5 May 11, 2018

@ifedorenko ifedorenko referenced this pull request May 11, 2018

Merged

Restore verify #1772

5 of 7 tasks complete

@ifedorenko ifedorenko force-pushed the ifedorenko:mt-restorer branch from d6883f5 to 64b842b May 11, 2018

@ifedorenko

This comment has been minimized.

Copy link
Contributor Author

ifedorenko commented May 11, 2018

@fd0 I've made all requested changes, so please have another look when you have time.

I decided I would prefer to keep --verify. The changes to restore logic are significant and I think it is important to have extra validation. I did move --verify to separate PR #1772, which is a prerequisite of this one.

@ifedorenko ifedorenko force-pushed the ifedorenko:mt-restorer branch 2 times, most recently from 61badb1 to 6c5e9a1 May 11, 2018

@fbarbeira

This comment has been minimized.

Copy link

fbarbeira commented May 22, 2018

Please go ahead with this one! 👍

Slow restorations is a blocking item for us.

@ifedorenko

This comment has been minimized.

Copy link
Contributor Author

ifedorenko commented May 22, 2018

@fbarbeira have you checked this change improves restore performance for you? if not, can you?

@ifedorenko ifedorenko force-pushed the ifedorenko:mt-restorer branch from 6c5e9a1 to 3af68d9 May 22, 2018

@skriss

This comment has been minimized.

Copy link
Contributor

skriss commented May 22, 2018

Looking forward to this PR merging, the improvements are great from testing I've done so far :)

@fbarbeira

This comment has been minimized.

Copy link

fbarbeira commented May 25, 2018

Great! I will try as soon as I can.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented May 25, 2018

FYI, I'm sick and swamped with other work.

@ifedorenko ifedorenko force-pushed the ifedorenko:mt-restorer branch from 3af68d9 to fa8854c May 26, 2018

@by-cx

This comment has been minimized.

Copy link

by-cx commented May 27, 2018

I have check this pull request on our servers and the result is amazing. With this patch, restic is finally usable in our infrastructure and we will be able to kill whole physical server for backuping. Restore via DigitalOcean Spaces backend was fast as hell.

@mholt

This comment has been minimized.

Copy link
Contributor

mholt commented Oct 4, 2018

That's awesome. Is it possible to make the intensive memory use optional? Can't tell if it's the UI or the concurrency that is causing that.

@ifedorenko

This comment has been minimized.

Copy link
Contributor Author

ifedorenko commented Oct 4, 2018

I am confident it is possible to reduce memory footprint of the multithreaded restorer, but I suggest to deal with memory usage optimization after this pull request is merged.

I think it is reasonable to keep both the old and the new restorer implementations so the users can choose, but this is up to fd0 to decide (he did explicitly ask to remove the old restorer as part of this PR, fwiw).

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 4, 2018

I'm very sorry, I hit a bit of a rough spot in terms of "free time" the last couple of weeks. I'd like to get this merged as soon as possible :)

Next steps for me:

  • Release 0.9.3
  • Get this PR merged (together with the UI PR)
  • Then iterate and improve it :)
@fd0

fd0 approved these changes Oct 14, 2018

ifedorenko and others added some commits Apr 8, 2018

Cleanup: more realistic restorer test data setup
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
restore: New optimized multithreaded implementation
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
restore: Removed legacy restore implementation
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
go mod vendor
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
restore: Fix small memory leak in filesWriter, add tests
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
restore: Fix packcache capacity math with failed downloads
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
restore: Chang fileInfo to use snapshot location instead of target path
* uses less memory as common prefix is only stored once
* stepping stone for simpler error callback api, which
  will allow further memory footprint reduction

Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
restore: significantly reduce memory footprint
reworked restore error callback to use file location
path instead of much heavier Node. this reduced restore
memory usage by as much as 50% in some of my tests.

Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
restorer: Optimize empty file restore
don't create fileInfo structs for empty files. this saves memory.
this also avoids extra serial scan of all fileInfo, which should
make restore faster and more consistent.

Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
restorer: Add a note on hardlink metadata
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>

@fd0 fd0 force-pushed the ifedorenko:mt-restorer branch from ccf8953 to 21a3486 Oct 14, 2018

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 14, 2018

I've had another look and decided to merge this PR, thank you very much for your patience! :)

As the last step, I've slightly reworded the commit message titles to match the rest of the project and rebased it on master, we'll have one final pass for Travis/Appveyor and then I'll merge it.

@fd0 fd0 merged commit 21a3486 into restic:master Oct 14, 2018

2 checks passed

Hound No violations found. Woof!
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 14, 2018

And it's in! Thanks @ifedorenko!

@ifedorenko ifedorenko deleted the ifedorenko:mt-restorer branch Oct 15, 2018

@pmkane

This comment has been minimized.

Copy link
Contributor

pmkane commented Nov 6, 2018

Hey all:

This PR was what let us move to restic for a backup of a large database system.

With several terabytes of data, being able to do a restore daily, so that we know that it's possible, was non-negotiable.

Prior to this PR, even though backups using restic were lightning fast, restoring took so long that the backups weren't useful.

With this PR merged, we can do restores in a reasonable amount of time. There's still room to make it even faster -- it doesn't max out S3 or our target disk subsystem during restores -- but this is a gigantic improvement. If we can help out with real world instrumentation, let us know.

Thank you!

@ifedorenko

This comment has been minimized.

Copy link
Contributor Author

ifedorenko commented Nov 6, 2018

See how fast rclone or any other tool of your choice can copy your repo from S3 to your target system. If that is significantly faster than restic, I suggest you open new issue where we can discuss what can be done about it.

@pmkane

This comment has been minimized.

Copy link
Contributor

pmkane commented Nov 6, 2018

I will do that -- it's not as fast as it could be because 8 workers isn't enough to keep the network busy -- but this wasn't meant as a back handed compliment, just a big thanks!

@ifedorenko

This comment has been minimized.

Copy link
Contributor Author

ifedorenko commented Nov 7, 2018

If you suspect the number of concurrent downloads is the issue, just bump it up in restic and see what restore throughput you get. https://github.com/restic/restic/blob/master/internal/restorer/filerestorer.go#L24.

@Miouge1

This comment has been minimized.

Copy link

Miouge1 commented Nov 20, 2018

Wow. I did a test restore from a 6GB OpenStack Swift repository:

  • v0.9.3: 8min
  • master with #1719: 2min30s restore time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.