Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restic takes 10-15 minutes to do anything (eg list snaphots) #1959

Closed
steve-tregidgo opened this issue Aug 17, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@steve-tregidgo
Copy link

commented Aug 17, 2018

restic version

  • Usually running...
    • restic 0.9.1 (v0.9.1-71-g8c124a2) compiled with go1.10.3 on linux/arm
  • Debug logs generated with...
    • restic 0.9.1 (v0.9.1-0-ge4c0d77) compiled with go1.10.3 on linux/arm

Background

I installed restic a couple of weeks ago on my NAS (ReadyNas 104 -- installed by compiling go 1.4 with gcc, then using go 1.4 to compile go 1.10.3, and finally using go 1.10.3 to compile restic).

I've been uploading data to Backblaze B2 successfully, with a few hundred GB uploaded over the last fortnight. (Aside: I haven't checked if that's fast or slow but it was running non-stop and continuously showed updates in the console).

In that final upload, I saw a lot of errors like this:

Remove(<lock/57541badb8>) returned error, retrying after 2.984548884s: Delete: b2_download_file_by_name: 404: bucket does not have file: restic/locks/57541badb8193f70391d5d681ef1d88b087f810b0b41a0bb7a1660159dbd3e2b

My actual problem

After the last upload, yesterday I ran restic snapshots and restic seemed to hang. After 10 minutes or so, it completed and printed 16 snapshots. I tried a few times (with snapshots, list snapshots and ls SNAPSHOTID) and each one took several minutes to complete; each time it tells me the repo was opened successfully after about 10 seconds, then has no output for minutes before printing its final output). Before yesterday I was able to run those commands relatively quickly (still tens of seconds though, which may or may not be expected).

I've compiled a debug binary and generated the following (redacted) log for restic snapshots: restic-snapshots-20180817-1223.tar.gz. (Aside: passwords and keys are already in the environment at this point while I'm debugging.)

Reproducing

I can reproduce this easily. I imagine it's not going to be easy for anybody else since I'm just running vanilla commands! I'm hoping there will be something in those logs which indicate what's changed in my installation or environment, or perhaps in my backup repo, that has made this slow down so much. I'm happy to run this again with different flags, or to drill down through the file system on B2 to look for things -- whatever will help.

Thanks in advance! I'm looking forward to getting this up and running fully.

Cheers,
Steve

@fd0

This comment has been minimized.

Copy link
Member

commented Aug 17, 2018

Hm, interesting! restic spends the 15 minutes checking lock files in the repo. I'm guessing you're located in Europe, so each operation which needs a response from the B2 API will have a very high latency. That's a common problem with B2.

So, for each of the 618(!!!) lock files in the repo, restic issues a download request, parses the lock file, then moves on to the next lock file. That's what takes so much time!

You can remove stale lock files by calling restic unlock, that should speed up restic's operation enormously.

Is it maybe possible that you're using credentials for accessing B2 which don't have the permission to remove files? It looks like restic's lock files stay in the repo, and since a lock file is replaced every five minutes or so, that left many files there...

@steve-tregidgo

This comment has been minimized.

Copy link
Author

commented Aug 18, 2018

Thanks for that diagnosis. So the failure to delete lock files was relevant after all! I didn't see that message the first few times I ran restic, nor did I see it for the first half of the days-long upload. I've run restic unlock and that has fixed my symptoms.

I authenticate with my master account details (with the intention of switching to app-specific keys shortly), so I would hope it's not permissions problems, assuming everything is working correctly at B2.

I was away for a week, and left restic doing its thing. While I was gone my Internet connection died for some number of days. Could that have accounted for the lock files still being there? If so, is there anything I can do to guard against this in future? Since I've got just the one box backing up, I'm thinking I could run restic unlock as part of the nightly script. I suppose the danger is that if I do that, but the nightly script (midnight) runs long and the hourly script kicks in (every hour from 0800-2000) then I could cause some corruption?

Thanks for your help!

@mholt

This comment has been minimized.

Copy link
Contributor

commented Aug 18, 2018

Since I've got just the one box backing up, I'm thinking I could run restic unlock as part of the nightly script. I suppose the danger is that if I do that, but the nightly script (midnight) runs long and the hourly script kicks in (every hour from 0800-2000) then I could cause some corruption?

My understanding is that restic unlock is safe to use any time, since only stale locks (locks which haven't been refreshed in about a half hour or so) are removed. (But restic unlock --remove-all is NOT safe.) I'm writing a web UI over restic and recently added restic unlock to be run before almost every restic invocation like to do a new backup or to do maintenance or a restore, since it seems like a good idea in case links get dropped suddenly.

I might be wrong, and I don't know the answers to your first questions. I'm also curious how restic unlock would fix the situation where B2 already fails to delete locks that don't exist in the first place... 🤔

@fd0 fd0 removed the feedback needed label Aug 18, 2018

@fd0

This comment has been minimized.

Copy link
Member

commented Aug 18, 2018

Cool, thanks for the feedback!

I was away for a week, and left restic doing its thing. While I was gone my Internet connection died for some number of days. Could that have accounted for the lock files still being there?

No, I don't think that was the issue of restic not being able to delete lock files. Sometimes B2 is a bit unstable, maybe that was one of the reasons. In your debug log, restic managed to delete the lock file it created just fine (right at the end of the log).

If so, is there anything I can do to guard against this in future?

Apart from using restic unlock from time to time I don't have any idea what to do here.

@mholt is indeed correct: By default, restic unlock will only remove stale locks. I'm thought about making restic ignore stale locks alltogether (it seems many people run into stale locks), but haven't done so for safety reasons.

Ideally, we come up with an algorithm which removes the need for lock files.

I'm going to close this issue for now, let us know if you see the error messages again (so then we can maybe debug it). Please feel free to add further comments. Thanks!

@fd0 fd0 closed this Aug 18, 2018

@DurvalMenezes

This comment has been minimized.

Copy link

commented May 20, 2019

@mholt, everyone; not wanting to be OOT nor 'necroposting' here, but just to serve as a warning for other folks who may come a-googling in the future (which is how I found this issue, and the advice below):

My understanding is that restic unlock is safe to use any time, since only stale locks (locks which haven't been refreshed in about a half hour or so) are removed.
This is not what happened when I tried iit with a running restic backup: it started outputting a lot of messages like

unable to refresh lock: <lock/0de3eb8974> does not exist
Remove(<lock/0de3eb8974>) returned error, retrying after 591.544762ms: <lock/0de3eb8974> does not exist                                                                                              
Remove(<lock/0de3eb8974>) returned error, retrying after 490.997928ms: <lock/0de3eb8974> does not exist

That backup took a few more hours than usual, and finally aborted without writing a snapshot, wasting almost a full day of running :-/
So, I can bear painful witness that plain restic unlock is not safe, at least not when you have a long-running restic backup :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.