New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent duration #1290
Comments
Ah, interesting, that's a corner case restic doesn't consider yet. I need to have an in-depth look. |
The issue I described above was observed at my employers computer. Today I made some more tests with my personal notebook: Output of
|
I invested a little further. I made a debug build and tried to identify differences between the slow case and the fast case. Here it goes. The first case is the fast one:
As can be seen restic correctly determines that the file is unchanged.
I have no idea why restic performs differently in these cases. In the mounted snapshots no difference is visible between each of the files backed up. |
How do you start restic exactly? Can you paste the command you're running? Something odd is going on here. Can you try minimizing the observed behavior, so I can try to reproduce it? |
This is my script with which I created the log files (I replaced my username by
Now I'm going to trim my setup ...
By not stating any excludes the backup is performed with the expected speed! And I can see in the log that restic correctly determines the files as being EDIT: I'm sorry but this is not the whole story. Even not using UPDATE: I made some more tests. But I don't get consistent results. Now all of my backups are performed with the expected speed! I can't explain it. I did not change anything in my setup. I did not change any file in the directories which were slow to backup before. So there must have changed something else - but what? |
Hmmm, this is odd. Thanks for trying to dig into it! What file system do the source files (that you're saving with restic) reside on? Is it possible that maybe One thing I haven't told you yet: When restic is unable to find a so called "parent snapshot" (a snapshot of exactly the same set of directories on the same host), it will re-read all files, but not store any duplicate blobs. |
No But right now I have a new idea of the cause: I performed a After thinking about it: Might it be a problem to interrupt restic using ctrl-c? I used ctrl-c to break restic in my early tests some times. Perhaps restic gets into a problematic state which results in the behaviour I observed. Would it be possible for restic to detect such a degraded repository? At least restic operates differently when using a corrupted or corrected repository. To cut a long story short I think we can close this issue. I am sorry I wasted your time (and mine). If the problems return I will happily reopen the bug ;) And perhaps you can enhance the detection of a currupted repository. |
Ok, then we'll close this issue for now. Please feel free to add forther commands, and thanks for the feedback! |
I'm sorry I have to reopen the issue. [Since I'm not allowed to reopen the ticket, please fd0 do it for me. Thanks.] Today I started from scratch with
The backup of one directory yields:
Everything looks ok, the second backup if performed as fast as expected. The backup of the second single directory is also successful:
But combining both directories in one backup is very slow and I can see that the source disk is read very heavily – not from the beginning but after 5 minutes or so:
|
Hm, this is a new special case: |
No, sorry, I forgot to mention. Home is on a different partition. |
I just started to learn the programming language go and the source code of restic. I studied the source code and looked again at the log listed in one of the comments above. I wonder why restic reacts differently in the fast case and in the slow case. It's striking that in the slow case no output of the kind And it just caught my eye that in my logs rows of the kind What else can I do to analyse the situation? |
Thanks for having an in-depth look, I appreciate that! :) I don't have any idea what you can do right now, and I'm sorry that the archiver code (which is the oldest part of restic) is so convoluted. I'm currently rewriting that part of restic. I'll try to find time this weekend to have a look myself. |
I think I just reproduced the issue. |
Bleh, you've found a very interesting corner case, thanks for being persistent about it! The basic idea of the current archiver code is as follows:
What you observed comes from the (rather strange) restic behavior to only retain the last basedir of items (tracked in #549). The special case here is that 1) the root fs is saved ( Let's say restic is called as follows: Due to #549, a snapshot will then contain the following sorted top-level entries: The archiver will, before attempting to traverse the directories to be saved, sort the list of directories by their basename, so that the order is correct. However, in this case the The matching logic above then does not work correctly and most files are read again. Meh. So much for the description of the cause. |
Isn't it the other way around? As far as I understand, |
Yeah, that's possible. But you get the idea ;) |
Congratulations for finding the problem! I really don't want to urge you, but what are your plans for a corrected version of restic? |
Yes, that works! I've just verified it. The culprit here is really the expansion of Can you try that and report back? I'm currently rewriting the archiver code, this issue will be fixed there (in addition to #1290). |
Yes it works! It's astonishing how fast restic can be ;) I'm eager to see the problem resolved. It is awkward that all the directories to be backed up are located in the root directory of the snapshot. So instead of seeing |
I fully agree, I'm currently reworking the archiver code (which turns out to be surprisingly hard). |
I, too, am experiencing these same performance issues with my backups of multiple paths under 0.7.3. I wonder if there's an issue with the logic of using a previous snapshot as a parent. It seems that a tuple <host, fullpath> could be used to uniquely identify a file's metadata (size, mtime, snapshots, etc.) and that in turn could be used to determine whether a file should be re-read. Does using a parent snapshot buy you something special over keeping track of individual files and the snapshots they appear in? |
In order to find something within the repo, we need to have an anchor point. In the current implementation, that's the parent snapshot, and I agree the implementation is not the best. I'm already working on a new one. If you want to understand how the data structures look like and what's stored in the repo, I recommend having a look at the repo specification, here: https://restic.readthedocs.io/en/latest/100_references.html#design |
I should also say that, as I think was mentioned above, if I split up my backup instead of having multiple paths stored as a single snapshot AND if I use appropriate parent snapshots, backups happen rather quickly. So for now, I plan on doing that. |
This issue is resolved with the new archiver code contained in 0.9.0 (PR #1494), so I'm closing this issue for now. Please feel free to add further comments. Thanks! |
Output of
restic version
restic 0.7.1
compiled with go1.8.3 on linux/amd64
How did you run restic exactly?
restic backup -x --exclude-file /tmp/restic-exclude /
restic backup -x --exclude-file /tmp/restic-exclude /home
restic backup -x --exclude-file /tmp/restic-exclude /home /home/dottoreg/Private
# now combine the backup of the above sources in one backup:
restic backup -x --exclude-file /tmp/restic-exclude / /home /home/dottoreg/Private
What backend/server/service did you use?
File based to USB-attached hd. The path is stored in the environment variable (as is the password).
Expected behavior
I would expect a consistent amount and duration of the backup processes.
Actual behavior
The combined run takes much more time to scan, much more time for the backup. The Throughput is much lower:
My problem: Sum of durations: 3:08 + 11:24 + 21:00 is much lower than 1:17:59. (It's a factor of two!)
Sum of scan times: 0:22 + 0:14 + 12:18 is much lower than 20:20.
Steps to reproduce the behavior
See above
Do you have any idea what may have caused this?
No.
Do you have an idea how to solve the issue?
Perhaps it is not the same when backing up the isolated items compared to one backup with all three items. Should I stay with three isolated backups?
The text was updated successfully, but these errors were encountered: