New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crazy slow incremental backup over WAN #1420
Comments
Thanks for the report, that is really odd. Could you paste the complete output from the Do you observe high CPU and/or disc IO during backup? Which version of restic did you use to create the initial snapshot? |
I saw the The cores bounce around between 7-35% and iotop shows the disks are about idle before restic starts and then the read speeds average 115M/s, and earlier I observed spikes up to 145M/s and dips as low as 75M/s. |
Ok, I think you've run into a version of the bug described in #1290, I'll try to reproduce it. |
Hm, I'm not able to reproduce it. What should happen is that restic finds a parent (=previous) snapshot of the same set of directories. It should the load the metadata from that snapshot, and detect that the files haven't been modified since, and only read and save new data. The metadata should also be automatically cached somewhere in All that should be pretty robust, besides bugs such as #1290. It looks like restic is reading all files locally again (which is why it takes so long), but after reading the data it detects that all the data is already saved in the repo, so it continues with the next file. That's not efficient, obviously. Can you please rebuild restic and create a debug log (instructions) of restic running for a couple of minutes and attach it to the issue? Please make sure to replace any sensitive details included in there (authentication tokens, passwords etc) before posting it. |
Can you paste the output of |
Sorry, was away from my machine all weekend. Before I left I started a backup since I'd been canceling all the ones since that first 10 hour incremental finished and since I dumped some new stuff on the drives. I ran it with the same command posted in the last screenshot where I specified the This morning I had about 10G of new stuff to dump on the drives, so I thought I'll just try once more, but without specifying the For kicks I started a second backup since nothing new has been added. It also displayed the "using parent ..." and it completed in 9:57 (m:ss) with an average speed of 5983MiB/s! Looks like your hypothesis was right in that it needed to reconcile all the files again for some strange reason. Once it did that it's picking up the parent correctly and the backups are wicked fast. I guess let's close this issue and if this crops up again I will get the debug log and pass that along. Thanks! |
Nice, so it looks like there's additional room for improvement in the parent detection code. I'm going to close this issue for now, please report back when it recurs! |
restic version:
I launched restic with the following:
Contents of
.restic.conf
:Output at start of app:
Issue:
Restic is a sweet app. I've been using it for local backups for about 2 months. Since CrashPlan announced they are getting rid of their home plan, I wanted to switch to restic since it's been working so well for local backups. I have approx 4 TB of data, the majority of which are videos and pictures, so very static data. I'm setting up a backup to a remote server and my home connection is 100Mbps/10Mbps, so I took my machine to the servers location and did the initial backup, so my remote incremental backups would be substatially faster since I'm just sending the changes. BTW, the remote server's WAN connection is 400Mbps down/up. The initial backup took ~12 hours to complete over a 1Gbps LAN connection. The following day I ran one incremental manually just to be sure everything was backed up. The incremental took 32 minutes. It was slower that I was expecting for a net data change of zero, but I didn't really think twice about it since my backups will run while I am sleeping.
So I took my machine home and setup up the cron job. It's first automated run and first run over my 10Mbps connection was last night. Since returning home, my data drives have only had 600MB of data added, so I was interested to compare that 32 minute incremental with what extra time it would take for the 600MB of data, which should have only added a few minutes. Well, it took it a little over 10 hours to run :-( I loaded up htop and nload in a tmux session so I could see if I would saturate that 10Mbps limit (CrashPlan never did, but restic did :-). Once I saw restic kick off in htop, I headed to bed. I checked my screen this morning and when I saw it still running, I though it was uploading everything again, so I looked at nload and it was hardly showing any I/O. I checked about every half hour until restic was gone from the processes, and I had a new snapshot in the repo. I checked nload and the inbound/outbound totals were 701MB/2.17GB, respectivly. There was hardly anything sent out, and certainly not enough to justify a 10 hour run time. Other than the 600MB of data and the slower WAN connection, nothing has change on my machine. Since it's sending such little data to the remote, it seems silly that the WAN speed would really be the cause of going from 32 minutes to 10 hours. What's even more bizarre is I started a manual backup about 2 hours ago just to confirm what I saw last night and it's on target to hit 10 hours again, this time with zero changes to the data and so far nload totals have only gone up a few MB. I also checked SMART on my disks since they were transported around and everything is fine, and the system temps were slightly above idle, so I really can't explain it.
I saw that there was issue 386 that lead to the inclusion of metadata caching, which was a result of a similar slow WAN issue. Am I mistaken in comparing the incremental backup in restic to CrashPlan in that it should have completed much faster than it did?
Thanks for any suggestions!
The text was updated successfully, but these errors were encountered: