-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance tuning for restic check --read-data (speed improvement for spinning discs) #4537
Comments
You could try to increase or decrease restic/internal/repository/repository.go Line 31 in 27ec320
Is the script reading the files sequentially? How much CPU is restic using? How fast (MB/s) is the HDD able to read files? How large are the pack files in your repository?
That's not entirely surprising. With one worker, it has to alternate between reading data from disk and processing it. Whereas with two workers, is much more likely that one of them can process data while the other one wait for data from disk. You could also set It might also be worthwhile to pass |
For checking local repos I did always pass The system I am currently doing the checks is a RAID5 array consisting of 8 2,5" S-ATA discs. On the other system with an external 3,5" USB3.0 drive, I noticed a lot of seeking noise and also about half of the speed with
I noticed a strange thing. On this system the total speed increases when running the different checks on 3 repositories: Using Using 4 threads for parallel reading with python gives quite exactly 300 MB/s. This seems the optimum for this system. . 2 threads reaches ~270 MB, 3 threads ~260 MB/s, 5 threads ~285 MB/s, 6 threads ~290 MB/s.
Now different results with restic check:
Using For the external spinning 3,5" disc, I'll try next week reducing with If there is no new buffer size recommendation, I'll try the default with 4 MB and the version with 1 MB. |
As announced, I did some testing with an external 3,5" HDD drive. TL;DR changing cache size doesn't improve speed. Larger sizes even slow speed down. But increasing the number of local connections increases speed, even if there is much more seek noise. Default (I think 2 connections) Still simple python script is fastest with only 31,42s, so more than twice as fast as default restic check and even restic check with 4 connections is 50 % slower than python script. Methods: restic repository with 32MB target pack size, compression MAX, 6765 MB size, CPU Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
128KB, 1MB and 32M represent the restic/internal/repository/repository.go Line 31 in 27ec320
So in conclusion we should increase the local connections for restic check to 4, since this is even on a spinning disc much faster than default 2 connections. Python slows down with 2 threads, as there is then seek noise audible. |
Very strange, with the same settings on the same machine, on the same SSD, I am currently checking a ~8,8 TB repo. After 11,25 hours passed, ~25 % are checked. This results in only about 55 MB/s, wehereas with the testrepo with these settings 134 MB/s were reached, so more than twice the speed. Only difference is that this repo uses 128 MB target pack size instead of 32 MB. |
@MichaelEischer, I'm not sure it's worth creating a separate issue, so I'll ask the question here:
There is no network activity (<10mbit/s), according to htop only one thread is running and uses ~100% of one core. |
@JsBergbau Thanks for the extensive tests. I agree that check should be faster when verifying large amounts of pack files. However, using different connection counts based on the command, doesn't sound particularly thrilling to me. I'm wondering whether we could parallelize @LuckyFrost That question is best suited for the forum and concerns a different part of the |
Output of
restic version
0.16.1
What should restic do differently? Which functionality do you think we should add?
When doing
restic check --read-data
restic seems to read multiple files at once. This is quite useful for remote storage and SSDs, but not for spinning discs, since positioning the head needs time / produces a lot of overhead.When you do a
restic check --read-data
on a spinning disc, you hear a lot of noise from seeking.What are you trying to do? What problem would this solve?
I try to do a restic check faster, especially with this bug #4523 this is necessary.
Using a simple python script, that just calculates the SHA256 checksum of files an compares it with the filename is about twice as fast as
restic check --read-data
The spike is when using the python script to check the pack files, but which is not sufficient in case of this bug, see #4523 (comment)
When using the python script there is almost no seeking noise.
I've already tried to change
restic/internal/checker/checker.go
Line 672 in 27ec320
to
workerCount := 1
but it even slowed the speed down.
Now I don't know where else to have a look to speed up checking.
Did restic help you today? Did it make you happy in any way?
Restic is my favorite backup program by far. It makes me relaxed, because I know, my data is save.
The text was updated successfully, but these errors were encountered: