Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance tuning for restic check --read-data (speed improvement for spinning discs) #4537

Open
JsBergbau opened this issue Oct 25, 2023 · 6 comments

Comments

@JsBergbau
Copy link
Contributor

Output of restic version

0.16.1

What should restic do differently? Which functionality do you think we should add?

When doing restic check --read-data restic seems to read multiple files at once. This is quite useful for remote storage and SSDs, but not for spinning discs, since positioning the head needs time / produces a lot of overhead.

When you do a restic check --read-data on a spinning disc, you hear a lot of noise from seeking.

What are you trying to do? What problem would this solve?

I try to do a restic check faster, especially with this bug #4523 this is necessary.
Using a simple python script, that just calculates the SHA256 checksum of files an compares it with the filename is about twice as fast as restic check --read-data

grafik
The spike is when using the python script to check the pack files, but which is not sufficient in case of this bug, see #4523 (comment)

When using the python script there is almost no seeking noise.

I've already tried to change

workerCount := int(c.repo.Connections())

to
workerCount := 1
but it even slowed the speed down.
Now I don't know where else to have a look to speed up checking.

Did restic help you today? Did it make you happy in any way?

Restic is my favorite backup program by far. It makes me relaxed, because I know, my data is save.

@MichaelEischer
Copy link
Member

MichaelEischer commented Oct 25, 2023

You could try to increase or decrease MaxStreamBufferSize

const MaxStreamBufferSize = 4 * 1024 * 1024
(I can think of a reason why either direction could improve the throughput).

Using a simple python script, that just calculates the SHA256 checksum of files an compares it with the filename is about twice as fast as restic check --read-data

Is the script reading the files sequentially?

How much CPU is restic using? How fast (MB/s) is the HDD able to read files? How large are the pack files in your repository?

workerCount := 1
but it even slowed the speed down.

That's not entirely surprising. With one worker, it has to alternate between reading data from disk and processing it. Whereas with two workers, is much more likely that one of them can process data while the other one wait for data from disk. You could also set -o local.connections=1 instead of patching the source code to reduce the worker count to 1.

It might also be worthwhile to pass --no-cache to restic as there's no need for a cache if the repository is stored on a local disk.

@MichaelEischer MichaelEischer added the state: need feedback waiting for feedback, e.g. from the submitter label Oct 25, 2023
@JsBergbau
Copy link
Contributor Author

JsBergbau commented Oct 25, 2023

Is the script reading the files sequentially?
Yes.
Thats the function. One file after another is passed.

def calculateSHA256(file_path):
    try:
        sha256_hash = hashlib.sha256()
        with open(file_path, 'rb') as f:
            fd = f.fileno()  
            while True:
                data = f.read(65536)  # Read in 64KB chunks
                if not data:
                    break
                sha256_hash.update(data)
        checksum = sha256_hash.hexdigest()
        filename = os.path.basename(file_path)
        if checksum != filename:
            sys.stderr.write(f"Error: Checksum mismatch for file '{file_path}'\n")
        return file_path
    except Exception as e:
        sys.stderr.write(f"Error processing file '{file_path}': {str(e)}\n")
        return None

It might also be worthwhile to pass --no-cache to restic as there's no need for a cache if the repository is stored on a local disk.

For checking local repos I did always pass --no-cache. Just didn't mention it for simplification reasons.

The system I am currently doing the checks is a RAID5 array consisting of 8 2,5" S-ATA discs. On the other system with an external 3,5" USB3.0 drive, I noticed a lot of seeking noise and also about half of the speed with restic check --read-data --no-cache compared to python.

How fast (MB/s) is the HDD able to read files?
Bypassing the file system (ext4) it reaches ~1GB/s.

dd if=/dev/sda of=/dev/null bs=1M status=progress
28177334272 bytes (28 GB, 26 GiB) copied, 28 s, 1,0 GB/s^C
27672+0 records in
27671+0 records out
29015146496 bytes (29 GB, 27 GiB) copied, 28,7665 s, 1,0 GB/s

I noticed a strange thing. On this system the total speed increases when running the different checks on 3 repositories:

grafik

Using local.connections=6 I get about the same speed, as expected, when running 3 checks in parallel. With 3 (yes 3) local.connections restic used 220 % CPU (12 Core / 24 threads Intel Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz

Using 4 threads for parallel reading with python gives quite exactly 300 MB/s. This seems the optimum for this system. . 2 threads reaches ~270 MB, 3 threads ~260 MB/s, 5 threads ~285 MB/s, 6 threads ~290 MB/s.
On a single 3,5" HDD, more than 1 thread slowed speed down to about half of the speed.

How large are the pack files in your repository?
Target size is 128 MB, so ~100 - 140 MB having a quick look.

Now different results with restic check:
const MaxStreamBufferSize = 65536 with -o local.connections=1 gives ~73MB/s, whereas with default buffersize, it was only about 65 MB/s.
const MaxStreamBufferSize = 65536 and default local.connections (I think this is 2) gives about 120 MB/s, so there smaller buffer makes no real difference compared to original buffer size.
const MaxStreamBufferSize = 65536 * 2 and default local.connections (I think this is 2) gives also about 120 MB/s.
const MaxStreamBufferSize = 4 * 8 * 1024 * 1024 and default local.connections (I think this is 2) gives also about 120 MB/s, so the 8 times bigger buffer makes no real difference compared to original buffer size.

const MaxStreamBufferSize = 1024 * 1024 and default local.connections (I think this is 2) gives also about 138 MB/s, so 1 MB buffer seems to speed up things at least a bit.

Using const MaxStreamBufferSize = 1024 * 1024 and -o local.connections=6 is slightly faster than 3 parallel restic checks with default buffer size and default local connections, ~2 %, so could also be within noise.

For the external spinning 3,5" disc, I'll try next week reducing with -o local.connections=1 and compare it to the python script.

If there is no new buffer size recommendation, I'll try the default with 4 MB and the version with 1 MB.

@JsBergbau
Copy link
Contributor Author

JsBergbau commented Oct 26, 2023

As announced, I did some testing with an external 3,5" HDD drive.

TL;DR changing cache size doesn't improve speed. Larger sizes even slow speed down. But increasing the number of local connections increases speed, even if there is much more seek noise. Default (I think 2 connections) restic check --read-data --no-cache needs 1m5s. Using 4 connections it needs 0m50s. So default with 2 connections is about 30 % slower than 4 connections.

Still simple python script is fastest with only 31,42s, so more than twice as fast as default restic check and even restic check with 4 connections is 50 % slower than python script.

Methods: restic repository with 32MB target pack size, compression MAX, 6765 MB size, CPU Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
Between each run there was echo 3 > /proc/sys/vm/drop_caches, full protocoll checkSpeedProtokoll.txt

restic options time MB/s seconds
0.16.1 none 1m5,612s 103,106139 65,612
0.16.1 local.connections=1 1m49,948s 61,5290865 109,948
0.16.1 local.connections=3 0m52,43s 129,029182 52,43
0.16.1 local.connections=4 0m50,120s 134,976057 50,12
0.16.1 local.connections=5 0m53,022s 127,588548 53,022
0.16.1 local.connections=6 0m53,772s 125,808971 53,772
0.16.1 128KB   1m8,392s 98,9150778 68,392
0.16.1 128KB local.connections=1 1m51,185s 60,8445384 111,185
0.16.1 1MB   1m8,048s 99,4151187 68,048
0.16.1 1MB local.connections=1 1m51,034s 60,9272835 111,034
0.16.1 32MB   1m32s,703s 72,9749846 92,703
0.16.1 32MB local.connections=1 2m3,154s 54,9312243 123,154
         
PythonSkript 1 Thread 0m31,424s 215,281314 31,424
PythonSkript 2 Thread 0m34,325s 197,086672 34,325

128KB, 1MB and 32M represent the MaxStreamBufferSize

const MaxStreamBufferSize = 4 * 1024 * 1024

So in conclusion we should increase the local connections for restic check to 4, since this is even on a spinning disc much faster than default 2 connections.

Python slows down with 2 threads, as there is then seek noise audible.

@JsBergbau JsBergbau changed the title Add read-concurrency for restic check command (speed improvement for spinning discs) Performance tuning for restic check --read-data (speed improvement for spinning discs) Oct 26, 2023
@JsBergbau
Copy link
Contributor Author

Very strange, with the same settings on the same machine, on the same SSD, I am currently checking a ~8,8 TB repo. After 11,25 hours passed, ~25 % are checked. This results in only about 55 MB/s, wehereas with the testrepo with these settings 134 MB/s were reached, so more than twice the speed. Only difference is that this repo uses 128 MB target pack size instead of 32 MB.

@LuckyFrost
Copy link

LuckyFrost commented Nov 4, 2023

@MichaelEischer, I'm not sure it's worth creating a separate issue, so I'll ask the question here:
Are there any ways to speed up restic check?
s3, high latency, -o s3.connections=48

check snapshots, trees and blobs
[52:20] 30.65%  38 / 124 snapshots

There is no network activity (<10mbit/s), according to htop only one thread is running and uses ~100% of one core.
And this is not a large repository yet, only 1.5TB
0.16.2

@MichaelEischer
Copy link
Member

So in conclusion we should increase the local connections for restic check to 4, since this is even on a spinning disc much faster than default 2 connections.

@JsBergbau Thanks for the extensive tests. I agree that check should be faster when verifying large amounts of pack files. However, using different connection counts based on the command, doesn't sound particularly thrilling to me.

I'm wondering whether we could parallelize check to not be limited to processing each pack file using a single CPU core. Currently, for each pack file, restic has to read it, check the overall sha256 hash, decrypt, decompress and hash each contained blob. Everything blob related could in theory be parallelized, the main question would just be how it can be implemented without creating a total mess and without further complicating the pack file checking.

@LuckyFrost That question is best suited for the forum and concerns a different part of the check command than discussed here. Please open a new issue/forum topic and include the output of restic stats --mode debug there. (requires at least restic 0.16.0)

@MichaelEischer MichaelEischer added type: feature enhancement improving existing features category: optimization category: check and removed state: need feedback waiting for feedback, e.g. from the submitter labels Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants