Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify uploaded files #130

Merged
merged 3 commits into from Aug 11, 2021
Merged

Verify uploaded files #130

merged 3 commits into from Aug 11, 2021

Conversation

MichaelEischer
Copy link
Member

@MichaelEischer MichaelEischer commented Dec 19, 2020

What is the purpose of this change? What does it change?

The restic uses the sha256 hash to calculate filenames based on the file content. Check on the rest-server side that the uploaded file is intact and reject it otherwise.

This PR complements restic/restic#3178 . However, this PR does not depend on the mentioned restic PR and should work with every restic version.

Was the change discussed in an issue or in the forum before?

Fixes #122

Checklist

  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • [ ] I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@wojas
Copy link
Contributor

wojas commented Jan 4, 2021

Great work!

This does miss one important scenario: since the file is still written in place, you can still end up with corrupt files:

  • The server could crash while writing the file
  • A client controlled by an attacker can keep the connection open and never actually finish writing the file.

In order to actually guarantee integrity, data should always be written to a temporary file, which is atomically moved in place once the entire file has been written and its hash verified.

Additionally, I suggest adding a flag to disable this feature or making it optional. While sha256 is really fast on modern systems, there are plenty of cheap and old NAS servers where this feature would kill performance, like an old Synology DS213j NAS where rest-server is CPU limited to 20MB/s even without this feature.

@MichaelEischer
Copy link
Member Author

MichaelEischer commented Jan 4, 2021

This does miss one important scenario: since the file is still written in place, you can still end up with corrupt files:

  • The server could crash while writing the file
  • A client controlled by an attacker can keep the connection open and never actually finish writing the file.

I'd like to handle atomic uploads in a separate PR. The idea for this PR was to ensure that successfully uploaded files are guaranteed to be intact.

Additionally, I suggest adding a flag to disable this feature or making it optional. While sha256 is really fast on modern systems, there are plenty of cheap and old NAS servers where this feature would kill performance, like an old Synology DS213j NAS where rest-server is CPU limited to 20MB/s even without this feature.

My raspberrypi 4 agrees:

go test -bench . crypto/sha256
goos: linux
goarch: arm
pkg: crypto/sha256
BenchmarkHash8Bytes-4             459962              2347 ns/op           3.41 MB/s
BenchmarkHash1K-4                  48517             25143 ns/op          40.73 MB/s
BenchmarkHash8K-4                   6669            177801 ns/op          46.07 MB/s
PASS
ok      crypto/sha256   3.802s

It would also be possible to let restic provide a md5 hash (see restic/restic#3178) which is about 4 times faster to verify on my raspberrypi.

@wojas wojas mentioned this pull request Jan 4, 2021
@wojas
Copy link
Contributor

wojas commented Jan 4, 2021

I'd like to handle atomic uploads in a separate PR. The idea for this PR was to ensure that successfully uploaded files are guaranteed to be intact.

Great!

It would also be possible to let restic provide a md5 hash (see restic/restic#3178) which is about 4 times faster to verify on my raspberrypi.

That would add an extra overhead on the client side without actually providing the integrity benefits that checking the object sha256 would give.

It would detect errors introduced during the network transfer, but these are I believe fairly unlikely and protected against by checksums on the datas link layer (e.g. CRC32 in Ethernet). TLS may provide extra protection here, but perhaps you could still end up with garbage as long as the right number of bytes were written. But I am not sure about this part.

@wojas
Copy link
Contributor

wojas commented Jan 4, 2021

BTW, if we want to add some hash to protect against data link layer errors, it may make more sense to pick a fast modern non-crypto hash like FNV instead of something like MD5, which is neither fast nor secure.

@MichaelEischer
Copy link
Member Author

BTW, if we want to add some hash to protect against data link layer errors, it may make more sense to pick a fast modern non-crypto hash like FNV instead of something like MD5, which is neither fast nor secure.

The idea behind restic/restic#3178 is to verify the integrity from the time the pack file is assembled (at which point also the sha256/(md5) hashes are calculated) all the way until it is received at the server-side. That would detect all sorts of bit flips while uploading a pack file. A non-crypto hash would thus actually be enough.

On the other hand using the sha256 hash would make it trivial to avoid all sorts of (possible) problems with mismatches between hashes etc. Could you test how slow the sha256 hash is on the NAS you've mentioned?

It would detect errors introduced during the network transfer, but these are I believe fairly unlikely and protected against by checksums on the datas link layer (e.g. CRC32 in Ethernet). TLS may provide extra protection here, but perhaps you could still end up with garbage as long as the right number of bytes were written. But I am not sure about this part.

CRC32 will only filter out most, but by far not all damaged ethernet packets. As the checksum calculation is usually offloaded to the network card/adapter it won't detect bugs, like garbage data in otherwise valid network packets after waking up from standby (I have one network adapter which exhibits that exact problem.). TLS will detect such transmission errors as the MAC for the packets will be invalid. The MAC there is also long enough to reliably filter out corrupted network packets even when transferring TBs of data.

On a large enough scale, we'll also see bitflips in the CPU / memory (just take a look at some of the prune failures in the restic bugtracker). Therefore the idea behind this PR is to lay the foundation for end-to-end checksums. The most important part is to ensure that the upload works correctly as we'd otherwise store broken data. Therefore the goal is ensure the integrity of the pack files from as early as possible (effectively the moment it was assembled) up to the moment when it is stored. That also allows distinguishing whether data corruption happened on the client or the rest-server side.

@wojas
Copy link
Contributor

wojas commented Jan 6, 2021

On a large enough scale, we'll also see bitflips in the CPU / memory (just take a look at some of the prune failures in the restic bugtracker). Therefore the idea behind this PR is to lay the foundation for end-to-end checksums.

OK, this makes sense. Handling bitflips in CPU/memory is a hard one though, there is always some part of the process that you cannot check.

Could you test how slow the sha256 hash is on the NAS you've mentioned?

I retired this NAS a few weeks ago and can no longer test, but a few datapoints:

  • The maximum rate I got out of rest-server was about 20 MB/s, where it was CPU bound on moving bits around without doing anything interesting with them.
  • When I tried running restic on this NAS, it took more than 15 seconds just to derive the restic key and open the repo.
  • I had to reconfigure sshd to only use old insecure algorithms, or my ssh session (scp/sftp/rsync) would be CPU limited to 5-10 MB/s.

So likely very very slow.

@wojas
Copy link
Contributor

wojas commented Jan 6, 2021

In the end I think it's also fine for rest-server to just not offer verification on this kind of hardware. Bitflips in the CPU, non-ECC memory and a filesystem without data checksums and scrubbing are probably more likely to introduce errors.

And since these kind of devices tend to run locally, it is doable to run restic verify once in a while.

@MichaelEischer MichaelEischer force-pushed the verify-upload branch 2 times, most recently from d121f92 to 9a14875 Compare January 6, 2021 15:20
@MichaelEischer
Copy link
Member Author

I had to reconfigure sshd to only use old insecure algorithms, or my ssh session (scp/sftp/rsync) would be CPU limited to 5-10 MB/s.

That looks like ~20MB/s SHA256 could be possible, such that the end result would also be 5-10MB/s.

In the end I think it's also fine for rest-server to just not offer verification on this kind of hardware. Bitflips in the CPU, non-ECC memory and a filesystem without data checksums and scrubbing are probably more likely to introduce errors.

It's definitely a much better user experience when at least some of these bitflips are already detected during a backup such that restic can retry saving data in case of a temporary bitflip (during transmission) or fail the backup run in case of a permanent one (the pack file content was already corrupted at the client). The nice part about having that verification is that it's much easier to pinpoint the component which is at fault: if the packfile content matches its hash but is invalid anyways, then the client is to blame. If the packfile content doesn't match its hash, then the error happened at the rest-server, as it only stores valid packfiles. (Ignoring that the save operation is currently not atomic).

But for low-power hardware it's probably better to offer a way to skip the verification. I've added an option "--skip-upload-verification".

@buschjost
Copy link
Contributor

I really like this feature.
Regarding the performance considerations: I would prefer to use the optimised SHA256 library built by Minio as it was recently introduced in Restic as well: restic/restic#2709 . What do you think about this?

I could also create a separate PR for this change of library once this is merged.

@MichaelEischer
Copy link
Member Author

@buschjost I think a separate PR would be the way to go. It will probably help already fast servers the most. The biggest speed improvement I've seen is due to support for the x86 SHA extension (2.3 GB/s vs. 0.6GB/s with the standard library). For hosts without special hardware support, the performance advantage seems to be shrinking with recent go versions.

@MichaelEischer
Copy link
Member Author

After talking to @rawtaz I've renamed the option to --no-verify-upload to stay consistent with the option names in restic.

cmd/rest-server/main.go Outdated Show resolved Hide resolved
@JsBergbau
Copy link
Contributor

This pull request looks like it has been finished so far. Whats preventing it from beeing merged?

@MichaelEischer
Copy link
Member Author

The PR still needs a review and the plan was to merge it after #112 lands.

Restic uses the sha256 hash to calculate filenames based on the file
content. Check on the rest-server side that the uploaded file is intact
and reject it otherwise.
@fd0
Copy link
Member

fd0 commented Aug 9, 2021

I've ported the code on the master branch (with #112 included), please have a look!

From my side, we still need to decide whether we will just use the file name (which does not work for config files) or if we maybe add a header with the sha256 sum (which won't work for older versions of restic). Or both?

@wojas
Copy link
Contributor

wojas commented Aug 9, 2021

I've ported the code on the master branch (with #112 included), please have a look!

Look good to me!

Note that my concerns about this still being written in-place are still true, but this can be solved in a different PR (added issue #152).

If --append-only is used, it could even result in a broken repo where we cannot reupload the partial file.

From my side, we still need to decide whether we will just use the file name (which does not work for config files) or if we maybe add a header with the sha256 sum (which won't work for older versions of restic). Or both?

For the data files, it should really use the filename, or you could end up with a repo where the contents do not match the filename.

Passing an sha256sum for the config file would be nice to have, but since this is only written once I think it's fine not to check it. There is a small chance of a repo becoming unusable due to a partial config write in combination with --append-only.

@wojas wojas mentioned this pull request Aug 9, 2021
@fd0
Copy link
Member

fd0 commented Aug 9, 2021

If --append-only is used, it could even result in a broken repo where we cannot reupload the partial file.

Huh? How can that happen?

If the hash is checked and there's an error, the file is removed directly afterwards (independent of whether or not append-only mode is enabled), here

	[...]
		// calculate hash for current request
		hasher := sha256.New()
		written, err = io.Copy(outFile, io.TeeReader(r.Body, hasher))

		// reject if file content doesn't match file name
		if err == nil && hex.EncodeToString(hasher.Sum(nil)) != objectID {
			err = fmt.Errorf("file content does not match hash")
		}
	}

	if err != nil {
		_ = tf.Close()
		_ = os.Remove(path)
		[...]
	}

@wojas
Copy link
Contributor

wojas commented Aug 9, 2021

Huh? How can that happen?

If the hash is checked and there's an error, the file is removed directly afterwards (independent of whether or not append-only mode is enabled), here

It can happen if the server crashes in the middle of a write.

@fd0
Copy link
Member

fd0 commented Aug 9, 2021

Ok, good point. We'll address it in the other PR.

@MichaelEischer
Copy link
Member Author

Passing an sha256sum for the config file would be nice to have, but since this is only written once I think it's fine not to check it. There is a small chance of a repo becoming unusable due to a partial config write in combination with --append-only.

If a config file is only written partially, then a user won't be able to upload any data to the repository. It might not be the optimal user experience, but it shouldn't be much of a problem either.

An additional header would either be used exclusively for the config file or add a few dozen bytes of overhead to each request. In total, I think we should just stick with the filename.

@fd0
Copy link
Member

fd0 commented Aug 11, 2021

Fine with me, thanks for looking at the code again @MichaelEischer!

@fd0 fd0 merged commit 5e71f61 into restic:master Aug 11, 2021
@MichaelEischer MichaelEischer deleted the verify-upload branch August 11, 2021 19:01
@buschjost buschjost mentioned this pull request Aug 27, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verify files during upload
6 participants