Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BackupEngine supports custom file checksums #7085

Closed
wants to merge 7 commits into from

Conversation

gg814
Copy link
Contributor

@gg814 gg814 commented Jul 6, 2020

Summary:
A new option std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory is added to BackupableDBOptions. This allows custom checksum functions to be used for creating, verifying, or restoring backups.

Tests are added.

Test plan:
Passed make check

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. Re-import the pull request

@gg814 gg814 requested a review from zhichao-cao July 6, 2020 18:57
@pdillinger
Copy link
Contributor

I have thought some more about this approach. Schema changes are scary, so what about this approach:

Setting backup_checksum_gen_factory (maybe rename to file_checksum_gen_factory) in BackupableDBOptions means that we essentially treat the backed-up DB manifest as an extension of the backup manifest, and use both crc32(c) checksums and the custom checksums to validate backups.

Event          | crc32c action | custom checksum action
-------------------------------------------------------
Create backup  | compute       | validate
Verify backup  | validate      | validate
Restore backup | validate      | validate

This does not enable performance improvements like atomic copy / hard link for creating backups, but that might be rarely used anyway so that we can verify db manifest checksums on copy.

Thoughts?

@gg814
Copy link
Contributor Author

gg814 commented Jul 8, 2020

Sounds good. In your proposed approach, if I understand correctly, you advocate users to adopt the same custom checksum function used in DB for backup engine. In this way, we have a meaningful extra guard of data integrity (crc32c and custom checksum method in DB).

To extend this a little bit, we could always compute the default crc32c checksum for backup. Moreover, if there is custom checksum function (different from the one used in DB), we would also use it to compute another checksum. Both checksums (crc32c and backup engine custom checksum) will be used for validation. This also provides an extra protection for data corruption. But an obvious drawback is the additional computation costs of the custom checksum.

@pdillinger
Copy link
Contributor

If we are actually copying files, rather than taking advantage of an atomic copy or hard link, I think CPU is relatively cheap here compared to I/O. Backups will often be on a remote filesystem, for example.

While the independent crc32c checksum does in theory improve data integrity, in my mind the primary motivation is forward compatibility (old RocksDB version restoring backups taken with a newer version). Besides, the weakest link has the biggest impact on overall data integrity. (Related to the cache-friendly Bloom filter, https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#the-math)

Moreover, if there is custom checksum function (different from the one used in DB), we would also use it to compute another checksum.

The problem is that (I believe) we would need a backup schema change (break forward compatibility) to store checksum information separate from what's already in the backup manifest schema and from the DB manifest file being backed up. Even if we added the information in an extra file, IIRC older version would see that as a "garbage" file and delete it. That's actually not terrible as it's like an automatic "downgrade", but if we go that far, we should probably do the schema change to reap the benefits.

I expect most users will want to use the same file checksum for backups as for DB, so I wouldn't expect it to be a big limitation. If Zhichao or anyone believes differently, speak up. :)

@gg814
Copy link
Contributor Author

gg814 commented Jul 8, 2020

The problem is that (I believe) we would need a backup schema change (break forward compatibility) to store checksum information separate from what's already in the backup manifest schema and from the DB manifest file being backed up.

I think we could store the custom checksum information together with the default crc32c information in the backup meta/manifest.

Currently, it is stored as
<file1> <crc32(literal string)> <crc32c_value>
we could extend it to
<file1> <crc32(literal string)> <crc32c_value> <custom(literal string)> <checksum_name>:<checksum_value>

The issue for this that I can think of now is that for forward compatibility, the file checksum info string above can be rather long with many custom checksum names and values that are no longer used.

@pdillinger
Copy link
Contributor

Extra info on the line would be reported as corruption by an older version: https://github.com/facebook/rocksdb/blob/master/utilities/backupable/backupable_db.cc#L2135

But actually, the error message for using different/unknown checksum is better than I remember: https://github.com/facebook/rocksdb/blob/master/utilities/backupable/backupable_db.cc#L2144

That makes me more comfortable with this change as it is (broken forward compatibility on custom checksum; use custom checksum instead of crc32c) because attempting to restore backup with an older version will at least give a somewhat useful message "Unknown checksum type ..."

I could foresee someone needing a tool to "downgrade" a backup (manifest) though. Since ldb has backup and restore options already, this seems like a logical place. And actually, can ldb even operate on backups using custom checksum yet? (without a custom build of ldb, would need dynamic linking, right?) CC @mrambacher

@gg814
Copy link
Contributor Author

gg814 commented Jul 8, 2020

Extra info on the line would be reported as corruption by an older version: https://github.com/facebook/rocksdb/blob/master/utilities/backupable/backupable_db.cc#L2135

If one opens a new version meta file with an old version backup engine then yes. With a new version backup engine, it should be ok. In this PR:
https://github.com/gg814/rocksdb/blob/backup_custom_checksum/utilities/backupable/backupable_db.cc#L2279

HISTORY.md Outdated Show resolved Hide resolved
@pdillinger
Copy link
Contributor

That makes me more comfortable with this change as it is (broken forward compatibility on custom checksum; use custom checksum instead of crc32c)

Actually, I think that the "emergency" ability to restore from backup with ldb, with no custom code in ldb, is important. It doesn't scare me too much if this restore skips checksum verification, because checksums should be verified at read time in the DB, but I suspect that users (esp users inside FB) will not want to be cornered into skipping checksum verification in any step. (I think they'll be OK with skipping a second layer of checksum verification in a step--we don't even have to make that obvious to the user--but a copying step with no checksum verification is scary for users.)

So that is more weight in favor of keeping the existing schema (backup manifest always includes crc32c checksums only) and extend the checking/verification when a custom checksum is configured in BackupEngine and DB (checksums in DB manifest).

@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. You must reimport the pull request before landing.

@gg814
Copy link
Contributor Author

gg814 commented Jul 20, 2020

03152e0 uses the backuped db MANIFEST as an extension of the backup meta. Specifically, this means when creating, verifying, or restoring backups, in addition to its default crc32c, backup engine will try to use the custom checksums stored in the corresponding db MANIFEST, and it will use them if the corresponding checksum functions are available in the custom checksum factory passed in to backup engine.

This additional checksum computation does not incur additional I/O but additional CPU usage.

If custom checksums are available, the checksum value in the table filenames will be the custom checksum value.

The checksum values will be changed to hex encoded later as in #7110.

Something to be decided:

  • if there is custom checksum function (different from the one used in DB), we would also use it to compute another checksum.

  • if verify_with_checksum is true, VerifyBackup only compares crc32c but not the custom checksums? (and add another option for verifying with custom checksums?)

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to be decided:

  • if there is custom checksum function (different from the one used in DB), we would also use it to compute another checksum.

I'd defer this flexibility until we have a use case that needs different custom checksum functions for backup vs. DB. I'm still hopeful that users can pick a single checksum function for all their file transfer data integrity needs.

  • if verify_with_checksum is true, VerifyBackup only compares crc32c but not the custom checksums? (and add another option for verifying with custom checksums?)

It seems fair to change the meaning of verify_with_checksum=true to also verify custom checksum when BackupableDBOptions::backup_checksum_gen_factory is set .

HISTORY.md Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. You must reimport the pull request before landing.

@gg814
Copy link
Contributor Author

gg814 commented Jul 22, 2020

Rebase and some small changes (use hex-encoded checksums, update comments, refactor helper functions, etc). Should be ready for another look :)

@zhichao-cao
Copy link
Contributor

Rebase and some small changes (use hex-encoded checksums, update comments, refactor helper functions, etc). Should be ready for another look :)

Thanks for the updates and dressing the comments, I will take a look today.

@@ -181,6 +197,8 @@ class BackupEngineImpl : public BackupEngine {
const std::string filename;
const uint64_t size;
const std::string checksum_hex;
const std::string custom_checksum_hex;
const std::string checksum_func_name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we need to use custom_checksum_func_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checksum_func_name always gives priority to custom checksum function names. That is, when a custom checksum function is used, checksum_func_name is always the name of the custom checksum function. Otherwise, it is the name of the backup default. Since there is only one backup default and if it is used, checksum_func will be null, we do not need another variable for the name here.

A related question might be: do we need checksum_func_name?
I think could remove it (although I prefer keeping it) because it mainly serves as an in-memory sanity-check (or for the purpose of debugging) since we do not write checksum_func_name to the backup meta file. If the checking of checksum_func_name failed, something seriously wrong should have happened.

This also reminds me another question I had in mind. Currently, when LoadFromFile() is called, backup engine only reads the backup meta files as before and it does not read the backed up db manifests. (Because this will make the initialization of this version's backup engine slower, which might not be a good idea.) Backup engine will read the corresponding db manifest for a specific backup id when it is needed for verification/restoration.

If we choose to read the backed up db manifests when opening the backup engine, then the usage of checksum_func_name will be more than in-memory sanity-check. It may help detect corruption of shared table files that have been backed up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, better to keep the checksum_func_name for now. Not a high overhead.

@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. You must reimport the pull request before landing.

utilities/backupable/backupable_db_test.cc Outdated Show resolved Hide resolved
utilities/backupable/backupable_db_test.cc Show resolved Hide resolved
include/rocksdb/file_checksum.h Outdated Show resolved Hide resolved

struct FileChecksumGenContext {
std::string file_name;
// The name of the requested checksum generator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either here or at CreateFileChecksumGenerator, we have to say something about expectations around
(a) backward compatibility, including
(b) if requested_checksum_func_name is empty
(c) if requested_checksum_func_name is not a recognized function that can be provided. Should requested_checksum_func_name return null unique_ptr?

cc @zhichao-cao

@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. You must reimport the pull request before landing.

utilities/backupable/backupable_db_test.cc Show resolved Hide resolved

struct FileChecksumGenContext {
std::string file_name;
// The name of the requested checksum generator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still TODO

… cannot find the custom checksum generator in its factory
@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gg814 has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good now. :)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@gg814 merged this pull request in b578ca2.

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups.

Tests are added.

Pull Request resolved: facebook#7085

Test Plan: Passed make check

Reviewed By: pdillinger

Differential Revision: D22390756

Pulled By: gg814

fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 18, 2021
Summary:
A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups.

Tests are added.

Pull Request resolved: facebook/rocksdb#7085

Test Plan: Passed make check

Reviewed By: pdillinger

Differential Revision: D22390756

Pulled By: gg814

fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f
Signed-off-by: Changlong Chen <levisonchen@live.cn>
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 22, 2021
Summary:
A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups.

Tests are added.

Pull Request resolved: facebook/rocksdb#7085

Test Plan: Passed make check

Reviewed By: pdillinger

Differential Revision: D22390756

Pulled By: gg814

fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f
Signed-off-by: Changlong Chen <levisonchen@live.cn>
mm304321141 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 23, 2021
Summary:
A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups.

Tests are added.

Pull Request resolved: facebook/rocksdb#7085

Test Plan: Passed make check

Reviewed By: pdillinger

Differential Revision: D22390756

Pulled By: gg814

fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f
Signed-off-by: Changlong Chen <levisonchen@live.cn>
mm304321141 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 23, 2021
Summary:
A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups.

Tests are added.

Pull Request resolved: facebook/rocksdb#7085

Test Plan: Passed make check

Reviewed By: pdillinger

Differential Revision: D22390756

Pulled By: gg814

fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f
Signed-off-by: Changlong Chen <levisonchen@live.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants