New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BackupEngine supports custom file checksums #7085
Conversation
7ed0009
to
1c78a5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
1c78a5a
to
de61dad
Compare
@gg814 has updated the pull request. Re-import the pull request |
I have thought some more about this approach. Schema changes are scary, so what about this approach: Setting backup_checksum_gen_factory (maybe rename to file_checksum_gen_factory) in BackupableDBOptions means that we essentially treat the backed-up DB manifest as an extension of the backup manifest, and use both crc32(c) checksums and the custom checksums to validate backups.
This does not enable performance improvements like atomic copy / hard link for creating backups, but that might be rarely used anyway so that we can verify db manifest checksums on copy. Thoughts? |
Sounds good. In your proposed approach, if I understand correctly, you advocate users to adopt the same custom checksum function used in DB for backup engine. In this way, we have a meaningful extra guard of data integrity (crc32c and custom checksum method in DB). To extend this a little bit, we could always compute the default crc32c checksum for backup. Moreover, if there is custom checksum function (different from the one used in DB), we would also use it to compute another checksum. Both checksums (crc32c and backup engine custom checksum) will be used for validation. This also provides an extra protection for data corruption. But an obvious drawback is the additional computation costs of the custom checksum. |
If we are actually copying files, rather than taking advantage of an atomic copy or hard link, I think CPU is relatively cheap here compared to I/O. Backups will often be on a remote filesystem, for example. While the independent crc32c checksum does in theory improve data integrity, in my mind the primary motivation is forward compatibility (old RocksDB version restoring backups taken with a newer version). Besides, the weakest link has the biggest impact on overall data integrity. (Related to the cache-friendly Bloom filter, https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#the-math)
The problem is that (I believe) we would need a backup schema change (break forward compatibility) to store checksum information separate from what's already in the backup manifest schema and from the DB manifest file being backed up. Even if we added the information in an extra file, IIRC older version would see that as a "garbage" file and delete it. That's actually not terrible as it's like an automatic "downgrade", but if we go that far, we should probably do the schema change to reap the benefits. I expect most users will want to use the same file checksum for backups as for DB, so I wouldn't expect it to be a big limitation. If Zhichao or anyone believes differently, speak up. :) |
I think we could store the custom checksum information together with the default crc32c information in the backup meta/manifest. Currently, it is stored as The issue for this that I can think of now is that for forward compatibility, the file checksum info string above can be rather long with many custom checksum names and values that are no longer used. |
Extra info on the line would be reported as corruption by an older version: https://github.com/facebook/rocksdb/blob/master/utilities/backupable/backupable_db.cc#L2135 But actually, the error message for using different/unknown checksum is better than I remember: https://github.com/facebook/rocksdb/blob/master/utilities/backupable/backupable_db.cc#L2144 That makes me more comfortable with this change as it is (broken forward compatibility on custom checksum; use custom checksum instead of crc32c) because attempting to restore backup with an older version will at least give a somewhat useful message "Unknown checksum type ..." I could foresee someone needing a tool to "downgrade" a backup (manifest) though. Since ldb has backup and restore options already, this seems like a logical place. And actually, can ldb even operate on backups using custom checksum yet? (without a custom build of ldb, would need dynamic linking, right?) CC @mrambacher |
If one opens a new version meta file with an old version backup engine then yes. With a new version backup engine, it should be ok. In this PR: |
Actually, I think that the "emergency" ability to restore from backup with ldb, with no custom code in ldb, is important. It doesn't scare me too much if this restore skips checksum verification, because checksums should be verified at read time in the DB, but I suspect that users (esp users inside FB) will not want to be cornered into skipping checksum verification in any step. (I think they'll be OK with skipping a second layer of checksum verification in a step--we don't even have to make that obvious to the user--but a copying step with no checksum verification is scary for users.) So that is more weight in favor of keeping the existing schema (backup manifest always includes crc32c checksums only) and extend the checking/verification when a custom checksum is configured in BackupEngine and DB (checksums in DB manifest). |
@gg814 has updated the pull request. You must reimport the pull request before landing. |
03152e0 uses the backuped db MANIFEST as an extension of the backup meta. Specifically, this means when creating, verifying, or restoring backups, in addition to its default crc32c, backup engine will try to use the custom checksums stored in the corresponding db MANIFEST, and it will use them if the corresponding checksum functions are available in the custom checksum factory passed in to backup engine. This additional checksum computation does not incur additional I/O but additional CPU usage. If custom checksums are available, the checksum value in the table filenames will be the custom checksum value. The checksum values will be changed to hex encoded later as in #7110. Something to be decided:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to be decided:
if there is custom checksum function (different from the one used in DB), we would also use it to compute another checksum.
I'd defer this flexibility until we have a use case that needs different custom checksum functions for backup vs. DB. I'm still hopeful that users can pick a single checksum function for all their file transfer data integrity needs.
- if
verify_with_checksum
istrue
,VerifyBackup
only compares crc32c but not the custom checksums? (and add another option for verifying with custom checksums?)
It seems fair to change the meaning of verify_with_checksum=true
to also verify custom checksum when BackupableDBOptions::backup_checksum_gen_factory
is set .
03152e0
to
d3b2731
Compare
@gg814 has updated the pull request. You must reimport the pull request before landing. |
d3b2731
to
4878a65
Compare
@gg814 has updated the pull request. You must reimport the pull request before landing. |
Rebase and some small changes (use hex-encoded checksums, update comments, refactor helper functions, etc). Should be ready for another look :) |
Thanks for the updates and dressing the comments, I will take a look today. |
@@ -181,6 +197,8 @@ class BackupEngineImpl : public BackupEngine { | |||
const std::string filename; | |||
const uint64_t size; | |||
const std::string checksum_hex; | |||
const std::string custom_checksum_hex; | |||
const std::string checksum_func_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need to use custom_checksum_func_name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checksum_func_name
always gives priority to custom checksum function names. That is, when a custom checksum function is used, checksum_func_name
is always the name of the custom checksum function. Otherwise, it is the name of the backup default. Since there is only one backup default and if it is used, checksum_func
will be null, we do not need another variable for the name here.
A related question might be: do we need checksum_func_name
?
I think could remove it (although I prefer keeping it) because it mainly serves as an in-memory sanity-check (or for the purpose of debugging) since we do not write checksum_func_name
to the backup meta file. If the checking of checksum_func_name
failed, something seriously wrong should have happened.
This also reminds me another question I had in mind. Currently, when LoadFromFile()
is called, backup engine only reads the backup meta files as before and it does not read the backed up db manifests. (Because this will make the initialization of this version's backup engine slower, which might not be a good idea.) Backup engine will read the corresponding db manifest for a specific backup id when it is needed for verification/restoration.
If we choose to read the backed up db manifests when opening the backup engine, then the usage of checksum_func_name
will be more than in-memory sanity-check. It may help detect corruption of shared table files that have been backed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, better to keep the checksum_func_name
for now. Not a high overhead.
2f43c03
to
95eed0c
Compare
@gg814 has updated the pull request. You must reimport the pull request before landing. |
95eed0c
to
9cc6fe7
Compare
@gg814 has updated the pull request. You must reimport the pull request before landing. |
include/rocksdb/file_checksum.h
Outdated
|
||
struct FileChecksumGenContext { | ||
std::string file_name; | ||
// The name of the requested checksum generator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either here or at CreateFileChecksumGenerator, we have to say something about expectations around
(a) backward compatibility, including
(b) if requested_checksum_func_name is empty
(c) if requested_checksum_func_name is not a recognized function that can be provided. Should requested_checksum_func_name return null unique_ptr?
cc @zhichao-cao
9cc6fe7
to
1f23a98
Compare
@gg814 has updated the pull request. You must reimport the pull request before landing. |
include/rocksdb/file_checksum.h
Outdated
|
||
struct FileChecksumGenContext { | ||
std::string file_name; | ||
// The name of the requested checksum generator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still TODO
… cannot find the custom checksum generator in its factory
@gg814 has updated the pull request. You must reimport the pull request before landing. |
@gg814 has updated the pull request. You must reimport the pull request before landing. |
@gg814 has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good now. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gg814 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups. Tests are added. Pull Request resolved: facebook#7085 Test Plan: Passed make check Reviewed By: pdillinger Differential Revision: D22390756 Pulled By: gg814 fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f
Summary: A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups. Tests are added. Pull Request resolved: facebook/rocksdb#7085 Test Plan: Passed make check Reviewed By: pdillinger Differential Revision: D22390756 Pulled By: gg814 fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f Signed-off-by: Changlong Chen <levisonchen@live.cn>
Summary: A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups. Tests are added. Pull Request resolved: facebook/rocksdb#7085 Test Plan: Passed make check Reviewed By: pdillinger Differential Revision: D22390756 Pulled By: gg814 fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f Signed-off-by: Changlong Chen <levisonchen@live.cn>
Summary: A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups. Tests are added. Pull Request resolved: facebook/rocksdb#7085 Test Plan: Passed make check Reviewed By: pdillinger Differential Revision: D22390756 Pulled By: gg814 fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f Signed-off-by: Changlong Chen <levisonchen@live.cn>
Summary: A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups. Tests are added. Pull Request resolved: facebook/rocksdb#7085 Test Plan: Passed make check Reviewed By: pdillinger Differential Revision: D22390756 Pulled By: gg814 fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f Signed-off-by: Changlong Chen <levisonchen@live.cn>
Summary:
A new option
std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory
is added toBackupableDBOptions
. This allows custom checksum functions to be used for creating, verifying, or restoring backups.Tests are added.
Test plan:
Passed make check