-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[Draft] Introduce Checksum handoff APIs and Control Options #7409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
62ef70f to
a4e989f
Compare
mrambacher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the implementations of the new APIs default to using the old API, which means 1) the requested new functionality is not there and is silently ignored and 2) the new API might not really be needed.
I do not understand exactly what the change is hoping to accomplish and do not see any tests that validate it is doing the right thing, or that it is handling any of the error cases.
I am unsure where the resulting checksum is used.
env/composite_env_wrapper.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add the new API rather than expand the IOOptions to have a DataVerificationInfo element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IOOptions suppose to bring the hints down to the FS, might not be a appropriate place to bring the verification information.
include/rocksdb/options.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo "storage"
options/options_settable_test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you adding an option to the DBOptions that is not settable as an option? You should be able to set this using the options code.
db/builder.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a method somewhere? There is almost the same code block in many locations
file/writable_file_writer.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bad as it silently drops the fact that the user requested data validation. Why can't this be done at a higher level during options validation?
include/rocksdb/env.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the checksum method name part of the Env? Shouldn't it be coming from the Options?
include/rocksdb/options.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use FileType? Wouldn't this end up there anyway (but with better-but-different names)?
include/rocksdb/options.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option doesn't make sense to me as written. Like Mark pointed out elsewhere, this makes it easy for the user not to get handoff checksums as expected because there are a minimum of two things to turn on to get it. Why?
At first I thought this didn't make sense at all as a DB option, but it seems plausible that DBs would have different integrity requirements for different file types. For example, some applications might only rely on WAL for fast recovery, but have an alternate method of recovery in case of corrupt WAL.
But shouldn't we have a sensible set of defaults and let Env decide whether to enable checksum handoff? If a DB option is always needed to enable it, that seems like a bad (distant, unnecessary) and dangerous (not getting what you think you're getting) abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger My consideration is, FileSystem gives the capability to support checksum handoff. So I plan to introduce a API to FileSystem to return the ChecksumHandoffFuncName (or similar). If FileSystem does not support it, or does not want to enable checksum handoff in its layer, ChecksumHandoffFuncName will return an empty string. Also, if the checksum function is different from the one in the RocksDB, we do not need to even calculate the checksum and send it. So, that part is like the mechanism.
On the other side, I think we should give user/applications the option to decide, even FileSystem support it, but user/application do not want to use it, or partial use it. So I introduce the checksum_handoff_file_types to achieve the control. Otherwise, how can we deliver such flexibility?
pdillinger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, since nothing but wrappers call *AppendWithVerify, what would be accomplished by picking this into 6.13?
HISTORY.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this way, the customized Env or FileSystem is able to verify the correctness of data being written to the storage on time.
Except this doesn't actually work yet, right? So why is this stated in HISTORY.md?
include/rocksdb/env.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to define these new interfaces in WritableFile? Since we want to deprecate the storage part of Env eventually, why not just define them in file_system.h and redirect in LegacyWritableFileWrapper. Users like Zippy can work with the FileSystem directly or call env->GetFileSystem()->AppendWithVerify().
Hi Mark, for the review. This PR is only the API change of the checksum handoff. The motivation of this project is to pass the data verification information (e.g., checksum for now. Maybe offset in the file and others) together with the data from RocksDB to the storage layer. In this way, a customized env/fs which has the capability to verify the data can use these information to check the correctness of the data and catch the potential corruptions. This is only the first PR and the code actually does not execute. So I did not include the testing case. The checksum will be generated in WritableFileWriter for the Slice data and be verified in the Env/FS WritableFile implementation. |
a4e989f to
9cd8af2
Compare
| IOSTATS_CPU_TIMER_GUARD(cpu_write_nanos, env_); | ||
| s = writable_file_->Append(Slice(src, allowed), IOOptions(), nullptr); | ||
| if (perform_data_verification_) { | ||
| std::string checksum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse it for the whole file, rather than recreating a string for every write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
| void WritableFileWriter::DataChecksumCalculation(const char* data, size_t size, | ||
| std::string* checksum) { | ||
| uint32_t v_crc32c = crc32c::Extend(0, data, size); | ||
| PutFixed32(checksum, EndianSwapValue(v_crc32c)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why endian swap?
| s = writable_file_->PositionedAppend(Slice(src, size), write_offset, | ||
| IOOptions(), nullptr); | ||
| if (perform_data_verification_) { | ||
| std::string checksum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Can we reuse this string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
Add new AppendWithVerify and PositionedAppendWithVerify APIs to Env and FileSystem to bring the data verification information (data checksum information) from upper layer (e.g., WritableFileWriter) to the storage layer. In this way, the customized Env or FileSystem is able to verify the correctness of data being written to the storage on time. Add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (e.g., SST, WAL, Manifest) should use the new AppendWithVerify and PositionedAppendWithVerfiy APIs to handoff the verification information.
test plan: make asan_check/ make check