New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use FileChecksumGenFactory for SST file checksum #6600
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update HISTORY.md for public interface change.
// Return a processed value of the checksum for store in somewhere | ||
virtual std::string ProcessChecksum(const std::string& checksum) = 0; | ||
// Get the checksum | ||
virtual std::string GetChecksum() = 0; | ||
|
||
// Returns a name that identifies the current file checksum function. | ||
virtual const char* Name() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? Or is it enough to only have it in FileChecksumGenFactory
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? Or is it enough to only have it in
FileChecksumGenFactory
?
It might be more clear to user different name for FilechecksumGenerator and the Factory. User can decide to use the same name or not I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used in code for now? Where do you plan to use this Name()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used in code for now? Where do you plan to use this
Name()
?
Each time, after the SST file is generated, table builder will call to get the checksum generator name and checksum. The name of ChecksumGenFactory is not used yet.
util/file_checksum_helper.h
Outdated
assert(data != nullptr); | ||
return Uint32ToString(crc32c::Value(data, n)); | ||
if (is_inintilized_ == false) { | ||
checksum_ = crc32c::Value(data, n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the implementation of crc32c::Value():
inline uint32_t Value(const char* data, size_t n) {
return Extend(0, data, n);
}
I don't think we need this if
and is_inintilized_
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying Thanks for pointing out. I will change accordingly.
util/file_checksum_helper.h
Outdated
private: | ||
uint32_t checksum_; | ||
bool is_inintilized_; | ||
std::string file_name_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is it used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying Will remove.
Sure. |
|
||
// Returns a name that identifies the current file checksum function. | ||
virtual const char* Name() const = 0; | ||
}; | ||
|
||
// Create the FileChecksumGenerator object for each SST file. | ||
class FileChecksumGenFactory { | ||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: explicitly write the default constructor.
if (checksum_generator_ != nullptr) { | ||
return checksum_generator_->GetChecksum(); | ||
} else { | ||
return kUnknownFileChecksum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's not a part of the PR but does it make more sense to call it kNoFileChecksum
? Similarly for kUnknownFileChecksumFuncName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying So change to kNoFileChecksum and kNoFileChecksumGenName ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying. I will change them in the next PR.
include/rocksdb/file_checksum.h
Outdated
// Return a processed value of the checksum for store in somewhere | ||
virtual std::string ProcessChecksum(const std::string& checksum) = 0; | ||
// Get the checksum | ||
virtual std::string GetChecksum() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you would add a Finalize()
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying Yes. Just pushed the updates.
41179a7
to
afa8f0e
Compare
file/writable_file_writer.cc
Outdated
@@ -216,9 +216,18 @@ Status WritableFileWriter::Flush() { | |||
return s; | |||
} | |||
|
|||
std::string WritableFileWriter::GetFileChecksum() { | |||
if (checksum_generator_ != nullptr) { | |||
checksum_generator_->Finalize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finalize()
will be called each time we call GetFileChecksum()
. This does not sound intended usage pattern in which I expect Finalize()
to be called only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe considering adding a member bool finalized_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 Good suggestion! I have not considered this situation. PR is updated accordingly
uint32_t checksum_value = StringToUint32(checksum); | ||
return Uint32ToString(crc32c::Mask(checksum_value)); | ||
} | ||
void Finalize() override { checksum_str_ = Uint32ToString(checksum_); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid/disallow calling Finalize
multiple times. Currently it's possible due to the implementation of GetFileChecksum()
.
78413ac
to
fa33d35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have blocking comment for now. I'll defer to @riversand963 to approve.
if (checksum_generator_ != nullptr) { | ||
return checksum_generator_->GetChecksum(); | ||
} else { | ||
return kUnknownFileChecksum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
file/writable_file_writer.cc
Outdated
std::string WritableFileWriter::GetFileChecksum() { | ||
if (checksum_generator_ != nullptr) { | ||
if (!checksum_finalized_) { | ||
checksum_generator_->Finalize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only calling Finalize() inside GetFileChecksum() defeats the purpose of Finalize(). The application can do it if they want to. In order for Finalize() to make sense, we should call it when we close the file. Otherwise, there is no point of having such a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying I see the logic here. Will move to Close and update the GetChecksum order in builder.cc
@zhichao-cao has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me. Defer to @riversand963 to approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @zhichao-cao for working on this.
Thanks for the review and comments! Will change accordingly. |
Thanks for the review and comments! |
d58e66e
to
6934013
Compare
@zhichao-cao has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@zhichao-cao merged this pull request in e8d332d. |
Summary: In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method. Pull Request resolved: facebook/rocksdb#6600 Test Plan: tested with make asan_check Reviewed By: riversand963 Differential Revision: D20717670 Pulled By: zhichao-cao fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6 Signed-off-by: Changlong Chen <levisonchen@live.cn>
Summary: In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method. Pull Request resolved: facebook/rocksdb#6600 Test Plan: tested with make asan_check Reviewed By: riversand963 Differential Revision: D20717670 Pulled By: zhichao-cao fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6 Signed-off-by: Changlong Chen <levisonchen@live.cn>
Summary: In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method. Pull Request resolved: facebook/rocksdb#6600 Test Plan: tested with make asan_check Reviewed By: riversand963 Differential Revision: D20717670 Pulled By: zhichao-cao fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6 Signed-off-by: Changlong Chen <levisonchen@live.cn>
Summary: In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method. Pull Request resolved: facebook/rocksdb#6600 Test Plan: tested with make asan_check Reviewed By: riversand963 Differential Revision: D20717670 Pulled By: zhichao-cao fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6 Signed-off-by: Changlong Chen <levisonchen@live.cn>
Summary: In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method. Pull Request resolved: facebook/rocksdb#6600 Test Plan: tested with make asan_check Reviewed By: riversand963 Differential Revision: D20717670 Pulled By: zhichao-cao fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6 Signed-off-by: Changlong Chen <levisonchen@live.cn>
Summary: In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method. Pull Request resolved: facebook/rocksdb#6600 Test Plan: tested with make asan_check Reviewed By: riversand963 Differential Revision: D20717670 Pulled By: zhichao-cao fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6 Signed-off-by: Changlong Chen <levisonchen@live.cn>
In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method.
Test plan: tested with make asan_check