Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handoff checksum Implementation #7523

Closed
wants to merge 7 commits into from

Conversation

zhichao-cao
Copy link
Contributor

in PR #7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.

Test plan: add new unit test, pass make check/ make asan_check

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zhichao-cao zhichao-cao requested review from pdillinger and removed request for siying October 16, 2020 21:24
Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to have disparate configurations that need to work together to make this work. If I set checksum_handoff_file_types, that does nothing on its own. If I support the DVI versions of FSWritableFile functions, that does nothing on its own. Why should I have to add something on both ends?

Echoing some comments on the design document, I think the DB and/or column families should only have a notion of which files are considered "critical" and let the storage layer handle that appropriately, when there is a distinction to be made in handling them. It then makes sense for there to be a reasonable set of defaults considered "critical."

(Perhaps revisiting old debates.) If handoff checksums are computed in the storage layer anyway, why can't the Env / FileSystem provide a function to generate the DataVerificationInfo it expects from data. And if we do that, why not let DataVerificationInfo live entirely under Env / FileSystem (in appropriate implementations)?

include/rocksdb/file_system.h Outdated Show resolved Hide resolved
file/writable_file_writer.cc Outdated Show resolved Hide resolved
include/rocksdb/file_system.h Outdated Show resolved Hide resolved
@zhichao-cao
Copy link
Contributor Author

Thanks for the review. !

We seem to have disparate configurations that need to work together to make this work. If I set checksum_handoff_file_types, that does nothing on its own. If I support the DVI versions of FSWritableFile functions, that does nothing on its own. Why should I have to add something on both ends?

My point is, this is a three-ways contracts. 1) user want to do handoff on which type of files, 2) RocksDB provides the path, 3) the customized File System support it. From RocksDB's view, it does not know File System support it or not. From user's view, it will set the option only when it knows the customized File System will do the job.

Echoing some comments on the design document, I think the DB and/or column families should only have a notion of which files are considered "critical" and let the storage layer handle that appropriately, when there is a distinction to be made in handling them. It then makes sense for there to be a reasonable set of defaults considered "critical."

In our case, we hope to calculate the checksum as early as possible, then the customized File System can use/verfiy/pass it. So, it is better to calculate the checksum in WritableFileWrite with the control option.

(Perhaps revisiting old debates.) If handoff checksums are computed in the storage layer anyway, why can't the Env / FileSystem provide a function to generate the DataVerificationInfo it expects from data. And if we do that, why not let DataVerificationInfo live entirely under Env / FileSystem (in appropriate implementations)?

The motivation of this PR is to move the checksum calculation from Env/File system up, and do it as early as possible. Even in WritableFileWriter, we do memory copy. There will be a follow up PR to bridge the gap, which covers the checksum verification from source buffer, to the place where the handoff checksum is calculated (due to the buffer write and write rate limiter, we are not able to calculate the checksum for the data being append that early)

facebook-github-bot pushed a commit that referenced this pull request Oct 23, 2020
Summary:
As suggested by pdillinger ,The name of kLogFile is misleading, in some tests, kLogFile is defined as info log. Replace it with kWalFile and move it to public, which will be used in #7523

Pull Request resolved: #7580

Test Plan: make check

Reviewed By: riversand963

Differential Revision: D24485420

Pulled By: zhichao-cao

fbshipit-source-id: 955e3dacc1021bb590fde93b0a568ffe9ad80799
@facebook-github-bot
Copy link
Contributor

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

@pdillinger
Copy link
Contributor

There will be a follow up PR to bridge the gap, which covers the checksum verification from source buffer, to the place where the handoff checksum is calculated (due to the buffer write and write rate limiter, we are not able to calculate the checksum for the data being append that early)

I'm finding it difficult to approve this complication of the Env interface without actually seeing a data integrity improvement that is not possible with the existing interface. Computing checksum on the buffer passed to Append in caller rather than callee to me is not a meaningful data integrity improvement.

@zhichao-cao
Copy link
Contributor Author

There will be a follow up PR to bridge the gap, which covers the checksum verification from source buffer, to the place where the handoff checksum is calculated (due to the buffer write and write rate limiter, we are not able to calculate the checksum for the data being append that early)

I'm finding it difficult to approve this complication of the Env interface without actually seeing a data integrity improvement that is not possible with the existing interface. Computing checksum on the buffer passed to Append in caller rather than callee to me is not a meaningful data integrity improvement.

You mean there is nearly no different between Append itself do the checksum calculation for the buffer and the caller of Append to do it?

@pdillinger
Copy link
Contributor

You mean there is nearly no different between Append itself do the checksum calculation for the buffer and the caller of Append to do it?

If it's the same buffer, I don't see a difference between computing checksum right before call vs. right at beginning of call.

@zhichao-cao
Copy link
Contributor Author

You mean there is nearly no different between Append itself do the checksum calculation for the buffer and the caller of Append to do it?

If it's the same buffer, I don't see a difference between computing checksum right before call vs. right at beginning of call.

Due to the rate limiter, yes, the buffer/size is the same as what we pass down. But based on the WS and Zippy requirement, if RocksDB can provide the calculation in higher layer instead of FileSystem, they can avoid re-calculate it again. In current implementation+the follow up one, we can achieve from WritableFileWriter::append(data)->buffer for WritableFile::Append->WS client validation. Without this interface, even we verify the buffer in WritableFileWriter, FileSystem still need to calculate again, so that's one more calculation.

@ajkr
Copy link
Contributor

ajkr commented Oct 28, 2020

I'm finding it difficult to approve this complication of the Env interface without actually seeing a data integrity improvement that is not possible with the existing interface. Computing checksum on the buffer passed to Append in caller rather than callee to me is not a meaningful data integrity improvement.

Just to clarify, is the Env interface change you're referring to the one that landed in #7419?

If it's the same buffer, I don't see a difference between computing checksum right before call vs. right at beginning of call.

I tend to think it's ok to trust the memory's reliability features for data at rest, so computing in callee would work for me. However, any gap in the software-level protection feels like somewhere users will place the blame in corruption investigations, so I could go either way.

Due to the rate limiter, yes, the buffer/size is the same as what we pass down. But based on the WS and Zippy requirement, if RocksDB can provide the calculation in higher layer instead of FileSystem, they can avoid re-calculate it again. In current implementation+the follow up one, we can achieve from WritableFileWriter::append(data)->buffer for WritableFile::Append->WS client validation. Without this interface, even we verify the buffer in WritableFileWriter, FileSystem still need to calculate again, so that's one more calculation.

Do you mean in the future there'll be integrity protection computed in WritableFileWriter::Append() regardless of this interface, and its checksum will be usable by the underlying WritableFile::Append()? I am not quite able to imagine how it'll work. Would we need to compute a cumulative file checksum using all the chunks passed to WritableFile::Append() and verify against the cumulative file checksum using all data received in WritableFileWriter::Append()?

edit: s/rolling checksum/cumulative file checksum/g

@zhichao-cao
Copy link
Contributor Author

Do you mean in the future there'll be integrity protection computed in WritableFileWriter::Append() regardless of this interface, and its checksum will be usable by the underlying WritableFile::Append()? I am not quite able to imagine how it'll work. Would we need to compute a rolling checksum over all the chunks passed to WritableFile::Append() and verify against the rolling checksum of all data received in WritableFileWriter::Append()?

Yes. The motivation of this project, is to handle the potential corruption of WAL and Manifest, with the discussion with Zippy, they will use the new FileSystem Append with verification to handle the checksum handoff for their log. So they also want the hand off can be used for other files like SST files. The protection will be in this way: by using the rolling checksum of the data in the buffer of WritableFileWriter::Append(), we can have a protection until we calculate the checksum of the piece of data being passed to WritableFile::Append(). In WS use case, WS will avoid the current checksum calculation of the data in WritableFile::Append(), and directly use the checksum for further verification (e.g., on WS client side or even server side). There are some challenges to close the integrity gap in WritableFileWriter::Append() especially caused by buffered write and rate-limiter. Therefore, in this PR, the first step is to move the buffer checksum calculation up from WritableFile::Append() customized implementation to the logic inside WritableFileWriter::Append().

@ajkr
Copy link
Contributor

ajkr commented Oct 28, 2020

Do you mean in the future there'll be integrity protection computed in WritableFileWriter::Append() regardless of this interface, and its checksum will be usable by the underlying WritableFile::Append()? I am not quite able to imagine how it'll work. Would we need to compute a rolling checksum over all the chunks passed to WritableFile::Append() and verify against the rolling checksum of all data received in WritableFileWriter::Append()?

Yes. The motivation of this project, is to handle the potential corruption of WAL and Manifest, with the discussion with Zippy, they will use the new FileSystem Append with verification to handle the checksum handoff for their log. So they also want the hand off can be used for other files like SST files. The protection will be in this way: by using the rolling checksum of the data in the buffer of WritableFileWriter::Append(), we can have a protection until we calculate the checksum of the piece of data being passed to WritableFile::Append(). In WS use case, WS will avoid the current checksum calculation of the data in WritableFile::Append(), and directly use the checksum for further verification (e.g., on WS client side or even server side). There are some challenges to close the integrity gap in WritableFileWriter::Append() especially caused by buffered write and rate-limiter. Therefore, in this PR, the first step is to move the buffer checksum calculation up from WritableFile::Append() customized implementation to the logic inside WritableFileWriter::Append().

Understood, thanks for elaborating. In this PR, it looks like the checksums passed to WritableFile::Append() are computed with an empty initial state each time, not cumulative file checksum. Won't we need the cumulative file checksum including all the chunks to verify against any checksum at a higher layer than rate limiter? If so, should WritableFile::Append()'s API provide cumulative file checksum? Sorry if I misunderstood something.

edit: s/rolling checksum/cumulative file checksum/g

@zhichao-cao
Copy link
Contributor Author

Do you mean in the future there'll be integrity protection computed in WritableFileWriter::Append() regardless of this interface, and its checksum will be usable by the underlying WritableFile::Append()? I am not quite able to imagine how it'll work. Would we need to compute a rolling checksum over all the chunks passed to WritableFile::Append() and verify against the rolling checksum of all data received in WritableFileWriter::Append()?

Yes. The motivation of this project, is to handle the potential corruption of WAL and Manifest, with the discussion with Zippy, they will use the new FileSystem Append with verification to handle the checksum handoff for their log. So they also want the hand off can be used for other files like SST files. The protection will be in this way: by using the rolling checksum of the data in the buffer of WritableFileWriter::Append(), we can have a protection until we calculate the checksum of the piece of data being passed to WritableFile::Append(). In WS use case, WS will avoid the current checksum calculation of the data in WritableFile::Append(), and directly use the checksum for further verification (e.g., on WS client side or even server side). There are some challenges to close the integrity gap in WritableFileWriter::Append() especially caused by buffered write and rate-limiter. Therefore, in this PR, the first step is to move the buffer checksum calculation up from WritableFile::Append() customized implementation to the logic inside WritableFileWriter::Append().

Understood, thanks for elaborating. In this PR, it looks like the checksums passed to WritableFile::Append() are computed with an empty initial state each time, not rolling checksum. Won't we need the rolling checksum of all the chunks to verify against any checksum at a higher layer than rate limiter? If so, should WritableFile::Append()'s API provide rolling checksums? Sorry if I misunderstood something.

Based on the current information from WS, they can only use the checksum information aligned with the data being passed in WritableFile::Append(). Therefore, it is a one-time checksum being passed. The rolling checksum (also reuse the record checksum in WAL and Manifest if possible) I mentioned before, is to bridge the gap between data being added to the write buffer in WritableFileWriter::Append() and the time we calculate the one-time checksum. For example, for the working flow of the small writes. Those data being passed in by WritableFileWriter::Append(data) are memcpy to the write buffer. We need to calculate the rolling checksum of the "data". When the buffer is sealed to be write to the lower level via WritableFile::Append(), a verification of the write buffer's data is needed. This means, the data in the buffer is correct. That is, the content of the write buffer can be trusted. In the next step, with rate limiter, this buffer might be pass down with multiple WritableFile::Append() calls. At that time, we calculate the one-time checksum and pass down with the data pieces. In this PR, it is the one-time checksum which aligns with the data being pass down through WritableFile::Append().

@ajkr
Copy link
Contributor

ajkr commented Oct 29, 2020

Got it, thanks for the explanation.

When the buffer is sealed to be write to the lower level via WritableFile::Append(), a verification of the write buffer's data is needed.

My understanding is with a WritableFile::Append() API taking cumulative file checksum would be beneficial in eliminating the checksum calculation for the above verification. But it is not feasible due to WS system limitation. I wonder if it is a fundamental limitation that'll be common to other systems, or just something they chose not to implement in H2 (we'll have a lot of difficulty changing the API once the limitation goes away).

edit: s/rolling checksum/cumulative file checksum/g

@zhichao-cao
Copy link
Contributor Author

Got it, thanks for the explanation.

When the buffer is sealed to be write to the lower level via WritableFile::Append(), a verification of the write buffer's data is needed.

My understanding is with a WritableFile::Append() API taking cumulative file checksum would be beneficial in eliminating the checksum calculation for the above verification. But it is not feasible due to WS system limitation. I wonder if it is a fundamental limitation that'll be common to other systems, or just something they chose not to implement in H2 (we'll have a lot of difficulty changing the API once the limitation goes away).

edit: s/rolling checksum/cumulative file checksum/g

My personal consideration is, in the API, we pass a data structure: DataVerificationInfo. It can bring any verification information we want. First, it is natural to bring the checksum align with the data in WritableFile::Append(). It is what I implemented and what WS support in the current stage. In the future, as mentioned in the discussion with WS and Zippy, we can bring more information down to the FS such as the offset of the data in the file, rolling checksum of the whole file, or others. Therefore, I think we have the flexibility to extend it. User can choose to use the one-time checksum, or other verification information in the WritableFile::Append() based on what is their customized FileSystem implementation and what they support.

@pdillinger
Copy link
Contributor

User can choose to use the one-time checksum, or other verification information in the WritableFile::Append() based on what is their customized FileSystem implementation and what they support.

Except not really, because right now the generation of DataVerificationInfo is fixed in the implementation. We are directly tied up in any updates or evolution.

Wouldn't it be easier for both sides if we just provide a way for WritableFileWriter to do no buffering (IMHO at the choice of Env/FS), and pass original data directly to FSWritableFile for each Append? Then the FS/Env implementation can worry about its own buffering and coordinating that with hand-off checksums. The fact that we provide buffering above the FS layer seems to me a convenience to FS implementors, and should be optional so that our implementation does not become intertwined will all possible concerns of buffering for all possible FS implementations. Re-implementing a buffering layer into an already complex, niche FS implementation should not be complicated, and not nearly complicated enough to justify intertwining our code and releases with this problem.

@zhichao-cao
Copy link
Contributor Author

Except not really, because right now the generation of DataVerificationInfo is fixed in the implementation. We are directly tied up in any updates or evolution.

Wouldn't it be easier for both sides if we just provide a way for WritableFileWriter to do no buffering (IMHO at the choice of Env/FS), and pass original data directly to FSWritableFile for each Append? Then the FS/Env implementation can worry about its own buffering and coordinating that with hand-off checksums. The fact that we provide buffering above the FS layer seems to me a convenience to FS implementors, and should be optional so that our implementation does not become intertwined will all possible concerns of buffering for all possible FS implementations. Re-implementing a buffering layer into an already complex, niche FS implementation should not be complicated, and not nearly complicated enough to justify intertwining our code and releases with this problem.

It is a possible direction, but may not be this PR. If we let WritableFileWriter to do no buffering and directly pass what is appended to FS, we will not be able to control write rate limiter, the FS need to deal with the small writes. This can be a future implementation to give an option to control if the user want to let FS to fully control the write process of Append. But here, since Zippy and WS relies on the current flow in WritableFileWriter, it would be better to follow the current buffering and rate limit control in WritableFileWriter.

@pdillinger
Copy link
Contributor

If we let WritableFileWriter to do no buffering and directly pass what is appended to FS, we will not be able to control write rate limiter, the FS need to deal with the small writes.

This is already handled in WritableFileWriter::WriteBuffered (which I think is supposed to mean "rely on layer below for any buffering"), so I propose we just need another hook for calling it from Append.

And I propose that the theoretical problem of data being corrupted while waiting for the rate limiter be left for future work. (AFAICT, my proposal allows an FS implementation to cover more of the pipeline with checksums sooner than this PR.)

@zhichao-cao
Copy link
Contributor Author

If we let WritableFileWriter to do no buffering and directly pass what is appended to FS, we will not be able to control write rate limiter, the FS need to deal with the small writes.

This is already handled in WritableFileWriter::WriteBuffered (which I think is supposed to mean "rely on layer below for any buffering"), so I propose we just need another hook for calling it from Append.

And I propose that the theoretical problem of data being corrupted while waiting for the rate limiter be left for future work. (AFAICT, my proposal allows an FS implementation to cover more of the pipeline with checksums sooner than this PR.)

Can you explain more on your proposal? Or a design doc for it. It is hard for me to pick up the overall information from the comments here.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao merged this pull request in d1c510b.

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
As suggested by pdillinger ,The name of kLogFile is misleading, in some tests, kLogFile is defined as info log. Replace it with kWalFile and move it to public, which will be used in facebook#7523

Pull Request resolved: facebook#7580

Test Plan: make check

Reviewed By: riversand963

Differential Revision: D24485420

Pulled By: zhichao-cao

fbshipit-source-id: 955e3dacc1021bb590fde93b0a568ffe9ad80799
codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
in PR facebook#7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.

Pull Request resolved: facebook#7523

Test Plan: add new unit test, pass make check/ make asan_check

Reviewed By: pdillinger

Differential Revision: D24313271

Pulled By: zhichao-cao

fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 18, 2021
Summary:
in PR facebook/rocksdb#7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.

Pull Request resolved: facebook/rocksdb#7523

Test Plan: add new unit test, pass make check/ make asan_check

Reviewed By: pdillinger

Differential Revision: D24313271

Pulled By: zhichao-cao

fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
Signed-off-by: Changlong Chen <levisonchen@live.cn>
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 18, 2021
Summary:
As suggested by pdillinger ,The name of kLogFile is misleading, in some tests, kLogFile is defined as info log. Replace it with kWalFile and move it to public, which will be used in facebook/rocksdb#7523

Pull Request resolved: facebook/rocksdb#7580

Test Plan: make check

Reviewed By: riversand963

Differential Revision: D24485420

Pulled By: zhichao-cao

fbshipit-source-id: 955e3dacc1021bb590fde93b0a568ffe9ad80799
Signed-off-by: Changlong Chen <levisonchen@live.cn>
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 18, 2021
Summary:
in PR facebook/rocksdb#7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.

Pull Request resolved: facebook/rocksdb#7523

Test Plan: add new unit test, pass make check/ make asan_check

Reviewed By: pdillinger

Differential Revision: D24313271

Pulled By: zhichao-cao

fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
Signed-off-by: Changlong Chen <levisonchen@live.cn>
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 18, 2021
Summary:
in PR facebook/rocksdb#7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.

Pull Request resolved: facebook/rocksdb#7523

Test Plan: add new unit test, pass make check/ make asan_check

Reviewed By: pdillinger

Differential Revision: D24313271

Pulled By: zhichao-cao

fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
Signed-off-by: Changlong Chen <levisonchen@live.cn>
mm304321141 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 23, 2021
Summary:
As suggested by pdillinger ,The name of kLogFile is misleading, in some tests, kLogFile is defined as info log. Replace it with kWalFile and move it to public, which will be used in facebook/rocksdb#7523

Pull Request resolved: facebook/rocksdb#7580

Test Plan: make check

Reviewed By: riversand963

Differential Revision: D24485420

Pulled By: zhichao-cao

fbshipit-source-id: 955e3dacc1021bb590fde93b0a568ffe9ad80799
Signed-off-by: Changlong Chen <levisonchen@live.cn>
mm304321141 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 23, 2021
Summary:
in PR facebook/rocksdb#7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.

Pull Request resolved: facebook/rocksdb#7523

Test Plan: add new unit test, pass make check/ make asan_check

Reviewed By: pdillinger

Differential Revision: D24313271

Pulled By: zhichao-cao

fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
Signed-off-by: Changlong Chen <levisonchen@live.cn>
mm304321141 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 23, 2021
Summary:
in PR facebook/rocksdb#7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.

Pull Request resolved: facebook/rocksdb#7523

Test Plan: add new unit test, pass make check/ make asan_check

Reviewed By: pdillinger

Differential Revision: D24313271

Pulled By: zhichao-cao

fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
Signed-off-by: Changlong Chen <levisonchen@live.cn>
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 24, 2021
Summary:
in PR facebook/rocksdb#7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.

Pull Request resolved: facebook/rocksdb#7523

Test Plan: add new unit test, pass make check/ make asan_check

Reviewed By: pdillinger

Differential Revision: D24313271

Pulled By: zhichao-cao

fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
Signed-off-by: Changlong Chen <levisonchen@live.cn>
facebook-github-bot pushed a commit that referenced this pull request Jun 25, 2021
…AL (#8412)

Summary:
In PR #7523 , checksum handoff is introduced in RocksDB for WAL, Manifest, and SST files. When user enable checksum handoff for a certain type of file, before the data is written to the lower layer storage system, we calculate the checksum (crc32c) of each piece of data and pass the checksum down with the data, such that data verification can be down by the lower layer storage system if it has the capability. However, it cannot cover the whole lifetime of the data in the memory and also it potentially introduces extra checksum calculation overhead.

In this PR, we introduce a new interface in WritableFileWriter::Append, which allows the caller be able to pass the data and the checksum (crc32c) together. In this way, WritableFileWriter can directly use the pass-in checksum (crc32c) to generate the checksum of data being passed down to the storage system. It saves the calculation overhead and achieves higher protection coverage. When a new checksum is added with the data, we use Crc32cCombine #8305 to combine the existing checksum and the new checksum. To avoid the segmenting of data by rate-limiter before it is stored, rate-limiter is called enough times to accumulate enough credits for a certain write. This design only support Manifest and WAL which use log_writer in the current stage.

Pull Request resolved: #8412

Test Plan: make check, add new testing cases.

Reviewed By: anand1976

Differential Revision: D29151545

Pulled By: zhichao-cao

fbshipit-source-id: 75e2278c5126cfd58393c67b1efd18dcc7a30772
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Sep 14, 2021
Summary:
in PR facebook/rocksdb#7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.

Pull Request resolved: facebook/rocksdb#7523

Test Plan: add new unit test, pass make check/ make asan_check

Reviewed By: pdillinger

Differential Revision: D24313271

Pulled By: zhichao-cao

fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
Signed-off-by: Changlong Chen <levisonchen@live.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants