Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: add LineFileReader and Status::MustCheck #8026

Closed
wants to merge 7 commits into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Mar 4, 2021

Summary: Removed confusing, awkward, and undocumented internal API
ReadOneLine and replaced with very simple LineFileReader.

In refactoring backupable_db.cc, this has the side benefit of
removing the arbitrary cap on the size of backup metadata files.

Also added Status::MustCheck to make it easy to mark a Status as
"must check." Using this, I can ensure that after
LineFileReader::ReadLine returns false the caller checks GetStatus().

Also removed some excessive conditional compilation in status.h

Test Plan: added unit test, and running tests with ASSERT_STATUS_CHECKED

Summary: Removed confusing, awkward, and undocumented internal API
ReadOneLine and replaced with very simple LineFileReader.

In refactoring backupable_db.cc, this has the side benefit of
removing the arbitrary cap on the size of backup metadata files.

Also added Status::MustCheck to make it easy to mark a Status as
"must check." Using this, I can ensure that after
LineFileReader::ReadLine returns false the caller checks GetStatus().

Also removed some excessive conditional compilation in status.h

Test Plan: existing tests, incl playing with ASSERT_STATUS_CHECKED
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

file/line_file_reader.h Outdated Show resolved Hide resolved
file/line_file_reader.h Outdated Show resolved Hide resolved
file/line_file_reader.cc Show resolved Hide resolved
file/line_file_reader.cc Outdated Show resolved Hide resolved
Comment on lines +2176 to +2178
} else {
line.clear();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this else is necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although LineFileReader::ReadLine actually records an empty line on 'false' return, that's not guaranteed by the API comment.

std::vector<std::shared_ptr<FileInfo>> files;
while (backup_meta_reader->ReadLine(&line)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you do not want to loop only up to num_files any longer? It seems a little cleaner to initialize num_files = 0 (or the strtoul) and then loop up to that many entries. If there are still more lines, then return an error at the end. As it is now, if the code has gibberish at the end, you will process the gibberish and return something about that gibberish. If you duplicated the contents of the "files" section, you will read them all, and then say "nope, too many"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need another 'if' and certainly another local variable to do it the way you propose, so this way is simpler.

std::vector<std::shared_ptr<FileInfo>> files;
while (backup_meta_reader->ReadLine(&line)) {
std::vector<std::string> components = StringSplit(line, ' ');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this do if there are multiple consecutive spaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care

utilities/backupable/backupable_db.cc Show resolved Hide resolved
Comment on lines +2220 to +2223
if (components[2] != ROCKSDB_NAMESPACE::ToString(checksum_value)) {
return Status::Corruption("Invalid checksum value for " + filename +
" in " + meta_filename_);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this happen? I think the original code was really checking to make sure the checksum_value was at EOL. I guess it might happen if the checksum_value had extra characters at the end? But we do not check that condition anywhere else (what if num_files=1234ABCD?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_files is redundant information. We don't need to check its syntax strictly because we have a later check on its value.

checksum should be canonically formatted to be accepted, and that's mostly for debugability.

Besides, all of these parsing details are nits because this is not a user-serviceable data format. It's internal. Rejecting data that wasn't strictly generated by StoreToFile is only "nice to have."

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments, otherwise LGTM

}
ASSERT_OK(reader->GetStatus());
ASSERT_EQ(count, nlines);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a couple more tests:

  • Read after EOF returns false but Status is still OK

}
ASSERT_TRUE(reader->GetStatus().IsCorruption());
ASSERT_LT(count, nlines / 2);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a few more tests:

  • Read after error still returns false and the status is still the same

// the line to `out`, or returning false on failure. You must check
// GetStatus to determine whether the failure was just end-of-file (OK
// status) or an I/O error (another status).
bool ReadLine(std::string* out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment to state that out does not contain the newline character.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in 4b18c46.

levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 18, 2021
Summary:
Removed confusing, awkward, and undocumented internal API
ReadOneLine and replaced with very simple LineFileReader.

In refactoring backupable_db.cc, this has the side benefit of
removing the arbitrary cap on the size of backup metadata files.

Also added Status::MustCheck to make it easy to mark a Status as
"must check." Using this, I can ensure that after
LineFileReader::ReadLine returns false the caller checks GetStatus().

Also removed some excessive conditional compilation in status.h

Pull Request resolved: facebook/rocksdb#8026

Test Plan: added unit test, and running tests with ASSERT_STATUS_CHECKED

Reviewed By: mrambacher

Differential Revision: D26831687

Pulled By: pdillinger

fbshipit-source-id: ef749c265a7a26bb13cd44f6f0f97db2955f6f0f
Signed-off-by: Changlong Chen <levisonchen@live.cn>
mm304321141 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 23, 2021
Summary:
Removed confusing, awkward, and undocumented internal API
ReadOneLine and replaced with very simple LineFileReader.

In refactoring backupable_db.cc, this has the side benefit of
removing the arbitrary cap on the size of backup metadata files.

Also added Status::MustCheck to make it easy to mark a Status as
"must check." Using this, I can ensure that after
LineFileReader::ReadLine returns false the caller checks GetStatus().

Also removed some excessive conditional compilation in status.h

Pull Request resolved: facebook/rocksdb#8026

Test Plan: added unit test, and running tests with ASSERT_STATUS_CHECKED

Reviewed By: mrambacher

Differential Revision: D26831687

Pulled By: pdillinger

fbshipit-source-id: ef749c265a7a26bb13cd44f6f0f97db2955f6f0f
Signed-off-by: Changlong Chen <levisonchen@live.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants