Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync ingested files only if reopen is supported by the FS #8296

Closed
wants to merge 2 commits into from

Conversation

anand1976
Copy link
Contributor

Some file systems (especially distributed FS) do not support reopening a file for writing. The ExternalSstFileIngestionJob calls ReopenWritableFile in order to sync the ingested file, which typically makes sense only on a local file system with a page cache (i.e Posix). So this change tries to sync the ingested file only if ReopenWritableFile doesn't return Status::NotSupported().

Tests:
Add a new unit test in external_sst_file_basic_test

@facebook-github-bot
Copy link
Contributor

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@anand1976 anand1976 requested a review from jay-zhuang May 14, 2021 22:16
Copy link
Contributor

@jay-zhuang jay-zhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Please add a note in History.md

Comment on lines +116 to +119
// Some file systems (especially remote/distributed) don't support
// reopening a file for writing and don't require reopening and
// syncing the file. Ignore the NotSupported error in that case.
if (!s.IsNotSupported()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the user should explicitly return a status for skip sync file, or ignore reopen error. As the default is NotSupported, should the user return a different status for that? or NotSupported with a special SubCode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be odd for the file system implementation to be aware of why the reopen is being called and return a subcode indicating that the file sync can be skipped.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like fsyncing ingested file data was bundled with the solution to #5287, but it feels extraneous and potentially misleading in what scenarios we can handle.

@@ -109,17 +109,26 @@ Status ExternalSstFileIngestionJob::Prepare(
// directory before ingest the file. For integrity of RocksDB we need
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand "It is unsafe to assume application had sync the file". Is it safe to assume the application didn't fail to sync the file? If not, this is essentially allowing ingestion of corrupt data since fsync family functions only return the error once.

IMO the application should sync the file successfully before calling this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably easier for the app developer to forget to fsync the file than to forget to handle the fsync error. But yes, I think its reasonable to assume that the app should do whatever's necessary to persist the data before calling this.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer delete the file data sync and document the expectation in IngestExternalFile(). But in any case, skipping this logic in NotSupported case seems like a step forward.

@facebook-github-bot
Copy link
Contributor

@anand1976 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@anand1976 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@anand1976 merged this pull request in 9d61a08.

anand1976 pushed a commit that referenced this pull request May 19, 2021
Summary:
Some file systems (especially distributed FS) do not support reopening a file for writing. The ExternalSstFileIngestionJob calls ReopenWritableFile in order to sync the ingested file, which typically makes sense only on a local file system with a page cache (i.e Posix). So this change tries to sync the ingested file only if ReopenWritableFile doesn't return Status::NotSupported().

Tests:
Add a new unit test in external_sst_file_basic_test

Pull Request resolved: #8296

Reviewed By: jay-zhuang

Differential Revision: D28420865

Pulled By: anand1976

fbshipit-source-id: 380e7f5ff95324997f7a59864a9ac96ebbd0100c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants