-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync ingested files only if reopen is supported by the FS #8296
Conversation
@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please add a note in History.md
// Some file systems (especially remote/distributed) don't support | ||
// reopening a file for writing and don't require reopening and | ||
// syncing the file. Ignore the NotSupported error in that case. | ||
if (!s.IsNotSupported()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the user should explicitly return a status for skip sync file, or ignore reopen error. As the default is NotSupported
, should the user return a different status for that? or NotSupported with a special SubCode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be odd for the file system implementation to be aware of why the reopen is being called and return a subcode indicating that the file sync can be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like fsyncing ingested file data was bundled with the solution to #5287, but it feels extraneous and potentially misleading in what scenarios we can handle.
@@ -109,17 +109,26 @@ Status ExternalSstFileIngestionJob::Prepare( | |||
// directory before ingest the file. For integrity of RocksDB we need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand "It is unsafe to assume application had sync the file". Is it safe to assume the application didn't fail to sync the file? If not, this is essentially allowing ingestion of corrupt data since fsync family functions only return the error once.
IMO the application should sync the file successfully before calling this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its probably easier for the app developer to forget to fsync the file than to forget to handle the fsync error. But yes, I think its reasonable to assume that the app should do whatever's necessary to persist the data before calling this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer delete the file data sync and document the expectation in IngestExternalFile()
. But in any case, skipping this logic in NotSupported
case seems like a step forward.
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
d30e459
to
d8703eb
Compare
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@anand1976 merged this pull request in 9d61a08. |
Summary: Some file systems (especially distributed FS) do not support reopening a file for writing. The ExternalSstFileIngestionJob calls ReopenWritableFile in order to sync the ingested file, which typically makes sense only on a local file system with a page cache (i.e Posix). So this change tries to sync the ingested file only if ReopenWritableFile doesn't return Status::NotSupported(). Tests: Add a new unit test in external_sst_file_basic_test Pull Request resolved: #8296 Reviewed By: jay-zhuang Differential Revision: D28420865 Pulled By: anand1976 fbshipit-source-id: 380e7f5ff95324997f7a59864a9ac96ebbd0100c
Some file systems (especially distributed FS) do not support reopening a file for writing. The ExternalSstFileIngestionJob calls ReopenWritableFile in order to sync the ingested file, which typically makes sense only on a local file system with a page cache (i.e Posix). So this change tries to sync the ingested file only if ReopenWritableFile doesn't return Status::NotSupported().
Tests:
Add a new unit test in external_sst_file_basic_test