-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add new property in IOOptions to skip recursing through directories and list only files during GetChildren. #10668
[RFC] Add new property in IOOptions to skip recursing through directories and list only files during GetChildren. #10668
Conversation
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it would make sense to make the PosixFileSystem and other builtin FS also support this property, and to add associated tests. The tests should also support setting the value of the property to something that is not valid ("hello") and see what happens.
706e43a
to
307cf5b
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
307cf5b
to
f19c9bd
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Property bag seems a little weird, though if this is just a super-niche optimization, probably OK. Possible alternative: add a GetChildrenForFiles()
function that may exclude directories from the list, with default implementation calling GetChildren()
.
A tricky part for testing (either with my proposal or what you have here) is that implementations are allowed to include directories, and would our unit tests a regression where we are confused by a directory being in the list? One possibility is to include them sometimes in DEBUG builds. But it would be weird to decide randomly. How about using one of the variant debug builds, such as ROCKSDB_ASSERT_STATUS_CHECKED or ROCKSDB_MODIFY_NPHASH, to use the old behavior in PosixFileSystem?
04cab05
to
d959655
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
include/rocksdb/file_system.h
Outdated
@@ -119,7 +133,9 @@ struct IOOptions { | |||
prio(IOPriority::kIOLow), | |||
rate_limiter_priority(Env::IO_TOTAL), | |||
type(IOType::kUnknown), | |||
force_dir_fsync(force_dir_fsync_) {} | |||
force_dir_fsync(force_dir_fsync_) { | |||
SetProperty("list_files_only", "false"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, I think this is going to be a non-trivial, unnecessary CPU consumer. Usually the threshold for negligible is .01%, and we already spend well over that in IOOptions::*, mostly in the destructor. And this would give it more work to do. (Internal detail in P533216605)
d959655
to
df18b62
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
df18b62
to
53e733b
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
Sure. Let me try ROCKSDB_ASSERT_STATUS_CHECKED to use the old behavior in PosixFileSystem. |
53e733b
to
eb4e081
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Add new property in IOOptions for underlying file system to skip iteration of directories during DB::Open if there are no sub directories. Test Plan: Reviewers: Subscribers: Tasks: Tags:
…SERT_STATUS_CHECKED Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
eb4e081
to
b0601dd
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -4588,8 +4592,12 @@ Status DestroyDB(const std::string& dbname, const Options& options, | |||
// Reset the logger because it holds a handle to the | |||
// log file and prevents cleanup and directory removal | |||
soptions.info_log.reset(); | |||
IOOptions io_opts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend to set the flag here? It's not clear.
@@ -112,14 +112,19 @@ struct IOOptions { | |||
// fsync, set this to force the fsync | |||
bool force_dir_fsync; | |||
|
|||
// Can be used by underlying file systems to skip recursing through sub | |||
// directories and list only files in GetChildren API. | |||
bool do_not_recurse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bad name. GetChildren is never recursive. It never returns children of children, only direct children. The question is whether to filter out directories (which would preclude wrapping GetChildren in recursive listing logic, but the name here is "do not recurse").
Also, AFAIK for backward compatibility and generality, this is only intended as a hint, not a strong requirement to filter out directories.
My GetChildrenForFiles()
suggestion avoids extra "bloat" on IOOptions (fields only affecting one operation) and should make it easier to "do the right thing" rather than require an explicit IOOptions every time you want just the files in a directory. (See possible omission I pointed out in your code.) Also avoids the problem of "negative naming" on boolean variables. See e.g. https://www.serendipidata.com/posts/naming-guidelines-for-boolean-variables
Summary: Add new property "do_not_recurse" in IOOptions for underlying file system to skip iteration of directories during DB::Open if there are no sub directories and list only files.
By default this property is set to false. This property is set true currently in the code where RocksDB is sure only files are needed during DB::Open.
Provided support in PosixFileSystem to use "do_not_recurse".
TestPlan: