Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add new property in IOOptions to skip recursing through directories and list only files during GetChildren. #10668

Closed

Conversation

akankshamahajan15
Copy link
Contributor

@akankshamahajan15 akankshamahajan15 commented Sep 13, 2022

Summary: Add new property "do_not_recurse" in IOOptions for underlying file system to skip iteration of directories during DB::Open if there are no sub directories and list only files.
By default this property is set to false. This property is set true currently in the code where RocksDB is sure only files are needed during DB::Open.

Provided support in PosixFileSystem to use "do_not_recurse".

TestPlan:

  • Existing tests

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would make sense to make the PosixFileSystem and other builtin FS also support this property, and to add associated tests. The tests should also support setting the value of the property to something that is not valid ("hello") and see what happens.

db/db_impl/db_impl_files.cc Outdated Show resolved Hide resolved
db/db_impl/db_impl_files.cc Outdated Show resolved Hide resolved
db/db_impl/db_impl_files.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@akankshamahajan15 akankshamahajan15 changed the title Add new property in IOOptions to skip directories during GetChildren. Add new property in IOOptions to skip directories and list only files during GetChildren. Sep 26, 2022
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Property bag seems a little weird, though if this is just a super-niche optimization, probably OK. Possible alternative: add a GetChildrenForFiles() function that may exclude directories from the list, with default implementation calling GetChildren().

A tricky part for testing (either with my proposal or what you have here) is that implementations are allowed to include directories, and would our unit tests a regression where we are confused by a directory being in the list? One possibility is to include them sometimes in DEBUG builds. But it would be weird to decide randomly. How about using one of the variant debug builds, such as ROCKSDB_ASSERT_STATUS_CHECKED or ROCKSDB_MODIFY_NPHASH, to use the old behavior in PosixFileSystem?

env/fs_posix.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

include/rocksdb/file_system.h Outdated Show resolved Hide resolved
include/rocksdb/file_system.h Outdated Show resolved Hide resolved
include/rocksdb/file_system.h Outdated Show resolved Hide resolved
@@ -119,7 +133,9 @@ struct IOOptions {
prio(IOPriority::kIOLow),
rate_limiter_priority(Env::IO_TOTAL),
type(IOType::kUnknown),
force_dir_fsync(force_dir_fsync_) {}
force_dir_fsync(force_dir_fsync_) {
SetProperty("list_files_only", "false");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, I think this is going to be a non-trivial, unnecessary CPU consumer. Usually the threshold for negligible is .01%, and we already spend well over that in IOOptions::*, mostly in the destructor. And this would give it more work to do. (Internal detail in P533216605)

db/db_impl/db_impl.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@akankshamahajan15
Copy link
Contributor Author

A tricky part for testing (either with my proposal or what you have here) is that implementations are allowed to include directories, and would our unit tests a regression where we are confused by a directory being in the list? One possibility is to include them sometimes in DEBUG builds. But it would be weird to decide randomly. How about using one of the variant debug builds, such as ROCKSDB_ASSERT_STATUS_CHECKED or ROCKSDB_MODIFY_NPHASH, to use the old behavior in PosixFileSystem?

Sure. Let me try ROCKSDB_ASSERT_STATUS_CHECKED to use the old behavior in PosixFileSystem.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@akankshamahajan15 akankshamahajan15 changed the title Add new property in IOOptions to skip directories and list only files during GetChildren. [RFC] Add new property in IOOptions to skip directories and list only files during GetChildren. Sep 28, 2022
Summary: Add new property in IOOptions for underlying file system to skip iteration of directories
during DB::Open if there are no sub directories.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
…SERT_STATUS_CHECKED

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@akankshamahajan15 akankshamahajan15 changed the title [RFC] Add new property in IOOptions to skip directories and list only files during GetChildren. [RFC] Add new property in IOOptions to skip recursing through directories and list only files during GetChildren. Oct 2, 2022
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -4588,8 +4592,12 @@ Status DestroyDB(const std::string& dbname, const Options& options,
// Reset the logger because it holds a handle to the
// log file and prevents cleanup and directory removal
soptions.info_log.reset();
IOOptions io_opts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to set the flag here? It's not clear.

@@ -112,14 +112,19 @@ struct IOOptions {
// fsync, set this to force the fsync
bool force_dir_fsync;

// Can be used by underlying file systems to skip recursing through sub
// directories and list only files in GetChildren API.
bool do_not_recurse;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad name. GetChildren is never recursive. It never returns children of children, only direct children. The question is whether to filter out directories (which would preclude wrapping GetChildren in recursive listing logic, but the name here is "do not recurse").

Also, AFAIK for backward compatibility and generality, this is only intended as a hint, not a strong requirement to filter out directories.

My GetChildrenForFiles() suggestion avoids extra "bloat" on IOOptions (fields only affecting one operation) and should make it easier to "do the right thing" rather than require an explicit IOOptions every time you want just the files in a directory. (See possible omission I pointed out in your code.) Also avoids the problem of "negative naming" on boolean variables. See e.g. https://www.serendipidata.com/posts/naming-guidelines-for-boolean-variables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants