Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Async Read and Poll APIs in FileSystem #9564

Closed
wants to merge 5 commits into from

Conversation

akankshamahajan15
Copy link
Contributor

@akankshamahajan15 akankshamahajan15 commented Feb 14, 2022

Summary: This PR adds support for new APIs Async Read that reads the data
asynchronously and Poll API that checks if requested read request has
completed or not.

Usage: In RocksDB, we are currently planning to prefetch data
asynchronously during sequential scanning and RocksDB will call these
APIs to prefetch more data in advanced.

Design:

  • ReadAsync API submits the read request to underlying FileSystem in
    order to read data asynchronously. When read request is completed,
    callback function will be called. cb_arg is used by RocksDB to track the
    original request submitted and IOHandle is used by FileSystem to keep track
    of IO requests at their level.

  • The Poll API is added in FileSystem because the call could end up handling
    completions for multiple different files which is not specific to a
    FSRandomAccessFile instance. There could be multiple outstanding file reads
    from different files in future and they can complete in any order.

Test Plan: Test will be added in separate PR.

@akankshamahajan15 akankshamahajan15 changed the title Add Async Read and Poll APIs in FileSystem [WIP] Add Async Read and Poll APIs in FileSystem Feb 14, 2022
@akankshamahajan15 akankshamahajan15 marked this pull request as draft February 14, 2022 22:05
@akankshamahajan15 akankshamahajan15 marked this pull request as ready for review February 14, 2022 23:03
@akankshamahajan15 akankshamahajan15 changed the title [WIP] Add Async Read and Poll APIs in FileSystem Add Async Read and Poll APIs in FileSystem Feb 14, 2022
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

include/rocksdb/file_system.h Outdated Show resolved Hide resolved
include/rocksdb/file_system.h Outdated Show resolved Hide resolved
include/rocksdb/file_system.h Outdated Show resolved Hide resolved
include/rocksdb/file_system.h Outdated Show resolved Hide resolved
Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without an example, I am having trouble following exactly how this API is supposed to function. I was assuming that the caller would invoke ReadAsync, get back something that represented that operation (an IOHandle?) and the callback was invoked when the operation was complete. But who creates and manages the IOHandle is unclear to me (is it the caller or the FileSystem?).

The lifetime of the returned Slice is also unclear (who owns/manages the "scratch" data backing the Slice?).

env/file_system_tracer.cc Outdated Show resolved Hide resolved
// IOStatus::NotSupported() indicates that polling is not supported, and
// RocksDB will be informed of IO completions via the callback on another
// thread.
virtual IOStatus Poll(IOHandle* /*io_handle*/) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the "Poll" off of the FileSystem while the ReadAsync is off the RandomAccessFile? Shouldn't we poll the file?

What is the Status returned if the read has completed?

This also should not take an IOHandle* -- perhaps by reference or const reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Poll call could end up handling completions for multiple different files, so its not specific to a FSRandomAccessFile instance. There could be multiple outstanding file reads and they can complete in any order.

include/rocksdb/file_system.h Outdated Show resolved Hide resolved
include/rocksdb/file_system.h Outdated Show resolved Hide resolved
@@ -733,6 +747,19 @@ struct FSReadRequest {
IOStatus status;
};

// A read IO response structure for use by asynchronously Read APIs.
struct FSReadResponse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand why you need both the IOHandle and the FSReadResponse. Shouldn't the IOHandle contain everything to identify the request (offset, length, status) and the callback to invoke when the read is complete?

@anand1976
Copy link
Contributor

Without an example, I am having trouble following exactly how this API is supposed to function. I was assuming that the caller would invoke ReadAsync, get back something that represented that operation (an IOHandle?) and the callback was invoked when the operation was complete. But who creates and manages the IOHandle is unclear to me (is it the caller or the FileSystem?).

I agree. An example, or a test with a mock FileSystem implementation, would make it clearer.

The IOHandle would be created by the FileSystem to store whatever context it needs to store. Its lifecycle is managed by RocksDB, hence its defined as a type erased unique_ptr. IOHandle could have been a void*, but managing the lifecycle would have been complicated. Would the FileSystem delete it after calling the callback, or when RocksDB calls Poll?

The lifetime of the returned Slice is also unclear (who owns/manages the "scratch" data backing the Slice?).

The scratch data and returned Slice lifetime works exactly like Read, i.e RocksDB reader owns scratch and Slice refers to either scratch or an mmap'd file.

@mrambacher
Copy link
Contributor

Without an example, I am having trouble following exactly how this API is supposed to function. I was assuming that the caller would invoke ReadAsync, get back something that represented that operation (an IOHandle?) and the callback was invoked when the operation was complete. But who creates and manages the IOHandle is unclear to me (is it the caller or the FileSystem?).

I agree. An example, or a test with a mock FileSystem implementation, would make it clearer.

The IOHandle would be created by the FileSystem to store whatever context it needs to store. Its lifecycle is managed by RocksDB, hence its defined as a type erased unique_ptr. IOHandle could have been a void*, but managing the lifecycle would have been complicated. Would the FileSystem delete it after calling the callback, or when RocksDB calls Poll?

I am not sure why an IOHandle is different than any other unique_ptr we return (like a FSRandomAccessFile). It feels very similar to me and should have functions off of it (for Poll, Complete, etc)

The lifetime of the returned Slice is also unclear (who owns/manages the "scratch" data backing the Slice?).

The scratch data and returned Slice lifetime works exactly like Read, i.e RocksDB reader owns scratch and Slice refers to either scratch or an mmap'd file.

As the current API is proposed, I do not think this is clean. The "scratch" is part of the Request but not part of the Response. So the Async is responsible for maintaining the scratch pointer. It is unclear what its lifetime will be. If there are multiple concurrent Async requests, then how is the caller supposed to manage their scratch buffers? There is no correlation between the request, the callback and the scratch buffer.

@akankshamahajan15
Copy link
Contributor Author

Let me add more details about what's the purpose of this PR, future plan where we will be using these APIs and test units.

@anand1976
Copy link
Contributor

I am not sure why an IOHandle is different than any other unique_ptr we return (like a FSRandomAccessFile). It feels very similar to me and should have functions off of it (for Poll, Complete, etc)

I actually think Poll should be a FileSystem method and should allow the caller to wait for multiple IO completions - IOStatus Poll(std::vector<IOHandle*>& handles, size_t min_completions). It would be inefficient to wait for one IO at a time.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. Would be good to have a test that gives an example of how the FS should implement ReadAsync.

env/file_system_tracer.cc Outdated Show resolved Hide resolved
include/rocksdb/file_system.h Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The test needs some more clarity. I've left some comments inline.

include/rocksdb/file_system.h Outdated Show resolved Hide resolved
env/env_test.cc Outdated
// 2. Read file.
{
std::unique_ptr<FSRandomAccessFile> file;
ASSERT_OK(Env::Default()->GetFileSystem()->NewRandomAccessFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping we could have a mock file system (derived from FileSystemWrapper) and a mock random access file (derived from FSRandomAccessFileWrapper). It can be very simple, just override NewRandomAccessFile and ReadAsync respectively.

env/env_test.cc Outdated
req.len = kSectorSize;
req.scratch = NewAligned(kSectorSize * 8, 0).get();
ASSERT_OK(file->ReadAsync(&req, opts, callback, cb_arg,
nullptr /*io_handle*/, nullptr /*dbg*/));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think io_handle should always be non-null. Since we don't know, we should assume that the FS implementation would want to allocate something to store in io_handle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also confused here. I thought the ReadAsync operation was responsible for returning an IOHandle that could later be checked. If not, how would one ever Poll for operations -- how is an IOHandle associated with an outstanding Async request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am updating the test to implement MockFS. That would clear up few things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOHandle will be used by FileSystem to distinguish between different read requests submit at their level. I haven't implemented PosixFileSystem support yet that will for example store some context when it has different read requests in any order.

env/env_test.cc Outdated Show resolved Hide resolved
env/env_test.cc Outdated
req.len = kSectorSize;
req.scratch = NewAligned(kSectorSize * 8, 0).get();
ASSERT_OK(file->ReadAsync(&req, opts, callback, cb_arg,
nullptr /*io_handle*/, nullptr /*dbg*/));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also confused here. I thought the ReadAsync operation was responsible for returning an IOHandle that could later be checked. If not, how would one ever Poll for operations -- how is an IOHandle associated with an outstanding Async request?

include/rocksdb/file_system.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@akankshamahajan15
Copy link
Contributor Author

In Progress: Update test to have a mock file system and submit requests to each thread.

env/env_test.cc Outdated
TEST_F(TestAsyncRead, ReadAsync) {
std::shared_ptr<AsyncReadFS> fs =
std::make_shared<AsyncReadFS>(env_->GetFileSystem());
std::unique_ptr<Env> env(new CompositeEnvWrapper(env_, fs));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The env is not necessary. You can call fs directly.

env/env_test.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: This PR adds support for new APIs Async Read that reads the data
asynchronously and Poll API that checks if requested read request has
completed or not.

Usage: In RocksDB, we are currently plannig to prefetch data
asynchronously during sequential scanning and RocksDB will call these
APIs to prefetch more data in advanced.

Design:
- ReadAsync API submits the read request to underlying FileSystem in
order to read data asynchronously. When read request is completed,
callback function will be called. cb_arg is used by RocksDB to track the
original request submit and IOHandle is used by FileSystem to keep track
of IO requests at their level.

- The Poll API  is added in FileSystem because the call could end up handling
completions for multiple different files which is not specific to a
FSRandomAccessFile instance. There could be multiple outstanding file reads
from different files in future and they can complete in any order.

Test Plan: make check -j64
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pingyu pushed a commit to pingyu/rocksdb that referenced this pull request Oct 22, 2022
Signed-off-by: pingyu <yuping@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants