Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ bindings to the Batch API #220

Merged
merged 12 commits into from
Jun 1, 2023
Merged

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented May 16, 2023

An example use of the Batch API (requires CUDA v12.1+):

    // Here we use the batch API to read "/tmp/test-file" into `b_dev` by
    // submitting 4 batch operations.
    constexpr int num_ops_in_batch = 4;
    constexpr int batchsize        = SIZE / num_ops_in_batch;
    kvikio::DriverProperties props;
    check(num_ops_in_batch < props.get_max_batch_io_size());

    // We open the file as usual.
    kvikio::FileHandle f("/tmp/test-file", "r");

    // Then we create a batch
    auto batch = kvikio::BatchHandle(num_ops_in_batch);

    // And submit 4 operations each with its own offset
    std::vector<kvikio::BatchOp> ops;
    for (int i = 0; i < num_ops_in_batch; ++i) {
      ops.push_back(kvikio::BatchOp{.file_handle   = f,
                                    .devPtr_base   = b_dev,
                                    .file_offset   = i * batchsize,
                                    .devPtr_offset = i * batchsize,
                                    .size          = batchsize,
                                    .opcode        = CUFILE_READ});
    }
    batch.submit(ops);

    // Finally, we wait on all 4 operations to be finished and check the result
    auto statuses = batch.status(num_ops_in_batch, num_ops_in_batch);
    check(statuses.size() == num_ops_in_batch);
    size_t total_read = 0;
    for (auto status : statuses) {
      check(status.status == CUFILE_COMPLETE);
      check(status.ret == batchsize);
      total_read += status.ret;
    }
    check(cudaMemcpy(b, b_dev, SIZE, cudaMemcpyDeviceToHost) == cudaSuccess);
    for (int i = 0; i < NELEM; ++i) {
      check(a[i] == b[i]);
    }
    cout << "Batch read using 4 operations: " << total_read << endl;

@madsbk madsbk added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels May 16, 2023
@madsbk madsbk force-pushed the cufile_batch branch 4 times, most recently from 0aea697 to 0026f80 Compare May 17, 2023 07:38
@KiranModukuri
Copy link

An example use of the Batch API (requires CUDA v12.1+):

  
    // Here we use the batch API to read "/tmp/test-file" into `b_dev` by
    // submitting 4 batch operations.
    constexpr int num_of_batches = 4;

may be it will be clear to say this is "num_ops_in_batch".

@madsbk
Copy link
Member Author

madsbk commented May 22, 2023

An example use of the Batch API (requires CUDA v12.1+):

  
    // Here we use the batch API to read "/tmp/test-file" into `b_dev` by
    // submitting 4 batch operations.
    constexpr int num_of_batches = 4;

may be it will be clear to say this is "num_ops_in_batch".

Good point, fixed

@madsbk madsbk marked this pull request as ready for review May 22, 2023 15:18
@madsbk madsbk self-assigned this May 24, 2023
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few places where we can do error checking I think.

Also, I don't like the name-mangled lookup, is there not a better way?!

Comment on lines +64 to +71
file(READ "${cuFile_INCLUDE_DIRS}/cufile.h" CUFILE_H_STR)
string(FIND "${CUFILE_H_STR}" "cuFileBatchIOSetUp" cuFileBatchIOSetUp_location)
if(cuFileBatchIOSetUp_location EQUAL "-1")
set(cuFile_BATCH_API_FOUND FALSE)
else()
set(cuFile_BATCH_API_FOUND TRUE)
endif()
message(STATUS "Found cuFile's Batch API: ${cuFile_BATCH_API_FOUND}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right thing to do is to attempt to compile and link a call to this function, surely this is something that cmake supports? To ensure that both the header file and the library actually offer the function.

This is fine, I guess, but there must be a canonical way.

Copy link
Member Author

@madsbk madsbk Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using check_symbol_exists() with no luck. Got irritated and just rolled my own :)

This doesn't work for some reason:

  include(CheckSymbolExists)
  check_symbol_exists(
    cuFileBatchIOSetUp
    "${cuFile_INCLUDE_DIRS}/cufile.h"
    cuFile_BATCH_API_FOUND
  )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_symbol_exists will try to compile and link a small executable that verifies that symbol exists. For this to work properly for libcufile you would also need to tell check_symbol_exists all the extra include directories, and link requirements.

The examples in the docs are easy since they are testing for values found in headers provided by the compiler/runtime and therefore no extra information is required.

Using CMake's try_compile would be easier in comparison but would still require setting up a code snippet and specifying the include directory for the libcufile and cuda rutime headers

Comment on lines 128 to 130
CUFILE_TRY(
cuFileAPI::instance().BatchIOSubmit(_handle, io_batch_params.size(), &io_batch_params[0], 0));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will produce a generic error if io_batch_params.size() is larger than the requested max batch size. Does it make more sense to check if operations.size() > _max_num_events and provide a useful message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

cpp/include/kvikio/batch.hpp Outdated Show resolved Hide resolved
// both the batch and async API.
try {
void* s{};
get_symbol(s, lib, "_ZTS23cuFileBatchContextState");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check the CTK version somehow? This is unstable wrt compilers (since name mangling is not standardised).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not AFAIK :/
I think this will suffice for now. When CUDA v12.2 is released, I think we should make the Batch API require v12.2 and then use cuFileReadAsync to determine the availability of both the batch and async API.

Comment on lines 39 to 56
typedef enum CUfileOpcode { CUFILE_READ = 0, CUFILE_WRITE } CUfileOpcode_t;

typedef enum CUFILEStatus_enum {
CUFILE_WAITING = 0x000001, /* required value prior to submission */
CUFILE_PENDING = 0x000002, /* once enqueued */
CUFILE_INVALID = 0x000004, /* request was ill-formed or could not be enqueued */
CUFILE_CANCELED = 0x000008, /* request successfully canceled */
CUFILE_COMPLETE = 0x0000010, /* request successfully completed */
CUFILE_TIMEOUT = 0x0000020, /* request timed out */
CUFILE_FAILED = 0x0000040 /* unable to complete */
} CUfileStatus_t;

typedef struct CUfileIOEvents {
void* cookie;
CUfileStatus_t status; /* status of the operation */
size_t ret; /* -ve error or amount of I/O done. */
} CUfileIOEvents_t;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just copied from cufile.h I guess. Do you endeavour to be ABI compatible with the cufile definitions or is this just "anything, as long as it has the appopriate fields/names"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is "anything, as long as it has the appopriate fields/names". Added a comment.

@madsbk
Copy link
Member Author

madsbk commented Jun 1, 2023

Thanks @wence-, I think I have addressed all of your issues?

@madsbk madsbk requested a review from wence- June 1, 2023 08:04
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @madsbk

@madsbk
Copy link
Member Author

madsbk commented Jun 1, 2023

Awesome, thanks for the review @wence-. Just in time for the code freeze :)

@madsbk
Copy link
Member Author

madsbk commented Jun 1, 2023

/merge

@rapids-bot rapids-bot bot merged commit b28298f into rapidsai:branch-23.06 Jun 1, 2023
26 of 31 checks passed
@madsbk madsbk deleted the cufile_batch branch June 1, 2023 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants