Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Plane - Initial P2P and RDMA Get #112

Merged
88 commits merged into from
Jul 26, 2022

Conversation

ryanolson
Copy link
Contributor

@ryanolson ryanolson commented Jul 4, 2022

This PR is the first stage of bring back the data plane to the SRF runtime.

This adds the following to the data_plane namespace:

  • Callbacks - a struct of static methods used to handle UCX callback on locally initiated UCX transactions, e.g. issuing a tagged send or and RDMA GET.
  • Request - a struct which holds the state of an async transaction. This object holds a bit more data than just a promise/future pair. I figure the API will have two ways to kick off an async transaction, one that takes a ref to a Request and another that return a Request. The latter requires a heap allocation, so the former could be used as a subtle optimization for structured concurrency.
  • DataPlaneServerWorker which is the Runnable that drives the UCX worker's progress method which ultimately executes the UCX callbacks. More functionality will be added to this component over time, specifically using ucp_nb_probe to match any incoming events who's payloads were larger than the pre-posted buffers.

The remaining work in this PR is moving the ucx tests into the internal tests binary and re-enables the RDMA get test.

This is not a complete implementation of the UCX Data Plane. #144 was created to address the WIP state.

@ryanolson ryanolson added non-breaking Non-breaking change feature request New feature or request labels Jul 4, 2022
@ryanolson ryanolson added this to the Multi-Node Support milestone Jul 4, 2022
@ryanolson ryanolson self-assigned this Jul 4, 2022
@ryanolson ryanolson requested review from a team as code owners July 4, 2022 17:43
Copy link
Contributor

@drobison00 drobison00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions, no real issues; will approve once comments are resolved.

src/internal/data_plane/callbacks.cpp Show resolved Hide resolved
src/internal/data_plane/callbacks.cpp Show resolved Hide resolved
src/internal/data_plane/callbacks.cpp Show resolved Hide resolved
src/internal/data_plane/server.cpp Show resolved Hide resolved
src/internal/system/partition_provider.hpp Show resolved Hide resolved
src/internal/ucx/resources.cpp Show resolved Hide resolved
src/internal/data_plane/server.cpp Outdated Show resolved Hide resolved
src/internal/data_plane/server.cpp Show resolved Hide resolved
src/internal/data_plane/server.cpp Show resolved Hide resolved
src/internal/data_plane/server.cpp Show resolved Hide resolved
@ryanolson
Copy link
Contributor Author

@gpucibot merge

@ghost ghost merged commit d1a3569 into nv-morpheus:branch-22.08 Jul 26, 2022
ghost pushed a commit that referenced this pull request Jul 29, 2022
Enables internal members to query the runtime resources from a `static thread_local` member of the `resources::Manager`

This builds on #112

This diff will highlight what's in this PR:
pull-request/112...pull-request/113

Appended to this PR is a repo wide update to correct all cpp_checks that are now failing.

Authors:
  - Ryan Olson (https://github.com/ryanolson)
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Devin Robison (https://github.com/drobison00)

URL: #113
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants