Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support memory functions for copying between peer devices #225

Closed
eyalroz opened this issue Aug 20, 2020 · 4 comments
Closed

Support memory functions for copying between peer devices #225

eyalroz opened this issue Aug 20, 2020 · 4 comments
Assignees
Labels

Comments

@eyalroz
Copy link
Owner

eyalroz commented Aug 20, 2020

There are specific functions - at least in the driver API - for copying between peers, including copying of arrays:

CUresult cuMemcpy3DPeer ( const CUDA_MEMCPY3D_PEER* pCopy );
CUresult cuMemcpy3DPeerAsync ( const CUDA_MEMCPY3D_PEER* pCopy, CUstream hStream );
CUresult cuMemcpyPeer ( CUdeviceptr dstDevice, CUcontext dstContext, CUdeviceptr srcDevice, CUcontext srcContext, size_t ByteCount );
CUresult cuMemcpyPeerAsync ( CUdeviceptr dstDevice, CUcontext dstContext, CUdeviceptr srcDevice, CUcontext srcContext, size_t ByteCount, CUstream hStream );

Let's support them. These have existed since at least CUDA 7... probably earlier.

@eyalroz eyalroz self-assigned this Aug 20, 2020
@eyalroz
Copy link
Owner Author

eyalroz commented Dec 29, 2021

So, there are really just two functions and their asynchronous variants.

Well, it seems I've already added the non-array version of this the driver-wrappers branch, but under cuda::memory::peer_to_peer, which is actually wrong, since it's not necessarily device-to-device. We should put this under inter_context.

We may want a structure with a context and a region as a parameter here, e.g. something like:

namespace cuda { namespace memory {
struct contextualized_region_t {
    const context_t& context;
    region_t region;
};
} // namespace memory
} // namespace cuda

and then we could say: cuda::memory::inter_context::copy(my_dest_ctxed_region, my_src_ctxed_region);

We could also perhaps have context_t::copy_to_peer(memory::contextualized_region_t dest, memory::region_t src), for the synchronous version) and stream_t::enqueue_t::intercontext_copy(memory::contextualized_region_t dest, memory::region_t src) for the asynchronous.

@eyalroz eyalroz added task and removed enhancement labels Dec 29, 2021
@eyalroz eyalroz added this to the Full CUDA 7 support milestone Dec 29, 2021
@eyalroz
Copy link
Owner Author

eyalroz commented Dec 30, 2021

Hmm... it looks like the CUDA_MEMCPY3D structure is actually a "type-erased" version of the similar inter-context structure, CUDA_MEMCPY3D_PEER. And what this means is... that maybe array copying, or at least 3D array copying, should be inter-context to begin with.

... and this leads me to think, that maybe we should just make all memory regions contextualized to begin with, so that cuda::memory::copy will be able to tell which kind of copying we need. Or is that a step too far?

@eyalroz
Copy link
Owner Author

eyalroz commented Dec 31, 2021

Ok, fixed on the driver-wrappers branch.

@eyalroz
Copy link
Owner Author

eyalroz commented Jan 1, 2022

... and now I notice that when UVA is available, the "Peer" calls are useless.

which is always for us, since we rely on UVA for copying to begin with. So, we don't even need this with the non-driver branch.

@eyalroz eyalroz closed this as completed Jan 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant