Support memory functions for copying between peer devices #225

eyalroz · 2020-08-20T21:10:00Z

There are specific functions - at least in the driver API - for copying between peers, including copying of arrays:

CUresult cuMemcpy3DPeer ( const CUDA_MEMCPY3D_PEER* pCopy );
CUresult cuMemcpy3DPeerAsync ( const CUDA_MEMCPY3D_PEER* pCopy, CUstream hStream );
CUresult cuMemcpyPeer ( CUdeviceptr dstDevice, CUcontext dstContext, CUdeviceptr srcDevice, CUcontext srcContext, size_t ByteCount );
CUresult cuMemcpyPeerAsync ( CUdeviceptr dstDevice, CUcontext dstContext, CUdeviceptr srcDevice, CUcontext srcContext, size_t ByteCount, CUstream hStream );

Let's support them. These have existed since at least CUDA 7... probably earlier.

The text was updated successfully, but these errors were encountered:

eyalroz · 2021-12-29T23:17:48Z

So, there are really just two functions and their asynchronous variants.

Well, it seems I've already added the non-array version of this the driver-wrappers branch, but under cuda::memory::peer_to_peer, which is actually wrong, since it's not necessarily device-to-device. We should put this under inter_context.

We may want a structure with a context and a region as a parameter here, e.g. something like:

namespace cuda { namespace memory {
struct contextualized_region_t {
    const context_t& context;
    region_t region;
};
} // namespace memory
} // namespace cuda

and then we could say: cuda::memory::inter_context::copy(my_dest_ctxed_region, my_src_ctxed_region);

We could also perhaps have context_t::copy_to_peer(memory::contextualized_region_t dest, memory::region_t src), for the synchronous version) and stream_t::enqueue_t::intercontext_copy(memory::contextualized_region_t dest, memory::region_t src) for the asynchronous.

eyalroz · 2021-12-30T23:00:37Z

Hmm... it looks like the CUDA_MEMCPY3D structure is actually a "type-erased" version of the similar inter-context structure, CUDA_MEMCPY3D_PEER. And what this means is... that maybe array copying, or at least 3D array copying, should be inter-context to begin with.

... and this leads me to think, that maybe we should just make all memory regions contextualized to begin with, so that cuda::memory::copy will be able to tell which kind of copying we need. Or is that a step too far?

eyalroz · 2021-12-31T19:55:36Z

Ok, fixed on the driver-wrappers branch.

eyalroz · 2022-01-01T10:44:36Z

... and now I notice that when UVA is available, the "Peer" calls are useless.

which is always for us, since we rely on UVA for copying to begin with. So, we don't even need this with the non-driver branch.

eyalroz added the enhancement label Aug 20, 2020

eyalroz self-assigned this Aug 20, 2020

eyalroz added task and removed enhancement labels Dec 29, 2021

eyalroz added this to the Full CUDA 7 support milestone Dec 29, 2021

eyalroz closed this as completed Jan 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support memory functions for copying between peer devices #225

Support memory functions for copying between peer devices #225

eyalroz commented Aug 20, 2020 •

edited

eyalroz commented Dec 29, 2021 •

edited

eyalroz commented Dec 30, 2021 •

edited

eyalroz commented Dec 31, 2021

eyalroz commented Jan 1, 2022

Support memory functions for copying between peer devices #225

Support memory functions for copying between peer devices #225

Comments

eyalroz commented Aug 20, 2020 • edited

eyalroz commented Dec 29, 2021 • edited

eyalroz commented Dec 30, 2021 • edited

eyalroz commented Dec 31, 2021

eyalroz commented Jan 1, 2022

eyalroz commented Aug 20, 2020 •

edited

eyalroz commented Dec 29, 2021 •

edited

eyalroz commented Dec 30, 2021 •

edited