Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition peer to peer support #8610

Closed
wzamazon opened this issue Mar 3, 2023 · 9 comments
Closed

Definition peer to peer support #8610

wzamazon opened this issue Mar 3, 2023 · 9 comments

Comments

@wzamazon
Copy link
Contributor

wzamazon commented Mar 3, 2023

This came from a discussion #8529

Background is that application like NCCL need a way to specify libfabric endpoint cannot make calls to CUDA API to support CUDA memory.

@shefty suggested to use the FI_OPT_HMEM_P2P_REQUIRED, which currently states as the following:

FI_HMEM_P2P_REQUIRED: Peer to peer support must be used for transfers, transfers that cannot be performed using p2p will be reported as failing.

From https://ofiwg.github.io/libfabric/main/man/fi_endpoint.3.html

However, to use this option for the purpose I described, we need a definition of "peer to peer support", which is lacking in the fi_endpoint document. So I opened this issue to ask whether libfabric community to agree on a definition for "peer to peer" support.

One thing I want to mention is that NCCL does allow libfabric to use GDRcopy, see this comment from @jdinan. EFA provider does use GDRcopy when used by NCCL, and found it to be efficient for small messages.

I understand that other providers, like RxM, also want to use GDRcopy to support NCCL.

Therefore, I think it would be ideal if we can define "peer to peer support" in a way that mechanisms like GDRcopy is counted as "peer to peer" support.

@shefty
Copy link
Member

shefty commented Mar 3, 2023

Peer to peer is meant to describe PCI peer to peer transfers, or device to device transfers that do not require bouncing data through host buffers. This could also apply to other device buses, not just PCI.

@wzamazon
Copy link
Contributor Author

wzamazon commented Mar 6, 2023

I see.

I think for the case of NCCL, HMEM_P2P_REQUIRED is too strong. Basically, it need a way to know whether the provider is capable of P2P, not necessarily that all transfer must be through peer 2 peer.

I am reading the man page for FI_HMEM_P2P_ENABLED. It did not specify what provider should do if it does not support Peer 2 peer.

Would it be reasonable for a provider to return -FI_EOPNOSUPP, if user set FI_HMEM_P2P_ENABLED and the provider is incapable of peer 2 peer support?

@shefty
Copy link
Member

shefty commented Mar 6, 2023

Maybe the question is whether HMEM_P2P_REQUIRED is useful? Or is it only useful if it also allows gdrcopy?

Does gdrcopy behave the same as if p2p were used?

@wzamazon
Copy link
Contributor Author

wzamazon commented Mar 6, 2023

Maybe the question is whether HMEM_P2P_REQUIRED is useful? Or is it only useful if it also allows gdrcopy?

I think P2P_REQUIRED is still useful, if we define P2P support as NIC access HMEM memory directly.

I can think of at least 1 case that NCCL does not want libfabric to only use NIC to access HMEM memory, (do NOT use gdrcopy), which is when NCCL uses its LL128 protocol.

Does gdrcopy behave the same as if p2p were used?

I do not think so. gdrcopy basically map GPU memory to host's memory address space. then do a memcpy, so it is driven by CPU.

@shefty
Copy link
Member

shefty commented Mar 6, 2023

So, it sounds like we need some other option that can be used to query/restrict the type of operations that a provider can undertake. Maybe this is a new HMEM option, or some sort of XPU option. Right now there's no way to convey that P2P is okay, but if you can't use P2P, then only this 'other' mechanism is usable.

That's hard to define generically, however. Maybe it's something like P2P_OR_CPU_ONLY?

@shefty
Copy link
Member

shefty commented Mar 7, 2023

From ofiwg call: Keep current FI_HMEM_P2P options restrictive in the definition. May need CUDA specific option. NCCL restricts the use of any CUDA call from any lower layer. Proposal: FI_CUDA_API_ENABLED/ALLOWED/DISABLED/PERMITTED ? Boolean option is sufficient.

@wzamazon
Copy link
Contributor Author

wzamazon commented Mar 8, 2023

#8624 introduced FI_CUDA_API_PERMITTED

@shefty
Copy link
Member

shefty commented Jun 5, 2023

Has this issue been resolved with the introduction of FI_CUDA_API_PERMITTED?

@wzamazon
Copy link
Contributor Author

wzamazon commented Jun 5, 2023

Yes

@wzamazon wzamazon closed this as completed Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants