New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update providers to report source addresses #2618
Comments
See ofiwg#2618. Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
We're looking at how to implement this functionality in the GNI provider and have hit a problem. Namely the GNI address can't fit into the 64 bits of the |
There are a couple ways to handle this, but this is a general work-able flow:
|
I don't see how this would work reliably for |
I'm not sure I understand your concern. If you're concerned about an app reading the CQ from multiple threads, the man pages state:
"The err_data field, if set, will reference an internal buffer owned by the provider. The contents of the buffer will remain valid until a subsequent read call against the CQ."
So, this is something that the app would need to handle, by serializing access to the CQ when handling errors.
|
On the call today Sean mentioned BGQ needs to sign off on this, however I don't think this applies to us since we've implemented FI_DIRECTED_RECV and we are sending the source address in the packet header and not looking it up in the address vector |
The issue is how to handle reporting the source address (FI_SOURCE) in a completion when the address is not in the AV.
|
See ofiwg#2618. Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
What Paul is saying is that the bgq receive protocol does not reference/access the AV at all. The source address is transferred with the packet metadata from origin to target. If we need to error-check that source address then that would introduce an array lookup and likely a cache miss. On bgq a user would have to try and explicitly mess this up (mismatched source address and/or missing address vector element) and this will not happen in practice. |
For the original issue, I have no objection to a provider just clicking 'done'. If this means that we document that the provider requires that all addresses be inserted into the AV up front, that's fine. I agree that we don't want to force a lookup just to make this check. We can discuss if the app should opt into this using some other mechanism that just FI_SOURCE + API version. |
Based on ofiwg pie day discussions, further changes will be proposed. |
After the call today just wanted to further explain -- right now we have just 1 user/app on bgq which is mpich - for all ranks the provider is computing this av in the same way and storing it locally at mpi_init, subsequently any communication initiated by mpich say for send-recv in FI_DIRECTED_RECV mode the src_addr fi_addr_t struct will have contents that mpich explicitly got from the bgq provider so the scenario for a bogus source address shouldn't exist --- and we certainly want an option to NOT have to check for this to avoid the overhead |
I don't understand this comment, I think I missed something. I thought this issue only applied to Does the |
No, i guess i thought this was for fi_cq_read also. Anyhow, bgq cq's contain fi_bgq_context's which also have fi_addr_t src addrs, and these src addrs are guaranteed to be valid and already in the av and this error condition cannot occur and I just don't want to have any overhead from this mechanism to report unknown source data since that can't happen on bgq. |
Providers that will support ABI 1.5 need to be updated to support 461306f.
The text was updated successfully, but these errors were encountered: