-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fabric: ABI 1.1 updates - CQ err_data, MMU_NOTIFY #2729
Conversation
The current CQ/EQ document that the err_data referenced by error events points to a provider owned data buffer. This results in serialization issues to the application. Allow the application to provide a data buffer for error data, including the size of the input buffer. Add a domain attribute to report the maximum size of error data that a provider may need. Fixes ofiwg#2720 Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Add a new call, fi_mr_refresh, that must be invoked by the application whenever there is a change to the pages backing a registered memory region. This completes the definition and behavior of what it means when the provider sets FI_MMU_NOTIFY. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Added separate commit for defining how apps must deal with FI_MR_MMU_NOTIFY. @jswaro |
MRs associated with RMA events may require that the app fully configure the MR prior to its use. This handles the case where the provider cannot dynamically assign a counter to a MR while it is actively being used. Such MRs are considered disabled on creation. Once configured they must explicitly be enabled (similar to enabling an EP). Signed-off-by: Sean Hefty <sean.hefty@intel.com>
|
||
The fi_mr_refresh call is only needed if the physical pages might have | ||
been updated after the memory region was created. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, here is a question. Is FI_MR_MMU_NOTIFY exclusive w.r.t. FI_MR_PROV_KEY? I'm asking because a fi_mr_refresh might cause a new registration key to be created by a provider.
Based on my own interpretation of this, if FI_MR_PROV_KEY is returned by the provider, and FI_MR_MMU_NOTIFY is also given -- then after a successful fi_mr_refresh, applications should exchange the new keys because the fi_mr_key() and fi_mr_desc() values may have changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is that only the pages are updated. They key to the region remains unchanged. That is, refresh does not behave like close/register. A possible future 'modify' type operation would more closely align with that.
@sayantansur - Added commit to address your concerns about MRs dynamically binding to counters. |
👍 thank you, looks good to me. |
Moving forward with these proposals. Until we release, we can still correct things if there's an issue. |
The current CQ/EQ document that the err_data referenced by
error events points to a provider owned data buffer. This
results in serialization issues to the application. Allow
the application to provide a data buffer for error data,
including the size of the input buffer. Add a domain
attribute to report the maximum size of error data that a
provider may need.
Fixes #2720
Signed-off-by: Sean Hefty sean.hefty@intel.com