UCP/PROTO: Add option to force ZCOPY#11289
Conversation
| ucs_memory_type_t local_mem_type, remote_mem_type; | ||
|
|
||
| if (!ep->worker->context->config.ext.rma_force_zcopy || | ||
| (req->send.rma.rkey == NULL)) { |
There was a problem hiding this comment.
actually, it's no needed as rma should always have rkey
|
|
||
| /* This protocol should not be selected for valid and connected endpoint */ | ||
| if (ep->flags & UCP_EP_FLAG_REMOTE_CONNECTED) { | ||
| if (ucp_proto_reconfig_report_rma_force_zcopy_no_proto(req, ep)) { |
There was a problem hiding this comment.
should we also abort request?
There was a problem hiding this comment.
it is aborted with canceled status inside this function, you prefer to take the abort call out of this function?
| test_ucp_rma_force_zcopy() | ||
| { | ||
| modify_config("RMA_FORCE_ZCOPY", "y"); | ||
| modify_config("IB_TX_INLINE_RESP", "0", SETENV_IF_NOT_EXIST); |
There was a problem hiding this comment.
@brminich, with that configuration, cap.get.min_zcopy=1, and proto selection multi rail makes the smallest size proportionate to the number of rails, so 2 with this IB device (max rail is set to 2 in this test), so for sizes < number of rails, we would need emulation currently.
There was a problem hiding this comment.
I started #11369 to fix it, it should contain all details.
| "lane without waiting for remote completion.", | ||
| ucs_offsetof(ucp_context_config_t, rndv_put_force_flush), UCS_CONFIG_TYPE_BOOL}, | ||
|
|
||
| {"RMA_FORCE_ZCOPY", "n", |
There was a problem hiding this comment.
maybe we can consider the opposite option: ENABLE_SIMULATION=n?
So that we are not bound to zcopy
There was a problem hiding this comment.
then PROTO_EMULATION_ENABLE, default y?
|
Can one of the admins verify this patch? |
|
this commit needs #11369 when emulated protocols are not allowed. |
UCP/PROTO: Add option to force ZCOPY (#11289)
What?
Add
UCX_RMA_FORCE_ZCOPY=y, only allow zero-copy RMA protocols when set, and print explicit error message in case missing such capability.Why?
Users need to be able to quickly identify when they are not using the optimal zero-copy path.
How?
Disable emulated protocols and print error message on connected protocol reconfig selection.