Skip to content

UCP/PROTO: Add option to force ZCOPY#11289

Merged
brminich merged 5 commits intoopenucx:masterfrom
tvegas1:ucp_force_zcopy
May 4, 2026
Merged

UCP/PROTO: Add option to force ZCOPY#11289
brminich merged 5 commits intoopenucx:masterfrom
tvegas1:ucp_force_zcopy

Conversation

@tvegas1
Copy link
Copy Markdown
Contributor

@tvegas1 tvegas1 commented Mar 24, 2026

What?

Add UCX_RMA_FORCE_ZCOPY=y, only allow zero-copy RMA protocols when set, and print explicit error message in case missing such capability.

Why?

Users need to be able to quickly identify when they are not using the optimal zero-copy path.

How?

Disable emulated protocols and print error message on connected protocol reconfig selection.

@tvegas1 tvegas1 requested a review from brminich March 25, 2026 14:45
Comment thread src/ucp/proto/proto_reconfig.c Outdated
ucs_memory_type_t local_mem_type, remote_mem_type;

if (!ep->worker->context->config.ext.rma_force_zcopy ||
(req->send.rma.rkey == NULL)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this check needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it's no needed as rma should always have rkey

Comment thread src/ucp/proto/proto_reconfig.c Outdated

/* This protocol should not be selected for valid and connected endpoint */
if (ep->flags & UCP_EP_FLAG_REMOTE_CONNECTED) {
if (ucp_proto_reconfig_report_rma_force_zcopy_no_proto(req, ep)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also abort request?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is aborted with canceled status inside this function, you prefer to take the abort call out of this function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it outside

Comment thread test/gtest/ucp/test_ucp_rma.cc
test_ucp_rma_force_zcopy()
{
modify_config("RMA_FORCE_ZCOPY", "y");
modify_config("IB_TX_INLINE_RESP", "0", SETENV_IF_NOT_EXIST);
Copy link
Copy Markdown
Contributor Author

@tvegas1 tvegas1 Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brminich, with that configuration, cap.get.min_zcopy=1, and proto selection multi rail makes the smallest size proportionate to the number of rails, so 2 with this IB device (max rail is set to 2 in this test), so for sizes < number of rails, we would need emulation currently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you pls elaborate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started #11369 to fix it, it should contain all details.

Comment thread src/ucp/core/ucp_context.c Outdated
"lane without waiting for remote completion.",
ucs_offsetof(ucp_context_config_t, rndv_put_force_flush), UCS_CONFIG_TYPE_BOOL},

{"RMA_FORCE_ZCOPY", "n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can consider the opposite option: ENABLE_SIMULATION=n?
So that we are not bound to zcopy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me
@tvegas1 wdyt?

Copy link
Copy Markdown
Contributor Author

@tvegas1 tvegas1 Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then PROTO_EMULATION_ENABLE, default y?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@svc-nixl
Copy link
Copy Markdown

svc-nixl commented May 4, 2026

Can one of the admins verify this patch?

@tvegas1
Copy link
Copy Markdown
Contributor Author

tvegas1 commented May 4, 2026

this commit needs #11369 when emulated protocols are not allowed.

@brminich brminich merged commit fd3bc3a into openucx:master May 4, 2026
152 checks passed
brminich pushed a commit that referenced this pull request May 4, 2026
UCP/PROTO: Add option to force ZCOPY (#11289)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants