-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hmem,efa: remove gdrcopy from cuda hmem copy path and make gdrcopy calls explicit #8836
Conversation
With the new hmem_data MR field, device can only be device id. Therefore it is no longer necessary to validate explicitly. With this change, gdrcopy should be requested by the caller separately. Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
bot:aws:retest |
prov/efa/src/rdm/rxr_pkt_type_base.c
Outdated
if (rxe->bytes_copied + data_size == rxe->total_len) { | ||
ofi_copy_to_hmem_iov(desc->peer.iface, desc->peer.device.reserved, | ||
rxe->iov, rxe->iov_count, | ||
data_offset + ep->msg_prefix_size, | ||
data, data_size); | ||
ofi_gdrcopy_to_cuda_iov((uint64_t)desc->peer.hmem_data, | ||
rxe->iov, rxe->iov_count, | ||
data_offset + ep->msg_prefix_size, | ||
data, data_size); | ||
rxr_pkt_handle_data_copied(ep, pkt_entry, data_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there were 5 tabs followed by a few spaces, now you have 3 tabs followed by lots of spaces. Should re-tabbify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked in my editor - I find it more reliable to achieve "github correct" indent with explicit spaces in the case of multi-line function calls, for my configuration.
Other places, though rendering correctly on github, show up crooked in my editor.
I'm unlikely alone since I see similar indent format choices.
d6c3125
to
cfcf77f
Compare
With the introduction of hmem_data MR field, the gdrcopy handle should be stored along side device id, and gdrcopy should be requested explicitly by the caller. This patch replaces calls to ofi_copy_from/to_hmem* functions with ofi_gdrcopy_from/to_cuda* equivalent when a gdrcopy handle is present. Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
bot:aws:retest |
1 similar comment
bot:aws:retest |
@shefty Could you kindly check the CI failure? Is it related to this change? |
There were a bunch of socket failures, so I re-ran the CI. The latest failure is from our friend:
|
tcp failure
This should not be related though since we are not touching tcp path... |
@shefty I am a bit worried if there is a bug in the rdm_tagged_peek test. If we can have a consistent reproducer I can take a look at it. |
That test must have an issue somewhere, either in the test or its use of common code |
I will try if I can reproduce with tcp, efa has not hit this issue so far AFAIK |
Since we believe the |
Yes, I am looking at #8844 now |
With the introduction of hmem_data MR field, the gdrcopy handle should be
stored along side device id, and gdrcopy should be requested explicitly
by the caller.
This patch:
ofi_gdrcopy_from/to_cuda* equivalent when a gdrcopy handle is present.