Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fabtest benchmarks with device memory #6185

Closed
wants to merge 8 commits into from

Conversation

iziemba
Copy link
Contributor

@iziemba iziemba commented Aug 11, 2020

The following patch set enables device memory to be used for the benchmark tests. I have verified that fi_msg_bw, fi_msg_pingpong, fi_rdm_cntr_pingpong, fi_rdm_pingpong, fi_rdm_tagged_bw, fi_rdm_tagged_pingpong, and fi_rma_bw work as expected. I have not verified fi_dgram_pingpong works due to not having a supported provider.

@iziemba iziemba requested a review from shefty August 11, 2020 15:38
fabtests/common/shared.c Outdated Show resolved Hide resolved
fabtests/common/shared.c Show resolved Hide resolved
fabtests/common/shared.c Show resolved Hide resolved
fabtests/common/shared.c Outdated Show resolved Hide resolved
@shefty
Copy link
Member

shefty commented Sep 1, 2020

Please check CI failures

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
This will allow for additional fabtests to define HMEM usage.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
If the user has requested HMEM support, allocate a HMEM bounce buffer.
This bounce buffer is used for data verification.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
If FT_OPT_ENABLE_HMEM is set, all message and tagged injects are done
with fi_sendmsg and fi_tsendmsg. The sendmsg variants are required for
device buffers since they allow users to provided the FI_HMEM_IFACE for
the buffer.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
If FT_OPT_ENABLE_HMEM is set, ft_post_rma_inject() will use fi_writemsg
to implement the inject operation. The fi_writemsg variant is required
for device buffers since it allow users to provided the FI_HMEM_IFACE
for the buffer.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
If the user has specified a destination address, use this to get the
source address to be stored in the fi_info hints structure. This
resolves a bug when HMEM and OOB communication is used with the verbs
provider. This change is needed to support FI_HMEM with fi_msg_bw,
fi_msg_pingpong, and fi_rma_bw.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
void ft_hmem_usage(void)
{
FT_PRINT_OPTS_USAGE("-D <device_iface>", "Specify device interface: eg ze, cuda, or rocr (default: None). "
"Automatically enables FI_HMEM (-H)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is moving, please check line length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

mr = NULL;
}
err_free_bounce_buf:
if (bounce_buf) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check isn't needed. free() can accept null parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove cleanup.


return 0;

err_free_tx_ctx:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most or all of the cleanup from here down isn't needed. There's a bunch of initialization that needs to happen before a test can run, a generic cleanup is done by all tests, even in the case of failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove.

free(tx_ctx_arr);
tx_ctx_arr = NULL;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes above aren't needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove.

ft_close_fids();

if (bounce_buf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check isn't needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove.


msg_index = ((iter++)*INTEG_SEED) % integ_alphabet_length;
recv_data = (char *)buf;
recv_data = ft_check_opts(FT_OPT_ENABLE_HMEM) ?
bounce_buf : (char *) buf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge setting recv_data with the if statement above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

FT_POST(fi_sendmsg, ft_progress, txcq, tx_seq,
&tx_cq_cntr, "fi_sendmsg", ep, &msg,
FI_INJECT);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same behavior as the current code. fi_sendmsg(... FI_INJECT) is not the same as fi_inject().

The existing code must be usable as-is, even if FI_HMEM is enabled. If it is not, then the provider has broken the API contract and will not work with existing applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though fi_sendmsg(... FI_INJECT) and fi_inject() are not the same, this still works since tx_cq_cntr will not be incremented for fi_sendmsg(... FI_INJECT).

Would you rather have a ft_sendmsg_inject() be defined and have the benchmark apps call it if HMEM is enabled instead of ft_inject()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - I'm saying that there should be no reason for this change. If the existing code does not work with a provider, then it's the provider that is broken, not the test.

The fi_inject() call is usable even if the provider has set FI_MR_LOCAL. There's no indication that setting FI_MR_HMEM changes anything, or that fi_inject is restricted to only host memory.

It may be that the API needs to be updated to address this. But for now, I believe if a provider cannot support fi_inject() on device memory, then it needs to set the inject_size to 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I follow the conversation enough to agree with @shefty on this. FI_HMEM as a capability shouldn't influence code changes beyond what is documented in the man pages.

It may be that the API needs to be updated to address this. But for now, I believe if a provider cannot support fi_inject() on device memory, then it needs to set the inject_size to 0.

I cannot think of a case where a provider couldn't support the inject case. However, if it couldn't -- then having the provider set the inject_size to 0 falls in line with other guidance that has been suggested in the past. I don't like it from a performance perspective, but it's not as if I see another alternative at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jswaro - I believe the issue is that the provider does not know where the memory resides and needs to discover it. fi_inject() does not provide any mechanism to convey that information. This comes across as an area that we need further definition around fi_inject() and FI_HMEM, possibly considering some extension, or at least guidance to apps to replace the use of fi_inject() when FI_MR_HMEM is specified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FI_HMEM already requires specialized, out of tree kernel support by both the GPU and NIC for any of this to work. Do the GPU drivers not already support this behavior? I thought they did.

A util provider can still handle this. It just needs to copy into a host buffer and pass that to the underlying provider's inject call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A util provider can still handle this. It just needs to copy into a host buffer and pass that to the underlying provider's inject call.

Sure, that could work.

FI_HMEM already requires specialized, out of tree kernel support by both the GPU and NIC for any of this to work. Do the GPU drivers not already support this behavior?

Well, GPUs like Nvidia's require nv_peer_mem, that much is true. They do require kernel support provide the bridge between the device and the NIC. What I was referring to was support of memory movement between device and host transparently and facilitated by newer kernel features such as HMM. Without HMM or features like HMM, applications would still need to make device specific calls to copy memory.

But from user space, you can't call memcpy on the buffer without something like HMM. You need to call the CudaMemcpy -- or equivalent -- in order to transfer memory from the device to the host's bounce buffer in order to use inject. The other option as described above would be to use a utility provider to do the detection and use a bounce buffer in the utility provider to continue to use the core provider's fi_inject function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j-xiong @aingerson - Can you confirm that using memcpy on a GPU buffer requires HMM, or can the driver mmap support handle this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GPU buffer pointer itself is not an mmap-ed address. To use it directly in memcpy() requires HMM. If mmap() is called explicitly to created a different address for the buffer, then that can be used in memcpy. Notice that the driver may migrate the buffer when handling the mapping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning to the original problem, for this patch series, please revert the above changes and use the existing fi_inject calls as-is.

default:
FT_ERR("Unknown RMA inject op type\n");
return EXIT_FAILURE;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment for previous patch. This change cannot be required and should be reverted.

@@ -1701,7 +1701,7 @@ int ft_read_addr_opts(char **node, char **service, struct fi_info *hints,
{
int ret;

if (opts->dst_addr && (opts->src_addr || !opts->oob_port)){
if (opts->dst_addr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly sure why, but this change is the reason AWS CI EFA test failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional checks can result in the code now taking the else path even when we have a dst_addr. That sets FI_SOURCE and changes the behavior of fi_getinfo. Something is off about the logic here.

@shefty
Copy link
Member

shefty commented Apr 2, 2021

No activity for months. If still needed, please update and open new PR.

@shefty shefty closed this Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants