Support fabtest benchmarks with device memory #6185

iziemba · 2020-08-11T15:38:50Z

The following patch set enables device memory to be used for the benchmark tests. I have verified that fi_msg_bw, fi_msg_pingpong, fi_rdm_cntr_pingpong, fi_rdm_pingpong, fi_rdm_tagged_bw, fi_rdm_tagged_pingpong, and fi_rma_bw work as expected. I have not verified fi_dgram_pingpong works due to not having a supported provider.

fabtests/common/shared.c

shefty · 2020-09-01T00:10:21Z

Please check CI failures

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

This will allow for additional fabtests to define HMEM usage. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

If the user has requested HMEM support, allocate a HMEM bounce buffer. This bounce buffer is used for data verification. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

If FT_OPT_ENABLE_HMEM is set, all message and tagged injects are done with fi_sendmsg and fi_tsendmsg. The sendmsg variants are required for device buffers since they allow users to provided the FI_HMEM_IFACE for the buffer. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

If FT_OPT_ENABLE_HMEM is set, ft_post_rma_inject() will use fi_writemsg to implement the inject operation. The fi_writemsg variant is required for device buffers since it allow users to provided the FI_HMEM_IFACE for the buffer. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

If the user has specified a destination address, use this to get the source address to be stored in the fi_info hints structure. This resolves a bug when HMEM and OOB communication is used with the verbs provider. This change is needed to support FI_HMEM with fi_msg_bw, fi_msg_pingpong, and fi_rma_bw. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

shefty · 2020-09-02T17:50:31Z

fabtests/common/shared.c

+void ft_hmem_usage(void)
+{
+	FT_PRINT_OPTS_USAGE("-D <device_iface>", "Specify device interface: eg ze, cuda, or rocr (default: None). "
+			     "Automatically enables FI_HMEM (-H)");


Since this is moving, please check line length

shefty · 2020-09-02T17:52:05Z

fabtests/common/shared.c

+		mr = NULL;
+	}
+err_free_bounce_buf:
+	if (bounce_buf) {


check isn't needed. free() can accept null parameter

Will remove cleanup.

shefty · 2020-09-02T17:53:49Z

fabtests/common/shared.c


 	return 0;
+
+err_free_tx_ctx:


Most or all of the cleanup from here down isn't needed. There's a bunch of initialization that needs to happen before a test can run, a generic cleanup is done by all tests, even in the case of failures.

Will remove.

shefty · 2020-09-02T17:54:47Z

fabtests/common/shared.c

+		free(tx_ctx_arr);
+		tx_ctx_arr = NULL;
+	}
+


The changes above aren't needed

Will remove.

shefty · 2020-09-02T17:54:57Z

fabtests/common/shared.c

 	ft_close_fids();

+	if (bounce_buf)


check isn't needed

Will remove.

shefty · 2020-09-02T18:33:54Z

fabtests/common/shared.c


 	msg_index = ((iter++)*INTEG_SEED) % integ_alphabet_length;
-	recv_data = (char *)buf;
+	recv_data = ft_check_opts(FT_OPT_ENABLE_HMEM) ?
+		bounce_buf : (char *) buf;


We can merge setting recv_data with the if statement above

shefty · 2020-09-02T18:54:54Z

fabtests/common/shared.c

+			FT_POST(fi_sendmsg, ft_progress, txcq, tx_seq,
+				&tx_cq_cntr, "fi_sendmsg", ep, &msg,
+				FI_INJECT);
+		}


This is not the same behavior as the current code. fi_sendmsg(... FI_INJECT) is not the same as fi_inject().

The existing code must be usable as-is, even if FI_HMEM is enabled. If it is not, then the provider has broken the API contract and will not work with existing applications.

Even though fi_sendmsg(... FI_INJECT) and fi_inject() are not the same, this still works since tx_cq_cntr will not be incremented for fi_sendmsg(... FI_INJECT).

Would you rather have a ft_sendmsg_inject() be defined and have the benchmark apps call it if HMEM is enabled instead of ft_inject()?

No - I'm saying that there should be no reason for this change. If the existing code does not work with a provider, then it's the provider that is broken, not the test.

The fi_inject() call is usable even if the provider has set FI_MR_LOCAL. There's no indication that setting FI_MR_HMEM changes anything, or that fi_inject is restricted to only host memory.

It may be that the API needs to be updated to address this. But for now, I believe if a provider cannot support fi_inject() on device memory, then it needs to set the inject_size to 0.

I think I follow the conversation enough to agree with @shefty on this. FI_HMEM as a capability shouldn't influence code changes beyond what is documented in the man pages.

It may be that the API needs to be updated to address this. But for now, I believe if a provider cannot support fi_inject() on device memory, then it needs to set the inject_size to 0.

I cannot think of a case where a provider couldn't support the inject case. However, if it couldn't -- then having the provider set the inject_size to 0 falls in line with other guidance that has been suggested in the past. I don't like it from a performance perspective, but it's not as if I see another alternative at the moment.

@jswaro - I believe the issue is that the provider does not know where the memory resides and needs to discover it. fi_inject() does not provide any mechanism to convey that information. This comes across as an area that we need further definition around fi_inject() and FI_HMEM, possibly considering some extension, or at least guidance to apps to replace the use of fi_inject() when FI_MR_HMEM is specified.

FI_HMEM already requires specialized, out of tree kernel support by both the GPU and NIC for any of this to work. Do the GPU drivers not already support this behavior? I thought they did.

A util provider can still handle this. It just needs to copy into a host buffer and pass that to the underlying provider's inject call.

A util provider can still handle this. It just needs to copy into a host buffer and pass that to the underlying provider's inject call.

Sure, that could work.

FI_HMEM already requires specialized, out of tree kernel support by both the GPU and NIC for any of this to work. Do the GPU drivers not already support this behavior?

Well, GPUs like Nvidia's require nv_peer_mem, that much is true. They do require kernel support provide the bridge between the device and the NIC. What I was referring to was support of memory movement between device and host transparently and facilitated by newer kernel features such as HMM. Without HMM or features like HMM, applications would still need to make device specific calls to copy memory.

But from user space, you can't call memcpy on the buffer without something like HMM. You need to call the CudaMemcpy -- or equivalent -- in order to transfer memory from the device to the host's bounce buffer in order to use inject. The other option as described above would be to use a utility provider to do the detection and use a bounce buffer in the utility provider to continue to use the core provider's fi_inject function.

@j-xiong @aingerson - Can you confirm that using memcpy on a GPU buffer requires HMM, or can the driver mmap support handle this?

The GPU buffer pointer itself is not an mmap-ed address. To use it directly in memcpy() requires HMM. If mmap() is called explicitly to created a different address for the buffer, then that can be used in memcpy. Notice that the driver may migrate the buffer when handling the mapping.

Returning to the original problem, for this patch series, please revert the above changes and use the existing fi_inject calls as-is.

shefty · 2020-09-02T18:55:40Z

fabtests/common/shared.c

+		default:
+			FT_ERR("Unknown RMA inject op type\n");
+			return EXIT_FAILURE;
+		}


See comment for previous patch. This change cannot be required and should be reverted.

wzamazon · 2020-09-24T00:06:00Z

fabtests/common/shared.c

@@ -1701,7 +1701,7 @@ int ft_read_addr_opts(char **node, char **service, struct fi_info *hints,
 {
 	int ret;

-	if (opts->dst_addr && (opts->src_addr || !opts->oob_port)){
+	if (opts->dst_addr) {


Not exactly sure why, but this change is the reason AWS CI EFA test failed.

The additional checks can result in the code now taking the else path even when we have a dst_addr. That sets FI_SOURCE and changes the behavior of fi_getinfo. Something is off about the logic here.

shefty · 2021-04-02T23:29:31Z

No activity for months. If still needed, please update and open new PR.

iziemba requested a review from shefty August 11, 2020 15:38

shefty reviewed Aug 17, 2020

View reviewed changes

fabtests/common/shared.c Outdated Show resolved Hide resolved

fabtests/common/shared.c Show resolved Hide resolved

fabtests/common/shared.c Show resolved Hide resolved

fabtests/common/shared.c Outdated Show resolved Hide resolved

iziemba force-pushed the fabtests_hmem_support branch from 56084f2 to c3087cf Compare August 31, 2020 17:10

iziemba requested a review from shefty August 31, 2020 17:11

iziemba added 8 commits September 2, 2020 07:35

fabtests: Enable using of ROCR and CUDA interfaces

3d36004

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

fabtests: Move HMEM arg usage to common function

9578183

This will allow for additional fabtests to define HMEM usage. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

fabtests: Add in HMEM data verification support

344b89c

If the user has requested HMEM support, allocate a HMEM bounce buffer. This bounce buffer is used for data verification. Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

fabtests: Support key exchange using device memory

b551335

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

fabtests: Add HMEM support to benchmark tests

499e554

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>

iziemba force-pushed the fabtests_hmem_support branch from c3087cf to 499e554 Compare September 2, 2020 13:34

shefty requested changes Sep 2, 2020

View reviewed changes

wzamazon reviewed Sep 24, 2020

View reviewed changes

shefty mentioned this pull request Oct 2, 2020

FI_INJECT support for device memory #5861

Closed

shefty closed this Apr 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support fabtest benchmarks with device memory #6185

Support fabtest benchmarks with device memory #6185

iziemba commented Aug 11, 2020

shefty commented Sep 1, 2020

shefty Sep 2, 2020

iziemba Sep 2, 2020

shefty Sep 2, 2020

iziemba Sep 2, 2020

shefty Sep 2, 2020

iziemba Sep 2, 2020

shefty Sep 2, 2020

iziemba Sep 2, 2020

shefty Sep 2, 2020

iziemba Sep 2, 2020

shefty Sep 2, 2020

iziemba Sep 2, 2020

shefty Sep 2, 2020

iziemba Sep 2, 2020

shefty Sep 2, 2020

jswaro Sep 2, 2020

shefty Sep 2, 2020

shefty Sep 10, 2020

jswaro Sep 10, 2020

shefty Sep 10, 2020

j-xiong Sep 10, 2020

shefty Sep 10, 2020

shefty Sep 2, 2020

wzamazon Sep 24, 2020

shefty Sep 24, 2020

shefty commented Apr 2, 2021


		return 0;

		err_free_tx_ctx:

Support fabtest benchmarks with device memory #6185

Support fabtest benchmarks with device memory #6185

Conversation

iziemba commented Aug 11, 2020

shefty commented Sep 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shefty commented Apr 2, 2021