New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iface_attr.cap.flags value on ib/ud #2817
Comments
Thanks for bringing up this issue. FYI for code contribution we require sign contributor agreement which can be found here http://www.openucx.org/license/ You can email it to me. |
this is intentional, UD supports both connection modes |
@yosefe if user gets confused about this we have to document this explicitly. The flags are not mutually exclusive and some interfaces may support both. @keisukefukuda what kinda error it causes for you application ? |
I wrote code like this: https://github.com/keisukefukuda/flucx/blob/master/include/flucx/communicator/communicator.hpp#L82 (Although it was more than a week ago and I don't remember well...) After the finding, I tired changing the order of the Although my code is my own C++ wrapper of uct and it's a bit long, it is easy to check the issue by just changing The original
Good. Then change the code at https://github.com/openucx/ucx/blob/master/test/examples/uct_hello_world.c#L550 .
(just change the order of
If |
@shamisp BTW, about CLA, I didn't find your email address. Do you mind telling me your address via twitter DM (twitter/@keisukefukuda) or [same account name]@gmail.com to me? |
so the code of https://github.com/openucx/ucx/blob/master/test/examples/uct_hello_world.c#L164 is not entirely correct, UCS_ERR_NO_RESOURCE is actually a valid error code which means the caller has to do some progress and then retry the operation. I guess uct_ep_connect_to_ep() just avoids this situation in the first place. |
@keisukefukuda it is pasharesearch at gmail |
I checked the documentation and we do not state anywhere that UCT_IFACE_FLAG_CONNECT_TO_EP and UCT_IFACE_FLAG_CONNECT_TO_IFACE are mutually exclusive. As @yosefe mentioned we should fix the example |
Thanks for the replies, but I'm afraid I'm still confused on this. Now I understand that Then, however, I believe that So the problems are: (1) The modified code still doesn't work with zcopy. After (2) The original problem happens for bcopy as well. The same error So I'm afraid #2820 is not a solution yet. Keisuke |
@keisukefukuda thanks for details I'll continue to work with #2820 to have all related fixes in single PR. |
@keisukefukuda please let me know if your problem was resolved |
I'm afraid the problem is not solved. My understanding is that I would like to know if the code works for you guys in your environment. If it's just an environment-spepcific issue, I will investigate on my own (or just avoid the issue) in my code. The actual diff from the master original is
|
@keisukefukuda please check if your repo is up to date against master. I have checked your patch applied to master and it works fine on my setup with all AM APIs (short/bcopy/zcopy) and UD transport. |
Oops, my local master was not updated. my apologies. |
I'm creating this issue instead of #2816 .
I think
UCT_IFACE_FLAG_CONNECT_TO_EP
andUCT_IFACE_FLAG_CONNECT_TO_IFACE
ofiface_attr.cap.flags
are mutually exclusive.However, in my environment, which has Mellanox Infiniband FDR HCA, both of the flags of
ib/ud/mlx4_0:1
are ON as shown below.Here's a program to demonstrate the problem, which is a shorter version of
test/examples/uct_hello_world.c
.https://gist.github.com/keisukefukuda/5ab5b36d63cdb8a5acdd7428cd380f2a#file-uct_hello_world-c-L467
It checks the value of
if_info.attr.cap.flags
and prints ifUCT_IFACE_FLAG_CONNECT_TO_EP
andUCT_IFACE_FLAG_CONNECT_TO_IFACE
are ON.In my environment,
UCX version is
Thanks.
The text was updated successfully, but these errors were encountered: