-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/WIREUP: Introduce connection-based lanes intersection #9881
Conversation
src/ucp/wireup/wireup_ep.c
Outdated
uct_ep_h ucp_wireup_ep_get_tl_ep(uct_ep_h uct_ep) | ||
{ | ||
ucp_wireup_ep_t *wireup_ep = ucp_wireup_ep(uct_ep); | ||
return (wireup_ep == NULL) ? uct_ep : wireup_ep->super.uct_ep; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's used in wireup.c only, maybe define it there as static func?
src/ucp/core/ucp_ep.c
Outdated
return UCP_NULL_LANE; | ||
} | ||
|
||
/* Connect to matching lane in case it is not connected yet */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe
/* Connect to matching lane in case it is not connected yet */ | |
/* Return matching lane in case it is not connected yet */ |
src/ucp/wireup/wireup.c
Outdated
ucp_wireup_fill_is_connected_params(uct_ep_is_connected_params_t *params, | ||
const ucp_address_entry_t *addr_entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor:
i'd make the out parameter to be last (for consistency with other places)
} else { | ||
ucs_assert(addr_index != UINT_MAX); | ||
ae = &remote_address->address_list[addr_index]; | ||
dst_rsc_index = ae->iface_attr.dst_rsc_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems we do not need ucp_address_iface_attr->dst_rsc_index
field anymore, can remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok but still need to keep placeholder for UCP_ADDRESS_PACK_FLAG_TL_RSC_IDX
in packed address (for wire compat).
@shasson5 as we discussed, please add a bit more details on the motivation in the PR description. |
test/gtest/ucp/test_ucp_wireup.cc
Outdated
|
||
for (rsc_index = 0; rsc_index < context->num_tls; ++rsc_index) { | ||
if ((context->tl_rscs[rsc_index].md_index == ae->md_index) && | ||
(context->tl_rscs[rsc_index].tl_name_csum == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this enough? shouldn't we check is_connected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pack/unpack test, not related to EP reconf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO md and tl_name check is not enough. should also check is_same_device
test/gtest/ucp/test_ucp_wireup.cc
Outdated
|
||
for (rsc_index = 0; rsc_index < context->num_tls; ++rsc_index) { | ||
if ((context->tl_rscs[rsc_index].md_index == ae->md_index) && | ||
(context->tl_rscs[rsc_index].tl_name_csum == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO md and tl_name check is not enough. should also check is_same_device
test/gtest/ucp/test_ucp_wireup.cc
Outdated
@@ -1769,14 +1769,14 @@ class test_ucp_address_v2 : public test_ucp_wireup { | |||
ucp_rsc_index_t rsc_index; | |||
|
|||
for (rsc_index = 0; rsc_index < context->num_tls; ++rsc_index) { | |||
if ((context->tl_rscs[rsc_index].md_index == ae->md_index) && | |||
if ((context->tl_rscs[rsc_index].dev_index == ae->dev_index) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dev_iondex in in a different space in context and unpacked address, so cannot compare them
need to use uct_iface_is_reachable_v2 with SAME_DEVICE scope
src/ucp/wireup/wireup.c
Outdated
|
||
if (addr_entry->num_ep_addrs == 0) { | ||
/* Verify this lane is connecting to iface */ | ||
ucs_assertv(!ucp_ep_is_lane_p2p(ep, lane), "ep=%p lane=%u", ep, lane); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- uct_rc_ep_is_connected should check UCT_RC_EP_FLAG_CONNECTED and return 0 if the flag is not present
- here,
if (!ucp_ep_is_lane_p2p(ep, lane)) {
/* Check if the lane is connected to the remote iface */
ucs_assertv(addr_entry->num_ep_addrs == 0, "num_ep_addres=%d",
addr_entry->num_ep_addrs);
return uct_ep_is_connected(ucp_wireup_get_tl_ep(uct_ep), ¶ms);
}
/* Compare resources by device and transport */ | ||
if ((context->tl_rscs[rsc_index].tl_name_csum == | ||
ae->tl_name_csum) && | ||
uct_iface_is_reachable_v2(wiface->iface, ¶ms)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXPECT_EQ(ae->md_index, context->tl_rscs[rsc_index].md_index)
/azp run |
Azure Pipelines successfully started running 3 pipeline(s). |
squashed |
What
A new method for lanes intersection is introduced here, using
ep_is_connected
UCT API.Why ?
To remove dependency on
dst_rsc_index
when performing lane intersection.dst_rsc_index
is only supported in CM flow, so in order to support EP reconfiguration in non-CM flow (next PRs), we need to replace it with connection-based method.Tests are covered by sockaddr gtests, as it is the only affected module currently.
This PR is the first part of the EP reconfiguration feature.