Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/WIREUP: Introduce connection-based lanes intersection #9881

Merged
merged 1 commit into from
May 28, 2024

Conversation

shasson5
Copy link
Contributor

@shasson5 shasson5 commented May 16, 2024

What

A new method for lanes intersection is introduced here, using ep_is_connected UCT API.

Why ?

To remove dependency on dst_rsc_index when performing lane intersection.
dst_rsc_index is only supported in CM flow, so in order to support EP reconfiguration in non-CM flow (next PRs), we need to replace it with connection-based method.
Tests are covered by sockaddr gtests, as it is the only affected module currently.
This PR is the first part of the EP reconfiguration feature.

@shasson5 shasson5 added the WIP-DNM Work in progress / Do not review label May 16, 2024
Comment on lines 678 to 682
uct_ep_h ucp_wireup_ep_get_tl_ep(uct_ep_h uct_ep)
{
ucp_wireup_ep_t *wireup_ep = ucp_wireup_ep(uct_ep);
return (wireup_ep == NULL) ? uct_ep : wireup_ep->super.uct_ep;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's used in wireup.c only, maybe define it there as static func?

return UCP_NULL_LANE;
}

/* Connect to matching lane in case it is not connected yet */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

Suggested change
/* Connect to matching lane in case it is not connected yet */
/* Return matching lane in case it is not connected yet */

Comment on lines 1717 to 1718
ucp_wireup_fill_is_connected_params(uct_ep_is_connected_params_t *params,
const ucp_address_entry_t *addr_entry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor:
i'd make the out parameter to be last (for consistency with other places)

src/ucp/wireup/wireup.c Outdated Show resolved Hide resolved
} else {
ucs_assert(addr_index != UINT_MAX);
ae = &remote_address->address_list[addr_index];
dst_rsc_index = ae->iface_attr.dst_rsc_index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we do not need ucp_address_iface_attr->dst_rsc_index field anymore, can remove it

Copy link
Contributor Author

@shasson5 shasson5 May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but still need to keep placeholder for UCP_ADDRESS_PACK_FLAG_TL_RSC_IDX in packed address (for wire compat).

@gleon99
Copy link
Contributor

gleon99 commented May 19, 2024

@shasson5 as we discussed, please add a bit more details on the motivation in the PR description.

@shasson5 shasson5 requested review from brminich and gleon99 May 20, 2024 10:27
@shasson5 shasson5 removed the WIP-DNM Work in progress / Do not review label May 20, 2024
@shasson5 shasson5 requested a review from yosefe May 20, 2024 10:27
src/ucp/core/ucp_ep.c Outdated Show resolved Hide resolved
src/ucp/core/ucp_ep.c Outdated Show resolved Hide resolved
src/ucp/wireup/wireup.c Outdated Show resolved Hide resolved
src/ucp/wireup/wireup.c Outdated Show resolved Hide resolved
src/ucp/wireup/wireup.c Outdated Show resolved Hide resolved
src/ucp/wireup/wireup.c Outdated Show resolved Hide resolved

for (rsc_index = 0; rsc_index < context->num_tls; ++rsc_index) {
if ((context->tl_rscs[rsc_index].md_index == ae->md_index) &&
(context->tl_rscs[rsc_index].tl_name_csum ==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this enough? shouldn't we check is_connected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pack/unpack test, not related to EP reconf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO md and tl_name check is not enough. should also check is_same_device

@shasson5 shasson5 requested a review from yosefe May 20, 2024 12:59
brminich
brminich previously approved these changes May 21, 2024
src/ucp/core/ucp_ep.c Outdated Show resolved Hide resolved
src/ucp/wireup/wireup.c Outdated Show resolved Hide resolved
src/ucp/core/ucp_ep.c Show resolved Hide resolved
src/ucp/wireup/wireup.c Show resolved Hide resolved
src/ucp/wireup/wireup.c Outdated Show resolved Hide resolved
src/ucp/wireup/wireup.c Show resolved Hide resolved

for (rsc_index = 0; rsc_index < context->num_tls; ++rsc_index) {
if ((context->tl_rscs[rsc_index].md_index == ae->md_index) &&
(context->tl_rscs[rsc_index].tl_name_csum ==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO md and tl_name check is not enough. should also check is_same_device

test/gtest/ucp/test_ucp_wireup.cc Outdated Show resolved Hide resolved
test/gtest/ucp/test_ucp_wireup.cc Outdated Show resolved Hide resolved
@@ -1769,14 +1769,14 @@ class test_ucp_address_v2 : public test_ucp_wireup {
ucp_rsc_index_t rsc_index;

for (rsc_index = 0; rsc_index < context->num_tls; ++rsc_index) {
if ((context->tl_rscs[rsc_index].md_index == ae->md_index) &&
if ((context->tl_rscs[rsc_index].dev_index == ae->dev_index) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev_iondex in in a different space in context and unpacked address, so cannot compare them
need to use uct_iface_is_reachable_v2 with SAME_DEVICE scope


if (addr_entry->num_ep_addrs == 0) {
/* Verify this lane is connecting to iface */
ucs_assertv(!ucp_ep_is_lane_p2p(ep, lane), "ep=%p lane=%u", ep, lane);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. uct_rc_ep_is_connected should check UCT_RC_EP_FLAG_CONNECTED and return 0 if the flag is not present
  2. here,
if (!ucp_ep_is_lane_p2p(ep, lane)) {
    /* Check if the lane is connected to the remote iface */
    ucs_assertv(addr_entry->num_ep_addrs == 0, "num_ep_addres=%d", 
                addr_entry->num_ep_addrs);
    return uct_ep_is_connected(ucp_wireup_get_tl_ep(uct_ep), &params);
}

@shasson5 shasson5 requested a review from yosefe May 22, 2024 15:44
src/uct/ib/rc/base/rc_ep.c Outdated Show resolved Hide resolved
src/uct/ib/rc/base/rc_ep.c Outdated Show resolved Hide resolved
/* Compare resources by device and transport */
if ((context->tl_rscs[rsc_index].tl_name_csum ==
ae->tl_name_csum) &&
uct_iface_is_reachable_v2(wiface->iface, &params)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXPECT_EQ(ae->md_index, context->tl_rscs[rsc_index].md_index)

@shasson5 shasson5 requested a review from yosefe May 23, 2024 10:20
@shasson5
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@shasson5
Copy link
Contributor Author

squashed

@yosefe yosefe enabled auto-merge May 27, 2024 10:09
@yosefe yosefe merged commit a63c53f into openucx:master May 28, 2024
140 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants