Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCG needs address lookup callbacks #18

Closed
alex--m opened this issue Oct 15, 2020 · 1 comment
Closed

UCG needs address lookup callbacks #18

alex--m opened this issue Oct 15, 2020 · 1 comment
Milestone

Comments

@alex--m
Copy link
Contributor

alex--m commented Oct 15, 2020

For reference, UCG parameters require the following:

typedef struct ucg_params {

    /* Callback functions for address lookup, used at connection establishment */
    struct {
        int (*lookup_f)(void *cb_group_context,
                        ucg_group_member_index_t index,
                        ucp_address_t **addr,
                        size_t *addr_len);
        void (*release_f)(ucp_address_t *addr);
    } address;

Currently, the OMPI-based implementation satisfies this requirement as follows:

int mca_coll_ucx_resolve_address(void *cb_group_obj,
                                 ucg_group_member_index_t rank,
                                 ucp_address_t **addr,
                                 size_t *addr_len)
{
    /* Sanity checks */
    ompi_communicator_t* comm = (ompi_communicator_t*)cb_group_obj;
    if (rank == (ucg_group_member_index_t)comm->c_my_rank) {
        COLL_UCX_ERROR("mca_coll_ucx_resolve_address(rank=%lu)"
                       "shouldn't be called on its own rank (loopback)", rank);
        return 1;
    }

    /* Check the cache for a previously established connection to that rank */
    ompi_proc_t *proc_peer =
          (struct ompi_proc_t*)ompi_comm_peer_lookup((ompi_communicator_t*)cb_group_obj, rank);
    *addr = proc_peer->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_COLL];
    *addr_len = 0; /* UCX doesn't need the length to unpack the address */
    if (*addr) {
       return 0;
    }

    /* Obtain the UCP address of the remote */
    int ret = mca_coll_ucx_recv_worker_address(proc_peer, addr, addr_len);
    if (ret < 0) {
        COLL_UCX_ERROR("mca_coll_ucx_recv_worker_address(proc=%d rank=%lu) failed",
                       proc_peer->super.proc_name.vpid, rank);
        return 1;
    }

    /* Cache the connection for future invocations with this rank */
    proc_peer->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_COLL] = *addr;
    return 0;
}

void mca_coll_ucx_release_address(ucp_address_t *addr)
{
    /* no need to free - the address is stored in proc_peer->proc_endpoints */
}
@alex--m alex--m added this to the UCG support milestone Oct 15, 2020
@alex--m alex--m closed this as completed Oct 15, 2020
@alex--m
Copy link
Contributor Author

alex--m commented Oct 15, 2020

Currently using ucc_team_p2p_conn, should satisfy this requirement.

artemry-nv pushed a commit to artemry-nv/ucc that referenced this issue Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant