Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/UCP: bcast opt #738

Merged
merged 2 commits into from
Mar 15, 2023
Merged

Conversation

Sergei-Lebedev
Copy link
Contributor

What

Pull request contains multiple optimizations for TL/UCP broadcast

How ?

  • Use knomial bcast for team size 2 for all message sizes. It doesn't make sense to use SAG bcast for 2 ranks since we can directly send whole message to single rank
  • Knomial bcast
  1. Wait for recv completion only to continue to the next iteration
  • SAG bcast
  1. Don’t send message to base root rank on each iteration in knomial allgather
    2.Wait for recv completion only to continue to the next iteration

src/components/tl/ucp/allgather/allgather_knomial.c Outdated Show resolved Hide resolved
src/components/tl/ucp/allgather/allgather_knomial.c Outdated Show resolved Hide resolved
@@ -146,6 +145,7 @@ void ucc_tl_ucp_scatter_knomial_progress(ucc_coll_task_t *coll_task)
peer_seg_count * dt_size, mem_type, INV_VRANK(peer,
(ucc_rank_t)args->root, size), team, task), task, out);
}
/*TODO: local_seg_index is always zero since rank that sends is base root? */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this TODO still a needed cooment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think this statement is true and we can clean code a bit

src/components/tl/ucp/scatter/scatter_knomial.c Outdated Show resolved Hide resolved
@@ -165,8 +165,8 @@ void ucc_tl_ucp_scatter_knomial_progress(ucc_coll_task_t *coll_task)
&offset, &local_seg_count);
if (offset != 0) {
status = ucc_mc_memcpy(PTR_OFFSET(args->dst.info.buffer, offset),
PTR_OFFSET(rbuf, task->scatter_kn.send_offset),
local_seg_count * dt_size, mem_type, mem_type);
rbuf, task->scatter_kn.recv_size, mem_type,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why copying from rbuf without send_offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to copy everything that we receive so far to have less number of communications in allgather step after it

@Sergei-Lebedev Sergei-Lebedev merged commit 45a0b49 into openucx:master Mar 15, 2023
@Sergei-Lebedev Sergei-Lebedev deleted the topic/bcast_sag_opt branch March 15, 2023 17:30
jeniaka pushed a commit to jeniaka/ucc that referenced this pull request Jun 22, 2023
* TL/UCP: bcast sag opt

* REVIEW: fix review comments
janjust pushed a commit to janjust/ucc that referenced this pull request Jan 31, 2024
* TL/UCP: bcast sag opt

* REVIEW: fix review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants