-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL/UCP: bcast opt #738
TL/UCP: bcast opt #738
Conversation
f730476
to
06a8c8e
Compare
@@ -146,6 +145,7 @@ void ucc_tl_ucp_scatter_knomial_progress(ucc_coll_task_t *coll_task) | |||
peer_seg_count * dt_size, mem_type, INV_VRANK(peer, | |||
(ucc_rank_t)args->root, size), team, task), task, out); | |||
} | |||
/*TODO: local_seg_index is always zero since rank that sends is base root? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this TODO still a needed cooment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i think this statement is true and we can clean code a bit
@@ -165,8 +165,8 @@ void ucc_tl_ucp_scatter_knomial_progress(ucc_coll_task_t *coll_task) | |||
&offset, &local_seg_count); | |||
if (offset != 0) { | |||
status = ucc_mc_memcpy(PTR_OFFSET(args->dst.info.buffer, offset), | |||
PTR_OFFSET(rbuf, task->scatter_kn.send_offset), | |||
local_seg_count * dt_size, mem_type, mem_type); | |||
rbuf, task->scatter_kn.recv_size, mem_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why copying from rbuf without send_offset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to copy everything that we receive so far to have less number of communications in allgather step after it
06a8c8e
to
129943b
Compare
* TL/UCP: bcast sag opt * REVIEW: fix review comments
* TL/UCP: bcast sag opt * REVIEW: fix review comments
What
Pull request contains multiple optimizations for TL/UCP broadcast
How ?
2.Wait for recv completion only to continue to the next iteration