-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/PROTOV1: Remove redundant fieds from request structure #9864
UCP/PROTOV1: Remove redundant fieds from request structure #9864
Conversation
src/ucp/rndv/rndv.c
Outdated
@@ -529,6 +531,7 @@ ucp_rndv_progress_rma_zcopy_common(ucp_request_t *req, ucp_lane_index_t lane, | |||
ucp_ep_h ep = req->send.ep; | |||
uct_ep_h uct_ep = ucp_ep_get_lane(ep, lane); | |||
ucp_ep_config_t *config = ucp_ep_config(ep); | |||
size_t lanes_count = ucs_popcount(req->send.rndv.zcopy.lanes_map_all); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
popcount is not converted to CPU instruction when not passing --enable-optimizations flag
so we should avoid it in fast path
maybe we can keep it in the request, it's not so big anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
union {
struct {
/* Actual lanes map */
ucp_lane_map_t lanes_map_all;
/* Actual lanes count */
uint8_t lanes_count;
} zcopy;
struct {
/* Data start offset of this request */
size_t offset;
} rtr;
};
That is the part of ucp_request
where lanes_count
is stored. If size of ucp_lane_map_t
is less than 8, everything is OK since lanes_count
+ lanes_map_all
fields took no more than offset
, so union has 8 bytes size.
But if ucp_lane_map_t
has 8 bytes, lanes_count
became the 9th byte, so the union size became 16 bytes .
The only possible option that I see is to move that 1-byte field to another place inside ucp_request
sturcture, but all other fields inside struct rndv {...}
have 8-byte size, so on any place it would create 7 byte padding.
Is that so significant to have that value prestored in that field instead of popcount
on that place? I mean that RMA flow which is often used for big messages and that's only for protov1, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, though maybe worth converting ucs_assert_always(lanes_count > 0);
to ucs_assert(ucs_popcount(lanes_count) > 0);
and remove the local var lanes_count, so it will be calculated only when needed in release mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
5dcc938
to
666bcae
Compare
src/ucp/rndv/rndv.c
Outdated
@@ -1844,7 +1845,7 @@ UCS_PROFILE_FUNC(ucs_status_t, ucp_rndv_progress_rma_put_zcopy, (self), | |||
ucp_request_t *sreq = ucs_container_of(self, ucp_request_t, send.uct); | |||
uct_rkey_t uct_rkey; | |||
|
|||
ucs_assert_always(sreq->send.rndv.zcopy.lanes_count > 0); | |||
ucs_assert_always(ucs_popcount(sreq->send.rndv.zcopy.lanes_map_all) > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not changed to assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, removed!
c6e3991
to
d6bae0a
Compare
What
Removes redundant fields from request structure.
Why ?
To avoid increasing
ucp_request
structure size after merging #9814Notes
Performance was tested on osu_latency, osu_bw and osu_mbw_mr benchmarks