Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/PROTOV1: Remove redundant fieds from request structure #9864

Conversation

ivankochin
Copy link
Contributor

@ivankochin ivankochin commented May 7, 2024

What

Removes redundant fields from request structure.

Why ?

To avoid increasing ucp_request structure size after merging #9814

Notes

Performance was tested on osu_latency, osu_bw and osu_mbw_mr benchmarks

tvegas1
tvegas1 previously approved these changes May 7, 2024
src/ucp/rndv/rndv.c Show resolved Hide resolved
@@ -529,6 +531,7 @@ ucp_rndv_progress_rma_zcopy_common(ucp_request_t *req, ucp_lane_index_t lane,
ucp_ep_h ep = req->send.ep;
uct_ep_h uct_ep = ucp_ep_get_lane(ep, lane);
ucp_ep_config_t *config = ucp_ep_config(ep);
size_t lanes_count = ucs_popcount(req->send.rndv.zcopy.lanes_map_all);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

popcount is not converted to CPU instruction when not passing --enable-optimizations flag
so we should avoid it in fast path
maybe we can keep it in the request, it's not so big anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

union {
    struct {
        /* Actual lanes map */
        ucp_lane_map_t lanes_map_all;

        /* Actual lanes count */
        uint8_t        lanes_count;
    } zcopy;

    struct {
        /* Data start offset of this request */
        size_t offset;
    } rtr;
};

That is the part of ucp_request where lanes_count is stored. If size of ucp_lane_map_t is less than 8, everything is OK since lanes_count + lanes_map_all fields took no more than offset, so union has 8 bytes size.

But if ucp_lane_map_t has 8 bytes, lanes_count became the 9th byte, so the union size became 16 bytes .

The only possible option that I see is to move that 1-byte field to another place inside ucp_request sturcture, but all other fields inside struct rndv {...} have 8-byte size, so on any place it would create 7 byte padding.

Is that so significant to have that value prestored in that field instead of popcount on that place? I mean that RMA flow which is often used for big messages and that's only for protov1, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, though maybe worth converting ucs_assert_always(lanes_count > 0); to ucs_assert(ucs_popcount(lanes_count) > 0); and remove the local var lanes_count, so it will be calculated only when needed in release mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

tvegas1
tvegas1 previously approved these changes May 8, 2024
yosefe
yosefe previously approved these changes May 8, 2024
@ivankochin ivankochin force-pushed the ucp/avoid-lane-maps-storage-inside-request branch from 5dcc938 to 666bcae Compare May 8, 2024 09:26
@ivankochin ivankochin enabled auto-merge May 8, 2024 09:26
@@ -1844,7 +1845,7 @@ UCS_PROFILE_FUNC(ucs_status_t, ucp_rndv_progress_rma_put_zcopy, (self),
ucp_request_t *sreq = ucs_container_of(self, ucp_request_t, send.uct);
uct_rkey_t uct_rkey;

ucs_assert_always(sreq->send.rndv.zcopy.lanes_count > 0);
ucs_assert_always(ucs_popcount(sreq->send.rndv.zcopy.lanes_map_all) > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not changed to assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, removed!

src/ucp/rndv/rndv.c Outdated Show resolved Hide resolved
@ivankochin ivankochin dismissed stale reviews from tvegas1 and yosefe via c6e3991 May 8, 2024 12:02
@ivankochin ivankochin force-pushed the ucp/avoid-lane-maps-storage-inside-request branch from c6e3991 to d6bae0a Compare May 8, 2024 12:16
@ivankochin ivankochin merged commit 4e10990 into openucx:master May 9, 2024
140 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants