Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/RNDV: Refactor rendezvous protocol to use rma_bw lanes. #2092

Merged
merged 5 commits into from
Dec 21, 2017

Conversation

yosefe
Copy link
Contributor

@yosefe yosefe commented Dec 19, 2017

No description provided.

@yosefe yosefe added the Cleanup label Dec 19, 2017
@jenkinsornl
Copy link

Build finished.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/5559/ for details (Mellanox internal link).

@@ -34,6 +34,26 @@ size_t ucp_rkey_packed_size(ucp_context_h context, ucp_md_map_t md_map)
return size;
}

void ucp_rkey_packed_copy(ucp_context_h context, ucp_md_map_t md_map,
void *rkey_buffer, const void* uct_rkeys[])
Copy link
Contributor

@hoopoepg hoopoepg Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const void* uct_rkeys[] is interpreted as void**, as I can see incoming parameter is just non-initialized array, not array of pointers (and coverity agree with me :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be a const array of pointers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I see
the simplest way to suppress error (without hack with suppress coverity by comment) is create separate array of pointers, like

const void *array[2] = {key_buf, NULL};

and pass it to this function (technically compiler makes same things, should not affect performane)


rkey_buffer += md_size;

snprintf(p, endp - p, md_map ? "," : "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

md_map is always NOT NULL here (coverity report)

ucs_for_each_bit(md_index, md_map) {
md_size = *((uint8_t*)rkey_buffer);
rkey_buffer += sizeof(uint8_t);

if (!first) {
snprintf(p, endp - p, ",");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not p+= snprintf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://linux.die.net/man/3/snprintf
The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated. (See also below under NOTES.)

@jenkinsornl
Copy link

Build finished.

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/3423/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/5563/ for details (Mellanox internal link).

@yosefe
Copy link
Contributor Author

yosefe commented Dec 20, 2017

@hoopoepg @brminich pls take a look

uct_rkey_bundle_t rkey_bundle;
ucp_request_t *sreq; /* send request on the send side */
uint64_t remote_address; /* address of the receiver's data buffer */
uintptr_t remote_request; /* pointer to the receiver's send request */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is receiver's recv request

ucp_lane_index_t lane = ucp_ep_get_tag_lane(ep);
ucp_ep_t *ep = sreq->send.ep;

ucs_assert(sreq->send.lane == ucp_ep_get_tag_lane(ep));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do you set a send.lane?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add comment

@@ -28,7 +23,6 @@ typedef struct {
ucp_request_hdr_t sreq; /* send request on the rndv initiator side */
uint64_t address; /* holds the address of the data buffer on the sender's side */
size_t size; /* size of the data for sending */
uint16_t flags;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we check on receiver whether we have rkeys or not now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to md_map in the packed rkey

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we send no rkey in case of SW rndv, non-contig RNDV, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, we can tell it by address==NULL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, for SW rndv, we do send rkeys (if the send buffer is contig but large, for example)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aa, right, we check the address. Yes, SW rndv may carry the rkey for large messages, but in other cases it won't.

ucs_assert_always(md_size <= UINT8_MAX);
*((uint8_t*)p++) = md_size;
memcpy(p, *uct_rkeys, md_size);
p += sizeof(uint8_t) + md_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like need to do p += md_size, because you already made p++ before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it worked because offload currently has only 1 key

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/3435/ for details.

@jenkinsornl
Copy link

Build finished.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/5576/ for details (Mellanox internal link).

@jenkinsornl
Copy link

Build finished.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/5577/ for details (Mellanox internal link).

}
}

sreq->send.lane = ucp_ep_get_am_lane(ep);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like no need to set lane here, can do assert that it is AM lane (for consistency with offload branch)

Copy link
Contributor Author

@yosefe yosefe Dec 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but here we set to am_lane, and in offload.c we expect tag_lane

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that request lane is set in ucp_tag_send_req_init(). No need to set it here again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but ucp_tag_send_req_init() always sets it to tag_lane, and in case of rndv we may need to switch to am_lane.. do you think they would have be the same in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!ep->flags & UCP_EP_FLAG_TAG_OFFLOAD_ENABLED), tag_lane should be am_lane

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config->tag.lane = lane;

@yosefe
Copy link
Contributor Author

yosefe commented Dec 21, 2017

bot:retest

@jenkinsornl
Copy link

Build finished.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/5582/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/3445/ for details.

@jenkinsornl
Copy link

Build finished.

@yosefe
Copy link
Contributor Author

yosefe commented Dec 21, 2017

bot:bgate:retest

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/3451/ for details.

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/3453/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/5588/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/5591/ for details (Mellanox internal link).

@yosefe yosefe merged commit 5b1aeb3 into openucx:master Dec 21, 2017
@yosefe yosefe deleted the topic/ucp-rndv-refactor branch December 21, 2017 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants