-
Notifications
You must be signed in to change notification settings - Fork 934
Consolidate all the QOS changes into one clean commit #564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Refer to this link for build results (access rights to CI server needed): Build Log Test FAILed. |
|
bot:retest |
|
|
Refer to this link for build results (access rights to CI server needed): |
orte/mca/qos/base/qos_base_frame.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct misprint: mca_base_var_rtegister => mca_base_var_register. It doesn't compilable with --enable-timing.
|
@nkogteva I think I have this cleaned up now. I see some significant work still to-be-done in the code, but the default code path appears to be untouched, as promised. |
|
|
Refer to this link for build results (access rights to CI server needed): |
|
@rhc54 yes, thanks. @jladd-mlnx Old functionality is working as it was promised in description. Light tests (like hello world or pmix test) work fine. I tried to run new oob_stress_channel test with oob ud. It hangs. But I'm sure that problem is in oob ud, not in this PR. Because old one (oob_stress test) with oob ud also hangs. So I think that this PR can be merged. I will look at the stress tests + ud and handle problem in different PR. |
Consolidate all the QOS changes into one clean commit
Pr/oshmem man v1.10
Summary
WHAT: Add new rml channel APIs (open, send and close) and a new QoS framework with ACK QoS component in ORTE.
WHY: to provide a method to specify end to end QoS requirements and to use it for sending RML messages
WHERE: Various files in RML, OOB and QoS. Please refer to change file list.
IMPACT: No impact on code using existing RML APIs. The QoS framework is only active when the new RML channel APIs are used.
More Details
Intended Use:
Theory of Operation:
Open Channel: The sender creates a channel to the destination by calling open_channel and provides a list of QoS attributes describing the desired QoS. The RML layer creates a RML channel object, and calls the QoS to create the QoS channel object matching the specified QoS. The QoS Framework will call the QoS component matching the QoS type requested . The selected QoS component will create a QoS channel object with the requested attributes and returns it to RML. The RML associates the QoS channel object with the RML channel object and sends a open_channel request to the destination with the requested QoS attributes. The destination processes the open channel request and creates a RML channel object and the respective QoS channel object at its end and replies to the sender with its RML channel number (reference). The sender processes the response from the peer, stores the peer channel number and calls the completion callback with the local channel number.
Send Channel: rml_send_channel is called with the channel number (instead of the destination process name in rml_send_nb) and the rest of the send parameters similar to existing rml_send api. The RML retrieves the rml channel object corresponding to the channel number and associates it with the send request object. The QoS is called to prep for send, the respective QoS component will do the required book keeping operations and stores that info in the QoS channel object associated with the send request. The required channel info is added to send msg and is then forwarded to OOB for further send processing. The send completion path is also intercepted by the QoS – the QoS component will determine if the send request can be completed instantly or wait until some QoS specific action has occurred. In the case of ACK QoS, the send request is completed only after receiving ACK from the destination.
Recv Channel: There is no rml_recv_channel api as the receiving process cannot enforce QoS from its end. However the RML and QoS components on the receive process must perform the required msg post processing for a message received on a channel. The RML will retrieve the channel object using the channel number received in the request header. The RML will then call the QoS to do the required processing on the received request. The QoS component will update the book keeping info and perform any required operation such as sending an ACK back to the sender. The recv request is then returned to the RML for further processing.
Close Channel: The sending process is expected to close a channel to the peer after sending all the msgs. In response a close request is sent to the destination process and the RML and QoS channel objects are released on the sender and receiver’s end following the handshake.
Major Code Changes: