-
Notifications
You must be signed in to change notification settings - Fork 933
Add new rml channel APIs and qos framework #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s if the degree" This reverts commit 9c788ff.
… degree of the topology is higher than the communicator size It is possible to have a topology degree higher than the size of the communicator. For example, a periodic cartesian communicator on MPI_COMM_SELF. This will leave the neighborhood collectives with a request buffer that is too small. This commits introduces a semantic change : from now, c_topo must be set before invoking coll_select
…(...) * use accesors to retrieve topo info
…number of bytes remaining to be output or else we will output duplicate bytes when next we are able to write.
|
@annu13 could you please rebase this PR, and then squash your commits into a single one? Makes the history cleaner. |
|
@annu13 also, I see that there are a fair number of files being touched here that just have whitespace deletions and/or some odd changes that have nothing to do with this PR. Can you please revert those so we only get the changes relative to this PR here? You can submit the whitespace changes separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this all looks like whitespace changes in ompi even though this PR is suppose to be for orte. I'd prefer to see this type of change moved to a separate PR.
|
Refer to this link for build results (access rights to CI server needed): Build Log Test FAILed. |
undoing whitespace auto insertions
|
Refer to this link for build results (access rights to CI server needed): Build Log Test FAILed. |
|
Replaced by PR #564 |
Remove the orte/qos framework and associated changes that should not …
Defect 231781 - Pull in 3 orte commits, seems to fix orted setup race condition.
WHAT: Add new rml channel APIs (open, send and close) and a new QoS framework with ACK QoS component in ORTE.
WHY: to provide a method to specify end to end QoS requirements and to use it for sending RML messages
WHERE: Various files in RML, OOB and QoS. Please refer to change file list.
IMPACT: No impact on code using existing RML APIs. The QoS framework is only active when the new RML channel APIs are used.
More Details:
Intended Use:
rml_open_channel: The sender specifies the desired QoS requirements for the messages it wants to send to a peer by opening a channel to the peer. The desired QoS is specified using an attribute list. A no-op qos channel can also be opened if there are no QoS requirements. A channel number reference is returned to the sender in the completion callback upon success.
rml_send_channel_nb: Sender sends messages to peer by providing the channel number corresponding to the desired QoS.
rml_close_channel: When sender is done sending all the messages on the channel it should close the channel to release the resources held for book-keeping purpose.
Theory of Operation:
Open Channel: The sender creates a channel to the destination by calling open_channel and provides a list of QoS attributes describing the desired QoS. The RML layer creates a RML channel object, and calls the QoS to create the QoS channel object matching the specified QoS. The QoS Framework will call the QoS component matching the QoS type requested . The selected QoS component will create a QoS channel object with the requested attributes and returns it to RML. The RML associates the QoS channel object with the RML channel object and sends a open_channel request to the destination with the requested QoS attributes. The destination processes the open channel request and creates a RML channel object and the respective QoS channel object at its end and replies to the sender with its RML channel number (reference). The sender processes the response from the peer, stores the peer channel number and calls the completion callback with the local channel number.
Send Channel: rml_send_channel is called with the channel number (instead of the destination process name in rml_send_nb) and the rest of the send parameters similar to existing rml_send api. The RML retrieves the rml channel object corresponding to the channel number and associates it with the send request object. The QoS is called to prep for send, the respective QoS component will do the required book keeping operations and stores that info in the QoS channel object associated with the send request. The required channel info is added to send msg and is then forwarded to OOB for further send processing. The send completion path is also intercepted by the QoS – the QoS component will determine if the send request can be completed instantly or wait until some QoS specific action has occurred. In the case of ACK QoS, the send request is completed only after receiving ACK from the destination.
Recv Channel: There is no rml_recv_channel api as the receiving process cannot enforce QoS from its end. However the RML and QoS components on the receive process must perform the required msg post processing for a message received on a channel. The RML will retrieve the channel object using the channel number received in the request header. The RML will then call the QoS to do the required processing on the received request. The QoS component will update the book keeping info and perform any required operation such as sending an ACK back to the sender. The recv request is then returned to the RML for further processing.
Close Channel: The sending process is expected to close a channel to the peer after sending all the msgs. In response a close request is sent to the destination process and the RML and QoS channel objects are released on the sender and receiver’s end following the handshake.
Major Code Changes:
QoS Framework : A new MCA framework for the QoS feature was added. The framework is called by RML to process the rml_channel_xxx requests.
ACK QoS Component: The ACK QoS component provides windowed ACK functionality. It also supports retry of lost messages (out of order ACK and ACK timeout).
Noop QoS Component: This is a no-op qos component intended for book-keeping and place holder purpose.
RML – 3 new RML channel APIs, minor modifications to send processing path, send completion and recv handling. New orte_rml_channel object added and additional fields in send, recv objects to carry channel info.
OOB – Minor modification to send completion and addition of channel specific data to msg headers.