Skip to content
Ralph Castain edited this page Apr 13, 2016 · 1 revision

PMIx Event Notification Meeting

Date

April 7, 2016

Attendees:

Ralph, David, Aurelien, Josh H., and Annu.

Minutes:

We continued last week’s discussion regarding prioritization of the event callback stack. The primary concern at this time is to adequately flesh out the various use-cases so we can ensure that the eventual notification APIs cover them, thus avoiding future modifications.

Conclusions:

  • reaffirmed last week’s decision that all callbacks registered against a provided event will be called - i.e., there will not be a “stop” flag to indicate that no further callbacks are to be executed

  • each callback will be non-blocking - the PMIx library will call the event callback and provide a “completion callback” that is to be executed when the event callback is done. We request that the event callback “thread shift” out of the PMIx progress thread to avoid deadlocks should the event callback desire to access the PMIx library. The completion callback will allow the event callback to return an array of pmix_info_t structs containing information on what it did - these will be concatenated and passed to subsequent callbacks, thus creating a “chain” that describes the overall response

  • the initial event notification will contain a pmix_info_t with a flag value indicating that it the event has not been “resolved”. Event callbacks can look at the value of the flag to see if a prior event callback “resolved” the indicated problem. If so, then no further action should be taken to correct the problem - however, the callback stack will continue to be traversed in order to allow each callback the opportunity to do any required cleanup or bookkeeping of the event

  • event registrations can either be appended (default) or prepended (indicate by pmix_info_t key) to the callback stack for a given event. At this time, we do not believe finer grained prioritization is warranted. However, pmix_info_t keys can provide such capabilities without changing the API should we later find a need for it.

  • a PMIX_ABORT event will be defined so that an event callback can request that the process abort. Should an event callback decide that an abort is necessary, it can signal this by calling PMIx_Notify_event with the PMIX_ABORT status. Once the current event callback stack has completed, the new PMIX_ABORT event will be triggered and the callback stack registered to receive that event will be traversed. This allows someone to register a callback that will actually call PMIx_Abort to abort the process.

Ralph will amend the current draft RFC document to reflect these changes, and update the code accordingly. As always, participants in the call should feel free to offer corrections and additions that I missed.

Comments/suggestions are welcome!

Clone this wiki locally