New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a mechanism for process synchronization in time #829
Comments
Note that "simultaneous" is possible only in Newtonian space time. In this universe, it is not possible in general. Of course, a single system that defines a clear frame of reference can define "simultaneous" WRT that rest frame. But an MPI system made of communicating processes without a clear frame of reference (e.g., satellite networks) won't have a precise definition of what it means to be "at the same time". |
I am concerned that this API creates a false promise to the user. I added one comment line below. If We already have In short, because MPI doesn't control OS noise or scheduling, it's not in a position to guarantee any sort of harmony between processes. /* check if we are within the time epoch */
if(MPITS_Clocksync_get_time(&cs) > barrier_stamp ) {
*outflag = 0;
data->sync_failed = 1;
} else {
*outflag = 1;
data->sync_failed = 0;
}
/* wait for the epoch to end */
while(MPITS_Clocksync_get_time(&cs) <= barrier_stamp);
/* OS noise goes crazy here */
return MPI_SUCCESS; |
The problem with Moreover, some networks already have a similar capability build in, this could be a way to expose it to users. |
and
I agree that there is no guaranteed perfect harmonization. Implementations can only make a best effort. MPI provides abstractions for operations that are commonly used by applications and/or require some engineering to do efficiently. Yes applications could implement their own broadcast but there are a handful of different ways to do that so MPI implementations take the burden to implement them and hide the complexities behind a well defined API. Do implementations always select the fastest algorithm? Probably not, but they try to make a best effort to provide the best performance. The quality of process harmonization (i.e., the resulting skew between processes) is a performance property and can range from what a barrier provides today (many microseconds) to less than a microsecond if the network provides clock synchronization capabilities, and probably close to a microsecond with proper software clock synchronization. The application is able to describe its intentions to the MPI implementation: either process synchronization without regard to timing ( |
Problem
MPI distinguishes between synchronizing and non-synchronizing collective operations.
MPI_Barrier
is a commonly used synchronization primitive in codes that want to synchronize processes before entering a subsequent program region. It is used in most of the commonly used benchmark suites (OSU, IMB, mpiBench) in an attempt to ensure uniform start of the collective operation under test. However, the MPI standard does not provide any time synchronization guarantees forMPI_Barrier
, leading to arbitrary arrival patterns into the collective under test and thus skewing the results of the benchmark. This issue has been discussed in the literature but it is likely that the complexities of proper time synchronization have kept benchmark implementors from ensuring proper time synchronization.Proposal
MPI should provide a mechanism to provide time synchronization of processes. This is different from exposing a functionality for synchronizing in that MPI will not directly expose a synchronized clock. Instead, MPI will provide a a procedure from which processes return at the "same" physical time. Due to the nature of distributed systems, the "same time" can only be an approximation so implementations will have to make a best effort. We call this mechanism harmonization and the proposed function
MPI_Harmonize
.MPI is in a unique place as it has knowledge of the underlying hardware, including the node-local clocks and network. For example, implementations can employ globally synchronized hardware clocks if available.
Changes to the Text
Introduce
MPI_Harmonize
that takes the following arguments:A call to
MPI_Harmonize
acts like a barrier oncomm
with the extended requirement that processes return from the call at the same time based on an internal synchronized virtual clock (without synchronizing the system clock itself). Applications will be able to use this functionality to harmonize process execution and approximate a uniform arrival pattern into program regions.The
flag
parameter will be set to1
upon return if the local process found that it's execution was successfully harmonized with other processes, and0
otherwise. The value does not represent a global state and thus might differ between processes. It is up to the application to ensure that all processes are harmonized by checking the returnedflag
at a convenient point in time (ideally without introducing additional skew between processes before entering the region under test). Harmonization may fail spuriously, e.g., due to OS noise, network jitter, and clock drift, so applications must be able to handle these cases.Impact on Implementations
Implementations should provide an implementation of
MPI_Harmonize
, potentially based on the reference implementation referenced below.Impact on Users
Applications are provided with a mechanism for approximating a uniform execution of processes. Users of benchmarks can rely on benchmarks results that are not skewed by the underlying barrier implementation.
References and Pull Requests
Reference implementation: https://github.com/devreal/mpix-harmonize
EuroMPI'23 paper: https://dl.acm.org/doi/10.1145/3615318.3615325
PR: https://github.com/mpi-forum/mpi-standard/pull/965
This work was a collaboration with Sascha Hunold (UWien), who does not regularly the Forum meetings but should be mentioned here.
The text was updated successfully, but these errors were encountered: