What features do users need from an MPI C++ interface? #288

jeffhammond · 2020-04-24T14:35:49Z

This is a meta-issue, which I am creating to capture user feedback on MPI C++ bindings.

I am moving this over from https://scicomp.stackexchange.com/questions/7978/what-features-do-users-need-from-an-mpi-c-interface, which was extremely well-received despite not complying with the rules of StackExchange.

Original Prompt

The 3.0 version of the MPI standard formally deleted the C++ interface (it was previously deprecated). While implementations may still support it, features that are new in MPI-3 do not have a C++ interface defined in the MPI standard. See http://blogs.cisco.com/performance/the-mpi-c-bindings-what-happened-and-why/ for more information.

The motivation for removing the C++ interface from MPI was that it had no significant value over the C interface. There were very few differences other than "s/_/::/g" and many features that C++ users are accustomed to were not employed (e.g. automatic type determination via templates).

As someone who participates in the MPI Forum and works with a number of C++ projects that have implemented their own C++ interface to the MPI C functions, I would like to know what are the desirable features of a C++ interface to MPI. While I commit to nothing, I would be interested in seeing the implementation of a standalone MPI C++ interface that meets the needs of many users.

And yes, I am familiar with Boost::MPI but it only supports MPI-1 features and the serialization model would be extremely difficult to support for RMA.

One C++ interface to MPI that I like is that of Elemental's mpi wrapper so perhaps people can provide some pro and con w.r.t. that approach. In particular, I think MpiMap solves an essential problem.

jeffhammond · 2020-04-24T14:40:27Z

Wolfgang Bangerth provided the following response (https://scicomp.stackexchange.com/a/7991/150):

Let me first answer why I think C++ interfaces to MPI have generally not been overly successful, having thought about the issue for a good long time when trying to decide whether we should just use the standard C bindings of MPI or building on something at higher level:

When you look at real-world MPI codes (say, PETSc, or in my case deal.II), one finds that maybe surprisingly, the number of MPI calls isn't actually very large. For example, in the 500k lines of deal.II, there are only ~100 MPI calls. A consequence of this is that the pain involved in using lower-level interfaces such as the MPI C bindings, is not too large. Conversely, one would not gain all that much by using higher level interfaces.

My second observation is that many systems have multiple MPI libraries installed (different MPI implementations, or different versions). This poses a significant difficulty if you wanted to use, say, boost::mpi that don't just consist of header files: either there needs to be multiple installations of this package as well, or one needs to build it as part of the project that uses boost::mpi (but that's a problem in itself again, given that boost uses its own build system, which is unlike anything else).

So I think all of this has conspired against the current crop of C++ interfaces to MPI: The old MPI C++ bindings didn't offer any advantage, and external packages had difficulties with the real world.

This all said, here's what I think would be the killer features I would like to have from a higher-level interface:

It should be generic. Having to specify the data type of a variable is decidedly not C++-like. Of course, it also leads to errors. Elemental's MpiMap class would already be a nice first step (though I can't figure out why the heck the MpiMap::type variable isn't static const, so that it can be accessed without creating an object).
It should have facilities for streaming arbitrary data types.
Operations that require an MPI_Op argument (e.g., reductions) should integrate nicely with C++'s std::function interface, so that it's easy to just pass a function pointer (or a lambda!) rather than having to clumsily register something.

boost::mpi actually satisfies all of these. I think if it were a header-only library, it'd be a lot more popular in practice. It would also help if it supported post-MPI 1.0 functions, but let's be honest: this covers most of what we need most of the time.

jeffhammond · 2020-04-24T14:42:45Z

@gnzlbg provided the following response (https://scicomp.stackexchange.com/a/14640/150):

My list in no particular order of preference. The interface should:

be header only, without any dependencies but <mpi.h>, and the standard library,
be generic and extensible,
be non-blocking only (if you want to block, then block explicitly, not by default),
allow continuation-based chaining of non-blocking operations,
support extensible and efficient serialization (Boost.Fusion like, such that it works with RMA),
have zero abstraction penalty (i.e. be at least as fast as the C interface),
be safe (the destructor of a non-ready future is called? -> std::terminate!),
have a strong DEBUG mode with tons of assertions,
extremely type-safe (no more ints/void* for everything, heck I want tags to be types!),
it should work with lambdas (e.g. all reduce + lambda),
use exceptions consistently as error-reporting and error-handling mechanism (no more error codes! no more function output arguments!),
MPI-IO should offer a non-blocking I/O interface in the style of Boost.AFIO,
and just follow good modern C++ interface design practices (define regular types, non-member non-friend functions, play well with move semantics, support range operations, ...)

Extras:

allow me to chose the executor of the MPI environment, that is, which thread pool it uses. Right now you can have applications with a mix of OpenMP, MPI, CUDA, and TBB... all at the same time, where each run-time thinks it owns the environment and thus ask the operating system for threads every time they feel like it. Seriously?
use the STL (and Boost) naming convention. Why? Every C++ programmer knows it.

I want to write code like this:

    auto buffer = some_t{no_ranks};
    auto future = gather(comm, root(comm), my_offsets, buffer)
                  .then([&](){
                    /* when the gather is finished, this lambda will 
                       execute at the root node, and perform an expensive operation
                       there asynchronously (compute data required for load 
                       redistribution) whose result is broadcasted to the rest 
                       of the communicator */
                    return broadcast(comm, root(comm), buffer);
                  }).then([&]() {
                    /* when broadcast is finished, this lambda executes 
                       on all processes in the communicator, performing an expensive
                       operation asynchronously (redistribute the load, 
                       maybe using non-blocking point-to-point communication) */
                     return do_something_with(buffer);
                  }).then([&](auto result) {
                     /* finally perform a reduction on the result to check
                        everything went fine */
                     return all_reduce(comm, root(comm), result, 
                                      [](auto acc, auto v) { return acc && v; }); 
                  }).then([&](auto result) {
                      /* check the result at every process */
                      if (result) { return; /* we are done */ }
                      else {
                        root_only([](){ write_some_error_log(); });
                        throw some_exception;
                      }
                  });

    /* Here nothing has happened yet! */
 
    /* ... lots and lots of unrelated code that can execute concurrently 
       and overlaps with communication ... */

    /* When we now call future.get() we will block 
       on the whole chain (which might have finished by then!).
    */
    
    future.get();

Think how one could chain all this operations using MPI_C's requests. You would have to test at multiple (or every single) intermediate step through a whole lot of unrelated code to see if you can advance your chain without blocking.

jeffhammond · 2020-04-24T14:44:14Z

GradGuy provided the following response (https://scicomp.stackexchange.com/a/8009/150):

Personally, I don't really mind calling long C-style functions for the exact reason Wolfgang mentioned; there are really few places you need to call them and even then, they almost always get wrapped around by some higher-level code.

The only things that really bother me with C-style MPI are custom datatypes and, to a lesser degree, custom operations (because I use them less often). As for custom datatypes, I'd say that a good C++ interface should be able to support generic and efficient way of handling this, most probably through serialization. This is of course the route that boost.mpi has taken, which if you are careful, is a big time saver.

As for boost.mpi having extra dependencies (particularly boost.serialization which itself is not header-only), I've recently came across a header-only C++ serialization library called cereal which seems promising; granted it requires a C++11 compliant compiler. It might worth looking into and using it as a based for something similar to boost.mpi.

jeffhammond · 2020-04-24T14:45:42Z

Utkarsh Bhardwaj provided the following response (https://scicomp.stackexchange.com/a/25094/150):

The github project easyLambda provides a high level interface to MPI with C++14.

I think the project has similar goals and it will give some idea on things that can be and are being done in this area by using modern C++. Guiding other efforts as well as easyLambda itself.

The initial benchmarks on performance and lines of code have shown promising results.

Following is a short description of features and interface it provides.

The interface is based on data flow programming and functional list operations that provide inherent parallelism. The parallelism is expressed as property of a task. The process allocation and data distribution for the task can be requested with a .prll() property. There are good number of examples in the webpage and code-repository that include LAMMPS molecular dynamics post processing, explicit finite difference solution to heat equation, logistic regression etc. As an example the heat diffusion problem discussed in the article HPC is dying... can be expressed in ~20 lines of code.

I hope it is fine to give links rather than adding more details and example codes here.

Disclamer: I am the author of the library. I believe I am not doing any harm in hoping to get a constructive feedback on the current interface of easyLambda that might be advantageous to easyLambda and any other project that pursues similar goals.

mhoemmen · 2020-04-27T15:22:32Z

Given how fast the C++ Standard is moving with respect to thread and task parallelism, coroutines, networking, and reflection, it seems premature to standardize a C++ MPI interface now. Why not let all these great libraries first build experience presenting a modern C++ interface to the latest MPI features? Why repeat the mistake of the '90s and rush to standardize? I would love for someone to modernize Boost.MPI, for example; I would be happy to help with that (at least to test changes).

If we want gather(...).then(...).then(...)...., then why not build on the C++ networking TS? If we worry about thread interactions, then why not wait on (or participate in) an executors-networking merger? I can guess some reasons why, but I would expect an MPI proposal to answer questions like that.

Regarding a header-only library: this sounds good if you're starting a new project, but some existing C++ projects that use MPI care a lot about build sizes and times. If we want to put something in the MPI Standard, I'd like to see some build experiments in real applications.

mhoemmen · 2020-04-27T19:47:30Z

Wolfgang Bangerth wrote:

My second observation is that many systems have multiple MPI libraries installed (different MPI implementations, or different versions). This poses a significant difficulty if you wanted to use, say, boost::mpi that don't just consist of header files: either there needs to be multiple installations of this package as well, or one needs to build it as part of the project that uses boost::mpi (but that's a problem in itself again, given that boost uses its own build system, which is unlike anything else).

We've dealt with this issue of multiple MPI installations by writing an MPI (C binding) library that just calls through to an underlying MPI implementation. Our library dispatches to an underlying MPI implementation at run time via dlopen or the Windows equivalent (it works great on Windows). We don't expose any details of the underlying MPI implementation's ABI, so it's handy for things like Python bindings. Our library takes effort to maintain and incurs function call overhead, but it's been useful enough that we're thinking about open-sourcing it. If you're interested, please let me know.

omor1 · 2020-04-27T22:38:06Z

We've dealt with this issue of multiple MPI installations by writing an MPI (C binding) library that just calls through to an underlying MPI implementation. Our library dispatches to an underlying MPI implementation at run time via dlopen or the Windows equivalent (it works great on Windows). We don't expose any details of the underlying MPI implementation's ABI, so it's handy for things like Python bindings.

Unrelated to the discussion at hand, but I'm curious as to how do you deal with the opaque handles (e.g. MPI_Comm, MPI_Request) that are exposed via mpi.h? These are highly implementation-dependent features whose sizes do depend on the underlying ABI. There was discussion of exactly this issue in #159. As a concrete example: in Open MPI, handles are pointers, while in MPICH-derivatives, they are int.

omor1 · 2020-04-27T22:42:07Z

Regarding a header-only library: this sounds good if you're starting a new project, but some existing C++ projects that use MPI care a lot about build sizes and times. If we want to put something in the MPI Standard, I'd like to see some build experiments in real applications.

There are both benefits and detriments to defining the MPI C++ interface so that it can be implemented as a header-only library. An obvious benefit is that a single generic implementation may be sufficient for all underlying MPI libraries, which can ease adoption and maintenance burden. The flip side is that then there are severe restrictions on the e.g. datatypes interface, as they would be required to use the MPI C interface rather than whatever low-level representation is used by the implementation.

mhoemmen · 2020-04-28T02:44:32Z

@acdemiralp wrote:

Why not co-develop it along with the C++ standard?

Yes -- let's write a library first, then standardize it. Maybe that means becoming a Boost.MPI developer or taking over Boost.MPI development, or maybe it means starting a new library (if one can make a strong technical argument that Boost.MPI has a fundamentally flawed design).

sg0 · 2020-04-28T18:39:27Z

Thanks for initiating the discussion, Jeff. I am unsure if a number of ubiquitous C++ idioms can be supported by an MPI C++ binding (for e.g., RAII, because a C++ destructor can be called after MPI_Finalize). As such, perhaps we can identify the C++ idioms that can be supported in a conformant way in such a binding, since in C++ there are potentially different ways to implement/design an interface. In terms of ownership, since MPI does not own the data and request buffers (users responsibility), the C++ interface must follow suit. However, from the example mentioned by Mark H., it seems the return object of the MPI function invocation is a future. From my discussion with a few other forum members, it seems future objects can represent MPI request objects; that means the MPI C++ interface have to maintain the intermediate futures. Futures may require ownership transfer in certain cases, which involve extra copies. It seems for a C++ user, allowing an interface that accepts C++20 ranges[*] could be quite useful (not using ranges from std:: but implementing it keeping the interface). But, this would require 'hiding' (hence maintaining) derived datatypes, so again I don't know if passing this responsibility to the C++ API is appropriate performance-wise (may require extra copies during scope transitions). [*] https://en.cppreference.com/w/cpp/ranges <https://en.cppreference.com/w/cpp/ranges> A templated C++ free-function based approach will perhaps be the easiest to implement and lead to the least overhead. But, that means we won't be making use of the modern C++ functionalities.

…

On Mon, Apr 27, 2020 at 7:44 PM Mark Hoemmen ***@***.***> wrote: @acdemiralp <https://github.com/acdemiralp> wrote: Why not co-develop it along with the C++ standard? Yes -- let's write a library first, then standardize it. Maybe that means becoming a Boost.MPI developer or taking over Boost.MPI development, or maybe it means starting a new library (if one can make a strong technical argument that Boost.MPI has a fundamentally flawed design). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#288 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCNC6IRKDBSJIIE5643S43ROY7KBANCNFSM4MQE2IGA> .

-- Sayan | https://sg0.github.io/

mhoemmen · 2020-04-28T23:29:33Z

@sg0 wrote:

However, from the example mentioned by Mark H., it seems the return object of the MPI function invocation is a future.

It would be a sender, in P0443R13 terms, not a future. Senders and receivers avoid some of the shared state issues that futures have.

In any case, I'm not necessarily advocating this design. I'm just saying that if people want that kind of design, then it should fit with how modern C++ is doing it. I'd like to see the people doing that design engage with C++ networking and executors experts.

StellarTodd · 2020-05-12T16:29:03Z

We've dealt with this issue of multiple MPI installations by writing an MPI (C binding) library that just calls through to an underlying MPI implementation. Our library dispatches to an underlying MPI implementation at run time via dlopen or the Windows equivalent (it works great on Windows). We don't expose any details of the underlying MPI implementation's ABI, so it's handy for things like Python bindings.

Unrelated to the discussion at hand, but I'm curious as to how do you deal with the opaque handles (e.g. MPI_Comm, MPI_Request) that are exposed via mpi.h? These are highly implementation-dependent features whose sizes do depend on the underlying ABI. There was discussion of exactly this issue in #159. As a concrete example: in Open MPI, handles are pointers, while in MPICH-derivatives, they are int.

We defined a Handle class that contains a union, and conversions methods for converting back and forth between native handles and our handles. The conversions are done in the plugin portion of the library that is compiled against a specific MPI implementation.

Since this is off topic, I don't want to get into any more details here. Feel free to contact Mark or me for further details.

jeffhammond · 2020-05-18T01:52:43Z

Given how fast the C++ Standard is moving with respect to thread and task parallelism, coroutines, networking, and reflection, it seems premature to standardize a C++ MPI interface now. Why not let all these great libraries first build experience presenting a modern C++ interface to the latest MPI features? Why repeat the mistake of the '90s and rush to standardize? I would love for someone to modernize Boost.MPI, for example; I would be happy to help with that (at least to test changes). As a person who also considered updating Boost.MPI, and then walked through a 900+ page standard to see if it is feasible to write a full C++17 wrapper around it from scratch, and then gave up on all these due to the solo amount involved and used barebones C for MPI in an otherwise fully modern C++17 application: Why not co-develop it along with the C++ standard? The majority of the features you mention are already concrete, and even provide experimental/predecessor implementations.

You don’t have to implement everything to make an impact on the MPI Forum. If you look at the BigMPI stuff I did, I hit most or all of the relevant functions but didn’t support datatypes. People understood how to generalize. As for Boost.MPI enhancements, adding support for nonblocking collectives, Mprobe/Mrecv, and neighborhood collectives is both important and straightforward. RMA will be hard but just leave that for now. It doesn’t make it or break it for either goal. In any case, if you are serious about Boost.MPI3, setup a repo for it, add the classes of functionality you want to support, and start with easy stuff like Mrecv. Tag me in any issues where you need help understanding the document. It has been a while but I have read it cover to cover at least once, and the meaty stuff many times. You might also look at code generation methods like mpiwrap from LLNL to understand how to automate away some of the tedium. It’s not designed for this purpose, but it might be useful anyways.

raffenet · 2020-05-18T12:41:41Z

FYI https://gitlab.com/correaa/boost-mpi3. I don't know any of the details of the implementation, just that it exists and some projects have investigated using it.

mhoemmen · 2020-05-25T20:22:16Z

@acdemiralp wrote:

Can https://www.mpich.org/static/docs/latest/www3/MPI_Type_create_struct.html forward the difficulties of serialization to MPI, and potentially even allow removing the dependency to Boost.Serialization?

If C++ gets actual reflection, that would let us use MPI_Type_create_struct to iterate over the fields of a class and convert them into an MPI_Datatype. Right now, there's no way in standard C++ to do that.

omor1 · 2020-05-25T20:36:42Z

If C++ gets actual reflection, that would let us use MPI_Type_create_struct to iterate over the fields of a class and convert them into an MPI_Datatype. Right now, there's no way in standard C++ to do that.

This would probably work for most POD / Trivial / StandardLayout types, but isn't portable to types that don't need all members serialized. I think most high-level C++-based APIs (thinking Charm++ and STAPL here, for instance) use user-provided pack/unpack routines to do serialization. If we can find a mechanism that allows users to easily select which fields of a class must be serialized, that would probably be the way to go.

omor1 · 2020-05-25T22:45:06Z

I believe the best practice solution to such a problem lies on the user's part: Create a smaller struct of things which will actually be serialized, and put it in a struct which also contains other stuff. If you need sequentiality, use pointer to the serialized struct in the larger struct and store them sequentially separately. Decent, intuitive solution in C++ terms.

I agree that this is indeed a nifty solution. Actually, it should be possible to make a template type with a parameter pack that serializes the types in the order given, something similar to std::tuple. That would allow use in current C++.

mhoemmen · 2020-05-26T16:11:51Z

Automagical serialization could be a footgun. I'm already uncomfortable with Boost automatically "taking care of" types that have run-time length, like std::string. It's useful for my current project, but I don't like that there could be multiple messages happening when I only typed one (what does that mean for progress of nonblocking messages, for instance?).

rabauke · 2020-05-28T23:58:47Z

@acdemiralp wrote:

Can https://www.mpich.org/static/docs/latest/www3/MPI_Type_create_struct.html forward the difficulties of serialization to MPI, and potentially even allow removing the dependency to Boost.Serialization?

If C++ gets actual reflection, that would let us use MPI_Type_create_struct to iterate over the fields of a class and convert them into an MPI_Datatype. Right now, there's no way in standard C++ to do that.

Actually, one can do kind of reflection for some generic types as std::tuple, std::array etc. to build MPI datatypes at run time fully automatically and not visible to the user. This was the route that I took in MPL. MPL is a C++11 header-only message passing library build around the MPI standard.

omor1 · 2020-05-29T01:33:06Z

The problem with using std::tuple and std::pair directly is that as far as I know they aren't guaranteed to be standard layout types and don't provide direct access to the underlying storage.

rabauke · 2020-05-29T08:05:42Z

@omor1 Not being standard layout types is the reason, why reflection via template magic is performed and an MPI datatype is constructed via MPI_Type_create_struct for each std::tuple type. Access to underlying member storage is gained via std::get and &. To my understanding, a restriction to standard layout types would be only required if one would send data in a memcpy-like fashion in MPI calls, e.g., by sending blocks of raw memory and using MPI_BYTE.

omor1 · 2020-05-29T08:12:46Z

Oh, I think I understand—you can get the offset from the base of the tuple and thus construct an MPI type for the tuple itself. Very clever! I'd been playing around for a bit with something similar, but I was recursively constructing structures to ensure they would be standard layout and thus be able to use offsetof, since C++ has no way to expand a parameter pack into a set of variables of those types.

VictorEijkhout · 2020-06-13T17:03:41Z

Well, this discussion went a long time before anyone mentioned MPL. I've been very impressed with MPL, which like mpi4py makes life a lot easier. For instance, data knows which type it is so for the 99.99 percent of the cases where you don't care you don't have to spell it out.

I've started incorporating MPL in my MPI book, hoping that it will find wider adoption.
https://web.corral.tacc.utexas.edu/CompEdu/pdf/pcse/EijkhoutParComp.pdf

jeffhammond · 2020-06-16T13:54:42Z

@mhoemmen

why not build on the C++ networking TS?

I tried a few years ago to get the C++ networking people to support semantics other than HTTP and they were rather hostile. I proposed a fabric TS that behaved like OFI/libfabric was told I just didn't understand what the word "networking" meant.

You may have better luck, but I don't have time to teach SG14 people that Internet Protocol is not the only way to move bytes between computers.

mhoemmen · 2020-06-17T14:46:46Z

@jeffhammond Ugh, sorry to hear that. I wish I had more time to work on this.

hzhangxyz · 2020-12-23T12:33:22Z

With c++ coroutine maybe we can write something like this?

auto value = MPI::Async::Receive(xxxxxx);
something_else();
use_value(co_await value);

VictorEijkhout · 2022-02-17T18:17:25Z

On , 2022Feb16, at 17:26, Wolfgang Bangerth ***@***.******@***.***>> wrote: * If we already accept that MPI functions should return objects by-value and express errors via exceptions, then it is a relatively small step to say that the immediate functions (like Isend, Irecv) shouldn't just return an MPI_Request object, but instead something like a std::future<void> that one can .wait() for. “Std::future” is a loaded term that comes with a lot of baggage. (Am I the only one to think that C++ threading is a mess?) The MPL interface to MPI has: auto request = comm.isend( stuff ); request.wait(); What are you wanting beyond that? A lot of the “std::future” functionality would require wrapping MPI_Test/Probe to realize, and that would take it far from the C/F interface to MPI. * We've recently really run into a lot of bugs where something exceeds 2^31 bytes. MPI 4 at your service. Victor.

bangerth · 2022-02-17T19:05:18Z

On 2/17/22 11:17, Victor Eijkhout wrote: “Std::future” is a loaded term that comes with a lot of baggage. (Am I the only one to think that C++ threading is a mess?) The MPL interface to MPI has: auto request = comm.isend( stuff ); request.wait(); What are you wanting beyond that? A lot of the “std::future” functionality would require wrapping MPI_Test/Probe to realize, and that would take it far from the C/F interface to MPI.

In the end, std::future isn't so bad. How you internally implement making the future "ready" is something independent of the interface chosen. std::future has the advantage that everyone is familiar with it, and that it allows storing an exception in it if the communication ends up failing; it can also be shared. Inventing a different solution has its costs as well. But these are all ancillary considerations. The purpose of this 'issue' is to collect ideas.

jacobmerson · 2022-02-17T19:09:00Z

As @VictorEijkhout says in C++ futures are a bit of a loaded term and use of std::future cause all sorts of lifetime/state issues and is not particularly performant due to this need of shared state. I think any forward looking C++ MPI API should consider the async utilities that are coming into the language via coroutines and std::execution/p2300.

bangerth · 2022-02-17T19:41:31Z

On 2/17/22 12:09, Jacob Merson wrote: As @VictorEijkhout <https://github.com/VictorEijkhout> says in C++ futures are a bit of a loaded term and use of |std::future| cause all sorts of lifetime/state issues and is not particularly performant due to this need of shared state. I think any forward looking C++ MPI API should consider the async utilities that are coming into the language via coroutines and |std::execution|/p2300 <http://wg21.link/p2300>.

I'm all for this kind of stuff. But do you want to standardize on things that are only available in C++23 or C++26? It's going to be many many years before a lot of project will be able to use this -- most large high performance projects lag about five years behind C++ standards because that's how long it takes for everyone to have compilers that support a standard. So if std::execution is part of C++26, most projects might be willing to use interfaces built on it in ~2031. Or you could standardize on C++11 or C++14 features and projects can start using these interfaces now. Of course this all assumes the MPI forum has any inclination to provide C++ interfaces to begin with, and do within the next few years.

sg0 · 2022-02-17T20:20:27Z

Technical reasons aside, there has to be some dedicated funding for getting this work done, since this is not just forum participation and developing myriad modern C++ language bindings. I contributed to 3 LDRD open calls and one DOE proposal solicitation (jointly with more established/senior scientists in this area) in the last 3 years in trying to get some funding for this work - all of them failed (I am still trying, but mostly pessimistic). I think there is perhaps limited incentive structure for this work in the minds of the senior people, at least in US DOE.

bkmgit · 2022-02-18T05:47:04Z

US DOE has traditionally been important, but MPI is used in a wide range of codes. An important additional consideration is use in industry. Examining software such as OpenFOAM may be helpful to get some idea of used features. Some C++ applications may also choose to directly build on top of UCX.

correaa · 2022-02-18T08:07:08Z

@bangerth,

The good thing about the word "future" (and continuations) is that many people knows what it means and it is a good initial sketch in principle.

Having said that, it is important to recognize the std::future in its current status might be too general and too heavy weight for some family of basic task.
Coincidentally on this family there are things that are very related to message passing.

First, std::future are not ideal because they do type erasure on the task (sort of like std::function), they are quite flexible but the best option in all cases. Second std::future contemplates the possibility of tasks failing (throwing) and that has a cost. It also typically needs to allocate the return object, which in turn can be a failure point.

What I found in my experiments is that from the outset, before and after sending a message there is the typical need for encoding and decoding messages (for example [de]serialization).
These are the specific tasks we should consider before going to the more general case of an arbitrary continuation.
In fact, while decoding can be seen as continuation, encoding is not, it is more like a prolog.

Also, it is interesting to consider that encoding and decoding tasks can be made/programmed in such a way that they cannot fail (and not throw).
Therefore in principle it is possible to disregard exceptions in this context.

Additionally, as I mentioned in other posts, I don't think that returning objects or values are a good idea, and this extends to asynchronous messaging too.
There are several reason for that and even a specific reason in this context.
If these future-like request return iterators-like objects instead of new value then we don't need to even worry about exceptions thrown during construction.

In summary, for request or future-likes that do not return values and are that restricted to only do encoding and decoding (or more generally epilogs or prologs that cannot fail and be noexcept) the implementation doesn't need to be as complicated or as heavy as what std::future offers right now.

Feedback on these ideas will be appreciated too.

correaa · 2022-02-19T19:55:02Z

You can use a std::expected instead of throwing. Even nicer is to allow both via macros.

any problem can be solved adding a level of indirection, except too many levels of indirection. (std::expected is the indirection here)

More seriously, i think returning values (or expected) do not reflect what MPI communication ultimately is, IO.
In the IO picture, object exists (maybe in unspecified but valid state) before communication.

returning values forces allocation even in cases where it is obvious it is not needed. (think of the case of receiving into a vector that already has enough capacity to receive the number of elements sent)

I do not understand why you are occupied with the idea of byte-level serialization, which to my knowledge is last resort practice.

i don't know in general, but in my case it is not byte-level serialization. the fundamental block of serialization are typed packages of basic types.
i call it encoding for the lack of a better word. what i refer to is a standard transformation of a data structure into packed format that both ends of a message have to agree upon.
also, byte-level serialization would break endianness compatibility, which, i won't defend, but it is a nice to have.

If you have proper reflection, or even precise flat reflection like MPL's or Boost.PFR, you often do not need byte-level serialization.

(static) reflection can get you so far. it doesn't solve all the problems. reflection is ok for generating custom data types which can be known at compilation but not much more.
it doesn't help with dynamic data structures (e.g. a multi block data structure, like std::queue or a CSR matrix) or MPI data types that in practice would take about the same memory as the size of the message itself (e.g. std::list).

I also do not understand what problem you have between std::future and serialization.

No problem, i am just pointing out that std::future are made to handle almost any kind of tasks.

And serialization, that is an important example for the need “ continuation", is not a general task, but a simpler one.

If you want one or more intermediate (de)serialization steps that are not async, then make them async compatible via https://en.cppreference.com/w/cpp/experimental/make_ready_future instead of opening callback points for them or using asymmetrical packing and unpacking to confuse the user.

i have to think about that.
yes, the idea is that generic asynchronous messaging (like in BMPI3) needs preprocessing or postpropocessing.

i would like to make this processing 1) asynchronous also, 2) optimally use the resources (threads, buffers) already given to MPI.
i don’t know how to do exactly yet.
this part is also work in progress.

Which iterators? Iterators of contiguous sequential containers (span, string, valarray, vector<!bool>)? Or iterators of non-contiguous sequential containers (deque, forward list, list, vector<bool>)? Or iterators of associative containers (map, unordered map, set, unordered set)?

All of the above, depends on the case. It can even be pure input and output iterators. (not that i recommend using them).

The BMPI3 "basic" interface is iterator-based, as you indicate.
it also returns other (new) iterators in the cases where the internal computations are hard or impossible to replicate outside the message call.

(STL is designed with the same philosophy, although not always got it right).

the asynchronous versions are not different in principle, in the sense that the request could return (like via future::get) iterators.
This is work in progress.

The latter two do not ensure contiguity, whereas MPI often prerequisites contiguity.

sure, low level interfaces require contiguity. (think of memcpy)

high level interfaces try to take advantage of them through direct or indirect means, even when data is not contiguous.
they do whatever possible for them with whatever resources it has available, heuristics, buffer, pinned memory, data types, packed-level serialization, byte-level serialization, etc.
and yes, under sufficiently complex situations they can fail to do their job efficiently (while still doing the job correctly).

MPI forces a C mentality, we think how to use them through contiguos arrays, and it is fine.
BMPI3 has a C++ mentality, (or STL).
It will try to do the best job possible and the idea is to have a decent base level of quality of implementation which will be work in progress for a while, and any help will be appreciated.

This is also confusing to me in your library Boost.MPI3. What happens when I pass a std::unordered_map::begin() and std::unordered_map::end() to your functions that accept iterators? Does my map get copied to contiguous memory e.g. a std::vector<std::pair> and then transmitted?

very good question.
(the answer has many corner cases because you didn't say what are the element types, but i am going to ignore this and assume the best possible scenario, that the datatype is a builtin).

but, yes, broadly speaking, what you describe is a good starting point solution.
(i will add some levels of details as we go.)
after all what is the alternative otherwise? partition the message in N smaller messages with one element (or pair) each)? that is, as you know unacceptable.

the solution you propose works and one has to accept that the user had a very good reason to use a unordered_map to begin with. the user has to know the cost of transversal in general and communication in particular of such specialized data structure.

an important point before continuing is that if you pass a pair of iterators the library lost already the information that the container is associative.

the only information that it has is that the range is defined by a pair of iterators that are bidirectional iterators and that the elements are decomposable as pairs.

Where does that std::vector<std::pair> live if the call is immediate?

ok, yes, assuming we are going this route then the vector lives in some sort of free store. a possible candidate is the default heap (std::allocator) and that would work.

but we can do better, we have access to the MPI system as well, and to the communicator, with all its hypothetical buffers. we also know we are copying to the vector for the sake of communicating, nothing else.

Therefore what the library should do is to put the vector in MPI pinned memory, which if it is available, can make the communication faster).

what if there is no enough pinned memory?, well, then a series of few smaller intermediate vectors can built and sent, one at a time.

if many vectors are necessary to be constructed and destructed maybe it also a good idea not to allocate each one and use a single one or use a specialized arena allocator.

so as you see, it can get intricate internally. there are levels of optimizations one can take advantage from.

is this the only way to do this? no, i can also take advantage that the elements are pairs so can construct two vectors one for each type. i am not doing this, maybe if it is proven to work across multiple systems, one can write (inside the library) special code for this. what i am trying to illustrate is that one can optimize up to different levels.

What about std::vector::begin() and std::vector::end()? Do you still make a copy like you would in the std::map case or do you somehow detect it and avoid the copy?

no, I don't, first of all, at this point I have a temporary vector and I can send it directly. i know it is a vector.

but anyway, if you were to pass a vector::begin() and vector::end() the library (not necessarily with your help) detects that these are random-access and contiguous iterators so it knows how to handle this case, without intermediate copies.

i will stop the details of what i am doing internally here.
i hope the idea is clear even if you disagree with it in general or in the details.
the important point is that this is all internal to the library.

You see? Iterators are confusing in this context.

sorry, no, i don't see.
what is confusing about this? this is work that the library does for you.
if the implementation i described confuses you that’s fine: it is just that, an implementation; it is enough for you to know that an unordered_map has costly transversal and it is not contiguous. and if your dataset is small enough you can even get away with not knowing that.

when you use iterators… do you worry if they use memcpy at some point below? maybe, maybe not. if you don't have many elements you might not care.
of course if you want performance you need to know your data structures: do not expect that unsorted_map would be able to take much advantage of hardware or low level MPI primitives.

to finish, the two types of iterators that you mentioned belong to two different iterator categories, and they naturally have different performance guarantees.

In summary, for request or future-likes that do not return values and are that restricted to only do encoding and decoding (or more generally epilogs or prologs that cannot fail and be noexcept) the implementation doesn't need to be as complicated or as heavy as what std::future offers right now.

Yes as you can see in the 89 liner above.

Yes to what exactly? (what is the “89 liner”?)

yes to that prologues and epilogues do need to be handled by things as heavy as futures?

maybe, i didn't write all the possible epilogues and prologues that could be necessary so, yes, this is, until proven correct a guess. the fundamental difference is that prologues and preambles do not need to return values, like future are designed to do. my prologues work with elements that are already there in some sense, the do not need to return anything "new".

Thank you for your questions. -- A

bangerth · 2022-02-21T04:18:40Z

On 2/19/22 12:55, Alfredo Correa wrote: More seriously, i think returning values (or expected) do not reflect what MPI communication ultimately is, IO. In the IO picture, object exists (maybe in unspecified but valid state) before communication.

Just to be clear, this is not what I wanted to advocate for. The actual send and receive buffers should be allocated by the user. It is things such as the output integer arguments of `MPI_Comm_rank` and `MPI_Comm_size` that would be nice to return, as well as `MPI_Request` objects by immediate functions.

correaa · 2022-02-21T08:39:02Z

On 2/19/22 12:55, Alfredo Correa wrote:

More seriously, i think returning values (or expected) do not reflect what MPI
communication ultimately is, IO.
In the IO picture, object exists (maybe in unspecified but valid state) before
communication.

Just to be clear, this is not what I wanted to advocate for. The actual send
and receive buffers should be allocated by the user. It is things such as the
output integer arguments of MPI_Comm_rank and MPI_Comm_size that would be
nice to return, as well as MPI_Request objects by immediate functions.

thank you for the very important clarification.

if you are referring to your quote "return whatever they are producing by-value, rather than through arguments; ...", and by values you didn't mean the values of the communicated data, then, yes, i am in the same page.

maybe @acdemiralp was referring to the same thing as well and i also misinterpreted.

mhoemmen · 2022-02-21T16:21:07Z

I'm all for this kind of stuff. But do you want to standardize on things that are only available in C++23 or C++26?

P2300 won't make C++23, though it has a good chance at C++26.
I've seen plenty of MPI 1.x code in the wild. This suggests that people shouldn't worry about requiring newer versions of a programming language in newer versions of MPI, because users will always be able to fall back to implementations of older MPI versions.
That being said, a standard should standardize existing practice. Thus, I'd rather see one or more examples of a senders/receivers-based C++ MPI interface first, before considering its standardization. P2300 is a library solution with existing implementations, so interested parties should feel welcome to try this. P2300's authors are open to considering more use cases, so now would be a good time to explore using senders/receivers.
I think MPI (2-sided or 1-sided) is a poor match for senders/receivers, but am open to discussion.

VictorEijkhout · 2022-02-21T16:50:50Z

On , 2022Feb21, at 10:21, Mark Hoemmen ***@***.******@***.***>> wrote: I'm all for this kind of stuff. But do you want to standardize on things that are only available in C++23 or C++26? I’m all for letting the C++ interface be “syntactic sugar” around MPI: - tag=0 by default - Wolfgang’s function result for Comm_rank & Isend - no receive buffer for non-root reduction Considering what a terrible mess threading is in C++ (every next standard seems to say “Oh no, we should have done it this way”) I think it’s a bad idea to adopt that terminology for sends/requests/whatever. I think MPL is striking a good balance: C++17 where it simplifies expression, but no introduction of syntax with loaded meaning. Victor.

mhoemmen · 2022-02-21T22:04:18Z

@VictorEijkhout wrote:

Considering what a terrible mess threading is in C++ (every next standard seems to say “Oh no, we should have done it this way”)....

I'll fight you on that one, my friend Victor : - ) .

std::thread is a perfectly fine wrapper for an operating system thread. It never aimed to be anything more.
Regarding "every next standard seems to say...," the only way in which the Standard has actually changed was in discouraging use of release-consume memory ordering. That came out of some recent academic work. I've never seen code in the wild that uses this ordering.
I've written and used thread-parallel C++ code for over a decade. It works fine and it runs at scale.

You don't have to like C++, but phrases like "terrible mess" just aren't accurate. I would say MPI is a bigger mess; consider, for example, how long it's taking the community of MPI experts to decide what MPI_THREAD_MULTIPLE means.

ibaned · 2022-04-05T00:47:53Z

Reading through some of this discussion, it strikes me that the primary pitfall is the sheer size and complexity of ISO C++ and the temptation to ask ourselves how an MPI interface might be compatible with every single feature of C++.

Thinking of how an MPI interface could interact with ranges, reflection, threading, executors, etc. is an exciting exercise but seems to lead to an MPI interface that is as large as the ISO C++ standard itself.

My thought is that the C++ interface to MPI should look more like the MPI standard than the ISO C++ standard. By this I mean that it should mainly consist of applying tried-and-true (albeit less exciting) C++ features consistently over the whole interface. I'm convinced enough of this principle of simplicity that I made a C++ interface to MPI that I am using in large projects:

https://github.com/sandialabs/mpicpp

Here are the tried-and-true, non-controversial and non-daunting features of C++ that it applies to MPI so far:

RAII for requests, communicators, etc. with unique ownership and move semantics. This also encompasses non-blocking semantics by having the destructor of a request wait on the request. Ignoring a returned request is equivalent to calling a blocking function.
Exception-based error handling. Throws exceptions everywhere that the C MPI interface returns an error code.
Deduction of MPI_Datatype for C++ types but only for pre-defined MPI_Datatypes

Personally, I don't currently have code that sends user-defined structs or maps of lists that is begging for reflection, nor code that calls MPI from multiple threads that would really benefit from concurrency compatibility.

I think a minimal system like this would be a good starting point, and over time it can add compatibility with more and more C++ features. Adding compatibility with a new feature should consider carefully the maintenance cost of this part of the MPI C++ interface (both standardization and implementation), the stability and user experience of the C++ feature itself, and the clear benefit to existing users of MPI.

correaa · 2022-04-05T03:57:59Z

Hi @ibaned

My thought is that the C++ interface to MPI should look more like the MPI standard than the ISO C++ standard. By this I mean that it should mainly consist of applying tried-and-true (albeit less exciting) C++ features consistently over the whole interface. I'm convinced enough of this principle of simplicity that I made a C++ interface to MPI that I am using in large projects:

https://github.com/sandialabs/mpicpp

Yes, you leave with no option other than to agree with you. :)
These are exactly the principles I designed my wrapper https://github.com/LLNL/b-mpi3 around.

The subtitle of the project is "This aims to be an wrapper to C-MPI3 for C++, using the principles of simplicity, STL, RAII and Boost and enforcing type-safety."

I would like to comment some subtleties below.

Here are the tried-and-true, non-controversial and non-daunting features of C++ that it applies to MPI so far:

1. RAII for requests, communicators, etc. with unique ownership and move semantics. This also encompasses non-blocking semantics by having the destructor of a request wait on the request. Ignoring a returned request is equivalent to calling a blocking function.

I couldn't agree more, if I have to choose a single principle it would be this one.
RAII starts by writing the necessary destructors/constructor pairs, that is more or less mechanical, but it doesn't end there: one has to think about other fundamental operations and more importantly if they make sense, assignment, move-assignment, copy-construction and move-constructor.

RAII also touches the broader topic of "guarantees", modern C++ is all about guarantees in my opinion, thread safety and exception safety.
Make the code exception safe is the real challenge.

2. Exception-based error handling. Throws exceptions everywhere that the C MPI interface returns an error code.

In principle yes, however I would like to add that logical errors should not be handled by exceptions at all.
When I read the documentation of MPI many "return" errors look like logical errors, therefore I don't see the urgent need to handle them with exceptions.
At the end, the situation I find my self is that most of the errors that the C MPI interface reports should not even be converted exceptions.
(We could still throw exceptions but there is little gain in doing so. I am a fan of the concept of narrow and wide contracts and not "defining undefined behavior".)

3. Deduction of `MPI_Datatype` for C++ types but only for pre-defined `MPI_Datatype`s

I agree, if something can be mapped to a MPI_Datatype (and the size of the MPI_Datatypes is less than O(N)) we should use all the tools at our disposal to achieve that (including dark magic).

Having said that, it is a fact of life that not all value objects have a MPI_Datatype of of size less than O(N).
The question is what to do in these cases, should we go beyond and use magic/user helper code? or just say that anything like that would not be handled? and the user is responsible for communicating such complicated datastructures.

In https://github.com/LLNL/b-mpi3, I went the route of 0) detect basic MPI_Datatypes, basic datastructures, if that doesn't 1) attempt (at compile time) to construct an MPI_Datatype, if that doesn't work 2) invoke serialization routines if available, fail (at compile time) otherwise.

(The boundary between 1) and 2) is tricky and I don't have a general way to handle that).

introduces the need of a serialization framework, which may or may not introduce a hard dependency on a third-party serialization library, such as Cereal or Boost.Serialization.

Personally, I don't currently have code that sends user-defined structs or maps of lists that is begging for reflection,

Here it is an example of a custom class sent communicated by MPI: https://github.com/LLNL/b-mpi3/blob/master/test/communicator_send_class.cpp

nor code that calls MPI from multiple threads that would really benefit from concurrency compatibility.

I think the library should be thread-compatible, and thread-safe only if the user wants to handle it. I think there are simple rules to achieve that and at the least be transparent about the relation between communication and threads.

I think a minimal system like this would be a good starting point, and over time it can add compatibility with more and more C++ features. Adding compatibility with a new feature should consider carefully the maintenance cost of this part of the MPI C++ interface (both standardization and implementation), the stability and user experience of the C++ feature itself, and the clear benefit to existing users of MPI.

I agree, not every C++ feature should be used, reflected or taken into account by MPI C++ interface.
Hopefully most features will be orthogonal or simply play nice with what we achieve.

tschuett · 2022-04-10T19:24:46Z

A simple wrapper around the C-API ill never go far. Exploiting C++ features to make MPI more ergonomic to use sounds more interesting: - RAII - coroutines - reflection - futures - … But I also heard that MPI-calls represent only a minor part of real world applications. Still they could be a performance bottleneck.

…

On 5. Apr 2022, at 05:58, Alfredo Correa ***@***.***> wrote: Hi @ibaned My thought is that the C++ interface to MPI should look more like the MPI standard than the ISO C++ standard. By this I mean that it should mainly consist of applying tried-and-true (albeit less exciting) C++ features consistently over the whole interface. I'm convinced enough of this principle of simplicity that I made a C++ interface to MPI that I am using in large projects: https://github.com/sandialabs/mpicpp Yes, you leave with no option other than to agree with you. These are exactly the principles I designed my wrapper https://github.com/LLNL/b-mpi3 around. The subtitle of the project is "This aims to be an wrapper to C-MPI3 for C++, using the principles of simplicity, STL, RAII and Boost and enforcing type-safety." I would like to comment some subtleties below. Here are the tried-and-true, non-controversial and non-daunting features of C++ that it applies to MPI so far: 1. RAII for requests, communicators, etc. with unique ownership and move semantics. This also encompasses non-blocking semantics by having the destructor of a request wait on the request. Ignoring a returned request is equivalent to calling a blocking function. I couldn't agree more, if I have to choose a single principle it would be this one. RAII starts by writing the necessary destructors/constructor pairs, that is more or less mechanical, but it doesn't end there: one has to think about other fundamental operations and more importantly if they make sense, assignment, move-assignment, copy-construction and move-constructor. RAII also touches the broader topic of "guarantees", modern C++ is all about guarantees in my opinion, thread safety and exception safety. Make the code exception safe is the real challenge. 2. Exception-based error handling. Throws exceptions everywhere that the C MPI interface returns an error code. In principle yes, however I would like to add that logical errors should not be handled by exceptions at all. When I read the documentation of MPI many "return" errors look like logical errors, therefore I don't see the urgent need to handle them with exceptions. At the end, the situation I find my self is that most of the errors that the C MPI interface reports should not even be converted exceptions. (We could still throw exceptions but there is little gain in doing so. I am a fan of the concept of narrow and wide contracts and not "defining undefined behavior".) 3. Deduction of `MPI_Datatype` for C++ types but only for pre-defined `MPI_Datatype`s I agree, if something can be mapped to a MPI_Datatype (and the size of the MPI_Datatypes is less than O(N)) we should use all the tools at our disposal to achieve that (including dark magic). Having said that, it is a fact of life that not all value objects have a MPI_Datatype of of size less than O(N). The question is what to do in these cases, should we go beyond and use magic/user helper code? or just say that anything like that would not be handled? and the user is responsible for communicating such complicated datastructures. In https://github.com/LLNL/b-mpi3, I went the route of 0) detect basic MPI_Datatypes, basic datastructures, if that doesn't 1) attempt (at compile time) to construct an MPI_Datatype, if that doesn't work 2) invoke serialization routines if available, fail (at compile time) otherwise. (The boundary between 1) and 2) is tricky and I don't have a general way to handle that). • introduces the need of a serialization framework, which may or may not introduce a hard dependency on a third-party serialization library, such as Cereal or Boost.Serialization. Personally, I don't currently have code that sends user-defined structs or maps of lists that is begging for reflection, Here it is an example of a custom class sent communicated by MPI: https://github.com/LLNL/b-mpi3/blob/master/test/communicator_send_class.cpp nor code that calls MPI from multiple threads that would really benefit from concurrency compatibility. I think the library should be thread-compatible, and thread-safe only if the user wants to handle it. I think there are simple rules to achieve that and at the least be transparent about the relation between communication and threads. I think a minimal system like this would be a good starting point, and over time it can add compatibility with more and more C++ features. Adding compatibility with a new feature should consider carefully the maintenance cost of this part of the MPI C++ interface (both standardization and implementation), the stability and user experience of the C++ feature itself, and the clear benefit to existing users of MPI. I agree, not every C++ feature should be used, reflected or taken into account by MPI C++ interface. Hopefully most features will be orthogonal or simply play nice with what we achieve. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

tschuett · 2022-04-10T19:33:28Z

Writing a simple wrapper around the C-API will never go far. Using unique C++ features to make MPI more ergonomic to use sounds more interesting: - RAII - coroutines - reflection - futures - … But I also heard that MPI is in lines of code not a dominating factor in real world applications. MPI being performance bottleneck is more likely.

…

On 5. Apr 2022, at 05:58, Alfredo Correa ***@***.***> wrote: Hi @ibaned My thought is that the C++ interface to MPI should look more like the MPI standard than the ISO C++ standard. By this I mean that it should mainly consist of applying tried-and-true (albeit less exciting) C++ features consistently over the whole interface. I'm convinced enough of this principle of simplicity that I made a C++ interface to MPI that I am using in large projects: https://github.com/sandialabs/mpicpp Yes, you leave with no option other than to agree with you. These are exactly the principles I designed my wrapper https://github.com/LLNL/b-mpi3 around. The subtitle of the project is "This aims to be an wrapper to C-MPI3 for C++, using the principles of simplicity, STL, RAII and Boost and enforcing type-safety." I would like to comment some subtleties below. Here are the tried-and-true, non-controversial and non-daunting features of C++ that it applies to MPI so far: 1. RAII for requests, communicators, etc. with unique ownership and move semantics. This also encompasses non-blocking semantics by having the destructor of a request wait on the request. Ignoring a returned request is equivalent to calling a blocking function. I couldn't agree more, if I have to choose a single principle it would be this one. RAII starts by writing the necessary destructors/constructor pairs, that is more or less mechanical, but it doesn't end there: one has to think about other fundamental operations and more importantly if they make sense, assignment, move-assignment, copy-construction and move-constructor. RAII also touches the broader topic of "guarantees", modern C++ is all about guarantees in my opinion, thread safety and exception safety. Make the code exception safe is the real challenge. 2. Exception-based error handling. Throws exceptions everywhere that the C MPI interface returns an error code. In principle yes, however I would like to add that logical errors should not be handled by exceptions at all. When I read the documentation of MPI many "return" errors look like logical errors, therefore I don't see the urgent need to handle them with exceptions. At the end, the situation I find my self is that most of the errors that the C MPI interface reports should not even be converted exceptions. (We could still throw exceptions but there is little gain in doing so. I am a fan of the concept of narrow and wide contracts and not "defining undefined behavior".) 3. Deduction of `MPI_Datatype` for C++ types but only for pre-defined `MPI_Datatype`s I agree, if something can be mapped to a MPI_Datatype (and the size of the MPI_Datatypes is less than O(N)) we should use all the tools at our disposal to achieve that (including dark magic). Having said that, it is a fact of life that not all value objects have a MPI_Datatype of of size less than O(N). The question is what to do in these cases, should we go beyond and use magic/user helper code? or just say that anything like that would not be handled? and the user is responsible for communicating such complicated datastructures. In https://github.com/LLNL/b-mpi3, I went the route of 0) detect basic MPI_Datatypes, basic datastructures, if that doesn't 1) attempt (at compile time) to construct an MPI_Datatype, if that doesn't work 2) invoke serialization routines if available, fail (at compile time) otherwise. (The boundary between 1) and 2) is tricky and I don't have a general way to handle that). • introduces the need of a serialization framework, which may or may not introduce a hard dependency on a third-party serialization library, such as Cereal or Boost.Serialization. Personally, I don't currently have code that sends user-defined structs or maps of lists that is begging for reflection, Here it is an example of a custom class sent communicated by MPI: https://github.com/LLNL/b-mpi3/blob/master/test/communicator_send_class.cpp nor code that calls MPI from multiple threads that would really benefit from concurrency compatibility. I think the library should be thread-compatible, and thread-safe only if the user wants to handle it. I think there are simple rules to achieve that and at the least be transparent about the relation between communication and threads. I think a minimal system like this would be a good starting point, and over time it can add compatibility with more and more C++ features. Adding compatibility with a new feature should consider carefully the maintenance cost of this part of the MPI C++ interface (both standardization and implementation), the stability and user experience of the C++ feature itself, and the clear benefit to existing users of MPI. I agree, not every C++ feature should be used, reflected or taken into account by MPI C++ interface. Hopefully most features will be orthogonal or simply play nice with what we achieve. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

mhoemmen · 2022-04-10T22:53:23Z

@ibaned Hi! : - ) Excellent points y'all! Some thoughts on your list of C++ features:

RAII for requests, communicators, etc. with unique ownership and move semantics.

I'm not actually convinced that code should manage lifetimes of MPI communicators at all. Idiomatic C++ destructors are nonblocking and nonthrowing, while MPI "destructors" are collective and possibly blocking. C++ code paths can diverge, which may break the requirement to call free functions collectively.

I see a callback-based model as more natural. MPI kind of already does this. For example, MPI_Init effectively launches a program that takes "nonowning references" to MPI_COMM_WORLD and MPI_COMM_SELF, and MPI_Comm_split effectively launches some number of programs with a rebinding of the "current communicator." The idea is that the callback would take its "current communicator" as a function argument. The "current communicator"'s lifetime would strictly contain the callback's execution. Users would then not need to think about communicator lifetimes at all. The callback approach would also expose nonblocking execution for communicator creation functions that require communication. One challenge with this technique, though, would be handling more complicated intercommunicator lifetimes.

Exception-based error handling. Throws exceptions everywhere that the C MPI interface returns an error code.

Exceptions are for recoverable errors. How would I write code to recover from MPI_Comm_free returning something other than MPI_SUCCESS?

ibaned · 2022-04-11T18:21:20Z

Hi @mhoemmen !

I agree that the collective requirement on MPI "constructor/descructor" C APIs is a different requirement than just the scope executed on one rank. The design I landed on is based on the idea of "standardize current practice", where the majority of existing code I see does use C++ scopes to denote these parallel collective lifetimes, so using RAII to do this wouldn't change their mental model. Your callback model sounds interesting, I wonder how much work is involved in building the "runtime" that executes those callbacks and handles communicator lifetimes.

Working on "parameter scans" of many small simulations has really changed my perspective on what is a recoverable error. Even if the simulation code has a bug and MPI basically says "you gave me invalid arguments", we can still terminate the current simulation and just start a new simulation (even better, if using RAII then its communicators are safely freed). Our users are often upset that one or two out of ten thousand simulations failed and all the data points are lost not just those two.

The other use case for exceptions is to build up useful debugging information while fatally exiting. Calling code in outer scopes can catch the MPI exception and throw a new exception with further information, like a manual stacktrace with application-specific metadata included.

My new perspective (I definitely didn't this think before) is that it is not up to the point of origin to decide what is recoverable, it is up to the calling code what to do with an exception: catch it and recover, catch it and add useful debug info but don't recover, or don't catch it at all.

bangerth · 2022-04-11T19:29:21Z

On 4/11/22 12:21, Dan Ibanez wrote: My new perspective (I definitely didn't this think before) is that it is not up to the point of origin to decide what is recoverable, it is up to the calling code what to do with an exception: catch it and recover, catch it and add useful debug info but don't recover, or don't catch it at all.

Precisely -- every layer that doesn't know how to recover should simply punt and let the layer above it make that call (which may include punting itself). That's exactly the semantics of how exceptions work: If not caught, they just propagate upwards. I will add that exceptions also avoid the common practice of not checking error codes.

devreal · 2022-04-11T23:00:45Z

Exceptions and RAII for MPI objects with non-local collective destruction semantics (files and windows, potentially communicators) won't match well. If an exception triggers the destructor of such object the application will deadlock and users won't even see why because the exception never makes it up the call chain. Such an interface would require littering code with try-catch blocks even if you are perfectly fine with not handling the exception and letting the application die, which is hardly more readable than checking return codes.

Other MPI types work perfectly fine with RAII (data types, ops, info objects). It would be somewhat inconsistent though.

Writing a simple wrapper around the C-API will never go far.
Using unique C++ features to make MPI more ergonomic to use sounds more interesting:

RAII

coroutines

reflection

futures

…

I'd like to see a good use-case for coroutines in MPI. Not everything that is possible is suitable and efficient. Also, there is a proposal for continuations under active discussion, which would (hopefully) work well with future.then() semantics.

mhoemmen · 2022-04-11T23:03:32Z

Hi @ibaned ! Always good to discuss C++ with you! : - )

Your callback model sounds interesting, I wonder how much work is involved in building the "runtime" that executes those callbacks and handles communicator lifetimes.

I built a callback-based interface like this a few years ago for access to a global (distributed) object's local data. It was an almost entirely compile-time wrapper around Kokkos::DualView. I don't think the MPI communicator version would need more run-time tracking than current code already needs to do. Contact me offline if you'd like an overview of the design.

My new perspective (I definitely didn't this think before) is that it is not up to the point of origin to decide what is recoverable, it is up to the calling code what to do with an exception: catch it and recover, catch it and add useful debug info but don't recover, or don't catch it at all.

Sure, I suppose that if MPI_Comm_split fails, the algebraic multigrid library could carefully handle and report this to the library encapsulating linear solvers, and the latter library could then fall back to a slower multigrid hierarchy construction strategy, or even CG + domain decomposition. In practice, though, this requires structuring code as a sequence of transactions. If any MPI process fails to catch an exception that needs to be handled collectively, a crash is the good outcome and a hang is the more likely outcome. I think it's too hard to get a whole team to write a whole code base like this without a helpful programming model, that forces them to write transactional code and doesn't let them own MPI state.

ibaned · 2022-04-11T23:10:05Z

I think the point being made here about the case of only one rank in a communicator throwing an exception is a good and important point... I agree that some structure or tools to help the user deal with non-collective failures in otherwise collective environments is needed. This might tie into regular MPI standardization efforts around resilience.

devreal · 2022-04-11T23:16:30Z

On second thought, you might be right: maybe we just need guaranteed local destruction semantics for all MPI objects...

correaa · 2022-04-12T19:38:09Z

I am squarely in the camp of the deterministic destruction and release of resources in C++.
If nothing else, at least, because this is what allows (me) to reason about programs and performance.

The blocking aspect is in fact the exact dilemma I have for the destructor of the mpi3::communicator object in my library https://gitlab.com/correaa/boost-mpi3.
Whether to use MPI_free or MPI_disconnect for destruction.
After some back and forth I decided that MPI_disconnect is more correct and desirable.

Among other things, this fits with the view that pending messages are some sort of dependent resource of the communicator.

jeffhammond self-assigned this Apr 24, 2020

simonbyrne mentioned this issue Jun 4, 2020

higher-level interface JuliaParallel/MPI.jl#395

Open

rabauke mentioned this issue Jun 13, 2020

Visibility is low due to name conflicting boost-mpl. rabauke/mpl#11

Closed

a-jp mentioned this issue Sep 1, 2020

send a std::variant with isend/irecv rabauke/mpl#12

Closed

wesbland unassigned jeffhammond Jul 6, 2022

jeffhammond mentioned this issue Dec 9, 2022

consider C++ constexpr issues in how we define handle constants mpiwg-abi/abi-issues#9

Closed

wesbland added needs guidance Needs guidance on what chapter committees need to do mpi-5 For inclusion in the MPI 5.0 standard labels Jun 14, 2023

lukashuebner mentioned this issue Aug 8, 2023

What do users want in an C++ MPI Interface? kamping-site/kamping#537

Open

What features do users need from an MPI C++ interface? #288

What features do users need from an MPI C++ interface? #288

Comments

jeffhammond commented Apr 24, 2020

Original Prompt

jeffhammond commented Apr 24, 2020

jeffhammond commented Apr 24, 2020 • edited Loading

jeffhammond commented Apr 24, 2020

jeffhammond commented Apr 24, 2020

mhoemmen commented Apr 27, 2020

mhoemmen commented Apr 27, 2020

omor1 commented Apr 27, 2020

omor1 commented Apr 27, 2020

mhoemmen commented Apr 28, 2020

sg0 commented Apr 28, 2020 via email

mhoemmen commented Apr 28, 2020

StellarTodd commented May 12, 2020

jeffhammond commented May 18, 2020 via email • edited Loading

raffenet commented May 18, 2020

mhoemmen commented May 25, 2020

omor1 commented May 25, 2020

omor1 commented May 25, 2020

mhoemmen commented May 26, 2020

rabauke commented May 28, 2020

omor1 commented May 29, 2020

rabauke commented May 29, 2020

omor1 commented May 29, 2020

VictorEijkhout commented Jun 13, 2020

jeffhammond commented Jun 16, 2020

mhoemmen commented Jun 17, 2020

hzhangxyz commented Dec 23, 2020

VictorEijkhout commented Feb 17, 2022 via email

bangerth commented Feb 17, 2022 via email

jacobmerson commented Feb 17, 2022

bangerth commented Feb 17, 2022 via email

sg0 commented Feb 17, 2022

bkmgit commented Feb 18, 2022

correaa commented Feb 18, 2022

correaa commented Feb 19, 2022 • edited Loading

bangerth commented Feb 21, 2022 via email

correaa commented Feb 21, 2022

mhoemmen commented Feb 21, 2022

VictorEijkhout commented Feb 21, 2022 via email

mhoemmen commented Feb 21, 2022

ibaned commented Apr 5, 2022

correaa commented Apr 5, 2022 • edited Loading

tschuett commented Apr 10, 2022 via email

tschuett commented Apr 10, 2022 via email

mhoemmen commented Apr 10, 2022

ibaned commented Apr 11, 2022

bangerth commented Apr 11, 2022 via email

devreal commented Apr 11, 2022

mhoemmen commented Apr 11, 2022

ibaned commented Apr 11, 2022

devreal commented Apr 11, 2022

correaa commented Apr 12, 2022

jeffhammond commented Apr 24, 2020 •

edited

Loading

jeffhammond commented May 18, 2020 via email •

edited

Loading

correaa commented Feb 19, 2022 •

edited

Loading

correaa commented Apr 5, 2022 •

edited

Loading