Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency on bindgen #28

Closed
Luthaf opened this issue Nov 6, 2017 · 14 comments
Closed

Remove dependency on bindgen #28

Luthaf opened this issue Nov 6, 2017 · 14 comments

Comments

@Luthaf
Copy link

Luthaf commented Nov 6, 2017

Hi !

I would love to use this crate in my application, but I have a single issue with it, which is that the ffi bindings are generated at compile time for the specific mpi implementation used by using bindgen.

While I like bindgen, it has a lot of issues, especially when using it at compile time: the final user needs to have libclang installed, at the right version and at the right place to be sure that code generation works.

I think this can be a big hurdle when trying to use this crate. Here are a few examples:

Why is bindgen used

This is my understanding of why bindgen is used at compile time in this crate, please correct me if I am wrong!

MPI does not have any stable ABI, only a specification and a C API. This crate uses a small C shim to ensure #defined symbols are available to Rust, and bindgen to parse the mpi headers and then generate corresponding ffi declaration for Rust.

Removing bindgen dependency

I think it could be possible to remove bindgen dependency at build time by pre-generating the FFI declaration for all the different MPI implementations and versions, and then detecting which one to use in build.rs by parsing the output of mpicc.

Of course generating the declaration for every single implementation and every single version is not going to be practical, and thus one could generate the declaration for some implementation/versions couples (starting with the latest release of OpenMPI and MPICH for example), and then defaulting to use bindgen for all the others cases.This would keep the benefits of having an easy way to use this crate, even with exotic MPI implementation, while having smaller build time and simpler build for 80% of the cases.

Please tell me what you think of this! Is there something I overlooked?

If you agree with this proposal, I could try to implement it, I have some experience with bindgen and Rust FFI.

@bsteinb
Copy link
Collaborator

bsteinb commented Nov 6, 2017

Hi there!

I understand your concerns about having bindgen as a build dependency. There is indeed some tension between the policies regarding selection of software versions in HPC environments (where often in my experience a somewhat conservative choice is made) and having a quite recent version of Clang as a requirement. I do not think it is quite as bad as you make it seem (I am not convinced #20 or #22 are connected to bindgen), but I concede that not having bindgen as a build dependency would be less painful (for build times alone).

Your observations as to why bindgen is used in rsmpi are correct, as is your conjecture that the FFI declarations could – in principle – be pre-generated and shipped along rsmpi. The problem with this strategy is precisely as you say

pre-generating the FFI declaration for all the different MPI implementations and versions

and

generating the declaration for every single implementation and every single version is not going to be practical

I rarely get around to working on rsmpi and I do not think the project is at a point where I should devote my time to simplifying the installation procedure for production users. However, if you offer to make a contribution in this area, I am inclined to accept it, especially since – as it only concerns the build infrastructure – it should not influence refactorings of the library itself.

An acceptable contribution should – I think – contain at least the following:

  1. Both mechanisms (pre-generated FFI declarations and build-time bindgen) should still be available, one the default (preferably the pre-generated FFI declarations) and the other via a cargo feature.
  2. A well-researched algorithm for inspecting all relevant aspects of the current build environment (this includes finding out which aspects are relevant in the first place: MPI vendor, MPI version, host triple, rustc version, ...) and determining whether an appropriate pre-generated FFI declaration is available
  3. Pre-generated bindings for at least those MPICH and Open MPI versions that are tested against on Travis
  4. An easy way of using the build-time bindgen mechanism to add new pre-generated FFI declarations
  5. Documentation

If this list of requirements makes this task too daunting, I completely understand, I already admitted that I am also not willing to work on this (at least for now). However I feel like anything less would only make this a brittle work-around. If you do still want to work on it, go for it!

Something that could make this task easier is this initiative by the MPICH project which aims to offer a somewhat stable ABI across certain versions of MPICH and various other MPI libraries based on it http://www.mpich.org/abi/. However, the information on that page seems to be a bit stale. Similar information for Open MPI can be found here https://www.open-mpi.org/software/ompi/versions/.

@Luthaf
Copy link
Author

Luthaf commented Nov 6, 2017

Thank you for the quick answer!

Both mechanisms (pre-generated FFI declarations and build-time bindgen) should still be available, one the default (preferably the pre-generated FFI declarations) and the other via a cargo feature.

Yes, this is how I see this implemented too.

A well-researched algorithm for inspecting all relevant aspects of the current build environment (this includes finding out which aspects are relevant in the first place: MPI vendor, MPI version, host triple, rustc version, ...) and determining whether an appropriate pre-generated FFI declaration is available

I feel like this would be the hardest part. I don't really know that much about MPI, so I guess the following would be relevant:

  • MPI vendor;
  • MPI version;
  • host-triple;

I don't see rustc version relevant here, the FFI will always work the same due to backward compatibility requirements. But as we are talking about C software here, maybe the C compiler used will have an influence? Or some compiler flags maybe too (like how in fortran you can specify the default size of integer at the command line)? Or is this all abstracted by mpicc?

An easy way of using the build-time bindgen mechanism to add new pre-generated FFI declarations

This should be as easy as copying the generated file and adding some lines in build.rs to emit the corresponding cfg.

Something that could make this task easier is this initiative by the MPICH project which aims to offer a somewhat stable ABI across certain versions of MPICH and various other MPI libraries based on it http://www.mpich.org/abi/. However, the information on that page seems to be a bit stale. Similar information for Open MPI can be found here https://www.open-mpi.org/software/ompi/versions/.

I did not knew about this, this is very nice! Does this mean that all the listed implementation on the MPICH page are ABI compatible? And maybe the compatibility extends to the following compatible releases (for some definition of compatible ^^)

@bsteinb
Copy link
Collaborator

bsteinb commented Nov 7, 2017

I feel like this would be the hardest part. I don't really know that much about MPI, so I guess the following would be relevant:

I agree this is probably the largest chunk of work. It is not really about MPI though. None of this is specified by the standard.

  • MPI vendor;
  • MPI version;
  • host-triple;

Yes, note that MPI version here means MPI library version (as in Open MPI 1.10.0) not the version of the MPI standard.

I don't see rustc version relevant here, the FFI will always work the same due to backward compatibility requirements.

Yeah, I am pretty sure that it is not of concern at the moment. I do think bindgen can emit things that go beyond just type declarations, like impls for Clone. One could think of a scenario where this extra stuff could in the future come to rely on rustc features that are not backwards compatible. Probably I am just being too paranoid.

But as we are talking about C software here, maybe the C compiler used will have an influence? Or some compiler flags maybe too (like how in fortran you can specify the default size of integer at the command line)? Or is this all abstracted by mpicc?

Your guess is as good as mine. I would say if the headers of an MPI library built with two different C compilers are the same than the compiler does not matter.

One other thing I just thought of: it might not be legal to distribute pre-generated FFI declarations that are based on header files of some of the commercial MPI libraries. E.g. the mpi.h shipped with Intel MPI contains some writing that makes me very reluctant to distribute something that is based on it.

@Luthaf
Copy link
Author

Luthaf commented Nov 13, 2017

One other thing I just thought of: it might not be legal to distribute pre-generated FFI declarations that are based on header files of some of the commercial MPI libraries. E.g. the mpi.h shipped with Intel MPI contains some writing that makes me very reluctant to distribute something that is based on it.

I did not thought of this =/ Yeah, it might be hard to distributed some of the bindings.

For Intel MPI specifically, it looks like it is based on MPICH, so we might get around the issue by using the same bindings for MPICH and Intel.

But maybe rust-lang/rust-bindgen#918 is a better solution to this problem. Bundling libclang would make most of my initial problems go away. I'll try to investigate both solutions.

@bsteinb
Copy link
Collaborator

bsteinb commented Jan 30, 2018

There has been no movement here or over in the bindgen issue (which I have subscribed to) since November. Closing this for now.

@bsteinb bsteinb closed this as completed Jan 30, 2018
@AndrewGaspar
Copy link
Contributor

AndrewGaspar commented Mar 1, 2018

I've got an idea for a perhaps more tractible solution to this problem.

We could add some tool that generates a vendored version of mpi-sys (as a tar ball or something like that). Then the user can use the [patch] directive to replace mpi-sys with their vendored version. My impression is that HPC systems tend to have a finite combination of compiler-MPI-version tuples, so you could easily pre-generate the mpi-sys for each MPI version you need once. When new MPI versions or compiler versions are added, you can just re-vendor.

The benefits of this are:

  • Eliminates dependency on libclang at build time
    • One user would still need libclang to produce the vendor crate
  • Eliminates the largest branch of the dependency tree[0]
  • Sidesteps the copyright issue
  • Sidesteps the version tuple nightmare

Downsides:

  • Pushes management of mpi-sys versions on to the user
  • Still requires libclang to be available at some point.
    • It's possible you could perform the vendoring off-system, though, if this is an issue - just copy the MPI headers off the system.

[0] Produced using cargo graph --optional-deps false (note: libffi also depends on bindgen).

rsmpi

@Luthaf
Copy link
Author

Luthaf commented Mar 2, 2018

This should work but I am not sure this is the best solution.

Pushes management of mpi-sys versions on to the user
Still requires libclang to be available at some point.

This is a bit of a bummer, because the idea was to completely get rid of the hard to install libclang. Plus pushing management of mpi-sys on the user is less than ideal if you want to have users who are nor rust developers.

@Luthaf
Copy link
Author

Luthaf commented Mar 19, 2018

Just though of another possible fix for the problem here: shipping an implementation of MPI (possibly MPICH/OpenMPI) with rsmpi itself.

This would be under an optional feature, and the MPI implementation would be compiled before compiling rsmpi. This means that we can control and ship the bindgen output for this particular, blessed implementation of MPI.

I am not sure if this could work, or if I am just showing my complete lack of understanding of MPI, but it look like one can install and use it's own MPI implementation on a cluster, without relying on the one provided with the cluster.

What do you think?

@bsteinb
Copy link
Collaborator

bsteinb commented Mar 24, 2018

Well, shipping an open source implementation of MPI with the mpi-sys crate is possible. Although:

  1. I still am not certain that building our own MPI across different platforms will result in the same mpi.h (and thus a stable output of bindgen that we can ship with mpi-sys) every time.
  2. While I have seen people install their own version of MPI on clusters, I think this always requires at least some help from the system administrators. At least your own MPI has to know how to talk to the resource manager (SLURM, Torque, ...) that controls the system. So, on a cluster, I am not convinced that installing your own MPI is really the easier route. On a development workstation, on the other hand, it should be easy enough to get bindgen and its dependencies working.

I will try to do some experiments regarding the first point by installing Open MPI or MPICH an different platforms and seeing whether the resulting mpi.h and bindgen output are compatible.

@bsteinb bsteinb reopened this Mar 24, 2018
@marmistrz
Copy link
Contributor

For student cluster competitions we always build OpenMPI 3.0 from source because CentOS ships the ancient 1.x version (and the difference in performance is significant)

On production clusters you often have to build your own version of compilers, etc. because admins don't want to put the newer version, even as a module module. I have rebuilt GCC and binutils myself.

@Luthaf
Copy link
Author

Luthaf commented Feb 27, 2020

I have an other idea to fix the issue here. My understanding of the problem is that some MPI types use different ABI in different implementations, meaning that is it not possible to assume all mpi.h are equivalent.

A possible way to work around this would be to provide a small shim around MPI functions where rsmpi would be in control of the ABI, which would call into the local MPI installation.

Something like this

#include <mpi.h>
#include <stdint.h>

// use types with known size to calling the functions from rust
typedef int32_t RSMPI_Comm;
typedef int32_t RSMPI_Group;

int RSMPI_Comm_create(RSMPI_Comm comm, RSMPI_Group group, RSMPI_Comm *newcomm) {
    MPI_Comm new;
    int status = MPI_Comm_create((MPI_Comm)(comm), (MPI_Group)(group), &new);
    *newcomm = (RSMPI_Comm)(new);
    return status;
}

// and so on for everything used by rsmpi 

Then, this file would be compiled with user-provided mpicc, taking care of all the specific for the current MPI installation. We could also ship bindgen's generated extern function definitions, since we would control them.

The main drawback I can see with this approach are

  1. it is somehow labor intensive to generate all the wrapper functions. This might be alleviated by the reduced need for support of specific MPI implementations
  2. there is an additional function call overhead, which might be alleviated with LTO

What do you think?

@AndrewGaspar
Copy link
Contributor

I think there's some merit to the idea - I've thought about doing this in the past.

A couple things:

  1. You'd almost certainly need to make the "portable" handle types 64-bit or larger. There are MPIs I've seen when the MPI handle type is a pointer.
  2. Something to keep in mind is that some routines will have a bit more overhead than just a function call. Any routine which takes a list of MPI handles may have to allocate memory to translate the lists if the actual handle type differs in size from the RSMPI handle type.

Though a possibility could be that you compile and run a program as part of build.rs than outputs all the handle type sizes.

@Luthaf
Copy link
Author

Luthaf commented Feb 28, 2020

Though a possibility could be that you compile and run a program as part of build.rs than outputs all the handle type sizes.

This is another alternative, and how the Julia bindings to MPI do it. You may also want to get the alignment of the types right. I don't know how one would create a fully opaque type with given size and alignment in safe rust though.

@jedbrown
Copy link
Contributor

jedbrown commented Feb 9, 2022

I think it's more appropriate for a wrapper layer to live outside rsmpi. This project, for example, has been around for a while, but is now nicely licensed.
https://github.com/cea-hpc/wi4mpi

At this point, I think avoiding bindgen has nontrivial maintenance costs and a specialized mpi-sys is the way to go if it's important to you. If you do create a specialized mpi-sys, we can add it to CI. I'll close this issue now, but feel free to reopen if you think that's inappropriate or you would like to put some effort toward a different strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants