Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packaging ecosystem experiment #44

Closed
gronki opened this issue Dec 23, 2019 · 26 comments
Closed

packaging ecosystem experiment #44

gronki opened this issue Dec 23, 2019 · 26 comments

Comments

@gronki
Copy link

gronki commented Dec 23, 2019

Sorry if this post is irrelevant to this repo. I am posting here because I understand people here are mostly package maintainers, so I think it's the right recipients for this post. I can move away to not mess this place if this project takes off.

I just saw this repo for the first time and my first fear was "it is not going to work". Standard library in C, Python, IDL or other languages I had experience with would hold the most essential functions possible. Here half of the proposals or more are extremely particular features. The do not fit in stdlib. These proposals are awesome but more suitable for packaging ecosystem (that we discussed in the other repo) than an stdlib.

I am so happy that there is this movement because since I started using fortran I've felt frustration and I thought nobody shared it. Then I realized some people have the same issues and then founders of fortran-lang project have made the amazing effort to organize these chaotic movement into streamed and targeted action. Since it's Christmas time I wanted to express my thankfulness for this from the bottom of my heart.

I wanted to ask if anyone is willing to participate in a following experimental project.

How about we try to take a dozen of packages (each of us created or maintains at least one) and attempt to make an experimental packaging ecosystem that will hold all of them. So we use currently existing and mature tools (make, gcc, gfortran) just to make it work on one platform (I think it should be linux/unix because it's the easiest and free). If that takes off and we reach the critical mass, we might expand to cover all needs (Windows, other compilers).

I am willing to put my effort (as I have a bit of free time now), however I have had no experience with packaging other than pypi and rpm (which is mostly binary packages). I know some people have mentioned they worked with some of source based packaging systems that could work for Fortran.

In this issue, instead of discussing whether it's a good idea or not, I would like to collect suggestions and advice of tools and solutions that would make it possible in the fastest time.

We need:

  • people who have experiences with packaging systems to share them and/or use their expertise to help setting things up
  • package developers/maintainers that are willing to add their packages
  • people who can provide some basic infrastructure for testing (I can share my RPi server for the start)
  • users to test the solutions

Again, despite my personal feels, I do not want to argue which of the solutions (stdlib vs packaging ecosystem) is superior. I just want to make a demostration working product in a short time.

@certik
Copy link
Member

certik commented Dec 23, 2019

@gronki thanks for taking part of our repositories by providing feedback and suggestions and ideas.

Regarding the core of this issue, isn't this a duplicate of j3-fortran/fortran_proposals#55? If so, let's close it and move the discussion there?

For the record, I think we need both a standard library, and a packaging ecosystem (like Python, Julia or Matlab has). These are orthogonal efforts.

@gronki
Copy link
Author

gronki commented Dec 23, 2019

I wanted to make it a separate thread but I might as well continue in the other one. Whatever you decide. :)

@certik
Copy link
Member

certik commented Dec 23, 2019

We can keep discussing here. Building upon what has been discussed at j3-fortran/fortran_proposals#55, what exactly is your proposal?

Build a new source distribution? How will it differ from Spack?

Build a new binary distribution? How will it differ from Conda?

Or to contribute Fortran packages to either Spack or Conda together with any possible fixes for Spack/Conda to make them work better with Fortran?

Or something like Pip or Cargo or the Julia package manager, which seem to be a mix of source and binary packages?

The issue is that most Fortran codes depend on non-Fortran packages such as Lapack, FFTW and others, and so we have to package those also.

@milancurcic
Copy link
Member

Great idea, thanks Dominik. I agree with Ondrej it's an orthogonal effort to this (stdlib), but I do see it fitting as a separate project in fortran-lang, alongside stdlib. It's good to discuss it here. Stdlib wouldn't compete with packages provided by the package manager, it would be one of them. See how many packages there are just on GitHub: https://github.com/fortran-lang/stdlib/wiki/List-of-popular-open-source-Fortran-projects.

From the surface, it does seem like Spack is the best candidate for this, though I haven't used it so I can't say from experience. From description of what it does, it seems most appropriate to me.

@milancurcic
Copy link
Member

Standard library in C, Python, IDL or other languages I had experience with would hold the most essential functions possible. Here half of the proposals or more are extremely particular features. The do not fit in stdlib.

Great! Each proposal issue is exactly there so that we can all say we want this in stdlib or we don't want this. This is such a young project (8 days I believe) that I didn't catch a breath to open some proposals for what I'd like to see in stdlib (but I will soon :)).

Please join us! We need to hear from everybody what should stdlib look and act like, made by community, for the community. We don't have the answers yet and we're learning along the way.

@certik
Copy link
Member

certik commented Dec 23, 2019

As I mentioned in j3-fortran/fortran_proposals#55, unfortunately Spack does not run on Windows, which is a deal breaker. However, Spack shows what it takes to make a successful source distribution --- it's not easy at all.

However, Spack, Conda and similar solutions are general solutions that work for any language. I feel that there is still a need for a language specific solution (Pip, Cargo, Julia Pkg, ...). This is something that we have a chance of implementing. Here are some ideas:

  • Many Fortran packages that people create are pure Fortran (including, at the moment, this stdlib). 90% of the time, we always need to do the same --- specify Fortran module dependencies (for Make, or let CMake figure them out), build the Fortran files with the given compiler in the correct order and the correct options (typically either Debug or Release, specific to each compiler), and then typically build a shared or static library or an executable. For example I think the vast majority of Python packages at https://pypi.org/ are just pure Python packages. So for pure Fortran packages, we can create our own packaging solution --- initially it would be just some TOML or YAML file that specifies a list of Fortran files together with some meta information. Then the packaging tool can take this and generate: Makefiles, CMake, Conda and Spack packages, etc. All the complex stuff like locating the proper Fortran compiler on each platform (i.e. depending on the right Spack or Conda compiler package) is the same for all pure Fortran packages and so can be encoded in our "tool". In particular, I am 100% confident we can write a "tool" to get pure Fortran packages building reliably on all platforms including Windows and depending on each other.

  • What about the non-pure Fortran packages (the vast majority of codes at https://github.com/fortran-lang/stdlib/wiki/List-of-popular-open-source-Fortran-projects) and what about non-Fortran dependencies (again the vast majority of those codes) such as MPI, FFTW, etc.? That's the hard part. A pragmatic approach would be to use another package manager for these, such as Conda or Spack, or apt-get. Our "tool" can even support many of these. Then in our "tool" TOML description for a non-pure Fortran package, one would specify the dependencies as Conda, Spack or apt-get packages.

We need to brainstorm all the details here.

Let's take FFTW as an example. Our "tool" can have a package for FFTW, described as TOML. In the TOML description, we would list how to install the package using all backends that our "tool" supports, that is, it would look like this:

[install]
conda="fftw"
apt-get="libfftw3-dev"
spack="fftw"
yum="..."
apk="..."

and our "tool" would know how to use all these backends to install the package. On each platform the user would select a preferable backend, and our "tool" would know how to either install, or check that fftw is installed, and if not, tell the user "please install fftw by apt install libfftw3-dev". So this mechanism should work to ensure fftw is reliably installed on all platforms. I expect that there will be differences how apt-get versus conda installs fftw, so our fftw package in our "tool" will have to have some metadata to correct some stuff if needed for a given backend (for example if apt-get installs some files in some wrong location, then our "tool" must know where to find them).

Summary of the idea: our "tool" can be a source distribution for pure Fortran packages, and it would piggy back on other general package managers for non-pure Fortran dependencies.

Finally what about most of the codes at https://github.com/fortran-lang/stdlib/wiki/List-of-popular-open-source-Fortran-projects that have both non-pure Fortran dependencies and are themselves a non-pure Fortran package? I don't have an answer right now. Part of the answer is that those codes are typically end applications. If one has a complex Python based application (such as https://www.sagemath.org/), that will be hard to package as a pip package also. So end applications will have their own complicated build system and distribution. They could still use our "tool" to install and manage most of their dependencies, and our "tool" should make that easy.

@gronki
Copy link
Author

gronki commented Dec 23, 2019

Windows is unfortunate, but Windows users can still install packages the "old way". To use the package manager, it is possible install one of the few popular linux distros using Windows Subsystem for Linux. We should make sure that the package system works with that.

Actually my idea was exactly what @certik described: in the first and foremost step, it should be a tool allowing to manage dependencies. In the second step, it can be used to distribute small apps (in general, apps that can be easily encapsulated and packaged). The biggest packages will certainly be a challenge and at this point it's hard to predict all the issues that will come up.

I would certainly prefer not reinventing the wheel and using one of the existing solutions. Writing our own tool would be possible but we will certainly make so many mistakes that others have already made I am not sure if we have human resources for this. Also, as I understand, spack already contains lots of typical HPC codes in its repository, which would greatly diminish the problem of having to package LAPACK, MPI etc.

If we decide to write our own tool, IMO the tool should be absolutely minimal in my opinion and only handle things that cannot be done in a different way. So generating Makefiles, package information etc should be the responsibility of the developer. Again, all about human resources and maintenance.

I'm currently torn between the two (spack/conda vs own tool), let's wait for more voices. :)

@certik
Copy link
Member

certik commented Dec 23, 2019

Re Windows: I am afraid the Windows Subsystem for Linux is not an acceptable solution --- some people cannot use it, and those will be left out. I think the right solution is to build and install natively. It's actually not that hard, and cmake has great support on Windows.

One issue with Conda, being a binary package manager, is that it uses gfortran. So if you want to build with Intel Fortran, you are out of luck and you can't reuse any dependencies from Conda build with gfortran. Spack fixes this issue by allowing you to choose a compiler for all the dependencies. But it doesn't work on Windows.

I personally just use Conda, as at least it works everywhere, even if with just one compiler.

@pdebuyl
Copy link
Contributor

pdebuyl commented Dec 26, 2019

There is also the use case (not discussed above if I understand well) of having the packages already available. On HPC systems, packages are often available by loading the compiler environment or by using the module system. In that case, the "tool" should make sure that the libraries are found and compiled.

Also, regarding build systems, I wrote on my experience with git and CMake here: http://pdebuyl.be/blog/2018/fortran-cmake-git.html I think it is relevant for packaging in the sense that it should cover all pure Fortran packages, as well as Fortran/C packages.

@certik
Copy link
Member

certik commented Jan 4, 2020

I would like to move this forward. I think we have a sufficient community around stdlib now that we can pull this off. Here are my current ideas:

  • I will study Rust's Cargo in detail. I think it is the closest to what we want
  • Let's restrict and agree upon a "standard" how to write Fortran packages, that the "tool" would understand. That's why we need to do it as a community. Part of this is:
    • Set directory layout
    • Restriction that module name is the same as filename and that each file is either a module, submodule or a main program
    • Where tests are and how they are named
    • ... whatever else is needed to reduce complexity, and thus make the "tool" doable
  • Let's start with pure Fortran, then later figure out how to do mixed Fortran / non-Fortran projects
  • Initially only pure Fortran dependencies, later we will figure out how to do non-Fortran deps
  • It must work on Linux, macOS, Windows and HPC (compatible with the typical "module" setup on most HPC systems)
  • The "tool" would understand that it is building a Fortran project and understand the above package "standard" and all the semantics. Just like cmake understands more semantics than make, and thus it is simpler to use, this "tool" would be even higher level than cmake. All the Fortran specific compiler and platform details would be encoded.
  • It would be able to generate a build system as a backend, for example it could generate the current CMake as well as the current manual Makefile system in stdlib. It knows all the semantic information to be able to do that.

Who would have time to help me brainstorm some of these ideas in more detail and help me with coding and then especially testing?

Here is some FAQ:

  • How is this different to Spack or Conda? None of them know about Fortran, so they do not know the semantics of the Fortran project. That's why Rust has Cargo that knows the semantics. Our "tool" can generate a Spack or Conda package as another backend (besides cmake and manual Make)
  • How is this different to CMake? CMake knows a lot about Fortran, but not enough. It must work with any setup. Our "tool" will restrict how a Fortran package is structured, and take advantage of it. Just like CMake could know about Rust, but Rust still has Cargo, because Cargo knows more, and thus is easier.

@certik
Copy link
Member

certik commented Jan 5, 2020

Here are some relevant blog posts about package management in Rust vs C++:

https://blog.pierre.marijon.fr/why-i-stopped-c/

http://cliffle.com/blog/m4vga-in-rust/

I only used Rust as a user so far, not a developer, to compile some end applications. And the experience was very smooth. I think we can create something similar for Fortran.

@milancurcic
Copy link
Member

milancurcic commented Jan 5, 2020

I'm in.

I agree that Cargo is most like what we're looking for.

Additional levels of complexity that we'll have over Cargo:

  • All Rust packages start with Cargo in mind, whereas Fortran packages are all built a bit differently, even within a single build system like CMake or autotools
  • There's one Rust compiler vs several Fortran compilers. Different Fortran packages will have different levels of compiler support

Should fpm (Fortran Package Manager) expect libraries to fit it, or should fpm adapt to different structures and build rules of projects? I think the latter -- more difficult to implement but it would allow us to have a larger ecosystem.

Another consideration: should fpm require packages to maintain a "registry" file (yaml, toml or whatever we choose) or would the complete build rules be maintained solely on fpm end?

Let's set some specs for the MVP. I suggest:

  • Can install stdlib from GitHub
  • Can list available and installed packages (will be only stdlib for start)
  • Can show help/available options
  • Supports one compiler (gfortran) and one build backend (cmake) initially

@certik
Copy link
Member

certik commented Jan 5, 2020

Here is what @milancurcic and I agreed on so far:

  • Fpm would be both a package manager and a build system
  • It would use CMake (and possibly also Make) as backends
  • There would be a configure file, let's call it (for now) fpm.toml, where one specifies the Fortran files and any other metadata that fpm needs.
  • CMake would be hidden from the user --- fpm would call appropriate cmake commands, but the user would interact with fpm.
  • fpm would be able to install all dependencies into per project basis, like Cargo does

@certik
Copy link
Member

certik commented Jan 9, 2020

Ok, I've started playing with Rust, or to be specific, with Cargo, which is their build manager. Here is how to get an overview what it does:

https://doc.rust-lang.org/cargo/guide/index.html

and in particular, Cargo assumes this layout:

https://doc.rust-lang.org/cargo/guide/project-layout.html

and here are more details about it:

https://doc.rust-lang.org/cargo/reference/manifest.html#the-project-layout

And I suggest we do something very similar (if not exactly the same) for Fortran. Cargo pretty much figures out which files to build and which binaries and libraries and tests and examples all automatically, even for large projects. So fpm would understand the structure, and then it can generate CMake or Make or any other build system to actually build it, and later on it can even build the project itself.

I need more time to learn Cargo and get experience using it and also Rust itself, so that I can see how this can be adopted for Fortran. So I need some project to learn this, so I was thinking I'll implement a prototype for fpm in Rust itself. That will allow me to get enough experience with it. Regarding which language to use for the production version of fpm, here are the requirements for fpm:

  • single binary
  • works on Linux, macOS, Windows and HPC machines
  • quick to start (it's a command line tool, so it must be very fast)

So Python is out, because it's hard to create a single binary and to make it start and work quickly, as well as to distribute all the dependencies and ensure they work. I was going to use C++. But if the Rust prototype goes well, Rust would also work I think --- the advantage over C++ is that Rust has all the nice (easy to install!) packages to handle TOML, downloads, filesystem, etc.

@scivision
Copy link
Member

scivision commented Jan 9, 2020

Requests:

CMake generator selection

allow for different CMake generators that consumer of fpm packages can specify. E.g. Visual Studio, Ninja, GNU Make. With CMake >= 3.15, the end-user can set environment variable CMAKE_GENERATOR. Maybe CMake Generator could be an fpm command line option.

allow non-CMake backends

Some major projects have switched away from autotools/make to meson/ninja. I think it would be good that even if CMake is the primary fpm choice, we don't make fpm so intertwined with CMake that fpm can't build non-CMake projects. This could be done a couple different ways, maybe even via CMake ExternalProject calling Meson.

@milancurcic
Copy link
Member

@scivision Yes, good and important point, Ondrej and I discussed this briefly. Eventually, CMake would be one of possible backends, and each could have their own backends (or.... generators? Confusing name IMO). The challenge, of course, would be to design fpm with the possibilities in mind.

@milancurcic
Copy link
Member

@certik +1 for Rust. Another potential candidate could be Go, which I hear is simple to learn and program and has great networking and system facilities built-in.

I'll be happy to learn and participate in this project regardless of the implementation language.

@milancurcic
Copy link
Member

In addition to requirements for fpm that @certik listed, I think it's important to also consider further design and implementation requirements:

  • Networking is built-in and easy to use
  • File system manipulation is easy to use

@certik
Copy link
Member

certik commented Jan 9, 2020

Yes, I need to look at Go more --- I used it a few times and I wasn't impressed with their dependency management. But as an implementation language for fpm, it would also work and perhaps be even better, because Go is I think much easier to learn than Rust.

@scivision yes, it should be able to work any backend as you described. However, I want to point out, that just like Cargo works, fpm would be only responsible for building pure Fortran parts of the package / application. And for the pure Fortran part, fpm would have a few backends (Make, CMake, Meson, Ninja, ..., as well as its own), but the backend would generate the CMake files to build the project. It would not reuse your own hand written files.

So for already existing projects, they would have two options:

  • simply continue using their current build system, and from fpm's perspective they would simply look like non Fortran part of the project; and as you said, we should make sure fpm is well aware how to call into CMake automatically, so that in most cases things just work
  • or they would migrate to fpm (by migrating their directory structure and filenames, writing the appropriate fpm.toml and removing their hand written CMake files, and then fpm would be responsible to building it). In this mode, you would not even know it is using CMake underneath (well, you would see the left over cmake files), it would be fpm's responsibility that things build (no matter what backend it uses underneath).

I am still figuring out how Cargo handles non-Rust parts, but it seems you are responsible for building them yourself (using any build system you want), and probably you specify how to link them in.

@certik
Copy link
Member

certik commented Jan 12, 2020

@milancurcic here is a very minimal prototype: https://gitlab.com/certik/fpm. Can you try it out (see README) and let me know what you think? Let's discuss it and brainstorm some more.

Overall, I must say I really like the Rust ecosystem, the package manager (Cargo) etc. There are libraries for anything that we would need. I would like exactly the same for Fortran. Regarding the Rust language itself, I think for what we need we don't seem to require any advanced features (such as the borrow checker) and so it's actually not difficult to learn. And the Rust compiler provides excellent error messages.

@milancurcic
Copy link
Member

Great! I'll play with it and let you know.

@certik
Copy link
Member

certik commented Jan 13, 2020

@milancurcic once you have a look at it, I mainly need your feedback on:

  • can we make this work in general?

If the answer is yes, then I suggest we create a new repository fortran-lang/fpm and start brainstorming there, as there are a lot of orthogonal issues that we have to discuss. Until then, I'll use this issue.


  1. Hosting of packages: for now we will not have our own maintained central place such as crates.io (that will come later). For now we will use a git repository (GitHub, GitLab and other places will work) as well as just url for a tarball. That way we don't need to host anything ourselves at first. (Hosting of packages fpm#4)

  2. How to support packages that do not conform to our "standard layout" (to be specified...). Some examples of such a package would be reference Lapack, or Arpack. The way to do that is that we create a new repository, say certik/lapack.fpm, which will have fpm.toml, in there it would specify the url to the actual sources (https://github.com/Reference-LAPACK/lapack) and a build script, which would build the sources (using CMake in this case) and install them into some $PREFIX provided by fpm and fpm takes it from there. This approach also works for non Fortran packages --- the build script either builds it, or requires it from the system (where it can be provided by, e.g., Spack). Either way this is a clean way to hook this up into the fpm ecosystem. (How to support packages that do not conform to our "standard layout" (to be specified...) fpm#6)

  3. Naming of fpm.toml. Cargo names Cargo.toml with capital C, and as explained in https://doc.rust-lang.org/cargo/faq.html#why-cargotoml, to "ensure that the manifest was grouped with other similar configuration files in directory listings. Sorting files often puts capital letters before lowercase letters, ensuring files like Makefile and Cargo.toml are placed together." If we want to do the same, the candidates are Fpm.toml and FPM.toml. I think fpm.toml looks better. But using a capital letter would make it similar to CMakeLists.txt also. We might want to devise a different name or naming scheme. Any ideas? (Naming of fpm.toml fpm#5)

@milancurcic
Copy link
Member

I played with it. So far so good. I've also built a few Rust projects in the past so this experience was similar. A bit strange was that the test examples were included with the package manager, but I understand that this is a minimal proof of concept.

can we make this work in general?

Yes! It will be a steep climb but I don't see why it wouldn't work from a technical point of view. Building a rich ecosystem is a different story but we need to start somewhere. Please go ahead and create fortran-lang/fpm.

Let's discuss your specific questions/issues there. I have some ideas.

@certik
Copy link
Member

certik commented Jan 14, 2020 via email

@certik
Copy link
Member

certik commented Jan 14, 2020

Here is the repository: https://github.com/fortran-lang/fpm. I moved from GitLab-CI to GitHub CI, for now only Linux and macOS is tested. Tests pass. Let's open issues in that repository and continue the discussion.

@milancurcic
Copy link
Member

Closing this in favor of fpm. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants