Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: switch to Meson as a build system #13615

Closed
rgommers opened this issue Feb 25, 2021 · 57 comments
Closed

RFC: switch to Meson as a build system #13615

rgommers opened this issue Feb 25, 2021 · 57 comments
Labels
Build issues Issues with building from source, including different choices of architecture, compilers and OS DX Everything related to making the experience of working on SciPy more pleasant enhancement A new feature or improvement Meson Items related to the introduction of Meson as the new build system for SciPy RFC Request for Comments; typically used to gather feedback for a substantial change proposal
Milestone

Comments

@rgommers
Copy link
Member

rgommers commented Feb 25, 2021

Now that the distutils deprecation in Python 3.10 is around the corner, I've been thinking about moving build systems. This feels like the right time. The thought of doing a lot of work migrating numpy.distutils features to setuptools, basically becoming responsible for Fortran support there, and then still being stuck with such a poor build system isn't giving me warm and fuzzy feelings. So here's an alternative.

tl;dr there are only two candidates for use as a build system, Meson and CMake. Meson + mesonpep517 has more gaps in the short term than CMake + scikit-build, however it's much cleaner (small code base of modern pure Python) than CMake (a ton of C++ code + a weird DSL + scikit-build seems to use legacy CMake constructs which are awful) and has much better documentation. So I'd prefer Meson.

What we need from a build system

Let's first outline everything that we need in terms of build, packaging, dev workflows, etc. And then figure out the projects that implement that.

At the highest level we need the following:

  1. A development build (can be in-place or out-of-place, as long as the workflow is good)

  2. create an sdist

  3. create packages from an sdist (create packages from the git repo is optional):

    • wheels
    • conda packages
    • .deb, .rpm, Homebrew bottles, etc.
  4. Other tools and jobs to invoke (standalone and/or via a runtests.py-like interface):

    • A documentation build
    • Run tests
    • Run benchmarks
    • Measure code coverage
    • Run linters and checkers (pyflakes, mypy, autopep8, etc.)
  5. Interfacing with Python packaging/install tools (e.g., pip install . should work as expected)

The build system itself should handle (note, some of these we don't have today but can have):

  • languages: C, C++, Fortran, Cython
  • compiler support including more niche compilers (e.g., clang-cl, ifort, mingw-w64, xlc)
  • platform support: Windows, Linux, macOS, aarch32/64, AIX, ppc64le, niche Debian architectures
  • support for multiple Python implementations (at least CPython, PyPy)
  • handle code generation, templating and ahead-of-time compilation with Pythran
  • parallel builds
  • fast builds with caching (e.g., ccache), incremental builds including for Cython
  • cross-compilation support
  • good diagnostic output in build log and afterwards (e.g., build settings ending up in __config__.py)
  • debug builds
  • coverage-enabled builds
  • BLAS/LAPACK detection
  • NumPy detection (for include dir)
  • easy control of build flags, ideally not only via CFLAGS et al. but configurable per compiler
  • a way of handling optional dependencies, e.g. PyFFTW, OpenMP
  • a way of special-casing certain situations (e.g., MSVC + gfortran on Windows)
  • CPU feature detection (e.g., SIMD flags) - note: don't need it (yet) for SciPy, but need it for NumPy

Python-specific build features that are necessary but may live outside the main build system:

  • Python extension naming support (e.g., submodulename.cpython-39m-x86_64-linux-gnu.so)
  • byte-compiling
  • handle vendoring of dependencies in wheels and name mangling

Moving to Meson

Advantages:

  1. Much faster builds. We now don't have parallel builds, setuptools doesn't do incremental builds, and setuptools is slow while Meson is as fast as it gets (it uses Ninja as backend). On a decent development machine we should be able to get to full rebuilds of ~1 min, and rebuilds much faster than that.
  2. Reliability: any system will have some bugs, however the combination of extensive monkeypatching and almost zero tests in the distutils, numpy.distutils and setuptools combination is particularly fragile.
  3. Support for cross-compiling. Right now we basically say "we don't know, setuptools doesn't support that - let us know if you get anywhere". It's not our own need, but there clearly is demand for it.
  4. Better build logs - clearer configuration and compiler/library detection info, as well as color-coded output which is easier to interpret.
  5. Less to maintain in the long term. If we can move SciPy - as by far the most Fortran-heavy Python library - we may just not add Fortran support to setuptools, which means that that headache just goes away.
  6. Easier to debug build issues. Meson is much better code, both architecturally and code quality-wise, than setuptools.

Challenges:

  1. It's a lot of work to move, and we may introduce new bugs in the process.

  2. There are missing pieces of the puzzle:

    • BLAS/LAPACK detection needs implementing on Meson.
    • Meson has Cython support and test cases, but it's fairly minimal. Better support including the caching in SciPy's tools/cythonize.py should be useful.
    • mesonpep517 builds sdists and wheels, but development builds are missing. Given that PEP 517 itself does not have support, it's unclear if this should be added to mesonpep517 or done as a separate package.
  3. We'll be early adopters in the scientific Python space, so we may run into unforeseen issues.

A potential plan:

  1. Work in a fork, and start with sdist to wheel on one platform for a single SciPy submodule (delete other submodules one commit per submodule, so they're easy to add back).
  2. After 3-4 submodules we should have covered all important cases (Fortran, Cython, templating, codegen), so do only 3-4.
  3. Meson improvements for BLAS and Cython are probably needed for step (2) above, so make those when needed.
  4. Implement a development build - improve runtests.py as needed, and/or add to mesonpep517 or a new package.
  5. Next add other platforms, first Windows, Linux and macOS. Then ping some packagers for, e.g., Debian and AIX to help check.
  6. If that all works well, add back all other submodules.
  7. Merge into SciPy
  8. Switch over a few CI jobs
  9. Update the scipy-wheels repo and test wheels for all platforms
  10. Switch over all other CI jobs
  11. Delete the main setup.py, leave other setup.py files in place for a while (unused, just in case)
  12. Do a release
  13. Delete all remnants of distutils-using code and declare victory

This will be a significant amount of work, which is why I added a GSoC project idea for it: https://github.com/scipy/scipy/wiki/GSoC-2021-project-ideas. A good student should get quite far in ~5 weeks of work.

@rgommers rgommers added enhancement A new feature or improvement Build issues Issues with building from source, including different choices of architecture, compilers and OS labels Feb 25, 2021
@charris
Copy link
Member

charris commented Feb 25, 2021

I think good documentation is essential. That meson is small and fast is a plus. It looks like meson users currently skew strongly towards Linux, C, and C++, so there will likely be some work adding support for Fortran and perhaps compilers other than gcc. OTOH, they are a high end group of users and have likely made a good choice from among the various options.

Last time I experimented with cmake, admittedly many years ago, it was a mess and installation a hassle. I wanted nothing to do with it.

@ilayn
Copy link
Member

ilayn commented Feb 25, 2021

I have to report from a very recent adventure with it; the situation did not change with cmake.

@ev-br
Copy link
Member

ev-br commented Feb 25, 2021

  • Is numpy planning a similar move?
  • timing: is there going to be a window where two build systems are used/supported?
  • before moving scipy, maybe try on a smaller package with cython or c/c++. Are there examples of how this looks like?

@andyfaff
Copy link
Contributor

  • would the user be interfacing with the external build system, or would the build be called from Python? For example, would scipy still be installable from source via pip in a one-liner, i.e. a pip install .? Or would the build be via a few meson/cmake calls, followed by a Python command?

@rkern
Copy link
Member

rkern commented Feb 26, 2021

It looks like there is PEP517 support available such that we can tell pip (and other installers) to use meson to build via the pyproject.toml. We should certainly try.

@tylerjereddy
Copy link
Contributor

FWIW, my preference here is CMake because I have a lot of experience with it--including usage of the Ninja backend, working on large C++/Fortran mixed projects with it, etc., so learning yet another build system is less appealing to me from a general time investment standpoint.

@tylerjereddy
Copy link
Contributor

I've also submitted patches to CMake in the past for i.e., handling the XL compiler toolchain with Fortran sources. In any case, it sounds like Ralf may have good reasons for his preference, though I'm less likely to be able to contribute much if we go that route.

@mckib2
Copy link
Contributor

mckib2 commented Feb 26, 2021

I would love to give meson a try. I will also comment that I have gained a lot of CMake experience at my day job -- it has a lot of widespread industry support, so I don't expect it to ever go away or become unmaintained (in my lifetime at least). It may be ugly, but it certainly works

@charris
Copy link
Member

charris commented Feb 26, 2021

There is a short discussion of build software at the Fortran Wiki here. Meson looks pretty cutting edge, but growing. Python was much the same, but there may be some growing pains. Both meson and cmake have 500 results on stackexchange (what is special about 500?) @certik may have some input.

@certik
Copy link

certik commented Feb 26, 2021

I've heard of Meson, but never used it yet. CMake certainly works, it works on Windows, has first class support for Fortran and that is my choice for all my projects. But I've heard that Meson might be better.

If you are worried about Fortran, I recommend you start a thread at our Fortran Discourse to discuss:

https://fortran-lang.discourse.group/

we have created a Fortran website: https://fortran-lang.org/

Finally, I would like you to also know that we are working hard on a Fortran Package Manager (fpm): https://github.com/fortran-lang/fpm, we are planning to have CMake backends, we can also add a Meson backend. fpm will be most useful for Fortran projects. For SciPy probably CMake or Meson are the two good choices.

@rgommers
Copy link
Member Author

  • Is numpy planning a similar move?

I don't want to propose the same thing in two places, but I'd say that if this works out well for SciPy then NumPy is likely to follow. NumPy's requirements are a little easier than SciPy's build-wise, but the benefits are similar. Plus there's the bigger "no need for much or all of numpy.distutils anymore" from Python >= 3.12

  • timing: is there going to be a window where two build systems are used/supported?

As short as possible I'd say - max one release in case it turns out packagers need to do work adjusting. Ideally pull the plug as soon as things work well with Meson. That's why the "delete the main setup.py" in the plan - we will only flush out the more obscure issues if people are forced to switch over.

  • before moving scipy, maybe try on a smaller package with cython or c/c++. Are there examples of how this looks like?

I didn't find any complex enough Python packages that we can learn much from. I did give myself one day a few weekends ago to both learn Meson and try to port PyWavelets. I did get pretty far and it was a pleasant experience. Didn't completely finish though, because PyWavelets has one very weird thing in its setup where just calling python setup.py build_ext -i doesn't work - you need to explicitly call first build_clib then build_ext` for obscure reasons that I never agreed with. So I still have some undefined symbols there - but the build already works and is lightning fast (2 seconds, half of which is invoking Cython). See PyWavelets/pywt@master...rgommers:meson

It looks like there is PEP517 support available such that we can tell pip (and other installers) to use meson to build via the pyproject.toml. We should certainly try.

Yes indeed. I edited the issue description to add this explicitly. We do need to make pip install . work, and support whatever packaging standards emerge as new PEPs. However, we should not support python setup.py xxx, I'd be very glad to see that go away.

One thing to note is that there's no PEP yet that specifies how to invoke a development/inplace build. So we should do that in our own way (e.g. runtests.py works, we can come up with something new - but python setup.py develop and pip install . -e are probably best left out).

If you are worried about Fortran ...

Thanks for the input @certik. I'm not worried about Fortran really. There are non-Python Fortran projects that use Meson. All the main compilers are supported and adding niche ones looks easy: https://github.com/mesonbuild/meson/blob/master/mesonbuild/compilers/fortran.py

@rgommers
Copy link
Member Author

There are non-Python Fortran projects that use Meson. All the main compilers are supported and adding niche ones looks easy

The kind of thing I'm more worried/curious about is how to do very specific things. For example, if we have test failures in SciPy with Intel Fortran because numpy.distutils uses -O3, we either hardcode a different flag in numpy.distutils or monkeypatch it. Either of those are bad, but I don't know how this would be done with Meson.

@larsoner
Copy link
Member

The only tweak to the plan I would suggest is moving 8 and 10 (CI migration halves) to before step 7 (merge into SciPy master), or make it so that step 7 is "merge into a new non-master branch in the SciPy repo" and add a step 10.5 which is to "merge non-master branch with master" or something. It seems like we should make all CIs work in the fork or branch before merging with master otherwise we'll be in a crunch to get that working in order to be able to review PRs.

@rgommers
Copy link
Member Author

This would fix the issue with depending on Boost nicely by the way. Meson has built-in support, so all it takes is documenting the build dependency and doing:

boost_dep = dependency('boost')
exe = executable('myprog', 'file.cc', dependencies : boost_dep)

@jpakkane
Copy link

jpakkane commented Mar 7, 2021

I'm the project lead of Meson and it's great that you are considering switching to it. Here are a couple of random tidbits related to the discussion here and questions raised.

  • There is AIX support and some people are using it, but only with GCC. No-one has submitted patches for IBM's native compiler yet.
  • Fortran support is fairly extensive and used for real world projects, though I don't have personal experience with it.
  • If new features are needed for Meson, we'd be happy to accept patches and help with the work. Meson's design philosophy has always been that things should be fixed once upstream so that everyone gets them rather than having people write tons of custom boilerplate that gets copypasted between projects.

One thing you probably need to consider is supporting old distro releases. If you need to build all of Scipy on an old RHEL, say, using only distro packages, this gets a bit tricky since Meson versions on those are fairly old. If you can use your own Meson then this is not an issue but there may be people who can't do that due to various policy issues.

@rgommers
Copy link
Member Author

rgommers commented Mar 7, 2021

If new features are needed for Meson, we'd be happy to accept patches and help with the work.

Thanks @jpakkane!

Fortran support is fairly extensive and used for real world projects, though I don't have personal experience with it.

It looks pretty solid to me indeed. The most tricky bit may be dealing with how we have to glue MSVC and gfortran together, as documented in https://pav.iki.fi/blog/2017-10-08/pywingfortran.html#building-python-wheels-with-fortran-for-windows.

Meson's design philosophy has always been that things should be fixed once upstream so that everyone gets them rather than having people write tons of custom boilerplate that gets copypasted between projects.

That philosophy is quite appealing. Currently our build infrastructure is split over 5 projects with all sorts of monkeypatching going on, so the benefits of having it all in one place are pretty clear to us by now:)

One thing you probably need to consider is supporting old distro releases. If you need to build all of Scipy on an old RHEL, say, using only distro packages, this gets a bit tricky since Meson versions on those are fairly old. If you can use your own Meson then this is not an issue but there may be people who can't do that due to various policy issues.

That shouldn't be much of a problem. Linux packagers seem fine with using recent build dependencies - we regularly require very recent Cython versions for example. As long as a build dependency is easy to install (which is the case for Meson), there's no problem. Runtime dependencies are another story, there we have to be much more conservative.

@thisisshub
Copy link

I analyzed the repository and have found out the submodules which are easiest to port first, based upon their independence from other modules. This data might be useful to everyone involved during the project.

Link to the analysis: here

@isuruf
Copy link
Contributor

isuruf commented Mar 25, 2021

@conda-forge, we frequently run into issues with meson updates. For eg PyPy: mesonbuild/meson#8570
cross compiling: conda-forge/meson-feedstock#47. Maybe that's expected because of frequent meson updates which is a good thing compared to feature freeze distutils.

@rgommers
Copy link
Member Author

Maybe that's expected because of frequent meson updates which is a good thing compared to feature freeze distutils.

It probably is (in the short term at least). I think right now there are very few Python packages that use Meson, so it's not surprising to me there are scenario's like PyPy on macOS that aren't tested much and less are less stable.

My view on it is that this will improve once more scientific Python packages adopt it (and if SciPy is successful, I think they will), and that at least Meson has a sane design. Last Sunday I spent a couple of hours fixing a numpy.distutils bug which prevented building on Arch Linux with a basic conda-forge dev env, because numpy.distutils included /usr/include and many other paths as default include paths. Meson doesn't have such basic design flaws.

@rgommers
Copy link
Member Author

I analyzed the repository and have found out the submodules which are easiest to port first, based upon their independence from other modules.

Thanks, looks good @thisisshub. Some thoughts on the green modules:

  • constants is easiest, no compiled code
  • fft is C++ only, with pybind11 which should work out of the box with Meson
  • ndimage is C + Cython
  • odr is Fortran

That gives a decent coverage of languages. Then linalg would be the big one - that will require Meson improvements to BLAS and LAPACK detection. The dependencies for linalg in your sheet are either in tests and aren't too important for testing, or they're incorrect (you detected a few imports that are only in docstring examples).

@thisisshub
Copy link

thisisshub commented Mar 26, 2021

@rgommers,
I'll start with constants since its the easiest one and carry my way up. I'll correct the spreadsheet along the way.

@rgommers
Copy link
Member Author

rgommers commented May 19, 2021

We had a short discussion about the Meson topic in the community meeting. The main questions were:

  1. Does this move to Meson fit with what is going on in the wider Python ecosystem, or is everyone else just staying with setuptools and will we be the odd one out?
  2. What's the status of making a Meson-based build work?
  3. Can we start a long-term branch in this repo to work on it collaboratively?

Where Python packaging is going

distutils & co (setuptools, numpy.distutils) are still used by the vast majority of packages. However projects with very complex builds have moved away or never used it in the first place (e.g., cuDF/RAPIDS uses scikit-build, PyTorch uses CMake with a custom setup.py that invokes it, TensorFlow uses Bazel).

What has been changing in the Python packaging ecosystem itself is a move away from "assume a package uses setuptools" to a standards-based approach with hooks, so any package installer (like pip) can work with any build backend. Some of the most important PEPs and projects:

The tl;dr of all those things is: you can specify all your project metadata and build-time and runtime dependencies in pyproject.toml and you can add a hook there that calls a Python function in a package you specify. When you do pip install ., pip finds the hook and invokes it - no setup.py needed.

There are also a number of small projects that have as purpose to be a standalone tool fulfilling one aspect of the build/package/install lifecycle, adhering to the PEPs linked above and not requiring setuptools: build, installer, packaging.

Status of building SciPy with Meson

I have a branch named meson in my fork which makes a start, it builds a couple of modules, and uses C, C++, Cython and Fortran code. Look at the meson.build files there to get a sense of what things look like.

To actually use that branch:

The next steps / main blockers are:

  1. We need to implement good BLAS/LAPACK support for Meson, so one can use that as a dependency. I've tried with conda + OpenBLAS, that seems to be straightforward via pkg-config. Complete support, e.g. Windows, MKL, is a fair bit of work. See Feature: MKL dependency mesonbuild/meson#2835
  2. That Cython support for Meson needs to be completed.
  3. For codegen targets in combination with Cython, it looks like we need Structured inputs for Rust (and more languages in the future, probably) mesonbuild/meson#8775.

We can work around both (2) and (3) I believe, by extensive use of custom_target(), but it'll be super verbose and clunky, so don't really want to do that.

(2) and (3) are both being implemented by @dcbaker, so a big thank you to him.

We should probably use our own fork of Meson for a bit that is like an integration branch of upstream for 1-3. And then once we're completely happy, contribute the BLAS/LAPACK detection upstream.

Can we start a long-term branch in this repo?

Right now the strategy I used in my fork is:

  • add 17 WIP commits that remove each of the SciPy submodules in one go, one commit per submodule
  • when working on adding Meson support to a new submodule, drop the relevant commit via an interactive rebase
  • add the required meson.build, ensure things build cleanly and tests pass
  • the build log is very clean, so the occasional build warnings can be more easily fixed in SciPy itself (try for: everything except warnings from vendored Fortran code and the deprecated NumPy API warnings)

This rebase strategy obviously doesn't work well when multiple people are collaborating on the same branch, if it's in the main repo (with a few people it does work on a fork). Also, CI runs on the main repo are not all that useful. So my preference would be to push it a little further in my own fork; once we have BLAS/LAPACK working we can add support for more submodules quickly. And then also make CI work on the fork first, so we can deal with Windows, building wheels, etc. Using PRs to my repo should work for that - I'd love some help though.

Once we have most submodules done and some working CI, then I think it's time to move to this repo.

The one thing we should probably do now to make life easier is to create a scipy/meson repo with an integration branch. Making people install code that lives in a WIP PR to Meson isn't ideal.

@rgommers
Copy link
Member Author

rgommers commented Oct 1, 2021

The 1.8.x branch is still 1.5 months away I think. I'd prefer to not wait that long. We can remove the CI jobs and docs from the 1.8.x branch after it's created - at that point there's really nothing else than a set of meson.build files left (we could even remove those if desired), so there's nothing that can go wrong in the 1.8.0 release.

I don't love the idea of maintaining two separate build systems

Me neither, however the plan was always to do that for one release cycle, since I'd be uncomfortable getting rid of setup.py as a fallback straight away (there's always AIX and niche architectures in Debian that we can't test and may be problematic). So I don't think I'm proposing something new here.

If it turns out to be a regular occurrence that CI doesn't pass (which I don't expect, but you never know) I'm also fine with a rule that it can be ignore - at least there's some signal then that a fix is needed. It's pretty tricky right now to keep catching up with large changes like the recent UNU.RAN and PROPACK PRs.


I think this may finally be very easy to implement in meson, see my latest comment there.

That is great news, exactly what we needed for that - thanks @eli-schwartz!

@rgommers
Copy link
Member Author

rgommers commented Oct 1, 2021

Also let me add: I plan to use a stable Meson install from either https://github.com/rgommers/meson/tree/scipy or a fixed commit in the Meson repo; and will run Meson master in CI on my own fork. CI on my fork has been quite stable for ~3 months now.

@andyfaff
Copy link
Contributor

@rgommers, do you have an update on how the meson build system coming along?

@rgommers
Copy link
Member Author

The Windows build works as well now, as do wheel builds. It's been pretty much ready to merge into master for a month or two. I just need to find a full to update the PR and iron out the last little issues with CI and changes to codegen scripts that work across Meson and distutils builds. I plan to do this around Christmas (no meetings for ~2 weeks will be really helpful).

@rgommers rgommers added the Meson Items related to the introduction of Meson as the new build system for SciPy label Jan 3, 2022
@mattip
Copy link
Contributor

mattip commented Mar 7, 2022

FWIW I tried this out on PyPy + windows, and it failed. I opened mesonbuild/meson#10080. Not a blocker, but more of a heads-up.

@rgommers
Copy link
Member Author

We have now switched to using Meson by default by using the meson-python build hook in pyproject.toml (see gh-16187). So it's time to close this issue. I will open a new one with follow-up tasks.

Thanks everyone who pitched in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build issues Issues with building from source, including different choices of architecture, compilers and OS DX Everything related to making the experience of working on SciPy more pleasant enhancement A new feature or improvement Meson Items related to the introduction of Meson as the new build system for SciPy RFC Request for Comments; typically used to gather feedback for a substantial change proposal
Projects
None yet
Development

No branches or pull requests