Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CMake for Build System #47380

Closed
wants to merge 75 commits into from
Closed

Use CMake for Build System #47380

wants to merge 75 commits into from

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Jun 16, 2022

Proof of concept. I think we can greatly simplify our building system using this. @mroeschke @lithomas1 @Dr-Irv

This is in an intermediate state, but you can do:

python setup.py
cmake .
make -j$(nproc)

make clean will clean up build artifacts, which right now excludes the generated Cython files but should include them in the future.

Have the following TODOs:
[ ]: The CMAKE_SHARED_LIBRARY_SUFFIX is hard-coded to assume Py38 on linux
[ ]: Get rid of python setup.py and have CMake perform cythonization
[ ]: Update CI, documentation, etc...

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jun 16, 2022

I tried this on Windows. Doing python setup.py caused the cythonize step to be done, but it was done serially. Can't that be done in parallel as when we do python setup.py build_ext -j 8 ?

Right now, cmake is not working, but I'm not sure I have the latest build tools installed.

@WillAyd
Copy link
Member Author

WillAyd commented Jun 16, 2022

Yea it could be. I didn't put a lot of time into that yet but if you want to explore would be great. I ideally want CMake to build the Cython extensions. That way running make clean should clean up those generated files, while also would decouple a hard dependency on setuptools.

This is hard coded right now for Linux. For windows / your python version, you likely need to change like 8 of the top level CMake file to match the expected file ending for your platform:

set(CMAKE_SHARED_LIBRARY_SUFFIX .cpython-38-x86_64-linux-gnu.so)

This might be abstracted in some form via the CMake Python variables. If not something we can branch on in our file and upstream:

https://cmake.org/cmake/help/latest/module/FindPython.html#result-variables

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jun 16, 2022

Well, I'm not sure I have the time nor understanding to get this working on Windows!

@jbrockmendel
Copy link
Member

Well, I'm not sure I have the time nor understanding to get this working on Windows!

Yah, i expect this would be easier to get working on Mac, but it still involves a learning curve. I'm fine embracing that for personal edification, curious how others feel

@WillAyd
Copy link
Member Author

WillAyd commented Jun 16, 2022

Well, I'm not sure I have the time nor understanding to get this working on Windows!

Ah sorry...didn't think about make not working on Windows. I think you can do

cmake .
cmake --build .

On all systems to get this to work for an "in source" build

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jun 16, 2022

So now that I'm on Visual Studio 2019 build tools, there is an issue with the cmake version when doing cmake . .

CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  CMake 3.21 or higher is required.  You are running version
  3.15.19080502-MSVC_2

The version of cmake used here is the one from VS 2019.

Secondly, python setup.py clean doesn't work, and it should to make sure I'm doing a build from scratch.

@WillAyd
Copy link
Member Author

WillAyd commented Jun 16, 2022

I think we can drop the minimum CMake version to 3.14 which would help. I just pulled the 3.22 off their documentation, but the NumPy finder was added in 3.14

For cleaning, you could ideally do cmake --clean . That will automatically get rid of files generated via build. In its current form that excludes Cython generated files, but in a future iteration it should do all of it

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jun 16, 2022

I made the following changes to CMakeLists.txt, and got a lot further on Windows:

cmake_minimum_required(VERSION 3.15)

set(CMAKE_SHARED_LIBRARY_SUFFIX .cp38-win_amd64.pyd)

Then I discovered that you have to specify the architecture to cmake. So I did cmake -A x64 --clean . I then got more adventurous and did a parallel build via cmake --build . --config Release --parallel 8 , discovering the parameters via some google searching. The build succeeded, using all my cores, BUT the pyd files ended up in subdirectories called Release of pandas/_libs/ and pandas/_libs/tslibs/ etc.

If I don't include --config Release it looks for a debugging version of the python library and fails.

Don't know ANYTHING about cmake to figure out how to get the files out of those Release subdirectories into the right place.

Also not sure that the --clean is removing previous files correctly (e.g., .obj files).

@WillAyd
Copy link
Member Author

WillAyd commented Jun 16, 2022

Awesome! I think you can tweak those settings via some CMAKE variables:

https://stackoverflow.com/questions/45368513/how-to-prevent-cmake-from-creating-debug-release-folders-under-executable-output

@WillAyd
Copy link
Member Author

WillAyd commented Jun 16, 2022

FYI the idiomatic approach to CMake is to do an "out of source" build, where we would have a dedicated build sub folder and all required elements would be copied into that. In such a case, I don't think the Release / Debug sub folders generated by VS would matter much.

However, we today do in source builds of pandas. So want to make sure we emulate that as best as possible for initial lift and shift

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jun 16, 2022

However, we today do in source builds of pandas. So want to make sure we emulate that as best as possible for initial lift and shift

Once you figure that out for Linux, I hope it will be straightforward to do for Windows. I'll wait to hear about your progress.

@WillAyd
Copy link
Member Author

WillAyd commented Jun 16, 2022

Sounds good. Thanks for the feedback so far - very helpful

@mroeschke
Copy link
Member

I'd be supportive of trying CMake. Generally hoping

  1. Not too much deviations across platforms Linux/MacOS/Windows
  2. Near identical setup & invocation for local development & CI system

@rgommers
Copy link
Contributor

cmake is used by Arrow and scikit-learn,

Not the latter - scikit-learn uses setuptools, and is most likely to move to Meson (work is planned to start in the coming month). scikit-image and others scikits I can think off all still use either setuptools or numpy.distutils

@WillAyd
Copy link
Member Author

WillAyd commented Jun 22, 2022

Ah thanks for clarifying. I thought the scikit-build package was associated with scikit-learn

@jbrockmendel
Copy link
Member

I think we can greatly simplify our building system using this

What's the criterion for judging this? Is it just LOC?

I think there's an implicit "avoiding setuptools is desirable" consensus that I'm out of the loop on.

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Jun 22, 2022

I think there's an implicit "avoiding setuptools is desirable" consensus that I'm out of the loop on.

Setuptools invokes the compiler in an ad-hoc fashion, it's difficult to parallelize and very difficult to cross-compile. Those are among the bigger reasons SciPy switched from distutils to Meson. The same argument would generally apply to switching from distutils/setuptools to cmake, disregarding personal preferences between Meson and cmake.

@WillAyd
Copy link
Member Author

WillAyd commented Jun 22, 2022

To help clarify some more I think this is also worth drafting up via the new PDEP process @datapythonista is working on. This PR is part research, discussion and implementation at this point, so probably a good fit for that new process

@lithomas1
Copy link
Member

lithomas1 commented Jun 23, 2022

To help clarify some more I think this is also worth drafting up via the new PDEP process @datapythonista is working on. This PR is part research, discussion and implementation at this point, so probably a good fit for that new process

Yeah, I think it is worth pursing a PDEP. There's a lot of insight in the comments of this PR that I don't want to get lost, and are helpful when someone wants to revisit the build system or figure out rationale behind certain decisions. GitHub tends to handle PRs with lots of comments/activity super horribly, and kind of randomly collapses comments that could be important in the middle.

I think it is also worth pursuing a parallel PR using meson as the build backend. I am willing to do that if no one beats me to it.

EDIT: Will start tomorrow.

@rgommers
Copy link
Contributor

I think there's an implicit "avoiding setuptools is desirable" consensus that I'm out of the loop on.

@jbrockmendel for a bit more context:

  • There is indeed a consensus of people working on build stuff for scientific packages (and beyond) that distutils and everything that is related (setuptools, numpy.distutils, etc.) is very badly designed and basically unfixable. Tools like Cython and Pythran just all monkeypatch the internals to add the functionality they need, nothing is extensible. The setuptools devs are not going to fix this - and realistically, they can't.
  • There have been multiple attempts before at moving away from setuptools, for example NumPy and SciPy had two working alternative build systems based on Scons and then Waf. The problem was always that tools like Pip assumed setuptools, and only more recently it's become possible to really move away via PEP 517, PEP 518, PEP 621 & co.
  • The planned removal of distutils from Python 3.12 set the timer for a bunch of stuff; numpy.distutils is not compatible with the vendored setuptools.distutils, so we are in a hurry to move - about a year left (max, much faster would be better to not block other packages working with 3.12-dev). That said, Pandas is not in that position, because it works fine with latest setuptools, so you can do things at your leisure.

For new projects, the only situation where you'd still want to prefer setuptools is for packages that do contain a few C/Cython extensions, but need almost nothing from a build system (cross-compiling, good performance, etc.). For pure Python packages there are much better options like Flit, or perhaps Poetry/Hatch/PDM (there's a long list, because it's now fairly easy to write such a thing for pure Python code). And if you do have serious needs from a build system - which includes Pandas and all packages at the center of the PyData ecosystem I'd say - then I'd say you want either Meson or CMake.

@lithomas1
Copy link
Member

Hello everyone, I haven't really commented much the progress on using Meson as pandas' build system, but I now can compile basically all the c extensions. I'm still working on enabling some more extensions and cleaning up a lot of the boilerplate code, but I will try opening a draft PR next week.

The next step would be to build wheels/figure out how to install locally. I'm still working through some gnarly issues in meson-python for that.

All in all, we should probably start thinking of drafting up the PDEP sometime soon.

@simonjayhawkins
Copy link
Member

The next step would be to build wheels

can you confirm we would still be migrating to cibuildwheel #44027 to build wheels? #44027 (comment)

@lithomas1
Copy link
Member

lithomas1 commented Jul 25, 2022

The next step would be to build wheels

can you confirm we would still be migrating to cibuildwheel #44027 to build wheels? #44027 (comment)

Both meson and this Cmake implementation use PEP 517 build hooks, so cibuildwheel would build the wheels the same way(through build).

@lithomas1
Copy link
Member

My WIP meson work can be found here lithomas1#19.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 4, 2022

@Dr-Irv any insight on the typing failure in CI? Does this stem from pandas-stubs having an entry for pandas._libs.json? Not sure how those should be synced

Otherwise I think this PR is in a usable state. Started on PDEP but lost a little motivation, and this was pretty close so decided to finish this instead

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Aug 4, 2022

@Dr-Irv any insight on the typing failure in CI? Does this stem from pandas-stubs having an entry for pandas._libs.json? Not sure how those should be synced

I'm hazarding a guess here, but I think that you didn't include any of the .pyi files in the build. This is not from the pandas-stubs project. Rather, in pandas/_libs (and underneath that), there are multiple .pyi files that should be included in the distribution.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 4, 2022

Thanks @Dr-Irv makes sense. Was overthinking that one

This is all green now if its something we'd still entertain (the Docker failure can't pass until this is merged into main). There's a lot more we can do to improve our build system (moving to out of source builds comes to mind first and foremost) but I think this represents a reasonable lift and shift

@lithomas1
Copy link
Member

Hey @WillAyd, I'm happy to help you draft up the PDEP if you need it.
It'd probably be best to start off on something like a Google doc, as I expect there will be quite a few revisions needed before its ready for review as a formal PDEP.
I just discovered the pandas-core mailing list, so you can find my email through there if you need me.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 6, 2022

OK sure just created #47988 for that

@WillAyd
Copy link
Member Author

WillAyd commented Aug 22, 2022

After discussion seems like Meson is the way to go. Let's close this one

@WillAyd WillAyd closed this Aug 22, 2022
@WillAyd WillAyd mentioned this pull request Nov 9, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants