-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pyproject.toml with build system dependencies for PEP-517 installs #3758
Conversation
This should fix missing modules when installed with `pip install --use-pep517 horovod`
Now this won't work, because the build defaults to using PEP-517, and it also misses the |
I think the build fails because it tries to build the wheel using PEP-517 (because it finds the pyproject.toml) with I think the solution is to define a custom wrapper for the setuptools build system (see this in the setuptools docs), which can specify additional dynamic build-dependencies by providing a Unfortunately I was unable to set up the build locally to reproduce the error in CI, and I don't have more time to work on this for a while. |
Yes, some dynamic identification of build dependencies would be necessary. I did some research in #3483 (comment) and follow up comments and then it looked like it wouldn't work well to have differing build requirements depending on extras, but of course there may be a way. If none of the frameworks TensorFlow, PyTorch, or MXNet are available at build time, Horovod is installed in a minimal configuration without its C++ core and cannot be used for distributed training. I suspect that's what you built locally.
|
Based on this POC the dynamic build-dependencies might work this way, but there are a few problems:
I failed to set up the local build for the project and this makes it hard to test anything. I also feel inadequate to write the correct dynamic dependency list code, because I'm mostly clueless about the project's build process. |
Thanks for giving this another shot, @itsdani! The ppc64le build in the CI (which runs without explicit approval) is a bit particular in how the environment is set up, see https://github.com/horovod/horovod/blob/master/Jenkinsfile.ppc64le But I think this highlights that it's actually pretty important to use precisely the tensorflow (and/or pytorch, mxnet) package that's installed in the environment which Horovod is added to: Horovod needs to be compiled accordingly. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I've just rerun into this issue; locally, I've managed to get it working by changing some of the environment variables. Then, you can use this in the build backend to correctly build with the necessary libs. This is less nice than pulling from the lock files but enables PEP517, which enforces this isolation. I'm curious if there is a way to pull this from a lock file due to the many varieties; we'd be solving a problem excessively. Another potential way would be changing the build_ext from on install to on the first run, i.e. check the extension is available; if not, build it. This should only impact PyTorch and TF, as MxNet is at the end of its life. |
I think you just built it without tensorflow, horovod/horovod/tensorflow/CMakeLists.txt Lines 7 to 12 in 7e4d993
The solution to this issue would be declaring build dependencies explicitly instead of using them from the existing environment (a.k.a. isolation). The problem with that is that these dependencies might not be trivial to build, and requirements are currently governed by env-vars like |
Locally, I've altered the CMake to ensure it matches a version string. I don't think extras would work; in my opinion, they are only for Python optional extras intended to say if you want this functionality, install this lib in isolation. Not for defining build extras. Since we already require people to say explicitly they want tensorflow or pytorch links, I don't think it's much more to ask them to define the versions instead of a binary flag. I'll open my MR soon so you can see the changes. |
Ah, I see, extras might be only run-time dependencies and we need build-time deps 😞 I'm still unsure about this "versions in the env-var" thing, I feel like this still contradicts having an isolated build. Although I have to admit I'm interested in your solution 🤞 |
The PR: #3991 Having tried many ways, this was the only pragmatic approach I could make work. It still fulfils an isolated build but tells the build venv which versions of the necessary DL libraries to install, ensuring we don't get linking issues when the finished build is moved from the isolated build venv into its end site-packages location (i.e. the none isolated poetry venv site-packages). |
Ah very nice, I understand now, the custom build backend reports the required dependencies based on the env-var values, and then it can be used in the isolated environment. I think I missed this part in your original comment. Maybe we should move the conversation back to the issue instead of this closed PR-draft, if you want feedback from the others 😅 |
This should fix missing modules errors when installed with
pip install --use-pep517 horovod
Description
Fixes #3697.