-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] fix CUDA 11.8 builds (fixes #6466) #6465
Conversation
So, right before building a
So there's only one difference... on the
Seeing an error about |
I see this in the logs of the
added a
So something is getting left behind, probably a result of the changes from #6458. Here's how the container is being run:
Going to try cleaning this up at the beginning of builds:
|
I tried changing the version for I know that because even after changing the version number to full pip freeze (click me)
|
Alright I think this is ready for review! Whenever this is merged, CI will be working again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for the extensive description/debugging log!
Thanks for the quick review! |
Fixes #6466
The CUDA 11.8
wheel
(gcc
) CI job is failing like this:This fixes that by switching all
pip install --user
calls in CI jobs to simplypip install
.Notes for Reviewers
In short, the problem is a mix of the following:
pip install --user
in CI jobs is installinglightgbm
into a location like/github/home/.local/lib/python3.10/site-packages
/github/home
is a volume mount from the self-hosted runner we use ([ci] fix CUDA 11.8 builds (fixes #6466) #6465 (comment))pip install
in subsequentwheel
CI jobs is refusing to install the newly-built-in-that-CI-run wheel, instead using one from a different job (which might use a different CUDA version or compiler)This PR fixes that by switching all uses of
pip install --user
to simplypip install
in LightGBM's CI jobs. That results inpip
installinglightgbm
into${CONDA}/envs/test-env/lib/python${ver}/site-packages
. On CUDA jobs,${CONDA}
is at/tmp/miniforge
, a location that doesn't have a mount back to the host... so nothing is left over from build-to-build 😁How do you know
lightgbm
was being installed in/home/github
?You can see the absolute path to which
lightgbm
is being loaded in the test logs, like this:(example failed build)