Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Skip Arrow tests on AppVeyor, use Intel macOS runners, upgrade to XCode 14.3 on macOS jobs, disable MacOS MPI jobs #6425

Merged
merged 19 commits into from
Apr 29, 2024

Conversation

borchero
Copy link
Collaborator

@borchero borchero commented Apr 24, 2024

Motivation

Fix failing CI on main.

This PR does two things:

  • Skip Arrow tests in AppVeyor builds using an old Visual Studio version. The idea that pyarrow only works with newer versions of Visual Studio comes from this comment in the pyarrow feedstock:

    You'll need at least VS2019 to successfully link the DLLs we're producing here

  • Bump the version of Xcode used in MacOS GitHub runners (macOS-latest seems to have been updated recently).

@borchero borchero self-assigned this Apr 24, 2024
@borchero borchero marked this pull request as ready for review April 24, 2024 00:54
@borchero
Copy link
Collaborator Author

borchero commented Apr 24, 2024

This is still WIP, I just want to get CI running.

@borchero
Copy link
Collaborator Author

At least for compiler MSVC, using Visual Studio 2019 would fix the issue. I'm also trying an older pyarrow build for completeness.

@jameslamb
Copy link
Collaborator

jameslamb commented Apr 24, 2024

Thanks for investigating this! In case it helps, I searched this error message:

  File "C:\Miniconda3-x64\envs\test-env\Scripts\pytest-script.py", line 9 in <module>
Windows fatal exception: code 0xc0000139

And found this relevant-looking discussion that's also about pytest + math packages + Windows: numba/numba#8223.


re: some of the comments you got on conda-forge/arrow-cpp-feedstock#1374 ...

here's the main script that runs for Appveyor jobs in this repo: https://github.com/microsoft/LightGBM/blob/master/.ci/test_r_package_windows.ps1.

(edit: wrong link, use this one) https://github.com/microsoft/LightGBM/blob/master/.ci/test_windows.ps1

You can add any commands you want (like conda info or conda env list) and push a commit here to get a new build and new logs.

You should also be able to log in with your GitHub account at https://ci.appveyor.com/project/guolinke/lightgbm/history and cancel / restart any builds. If that doesn't work for you, let me know and we can get you that access.

@borchero
Copy link
Collaborator Author

All right, two observations:

@jameslamb
Copy link
Collaborator

Thank you! Could you try keeping visual studio 2015 and just skipping the arrow tests on appveyor, to see if that helps?

Testing on visual studio 2015 is a main motivation for having appveyor here. We get coverage of newer versions on GitHub Actions and Azure DevOps windows jobs.

@jameslamb jameslamb changed the title [ci] Use Visual Studio 2019 for AppVeyor builds [ci] Skip Arrow tests on AppVeyor, upgrade to XCode 14.3 on macOS jobs Apr 24, 2024
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you SO MUCH for working through these issues! I really really appreciate it.

It looks to me like this is working great... I see the arrow tests running where we expect them to (e.g. on Azure DevOps) and being skipped on AppVeyor (build link).

In the future when you work through these issues that are blocking all CI, feel free to kill any CI jobs that are blocking you. For example, #6426 came in while you were working on this, and I suspect you might have waited the 30-ish minutes for its AppVeyor jobs to run. Just wanted to say that it's totally fine to do that. My approach to that is like "those other PRs aren't going to be mergeable until I fix CI anyway, so they'll have to re-run anyway".

tests/python_package_test/test_arrow.py Outdated Show resolved Hide resolved
.ci/setup.sh Outdated Show resolved Hide resolved
@borchero
Copy link
Collaborator Author

Since CI still doesn't work: I noticed that, very recently, macos-latest runners were upgraded from MacOS 12 to MacOS 14 and now use Apple Silicon M1 chips (i.e. an arm architecture), see docs. This means, however, that many tests won't work on macos-latest (e.g. Python 3.7 is already EOL and not available for Apple Silicon on conda-forge). I am downgrading to macos-13 now and we can go back to macos-latest in the future.

@jameslamb
Copy link
Collaborator

Thanks for the investigation! 100% agree with changest all the macos-latest to the latest intel version.

GitHub not having arm64 mac support for YEARS and then giving us just 3 months before making it the default is.... sigh 😫

@borchero
Copy link
Collaborator Author

Unfortunately, cmake fails to find the brew-installed OpenMPI package now although the dylib is located on the exact same path as before. Do you have an idea how to fix this @jameslamb?

@borchero borchero changed the title [ci] Skip Arrow tests on AppVeyor, upgrade to XCode 14.3 on macOS jobs [ci] Skip Arrow tests on AppVeyor, use Intel macOS runners, upgrade to XCode 14.3 on macOS jobs Apr 25, 2024
@jameslamb
Copy link
Collaborator

I see this from brew install open-mpi.

==> Pouring open-mpi--5.0.3.ventura.bottle.tar.gz
🍺  /usr/local/Cellar/open-mpi/5.0.3: 2,615 files, 74.4MB

And then this later on

-- Could NOT find MPI_C (missing: MPI_C_WORKS) 
CMake Error at /usr/local/Cellar/cmake/3.29.2/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS) 
  Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND)
-- Configuring incomplete, errors occurred!
Call Stack (most recent call first):
  /usr/local/Cellar/cmake/3.29.2/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/Cellar/cmake/3.29.2/share/cmake/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  CMakeLists.txt:135 (find_package)

(build link)

LightGBM is relying on the FindMPI script that ships with CMake. Here's the content of that script in case it helps: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/FindMPI.cmake.

I see you're trying the CMAKE_PREFIX fix mentioned at https://stackoverflow.com/a/71115879/3986677 ... at a minimum, I think that path needs to be into /usr/local/Cellar/open-mpi/5.0.3. But first, I recommend trying removing that CMAKE_PREFIX stuff and running this:

brew link open-mpi

ref: https://github.com/Homebrew/brew/blob/master/docs/FAQ.md#what-does-keg-only-mean

Sorry, will have more time to help later.

@jameslamb
Copy link
Collaborator

ugh that still didn't work.

I vote we just remove the MPI jobs on macOS for now. Delete the job definitions from the GitHub Actions config and document the need to get them working again in a new issue, linking back to this PR.

MPI jobs are working on Linux (build link), that's enough to cover most of the relevant code in the library.

Let's not let these MPI macOS tests block CI for the whole project.

@jameslamb
Copy link
Collaborator

@shiyu1994 can you please fix the VM where the CUDA tests run?

All CUDA CI jobs have been failing for the last 2 days. They hit something like this:

Failed to restart docker.service: Connection timed out
See system logs and 'systemctl status docker.service' for details.
Error: Process completed with exit code 1.

(build link)

Then hang until GitHub Actions kills them for hitting the 60 minute timeout.

@borchero borchero changed the title [ci] Skip Arrow tests on AppVeyor, use Intel macOS runners, upgrade to XCode 14.3 on macOS jobs [ci] Skip Arrow tests on AppVeyor, use Intel macOS runners, upgrade to XCode 14.3 on macOS jobs, disable MacOS MPI jobs Apr 25, 2024
@jameslamb
Copy link
Collaborator

Thanks to @shiyu1994 for fixing the CUDA CI!

I've retriggered it here. I'll merge this once it passes. After that, let's get #6274 updated and merged.

Then let's do a release 😁 . I'll put up a release PR in the next day or 2.

@@ -17,7 +17,9 @@ if [[ $OS_NAME == "macos" ]]; then
sudo xcode-select -s /Applications/Xcode_11.7.app/Contents/Developer || exit 1
fi
else # gcc
sudo xcode-select -s /Applications/Xcode_14.1.app/Contents/Developer || exit 1
# Check https://github.com/actions/runner-images/tree/main/images/macos for available
# versions of Xcode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to say earlier... thanks very much for adding this comment on an otherwise kind of magical hard-coded string here 😁

@jameslamb jameslamb merged commit 92a8741 into master Apr 29, 2024
38 checks passed
@jameslamb jameslamb deleted the app-veyor-vs-2019 branch April 29, 2024 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants