Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Python CMake #14565

Merged
merged 36 commits into from
Dec 11, 2023
Merged

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Dec 4, 2023

Description

This PR makes a number of changes to streamline the CMake code in the Python build. There are additional potential improvements that may be possible but I'd prefer to wait on until after the scikit-build-core migration because I suspect that there may be subtle differences (particularly around the install rules). The major improvements here are:

  • nvcomp installation no longer needs as much special casing thanks to Add missing nvcomp targets rapids-cmake#496 adding missing targets
  • get_arrow.cmake now determines how to find arrow by checking for 1) the presence of Python, 2) the presence of pyarrow, and 3) the presence of libarrow.so.* inside the pyarrow directory. This approach works for both pip and conda packages as well as pure C++ builds and removes the need for manual determination of where to look by the builder. The assumption baked in is that the developer has installed the desired pyarrow package when building.
  • The CUDF_BUILD_WHEELS variable is now removed. All Python builds that don't find the C++ library all go down the same path now. This is particularly relevant for doing something like a local pip install. This is the desired behavior because if you're building the Python package in that kind of environment you always want the same behaviors. The different case is when the Python build finds the C++ build (e.g. in a conda environment where libcudf is a separate package).
  • The logic for linking to pyarrow headers is now centralized in a single function since it needs to be used by every Cython target.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added CMake CMake build issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 4, 2023
@vyasr vyasr self-assigned this Dec 4, 2023
@vyasr vyasr requested review from a team as code owners December 4, 2023 21:38
@github-actions github-actions bot added Python Affects Python cuDF API. ci labels Dec 4, 2023
Comment on lines +81 to +87
# TODO: This install is currently overzealous. We should only install the libraries that are
# downloaded by CPM during the build, not libraries that were found on the system. However, in
# practice right this would only be a problem is if libcudf was not found but some of the
# dependencies were, and we have no real use cases where that happens.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to try and fix this right now. I would like to manage the scikit-build-core transition first, then revisit the best way for us to handle install rules. Some anecdotal experiments suggest that the configure/build paths might be a bit different in scikit-build-core, so I don't want to do redundant work.

@vyasr vyasr marked this pull request as draft December 5, 2023 05:12
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 5, 2023
Copy link

copy-pr-bot bot commented Dec 5, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@vyasr vyasr marked this pull request as ready for review December 5, 2023 23:38
cpp/cmake/thirdparty/get_arrow.cmake Show resolved Hide resolved
cpp/cmake/thirdparty/get_arrow.cmake Outdated Show resolved Hide resolved
cpp/cmake/thirdparty/get_arrow.cmake Show resolved Hide resolved
@@ -432,7 +446,35 @@ if(NOT DEFINED CUDF_VERSION_Arrow)
)
endif()

# If the pyarrow package contains libarrow we want to link against it directly instead of searching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should be explicitly opt-in.
If you are building libcudf on a machine with a system installed pyarrow you would always use that with no way to opt out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh you're right, but this is unfortunate. It does mean we continue to have to specify nonzero information from the CLI when invoking certain (wheel) builds, but I guess that's not the worst.

@raydouglass raydouglass merged commit fc0f181 into rapidsai:branch-24.02 Dec 11, 2023
66 of 67 checks passed
@vyasr vyasr deleted the fix/simplify_python_cmake branch December 11, 2023 19:55
karthikeyann pushed a commit to karthikeyann/cudf that referenced this pull request Dec 12, 2023
This PR makes a number of changes to streamline the CMake code in the Python build. There are additional potential improvements that may be possible but I'd prefer to wait on until after the scikit-build-core migration because I suspect that there may be subtle differences (particularly around the install rules). The major improvements here are:
- nvcomp installation no longer needs as much special casing thanks to rapidsai/rapids-cmake#496 adding missing targets
- get_arrow.cmake now determines how to find arrow by checking for 1) the presence of Python, 2) the presence of pyarrow, and 3) the presence of libarrow.so.* inside the pyarrow directory. This approach works for both pip and conda packages as well as pure C++ builds and removes the need for manual determination of where to look by the builder. The assumption baked in is that the developer has installed the desired pyarrow package when building.
- The `CUDF_BUILD_WHEELS` variable is now removed. All Python builds that don't find the C++ library all go down the same path now. This is particularly relevant for doing something like a local `pip install`. This is the desired behavior because if you're building the Python package in that kind of environment you always want the same behaviors. The different case is when the Python build finds the C++ build (e.g. in a conda environment where libcudf is a separate package).
- The logic for linking to pyarrow headers is now centralized in a single function since it needs to be used by every Cython target.

Authors:
   - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
   - https://github.com/brandon-b-miller
   - Robert Maynard (https://github.com/robertmaynard)
   - Ray Douglass (https://github.com/raydouglass)
@jameslamb jameslamb mentioned this pull request Aug 21, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants