Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] Add support for NumPy 2.0, test against nightly versions of dependencies (fixes #6454) #6467

Merged
merged 15 commits into from
Jun 13, 2024

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented May 29, 2024

Fixes #6454.

This PR makes lightgbm compatible with numpy 2.x

  • converts np.array(..., copy=False) calls (which are no longer supported in NumPy 2.x) with np.asarray(...).
  • adds a CI job testing lightgbm against nightlies of matplotlib, numpy, pandas, pyarrow, scipy, and scikit-learn

Also fixes some other smalls things I noticed thanks to the new CI job.

  • makes it clearer when test_arrow.py tests are failing because pyarrow is installed but cffi is not
  • skips on test on pandas>=3.0 (which is not release yet), to deal with a change to pd.DataFrame(...)'s copying behavior when built from a numpy array

Notes for Reviewers

What does it mean that np.array(..., copy=False) is "no longer supported"?

Raises an exception if you try to ask it to do an impossible cast.

import numpy as np

X = np.array([1, 2, 3, 4], dtype=np.float32)
np.array(X, dtype=np.float64, copy=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Unable to avoid copy while creating an array as requested.
If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.

So lightgbm is going to start creating unnecessary copies again on pandas input, if you use numpy 2.0`?

No.

pandas is going to start creating copies on NumPy input under some conditions. Follow the conversation at pandas-dev/pandas#58913.

cc @jmoralez, since you did so much work on this e.g. in #4927 and #5612

@jameslamb jameslamb added feature and removed breaking labels Jun 4, 2024
@@ -20,6 +20,10 @@
else:
import pyarrow as pa # type: ignore

assert (
lgb.compat.PYARROW_INSTALLED is True
), "'pyarrow' and its dependencies must be installed to run the arrow tests"
Copy link
Collaborator Author

@jameslamb jameslamb Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found myself in a situation where pyarrow was installed but cffi was not.

That caused this import to fail:

from pyarrow.cffi import ffi as arrow_cffi

which led to all of the pyarrow imports being defined with the placeholders like this:

class pa_ChunkedArray: # type: ignore
"""Dummy class for pa.ChunkedArray."""
def __init__(self, *args: Any, **kwargs: Any):
pass

which led to every Arrow test failing in a hard-to-understand way.

pytest tests/python_package_test/test_arrow.py
==== short test summary info ====
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params0] - AssertionError: assert False
... 
FAILED tests/python_package_test/test_arrow.py::test_predict_ranking - TypeError: Wrong type(ChunkedArray) for label.
==== 139 failed in 11.77s ====

As of this change, it'll fail with a slightly clearer error message (and earlier, without running all the tests).

E   AssertionError: 'pyarrow' and its dependencies must be installed to run the arrow tests

@@ -89,6 +116,7 @@ jobs:
run: |
docker run \
--rm \
--env CMAKE_BUILD_PARALLEL_LEVEL=${{ env.CMAKE_BUILD_PARALLEL_LEVEL }} \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one that I missed in #6458, noticed it when replicating some of the configuration for the new CI job here

@jameslamb jameslamb changed the title WIP: [python-package] Add support for NumPy 2.0 (fixes #6454) WIP: [python-package] Add support for NumPy 2.0, test against nightly versions of dependencies (fixes #6454) Jun 4, 2024
@jameslamb jameslamb changed the title WIP: [python-package] Add support for NumPy 2.0, test against nightly versions of dependencies (fixes #6454) [python-package] Add support for NumPy 2.0, test against nightly versions of dependencies (fixes #6454) Jun 4, 2024
@jameslamb jameslamb marked this pull request as ready for review June 4, 2024 05:03
@jameslamb jameslamb mentioned this pull request Jun 12, 2024
33 tasks
Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jameslamb
Copy link
Collaborator Author

Thanks very much for the reviews! I'm very excited to now have this CI job testing against nightlies of lightgbm's dependencies. I hope it won't be too noisy, and will help us catch issues before any lightgbm users do.

@jameslamb jameslamb merged commit 1e7ebc5 into master Jun 13, 2024
40 checks passed
@jameslamb jameslamb deleted the fix/numpy-2.0 branch June 13, 2024 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[python-package] NumPy 2.0 support
3 participants