Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Mention copy=True for __array__ method in the migration guide. #26097

Merged
merged 1 commit into from Apr 2, 2024

Conversation

mtsokol
Copy link
Member

@mtsokol mtsokol commented Mar 21, 2024

Addresses #25941 (comment)

Hi @ngoldbaum,

With this PR, explicit copy argument is always passed to __array__.
NPY_ARRAY_ENSURECOPY check was missing from flags processing, to cover all supported values for copy:

* @param copy Specifies the copy behavior
* NOTE: For copy == -1 it passes `op.__array__(copy=None)`,
* for copy == 0, `op.__array__(copy=False)`, and
* for copy == 1, `op.__array__(copy=True).

@seberg
Copy link
Member

seberg commented Mar 21, 2024

You need to be very careful here. Right now:

  • The ensure-copy flag seems still set, so I assume another copy is guaranteed to be made even if copy=True was passed and honored.
  • If copy=True is passed but not yet supported, you retry without it, but that code doesn't enforce a copy at all.
  • (I am not sure that the other code paths, e.g. conversion from buffers, ensure the copy correctly either, or if recursive calls in the discovery disable copy since a copy has to be made in either case.)

This would also significantly escalate the amount of DeprecationWarnings we see, so I am not sure I think it is worth to backporting (if that was the plan)?

@ngoldbaum
Copy link
Member

ngoldbaum commented Mar 21, 2024

If copy=True is passed but not yet supported, you retry without it, but that code doesn't enforce a copy at all.

EDIT: deleted a bad idea I had here originally

Also would you be up for adding docs for implementers on how to handle the copy keyword? For copy=True, we should tell implementers they need to make a copy if copy=True because in the future numpy will stop making copies. If for some reason it's impossible to copy or copying doesn't make sense, then an error has to be raised. Here "copying" specifically means that __array__ should return a new numpy array that owns its own data. That means any array-like defining __array__ that stores its data outside of a numpy array has to always copy in __array__ and therefore should probably always error if copy=False is passed.

@ngoldbaum
Copy link
Member

This would also significantly escalate the amount of DeprecationWarnings we see, so I am not sure I think it is worth to backporting (if that was the plan)?

Ah this is a good point. If you run the scipy or scikit-learn tests with this PR applied, do you see a bunch of new deprecation warnings? I think I agree with Sebastian here, we’re going to explicitly pass a copy keyword argument more often with this change, since before if someone did:

np.array(arraylike_that_doesnt_implement_copy, copy=True)

The copy keyword wouldn’t get passed in to __array__ at all, because we were implicitly passing None by ignoring True, and for the default case we don’t pass the keyword at all.

So another way to generating a bunch of new unnecessary downstream warnings or errors would be to only print the deprecation warning if someone explicitly passed in copy=False. For copy=True most libraries were already doing the correct thing, and if they don’t make a copy then numpy will. In the future we could deprecate not having copy and dtype at all, and not just when copy=False is passed once there’s more widespread availability in downstream implementations.

@ngoldbaum
Copy link
Member

It occurs to me we should also update the release notes discussion for this change if we want it to be in the 2.0 release.

@rgommers
Copy link
Member

I had the impression that the comment this PR is trying to address was quite minor, and non-critical for 2.0. What was critical for 2.0 was updating the docs. And then for 2.1 we can work on the change needed to ensure that there will never be two copies. Does that sound right?

@seberg
Copy link
Member

seberg commented Mar 23, 2024

I had the impression that the comment this PR is trying to address was quite minor, and non-critical for 2.0

Right, I agree. The only thing that may be nice for 2.0 is adding a single half sentence to the documentation saying: copy=True must ensure a copy is returned (however, as of 2.0, NumPy never passes True!).
Anything more is unnecessary and would create more churn (and in the first PR we said we should avoid exactly that churn to force quick copy= adoption).

However, there is something else here (should maybe make a new issue):

class m:
    def __array__(self):
        return np.arange(3)

np.array(m(), copy=False)

Gives:

<ipython-input-3-a3c23c535a07>:1: UserWarning: __array__ should implement 'dtype' and 'copy' keywords
  np.array(m(), copy=False)
Out[3]: array([0, 1, 2])

Which seems wrong. It must fail (the copy=False was clearly not honored) with an error that suggest that the user probably meant asarray (copy=None), but if not the __array__ needs to implement the copy kwarg.

I am not even sure we need the warning in NumPy 2.0 (i.e. this PR needs it but copy=False fails anyway, and copy=None doesn't pass it), but if we have it would be better as a DeprecationWarning?

@mtsokol mtsokol force-pushed the fix-copy-arg-array branch 2 times, most recently from 764eaa1 to 1814aa8 Compare March 27, 2024 13:48
@mtsokol mtsokol changed the title BUG: Always pass explicit copy arg to __array__ DOC: Mention copy=True for __array__ method in the migration guide. Mar 27, 2024
@mtsokol
Copy link
Member Author

mtsokol commented Mar 27, 2024

The only thing that may be nice for 2.0 is adding a single half sentence to the documentation saying: copy=True must ensure a copy is returned

Ok, I removed original code changes and mentioned "copy=True must return a copy" in the migration guide in the __array__ section. Also, it's already explained in arrays.classes.rst in the __array__ entry.

@charris charris added 04 - Documentation 09 - Backport-Candidate PRs tagged should be backported labels Mar 27, 2024
@mtsokol mtsokol removed the 00 - Bug label Mar 28, 2024
@ngoldbaum
Copy link
Member

Sorry for missing this last week, let's pull this in.

@ngoldbaum ngoldbaum merged commit 6d4c2c4 into numpy:main Apr 2, 2024
4 checks passed
@charris charris changed the title DOC: Mention copy=True for __array__ method in the migration guide. DOC: Mention copy=True for __array__ method in the migration guide. Apr 2, 2024
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Apr 2, 2024
@mtsokol mtsokol deleted the fix-copy-arg-array branch April 2, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants