Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run CI on python=3.12 #8605

Merged
merged 6 commits into from
Jan 17, 2024
Merged

run CI on python=3.12 #8605

merged 6 commits into from
Jan 17, 2024

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Jan 12, 2024

@keewis
Copy link
Collaborator Author

keewis commented Jan 12, 2024

@Illviljan, it appears on python=3.12 the trick with _arrayfunction does not work for pint anymore:

from xarray.namedarray._typing import _arrayfunction
import pint

ureg = pint.UnitRegistry()
q = ureg.Quantity([0, 1, 2], "m")
isinstance(q, _arrayfunction)  # True on `python < 3.12`, False on `python=3.12`

I assume this is due to the caveat about the performance improvements in this section? Do you have any idea what we have to do to fix this (here or in pint)?

Edit: it believe the reason is that pint defines several properties as dynamic (i.e. ureg.Quantity(pyscalar, unit).ndim exists but raises), which with the recent changes to runtime_checkable will not be checked properly pint apparently forgot to explicitly define the dtype property, it is only available through __getattr__

@keewis
Copy link
Collaborator Author

keewis commented Jan 12, 2024

as it turns out, my edit above is wrong: Quantity.dtype is only available through the __getattr__ fallback, explicitly defining it as a property makes the check work again. I'll put in a PR to fix this in pint, but in the meantime we need to figure out how to deal with this, as it causes a number of our tests to fail.

@Illviljan
Copy link
Contributor

Was the dtype fix in pint enough to solve all the errors, @keewis?

I didn't go through all the errors but a few seems to be due to triggering a reduction like .mean, which ends up in .reduce which then ends up in from_array:

return from_array(dims, data, attrs=self._attrs)

Because NamedArray.data should always be the same duckarray type (numpy, dask, etc) regardless of the operation, I'd argue the safety checks done in from_array is redundant and should be replaced with self._new instead.

That's my high level point of view at least...

I've been looking into .reduce before and the problem if I remember correctly is that you somehow can get ArrayLike values out of the reductions. I have some draft code how to clean .reduce up in #8344, but that one still had from_array unfortunately.

@keewis
Copy link
Collaborator Author

keewis commented Jan 17, 2024

yes, that was enough to fix every issue (at least on my machine, not entirely sure about windows / macos). To unblock this we decided in the meeting today to remove pint from the 3.12 environment until there's at least one released version that does not cause the tests to fail.

Because NamedArray.data should always be the same duckarray type (numpy, dask, etc) regardless of the operation, I'd argue the safety checks done in from_array is redundant and should be replaced with self._new instead.

We need to be careful about removing runtime checks, as this can easily become unsafe – though if we can ensure for normal usage that we don't work on data that has never been checked I would agree, checking more than once is not necessary. As an example for when this can go wrong, xr.namedarray.core.NamedArray("x", range(10)) currently raises an AttributeError because range objects don't have the shape attribute (I've been meaning to open an issue for this, but didn't get around to it until now).

The reason is that `pint` currently does not explicitly define
`dtype`, causing our runtime checkable protocols not to match.
@dcherian
Copy link
Contributor

dcherian commented Jan 17, 2024

Because NamedArray.data should always be the same duckarray type (numpy, dask, etc) regardless of the operation

Unfortunately not true for our lazy indexing wrappers.

EDIT: I guess those live in _data

@Illviljan
Copy link
Contributor

Illviljan commented Jan 17, 2024

We need to be careful about removing runtime checks, as this can easily become unsafe – though if we can ensure for normal usage that we don't work on data that has never been checked I would agree, checking more than once is not necessary. As an example for when this can go wrong, xr.namedarray.core.NamedArray("x", range(10)) currently raises an AttributeError because range objects don't have the shape attribute (I've been meaning to open an issue for this, but didn't get around to it until now).

My point of view here is technically NamedArray hasn't 'removed' any runtime checks as it's not public yet. Now NamedArray has a performant initialization with the promise that the user inputs a valid array function or array api. If the user is uncertain about the data, then use the slower from_array / asarray functions. Cubed uses this approach which I think was the correct decision. This was discussed a bit in #8294.

Variable and DataArray both suffers performance from the overuse of as_compatible_data in the init, which is hard for the user to avoid.

Because NamedArray.data should always be the same duckarray type (numpy, dask, etc) regardless of the operation

Unfortunately not true for our lazy indexing wrappers.

EDIT: I guess those live in _data

The lazy wrappers needs more love to be considered array_functions/array_api is what I'm reading!
But the necessary tweaks to keep them working can still be on the Variable level, which I think it mostly is right now.
It would be nice to try and keep NamedArray pure, without inheriting a bunch of strange workarounds from Variable for a short while at least.

@keewis
Copy link
Collaborator Author

keewis commented Jan 17, 2024

My point of view here is technically NamedArray hasn't 'removed' any runtime checks as it's not public yet.

As long as that was a intentional decision with runtime usage in mind I'm fine with the current state.

@keewis keewis merged commit e91ee89 into pydata:main Jan 17, 2024
29 checks passed
@keewis keewis deleted the python-3.12 branch January 17, 2024 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add py3.12 CI and update pyproject.toml
4 participants