New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a bound TypeVar
for DataArray
and Dataset
methods
#8208
Conversation
I _think_ we should be using a `TypeVar(..., bound=...)` for our typing, since otherwise subtypes aren't allowed. I don't see a compelling reason to have both, though I might be missing something. So this changes all `T_DataWithCoords` to `T_Xarray`, and changes that type to bound on a union of `DataArray` & `Dataset`. This covers the bulk of the API which offer transformations of either a `DataArray` or `Dataset` and return that object. This does require some manual casts; I think because when there's a concrete path for both `DataArray` & `Dataset`, mypy doesn't unify them back together. It's also possible I'm missing something. One alternative — a minor change — would be to bound on `DataWithCoords`, rather than the union. A quick comparison showed both required some casts / ignores.
for more information, see https://pre-commit.ci
Here's a partial description of why we need |
window: Mapping[Any, int] | None = None, | ||
window_type: str = "span", | ||
**window_kwargs, | ||
) -> RollingExp[T_DataWithCoords]: | ||
) -> RollingExp[T_Xarray]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> RollingExp[T_Xarray]: | |
) -> RollingExp[Self]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this doesn't work, similar to the reason outlined below — T_Xarray
isn't compatible with T_DataWithCoords
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this doesn't work, similar to the reason outlined below —
T_Xarray
isn't compatible withT_DataWithCoords
...
I think thats why originally I was typing RollingExp using T_DataWithCoords
@@ -800,7 +801,10 @@ def __getitem__(self, key: GroupKey) -> T_Xarray: | |||
""" | |||
Get DataArray or Dataset corresponding to a particular group label. | |||
""" | |||
return self._obj.isel({self._group_dim: self.groups[key]}) | |||
return cast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's weird that these casts are now necessary...
On reflection, a problem with this came up in #7780 (comment):
|
Co-authored-by: Michael Niklas <mick.niklas@gmail.com>
Just some insight between a TypeVar with bound to union and simply both classes as args: T1 = TypeVar("T1", int, str)
T2 = TypeVar("T2", bound=Union[int, str])
def f1(x: T1, y: T1) -> T1:
return x + y
def f2(x: T2, y: T2) -> T2:
return x + y # mypy complains here mypy will complain twice, once for unsupported operand + between str and int and once for the return type (like your issue above). If you require that the types of x and y are the same, you need T1 (whjich is T_Xarray). Now of course in xarray DataArrays and Datasets behave quite nicely among themselves and you can basically do any operation on any combination of them. So it is quite natural to type them similar to T2 (which is T_DataArrayOrSet) But subclasses should work with the first approach class FStr(str):
pass
fstr = FStr("asd")
f1(fstr, fstr) # is allowed |
OK, I'm not sure how I came to such a misunderstanding. Thanks a lot for your explanation, very clear and helpful. Let me see whether I can resurrect anything useful out of this PR; at the very least I can add some comments on when to use which |
From the chars of pydata#8208
Also from the chars of pydata#8208 -- this uses the TypeVars we define, which hopefully sets a standard and removes ambiguity for how to type functions. It also allows subclasses. Where possible, it uses `T_Xarray`, otherwise `T_DataArrayOrSet` where it's not possible for `mypy` to narrow the concrete type down.
whats-new.rst
Edit: I added a comment outlining a problem with this
I think we should be using a
TypeVar(..., bound=...)
for our typing, since otherwise subtypes aren't allowed. I don't see a compelling reason to have all ofT_DataWithCoords
andT_DataArrayOrSet
andT_Xarray
, though I might be missing something.So this unifies all
T_DataWithCoords
,T_DataArrayOrSet
andT_Xarray
toT_Xarray
, and changes that type to be bound on a union ofDataArray
&Dataset
(similar to the existingT_DataArrayOrSet
). This covers the bulk of our API — functions which transform either aDataArray
orDataset
and return the input type.This does require some manual casts; I think because when there's a concrete path for both
DataArray
&Dataset
, mypy doesn't unify them back together. It's also possible I'm missing something.One alternative — a minor change — would be to bound on
DataWithCoords
, like the existingT_DataWithCoords
, rather than the union. A quick comparison showed each required some additional casts / ignores over the other.