Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mypy does not understand output of binary operations #7780

Open
4 tasks done
Illviljan opened this issue Apr 23, 2023 · 8 comments
Open
4 tasks done

mypy does not understand output of binary operations #7780

Illviljan opened this issue Apr 23, 2023 · 8 comments

Comments

@Illviljan
Copy link
Contributor

What happened?

When doing operations on numpy arrays and xarray variables mypy does not understand that the output is always a xarray variable regardless of the order. See example.

What did you expect to happen?

mypy to pass for the example code.

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr


x = np.array([1, 2, 4])
v = xr.Variable(["x"], x)

# numpy first:
xv = x * v
xv.values  # error: "ndarray[Any, dtype[bool_]]" has no attribute "values"  [attr-defined]
if isinstance(xv, xr.Variable):
    xv.values

# variable first:
vx = v * x
vx.values
if isinstance(vx, xr.Variable):
    vx.values

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

Seen in #7741

Environment

xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.16 (main, Mar 8 2023, 10:39:24) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
libhdf5: 1.10.6
libnetcdf: None

xarray: 2023.4.2
pandas: 2.0.0
numpy: 1.23.5
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.4.0
distributed: 2023.4.0
matplotlib: 3.5.3
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.4.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.1
pip: 23.1.1
conda: 23.3.1
pytest: 7.3.1
mypy: 1.2.0
IPython: 8.12.0
sphinx: 6.1.3

@Illviljan Illviljan added bug needs triage Issue that has not been reviewed by xarray team member topic-typing labels Apr 23, 2023
@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Apr 24, 2023
@headtr1ck
Copy link
Collaborator

headtr1ck commented Sep 17, 2023

That's interesting.
I just realized that the _typed_ops.pyi type definitions are wrong.
e.g. DataArray.__add__ will return NotImplemented on Datasets which in turn should then call Dataset.__radd__.
In type definitions, it should simply not accept Datasets at all, and mypy will realize this and use Dataset.__radd__ instead.
So all these overloads are useless.

But not sure if this solves the problem, I have not tested it yet!

In a mini test example this worked for me:

class A:
    i: int

    def __init__(self, i: int) -> None:
        self.i = i

    def __add__(self, other: A | int) -> A:
        if isinstance(other, A):
            return A(self.i + other.i)
        if isinstance(other, int):
            return A(self.i + other)
        return NotImplemented

    def __radd__(self, other: A | int) -> A:
        return self + other

    def __str__(self):
        return f"A({self.i})"

class B:
    ii: dict[str, int]

    def __init__(self, ii: dict[str, int]) -> None:
        self.ii = ii

    def __add__(self, other: A | B | int) -> B:
        if isinstance(other, B):
            return B(
                {k: i + other.ii.get(k, 0) for k, i in self.ii.items()}
                | {k: o for k, o in other.ii.items() if k not in self.ii}
            )
        if isinstance(other, A):
            return B({k: i + other.i for k, i in self.ii.items()})
        if isinstance(other, int):
            return B({k: i + other for k, i in self.ii.items()})
        return NotImplemented

    def __radd__(self, other: A | B | int) -> B:
        return self + other

    def __str__(self):
        return f"B({self.ii})"

a = A(1)
b = B({"a": 5})

print(a + 1)
print(1 + a)
print(a + a)

print(b + 5)
print(5 + b)
print(b + b)

print(a + b)
print(b + a)

reveal_type(a + 1)
reveal_type(1 + a)
reveal_type(a + a)

reveal_type(b + 5)
reveal_type(5 + b)
reveal_type(b + b)

reveal_type(a + b)
reveal_type(b + a)

Note how A does not know anything about B in code and in typing.

@headtr1ck
Copy link
Collaborator

I have entered quite the rabbit hole here.
This is not only a typing issue anymore :/

Somehow the operators used in xarray are not doing what the python standard dictates: return NotImplemented when you get an unknown type. Instead we pass everything down to the underlying data types.
This results in weird error messages like:

import xarray as xr
xr.DataArray([1, 2, 3]) + {"asd"}

raises:

TypeError: unsupported operand type(s) for +: 'int' and 'set'

but it should be "TypeError: unsupported operand type(s) for +: 'xarray.core.DataArray' and 'set'"

It seems that we have done this only for xarray objects but not all other types.

Back to static typing:
I still don't know how to best type the other argument of these ops...
In principle this method is supposed to accept ANYTHING, so the type should be object and then I might have to add some overloads for the compatible data and their return types.
However, -> NotImplemented or Literal[NotImplemented] or similar does not work. It seems that mypy is happy if I remove this overload entirely (i.e. do not use other: object at all and only use the compatible one), I only get some complaints about the Mapping class which could be ignored.

@headtr1ck
Copy link
Collaborator

headtr1ck commented Sep 18, 2023

Ok, I could improve the typing a bit of the ops but I fear that numpy is too dominant here...

Numpy types its __add__ with a type that includes SupportsArray, a Protocol that defines an __array__ method.
image

In other words: a xarray.Variable falls into this type and the type checker will always think that np.ndarray.__add__(self, xr.Variable(...)) -> np.ndarray just as you wrote.

So this particular problem is a numpy upstream problem :/ Feel free to open a PR over there, but I don't see a quick fix for this on numpys side...

@max-sixty

This comment was marked as resolved.

@headtr1ck

This comment was marked as resolved.

@max-sixty

This comment was marked as resolved.

@headtr1ck
Copy link
Collaborator

Related:

[1, 2 ,3] == DataArray([1, 2, 3], dims=["t"])

will be inferred to bool. Same for anything basically, including np.ndarray.
There is no way to prevent this currently. Requires a typeshed update.

Caused by python/typeshed#8217 / https://discuss.python.org/t/make-type-hints-for-eq-of-primitives-less-strict/34240

@max-sixty
Copy link
Collaborator

Is there anything we can do here or should we close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants