Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better introspection for comparing numpy arrays #5347

Closed
timhoffm opened this issue May 30, 2019 · 15 comments
Closed

Better introspection for comparing numpy arrays #5347

timhoffm opened this issue May 30, 2019 · 15 comments
Labels
topic: reporting related to terminal output and user-facing messages and errors topic: rewrite related to the assertion rewrite mechanism type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature

Comments

@timhoffm
Copy link
Contributor

timhoffm commented May 30, 2019

With plain python lists I can do:

a = [1, 2, 3]
b = [1, 2, 4]
assert a == b

That does not work for numpy arrays

        a = np.array([1, 2, 3])
        b = np.array([1, 2, 4])
>       assert a == b
E       ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

There are two ways out:

  • Using numpy.testing

    np.testing.assert_equal(a, b)
    

    This has the advantage of giving good context information on the error.
    But it is far from the simple assert assert a == b paradigm in pytest.

  • Alternatively we can aggregate the the check to a single bool

    assert np.all(a == b)
    

    This looks much nicer, but pytest cannot tell anymore where the error is:

    >       assert np.all(a == b)
    E       assert False
    E        +  where False = <function all at 0x7f57cd65eea0>(array([1, 2, 3]) == array([1, 2, 4])
    E        +    where <function all at 0x7f57cd65eea0> = np.all
    

Question: Would it in principle be possible with pytest's introspection capabilities to resolve the contents of np.all() and give a better error message? In a frist step maybe something like

assert np.all(a == b)
  At index 2: False

This of course, would require an assumption on the semantics of np.all.

Alternative idea: Could pytest be made to rewrite assert a == b to np.testing.assert_equal(a, b) if either a or b is a numpy array? Possibly not a good idea because it introduces a different semantic compared to a plain a == b.

@asottile
Copy link
Member

asottile commented Jun 2, 2019

== is not an equality check for numpy arrays

>>> x = np.array([1, 2, 3])
>>> y = np.array([1, 2, 3])
>>> x == y
array([ True,  True,  True])

@Zac-HD
Copy link
Member

Zac-HD commented Jun 3, 2019

IMO the idiomatic solution is to use np.testing, and if you want more detailed feedback the right place to generate more detailed messages would be upstream in Numpy.

(we've run into similar issues in Hypothesis; numpy arrays are just complex enough to need special handling and papering over the differences only makes it worse in the long run 😭)

@twmr
Copy link
Contributor

twmr commented Jun 3, 2019

Does pytest allow to write e.g. a numpy testing plugin that extends the assert statement bytecode rewriter? This plugin would also need to provide a new version of the pytest.approx function, which exposes the functionality of the numpy.testing.assert_* functions.

@Zac-HD Zac-HD added topic: reporting related to terminal output and user-facing messages and errors type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature labels Jun 5, 2019
@nicoddemus
Copy link
Member

nicoddemus commented Jun 6, 2019

numpy testing plugin that extends the assert statement bytecode rewriter?

That's an interesting proposal. We would need to figure out an API and hooks in order to do that though. This is interesting as it would allow all kinds of extensions, like the one proposed here and #3479.

@asottile
Copy link
Member

asottile commented Jun 6, 2019

even with that, I don't think the one proposed here is a good proposal, numpy overrides == to implement a projection -- if that code were suddenly in a test and didn't act as a projection that would be pretty surprising (maybe not as surprising as numpy doing that in the first place, but still a deviation from how the code actually executes)

@timhoffm
Copy link
Contributor Author

timhoffm commented Jun 8, 2019

I agree that rewriting a == b is a bit dangerous. OTOH the specific expression assert a == b does currently not have a defined meaning. But nevertheless it's probably not good to make a == b context dependent. And it still does not solve the approx/allclose cases. This was just an idea i wanted to add to the discussion to have the space of possible solutions covered.

Independently of this is the question if pytest could introspect assert np.all(a == b) or assert np.allclose(a, b). Essentially, that could replace the need for explicit numpy.testing.* functions. Instead of

assert a.dtype == b.dtype
nptest.assert_allclose(a, b)

one could then write

assert a.dtype == b.dtype
assert np.allclose(a, b)

which restores the pattern of starting all test conditions with assert.

@samueljsb
Copy link
Contributor

The error is a numpy error, which you'd get when asserting the truthiness of an array regardless of whether you're using pytest or not (this falls out of the numpy implementation of __eq__ for arrays). Is pytest core really the right place to change that behaviour? The error message suggests a solution: use assert all(a==b).

@timhoffm
Copy link
Contributor Author

timhoffm commented Aug 5, 2019

@samueljsb As I've written in the original message assert np.all(a == b) is one way of writing this. But it has the disadvantage of a not quite instructive error message.

Additional problem: If the dimensions are not the same, numpy currently just returns False for a == b. Thus, tests can fail with an obscure error message:

>       assert all(a == b)
E       TypeError: 'bool' object is not iterable

Fundamental conclusion: Numpy arrays are too complex for attempting any simple assert * with introspection from the pytest side. The comparison logic and message must come from numpy.

These are implemented in numpy.testing. At present

assert a.dtype == b.dtype
nptest.assert_allclose(a, b)

is the right way to go.

Possible enhancement: A plugin system for assert statement bytecode rewriting as proposed in #5347 (comment) could allow to write

assert a.dtype == b.dtype
assert np.allclose(a, b)

The rewrite rule assert np.allclose --> nptest.assert_allclose would have to be provided by numpy. This way users would not have to learn about np.testing explicitly. They would use normal numpy comparison functions but would still receive more instructive error messages.

@MachinesJesus
Copy link

MachinesJesus commented Jun 17, 2020

Informative dev issue. Thanks!

I ended up just doing:

assert np.testing.assert_allclose(a, expected_a) is None

seems to work for now, don't know if you guys have fixed it or have a better patch. Will be using this a lot.

@timhoffm
Copy link
Contributor Author

timhoffm commented Jun 17, 2020

@MachinesJesus that‘s rather obscuring what is happening. assert_allclose always returns None. So the assert never fails. It‘s the assert_allclose that raises the exception and formats the error message. When using that you don‘t need the outer assert.

@MachinesJesus
Copy link

Hi @timhoffm, Thanks yes I tested and found unnecessary. Will keep watching this for a better solution. It would be great to see what value of the array it failed on. Chris

@pckroon
Copy link

pckroon commented Aug 24, 2020

Hi all,

I'm having a similar issue, except that my numpy arrays are buried deep in some custom classes (which have dicts, which have dicts, which may contain arrays). Calling e.g. np.testing methods doesn't help me.

When trying to create a custom reporter (pytest_assertrepr_compare) I can't figure out how to compare/report the difference between the array containing dicts without having to reimplement the existing reporting mechanism for dicts.

@AdityaSavara
Copy link

AdityaSavara commented Sep 6, 2020

Hello, this issue of using nested objects is at least partly covered by the customCompare and compareNested functions inside https://github.com/AdityaSavara/UnitTesterSG

It even intentionally allows for comparing lists to arrays etc. because in scientific applications we may not be trying to compare apples to apples, and just want to know that the inside is the same.

I think it would be great if the custom functions from UnitTesterSG made it into pytest in some form. Right now, UnitTesterSG is made to be compatible with pytest. A typical unit test file is shown here: https://github.com/AdityaSavara/UnitTesterSG/blob/master/test12/test_12.py

@nicoddemus
Copy link
Member

I think we can close this now that pytest.approx supports numpy arrays, so one should use:

a = np.array([1, 2, 3])
b = np.array([1, 2, 4])
assert a == pytest.approx(b)

@Liquidmasl
Copy link

I think we can close this now that pytest.approx supports numpy arrays, so one should use:

a = np.array([1, 2, 3])
b = np.array([1, 2, 4])
assert a == pytest.approx(b)

While this is a working workaround for comparing arrays while still seeing the difference, this opens a whole can of worms of other issues, the approx really takes it self seriously with the "approximate":

np.array([1,2,3]) == approx(np.array([1,2,3]))
Out[22]: True
[] == approx(np.array([]))
Out[23]: True
0 == approx(np.array([]))
Out[24]: True
"" == approx(np.array([]))
Out[25]: True
(0,) == approx(np.array([]))
Out[26]: False
(0) == approx(np.array([]))
Out[27]: True
(0) == approx(np.zeros([2,3]))
Out[29]: True

I am struggling with numpys overwritten == everyday it feels like!
especially because numpy.testing assertion methods dont use assert but instead raise an assertion error, this circumvents pytests better difference display.
Started writing my own assertion method that can take "any" object, but numpy makes it very difficult.

>>> assert np.array([]) ==  np.array([])
raises AssertionError
>>> assert np.array([0,1]) == np.array([0,1])
raises ValueError

I can catch the value error but I obviously cant catch the assertion error.

Anyway, I digress,
I would still be very happy about some kind of pytest addon that supports numpy assertions. Or at least a possibility to write one.
Also open for implementation suggestions for a general assertion method that actually uses pytests assertion (or its output)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: reporting related to terminal output and user-facing messages and errors topic: rewrite related to the assertion rewrite mechanism type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature
Projects
None yet
Development

No branches or pull requests

10 participants