Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: _lib: improve array API assertions #19186

Merged
merged 20 commits into from Sep 11, 2023

Conversation

lucascolley
Copy link
Member

@lucascolley lucascolley commented Sep 5, 2023

Reference issue

#18930 (comment) @mdhaber

What does this implement/fix?

  • the assertions are renamed with the xp_ prefix to avoid conflict with np.testing.
  • assert actual.dtype == desired.dtype is added. This is already checked by default for PyTorch in assert_close and assert_equal.

@mdhaber
Copy link
Contributor

mdhaber commented Sep 5, 2023

LGTM but let's see what others think of the name and whether we can add a shape check.

@lucascolley
Copy link
Member Author

whether we can add a shape check

Ah forgot to mention, I think that this already happens. I checked through all the docs, e.g. https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_allclose.html where it is explicitly stated. I think that it's implied for torch too https://pytorch.org/docs/stable/testing.html?

@mdhaber
Copy link
Contributor

mdhaber commented Sep 5, 2023

It doesn't for NumPy when a value is a scalar or 0d array.

from scipy._lib._array_api import assert_equal
assert_equal(np.asarray(0), np.asarray([0, 0]))  # passes
assert_equal(np.asarray([0, 0]), np.asarray(0))  # passes

It is not uncommon to use this style to check whether all elements of an array are 0, NaN, or some other constant, but often the shape check is forgotten. The most likely problem this could hide, I think, is when there is a singleton dimension that gets squeezed away or added inadvertently.

Related to this is whether the output is a 0d array or scalar.

assert_equal(np.asarray(0), np.asarray(0)[()])  # passes
assert_equal(np.asarray(0)[()], np.asarray(0))  # passes

Fortunately, this already fails when the actual value is a Python float

assert_equal(0, np.asarray(0)[()])  # fails - good!

but it would be nice if the reverse would fail, too.

assert_equal(np.asarray(0)[()], 0)  # passes

I'd advocate for all of these failing by default. There are many issues in current code that would have been caught by stricter tests like this.

@tylerjereddy
Copy link
Contributor

https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html has a strict keyword that gets part of the way to what Matt wants, I think? I still like the "vectorized by default," but I suppose we could do something like that here if we really need it?

@lucascolley
Copy link
Member Author

sorry for the force push, I accidentally pushed a commit to the wrong branch.

Makes sense Matt, I can change it if we decide exactly what we want.

scipy/_lib/_array_api.py Outdated Show resolved Hide resolved
scipy/_lib/_array_api.py Outdated Show resolved Hide resolved
@mdhaber mdhaber added enhancement A new feature or improvement array types Items related to array API support and input array validation (see gh-18286) scipy._lib labels Sep 6, 2023
@lucascolley
Copy link
Member Author

In addition to the naming and shape check, we have the option of checking whether the correct namespace is returned inside these functions to consider, as discussed in gh-18930.

@rgommers
Copy link
Member

rgommers commented Sep 7, 2023

+1 for the stricter checks, and +1 for the renames.

@lucascolley
Copy link
Member Author

@mdhaber does Tyler's comment #19186 (comment) help address your strictness concerns?

Also happy to go with your decision on whether or not to include the namespace checks in this PR or not.

@mdhaber
Copy link
Contributor

mdhaber commented Sep 8, 2023

@lucascolley The comment is certainly related - assert_array_equal with strict=True would do what I think our xp_assert_equal should do by default (minus the additional array type check, which I wouldn't expect from a function developed before the Array API). The fact that assert_array_equal has this option doesn't resolve the concern, though. Rather, it gives support to the idea that these stricter checks should be added. (For more support, see the discussion in numpy/numpy#21595, where these checks were added to NumPy.)

#19186 (comment) indicated +1 on the strictness checks, so I'd suggest for all three assertions (perhaps in a separate, private function, so the code can be shared between all three xp_asserts):

  • add the shape check
  • add the array type check (this should also fail if the reference is a NumPy scalar but the result is a 0d array or vice-versa)
  • ensure that the AssertionError message identifies which check failed
  • (optional) add parameters to turn them off individually (e.g. check_dtype=True, check_shape=True, check_xp=True), or these can be added later if the need arises

As for why these should be strict by default: these checks are not performed strictly by default now, yet the manual additional of the strict checks is often neglected, and there are several inconsistencies/shortcomings/bugs in SciPy that might have been caught had the tests been stricter. So even if these stricter checks are only performed where these xp_asserts are used, I think we would be better off than we are now. (And actually, when developers see these stricter assertions fail, they might be reminded of the other places the strict checks are needed.)

@rgommers
Copy link
Member

rgommers commented Sep 8, 2023

(And actually, when developers see these stricter assertions fail, they might be reminded of the other places the strict checks are needed.)

To confirm that: this was my exact experience with the one test on Lucas' fft branch that I tried to refactor.

@lucascolley
Copy link
Member Author

Sorry, the rebase is messy since I failed to git submodule update - fixing now.

@lucascolley
Copy link
Member Author

lucascolley commented Sep 8, 2023

@mdhaber how does this look? Here are the AssertionError messages:

In [1]: from scipy._lib._array_api import xp_assert_equal

In [2]: import numpy as np

In [3]: import torch

In [4]: xp_assert_equal(np.asarray(0), np.asarray([0, 0]))
AssertionError: Array shapes are not equal: actual shape = (), desired shape = (2,)

In [5]: xp_assert_equal(np.asarray([0, 0]), np.asarray(0))
AssertionError: Array shapes are not equal: actual shape = (2,), desired shape = ()

In [6]: xp_assert_equal(np.asarray(0), np.asarray(0)[()])
AssertionError: Desired a numpy scalar, but 0-D array given.

In [7]: xp_assert_equal(np.asarray(0)[()], np.asarray(0))
AssertionError: Desired a 0-D array, but numpy scalar given.

In [8]: xp_assert_equal(0, np.asarray(0)[()])
ValueError: Inputs should be arrays with a dtype attribute, python scalars and arrays are not accepted.

In [9]: xp_assert_equal(np.asarray(0)[()], 0)
ValueError: Inputs should be arrays with a dtype attribute, python scalars and arrays are not accepted.

In [10]: xp_assert_equal(np.asarray([1, 2]), torch.asarray([1, 2]))
AssertionError: Namespaces do not match: scipy._lib.array_api_compat.array_api_compat.numpy and scipy._lib.array_api_compat.array_api_compat.torch

In [11]: xp_assert_equal(np.asarray([1, 2], dtype=np.float32), np.asarray([1, 2], dtype=np.float64))
AssertionError: Desired dtype: float64, actual dtype: float32

In [12]: xp_assert_equal(np.asarray([1, 2], dtype=np.float64), np.asarray([1, 2], dtype=np.float32))
AssertionError: Desired dtype: float32, actual dtype: float64

In [13]: xp_assert_equal(np.asarray([1, 2]), np.asarray([1, 2]))
# passes

scipy/_lib/_array_api.py Outdated Show resolved Hide resolved
scipy/_lib/_array_api.py Outdated Show resolved Hide resolved
Comment on lines 196 to 205
try:
actual.dtype
desired.dtype
except AttributeError:
raise ValueError("Inputs should be arrays with a dtype attribute, "
"python scalars and arrays are not accepted.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is even more strict than I anticipated - perhaps uncomfortably strict even for me... but I think I can get over it. I'll reply again after thinking bit. I'm guessing that I'll suggest for this check to be part of check_xp. If the user disables check_xp, at least desired can be automatically be made a NumPy type like desired = np.asarray(desired)[()].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the purpose of check_xp is to allow the developer to disable strict type checking, I would assume that with check_xp=False, a Python float would be OK.

However, all the checks after this rely on dtype and shape attributes, so either they would fail, or the Python float would need to become a float (or 0d array) of the appropriate library.

Maybe if check_xp=False, then:

xp = array_namespace(actual)
desired = xp.asarray(desired)[()]

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • will xp.asarray(desired)[()] work for torch too?
  • will it just be xp.asarray(desired) if we have a list instead of a scalar? If so, what function is used to check what we have?
  • What about when actual is a 0-D array?

Copy link
Contributor

@mdhaber mdhaber Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will xp.asarray(desired)[()] work for torch too?

It won't fail. This will always convert desired to an array except if desired is 0d and xp is NumPy. In that case, it will convert desired to a NumPy scalar.

will it just be xp.asarray(desired) if we have a list instead of a scalar? If so, what function is used to check what we have?

[()] doesn't extract items out of an array with more than 0 dimensions, so I don't think you need to have a special case for lists vs scalars.

What about when actual is a 0-D array?

I don't understand. array_namespace(actual) will still work even if actual is a 0d array, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, I guess [()] is not needed because this only happens after the check_xp block, and [()] would change neither the shape nor dtype.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_equal(0, np.asarray(0)[()]) # fails - good!

I'm unable to reproduce this failure from your example above... are you sure?

I don't know how to go about making this fail either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After b99f0e7, we now have improved behaviour with check_xp=False, e.g. xp_assert_equal(np.asarray([1, 2]), torch.asarray([1, 2]), check_xp=False) and xp_assert_equal(0, np.asarray(0)[()]) pass.

I have included actual = xp.asarray(actual) to enable dtype checks when actual is a built-in type.

However, I am struggling with built-in types for check_xp=True. xp_assert_equal(np.asarray(0)[()], 0), xp_assert_equal(0, np.asarray(0)[()]) and xp_assert_equal(np.asarray([1, 2]), [1, 2]) all pass.

Also, the error message for xp_assert_equal(1, [1, 2]) is Desired an ndarray, but scalar given., which is not quite correct. We desire a list. This passes the rest of the function since array_namespace returns array_api_compat.numpy for all array-likes now.

Feel free to commit any changes that you need to get this working Matt!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to reproduce this failure from your example above... are you sure?

No I can't reproduce that. Not sure what I was doing before...

scipy/_lib/_array_api.py Outdated Show resolved Hide resolved
lucascolley and others added 7 commits September 8, 2023 18:09
[skip cirrus] [skip circle]
[skip cirrus] [skip circle]

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>
[skip cirrus] [skip circle]

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
@lucascolley lucascolley changed the title TST: improve array API assertions TST: _lib: improve array API assertions Sep 9, 2023
[skip cirrus] [skip circle]
@lucascolley
Copy link
Member Author

lucascolley commented Sep 9, 2023

This looks almost done now, cc @tylerjereddy @rgommers if you'd like to have a look.

Edit: the CI failure is real, having a look now.

@lucascolley
Copy link
Member Author

lucascolley commented Sep 9, 2023

@mdhaber it looks like _lazywhere may need to be updated to satisfy the expected 0D-array / scalar behaviour? Either that or there's a bug in the new stuff here.

Edit: 0c3849f has fixed the CI failures 🎉

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this, but it could use another review since I've been pretty heavily involved. Some notable points are:

  • function signatures
  • changes to _lib._array_api.cov (unrelated to the rest of the PR, but I happened to notice some apparently dead code while working on this.)
  • When check_shape=True, _check_scalar raises an AssertionError if a function returns a NumPy 0d array instead of a NumPy scalar. (This is specific to NumPy; 0d arrays of any other type are allowed/enforced.) This is to encourage consistency with NumPy, which returns a scalar instead of a 0d array at almost every opportunity. Noteable examples include:
    • np.mean([1, 2, 3])
    • np.asarray(1)*2
    • np.asarray(1) + np.asarray(1)
    • np.sin(np.asarray(1))

I know that not all of SciPy follows this convention, but most code I have worked with in stats, optimize, special, and integrate does, so I think we should include this assertion by default to work toward better self-consistency and consistency with NumPy in new, Array API compatible features.

@@ -12,6 +12,7 @@
import warnings

import numpy as np
from numpy.testing import assert_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a preference for not using this assert_ and instead just using the plain pytest assert at this point, though I forget where we documented that.

In any case, please don't make that change at this point, I'm just doing a hopefully-final pass...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19186 (comment) makes sense, but we have a line break, so I think we want assert_? Or msg=... on the line above might be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just split the message assignment yeah, but not worth it at this point of code review IMO.

Copy link
Contributor

@mdhaber mdhaber Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I noted this earlier, but let it pass to avoid the explicit line breaks. I'm also a fan of assigning to a variable (I almost always do that with pytest.raises(match=message)); so that's fine too.

@tylerjereddy
Copy link
Contributor

tylerjereddy commented Sep 11, 2023

I'll merge this shortly, probably squash-merge given 20 commits and some 1-liners in there, etc.

Things I checked/other thoughts:

  1. Full test suite with no array API stuff on.
  2. Full test suite with array API + cuda. (there's a known torch failure already fixed upstream)
  3. Full test suite with array API on host.
  4. There is a comment above that mentions a few cases we may want to make some adjustment for, including this one that doesn't pass (raise an exception) for current non-conformant NumPy (but does for all other backends)--perhaps we should open a separate issue for those few cases if one isn't open already (I don't think I know the answer to what we want for all those cases though...):
--- a/scipy/_lib/tests/test_array_api.py
+++ b/scipy/_lib/tests/test_array_api.py
@@ -106,3 +106,9 @@ class TestArrayAPI:
             with pytest.raises(AssertionError, match="Types do not match."):
                 xp_assert_equal(xp.asarray(0.), xp.float64(0))
             xp_assert_equal(xp.float64(0), xp.asarray(0.))
+
+
+    @array_api_compatible
+    def test_coercion_stringency(self, xp):
+        with pytest.raises(AssertionError):
+            xp_assert_equal(xp.asarray([1, 2]), [1, 2])
  1. So far, upstream (NumPy) doesn't seem to object too much to adding some similar stringency options to their current assertions like Matt's PR here: ENH: add parameter strict to assert_allclose numpy/numpy#24680 (and they already did it elsewhere as discussed in previous comments)
  2. xp_assert_close isn't tested/used yet--I did note when substituting it in a few assert_allclose() places that it doesn't support float input with default arguments--maybe that's an "ok" usage for the conventional NumPy assertion machinery I suppose. (AttributeError: 'float' object has no attribute 'dtype'). Tests should be easy enough to write like this if we want:
--- a/scipy/_lib/tests/test_array_api.py
+++ b/scipy/_lib/tests/test_array_api.py
@@ -3,7 +3,8 @@ import pytest
 
 from scipy.conftest import array_api_compatible
 from scipy._lib._array_api import (
-    _GLOBAL_CONFIG, array_namespace, as_xparray, copy, xp_assert_equal, is_numpy
+    _GLOBAL_CONFIG, array_namespace, as_xparray, copy, xp_assert_equal, is_numpy,
+    xp_assert_close,
 )
 import scipy._lib.array_api_compat.array_api_compat.numpy as np_compat
 
@@ -106,3 +107,9 @@ class TestArrayAPI:
             with pytest.raises(AssertionError, match="Types do not match."):
                 xp_assert_equal(xp.asarray(0.), xp.float64(0))
             xp_assert_equal(xp.float64(0), xp.asarray(0.))
+
+    @array_api_compatible
+    def test_assert_close(self, xp):
+        with pytest.raises(AssertionError):
+            xp_assert_close(xp.asarray([0.0001, 0.0001]), xp.asarray([0.00009, 0.00009]))
+        xp_assert_close(xp.asarray([0.0001, 0.0001]), xp.asarray([0.00009, 0.00009]), atol=1e-05)

I imagine a few rough spots will get ironed out when these assertions start getting used more heavily.

@tylerjereddy tylerjereddy merged commit 5ed6043 into scipy:main Sep 11, 2023
@mdhaber
Copy link
Contributor

mdhaber commented Sep 11, 2023

it doesn't support float input with default arguments

This was intentional, given the discussion in #19157 (comment). My understanding from that was that developer is responsible for ensuring that the inputs to the function are of the appropriate array type. However, if check_namespace is turned off, it should accept plain floats and do the conversion automatically.

@lucascolley
Copy link
Member Author

Thanks for the thorough review Tyler, and all of the help on this one Matt!

@lucascolley lucascolley deleted the array-api-assertions branch September 11, 2023 22:35
@tylerjereddy
Copy link
Contributor

tylerjereddy commented Sep 11, 2023

However, if check_namespace is turned off, it should accept plain floats and do the conversion automatically.

Maybe, although if they are both floats you're still hosed it seems:

--- a/scipy/spatial/tests/test_hausdorff.py
+++ b/scipy/spatial/tests/test_hausdorff.py
@@ -6,6 +6,7 @@ import pytest
 from scipy.spatial.distance import directed_hausdorff
 from scipy.spatial import distance
 from scipy._lib._util import check_random_state
+from scipy._lib._array_api import xp_assert_close
 
 
 class TestHausdorff:
@@ -146,7 +147,7 @@ class TestHausdorff:
         # verify fix for gh-11332
         actual = directed_hausdorff(u=A, v=B, seed=seed)
         # check distance
-        assert_allclose(actual[0], expected[0])
+        xp_assert_close(actual[0], expected[0], check_namespace=False)

Since that functionality probably isn't even planned for array API support anytime soon it doesn't matter much. On the flip side, I was perhaps wondering if longer-term we'd want to consider completely switching over to that set of assertions everywhere (to avoid picking between the plain NumPy ones in some places) and/or having the error message explain what to do in that case, or using pytest.approx(), etc...

@mdhaber
Copy link
Contributor

mdhaber commented Sep 11, 2023

although if they are both floats you're still hosed it seems

Sure. That's just a consequence of the way #19157 (comment) went.

On the flip side, I was perhaps wondering if longer-term we'd want to consider completely switching over

Sounds great to me.

lucascolley added a commit to lucascolley/scipy that referenced this pull request Sep 11, 2023
[skip cirrus] [skip circle]
@j-bowhay j-bowhay added this to the 1.12.0 milestone Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) enhancement A new feature or improvement scipy._lib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants