New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: _lib: improve array API assertions #19186
Conversation
LGTM but let's see what others think of the name and whether we can add a shape check. |
Ah forgot to mention, I think that this already happens. I checked through all the docs, e.g. https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_allclose.html where it is explicitly stated. I think that it's implied for |
It doesn't for NumPy when a value is a scalar or 0d array. from scipy._lib._array_api import assert_equal
assert_equal(np.asarray(0), np.asarray([0, 0])) # passes
assert_equal(np.asarray([0, 0]), np.asarray(0)) # passes It is not uncommon to use this style to check whether all elements of an array are 0, NaN, or some other constant, but often the shape check is forgotten. The most likely problem this could hide, I think, is when there is a singleton dimension that gets squeezed away or added inadvertently. Related to this is whether the output is a 0d array or scalar. assert_equal(np.asarray(0), np.asarray(0)[()]) # passes
assert_equal(np.asarray(0)[()], np.asarray(0)) # passes Fortunately, this already fails when the actual value is a Python float assert_equal(0, np.asarray(0)[()]) # fails - good! but it would be nice if the reverse would fail, too. assert_equal(np.asarray(0)[()], 0) # passes I'd advocate for all of these failing by default. There are many issues in current code that would have been caught by stricter tests like this. |
https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html has a |
ed7fe76
to
80df770
Compare
sorry for the force push, I accidentally pushed a commit to the wrong branch. Makes sense Matt, I can change it if we decide exactly what we want. |
In addition to the naming and shape check, we have the option of checking whether the correct namespace is returned inside these functions to consider, as discussed in gh-18930. |
+1 for the stricter checks, and +1 for the renames. |
@mdhaber does Tyler's comment #19186 (comment) help address your strictness concerns? Also happy to go with your decision on whether or not to include the namespace checks in this PR or not. |
@lucascolley The comment is certainly related - #19186 (comment) indicated +1 on the strictness checks, so I'd suggest for all three assertions (perhaps in a separate, private function, so the code can be shared between all three
As for why these should be strict by default: these checks are not performed strictly by default now, yet the manual additional of the strict checks is often neglected, and there are several inconsistencies/shortcomings/bugs in SciPy that might have been caught had the tests been stricter. So even if these stricter checks are only performed where these |
To confirm that: this was my exact experience with the one test on Lucas' |
3876a0d
to
58621a9
Compare
Sorry, the rebase is messy since I failed to |
58621a9
to
74a0d6e
Compare
@mdhaber how does this look? Here are the
|
scipy/_lib/_array_api.py
Outdated
try: | ||
actual.dtype | ||
desired.dtype | ||
except AttributeError: | ||
raise ValueError("Inputs should be arrays with a dtype attribute, " | ||
"python scalars and arrays are not accepted.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is even more strict than I anticipated - perhaps uncomfortably strict even for me... but I think I can get over it. I'll reply again after thinking bit. I'm guessing that I'll suggest for this check to be part of check_xp
. If the user disables check_xp
, at least desired
can be automatically be made a NumPy type like desired = np.asarray(desired)[()]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the purpose of check_xp
is to allow the developer to disable strict type checking, I would assume that with check_xp=False
, a Python float would be OK.
However, all the checks after this rely on dtype
and shape
attributes, so either they would fail, or the Python float would need to become a float (or 0d array) of the appropriate library.
Maybe if check_xp=False
, then:
xp = array_namespace(actual)
desired = xp.asarray(desired)[()]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- will
xp.asarray(desired)[()]
work fortorch
too? - will it just be
xp.asarray(desired)
if we have a list instead of a scalar? If so, what function is used to check what we have? - What about when
actual
is a 0-D array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will
xp.asarray(desired)[()]
work for torch too?
It won't fail. This will always convert desired
to an array except if desired
is 0d and xp
is NumPy. In that case, it will convert desired
to a NumPy scalar.
will it just be
xp.asarray(desired)
if we have a list instead of a scalar? If so, what function is used to check what we have?
[()]
doesn't extract items out of an array with more than 0 dimensions, so I don't think you need to have a special case for lists vs scalars.
What about when
actual
is a 0-D array?
I don't understand. array_namespace(actual)
will still work even if actual
is a 0d array, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps, I guess [()]
is not needed because this only happens after the check_xp
block, and [()]
would change neither the shape nor dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_equal(0, np.asarray(0)[()]) # fails - good!
I'm unable to reproduce this failure from your example above... are you sure?
I don't know how to go about making this fail either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After b99f0e7, we now have improved behaviour with check_xp=False
, e.g. xp_assert_equal(np.asarray([1, 2]), torch.asarray([1, 2]), check_xp=False)
and xp_assert_equal(0, np.asarray(0)[()])
pass.
I have included actual = xp.asarray(actual)
to enable dtype checks when actual
is a built-in type.
However, I am struggling with built-in types for check_xp=True
. xp_assert_equal(np.asarray(0)[()], 0)
, xp_assert_equal(0, np.asarray(0)[()])
and xp_assert_equal(np.asarray([1, 2]), [1, 2])
all pass.
Also, the error message for xp_assert_equal(1, [1, 2])
is Desired an ndarray, but scalar given.
, which is not quite correct. We desire a list. This passes the rest of the function since array_namespace
returns array_api_compat.numpy
for all array-likes now.
Feel free to commit any changes that you need to get this working Matt!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unable to reproduce this failure from your example above... are you sure?
No I can't reproduce that. Not sure what I was doing before...
[skip cirrus] [skip circle]
[skip cirrus] [skip circle] Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>
[skip cirrus] [skip circle]
[skip cirrus] [skip circle] Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Co-authored-by: Lucas Colley <51488791+lucascolley@users.noreply.github.com>
TST: _lib: update to improve array API assertions [skip cirrus] [skip circle]
4e3fa2d
to
227a89f
Compare
[skip cirrus] [skip circle]
This looks almost done now, cc @tylerjereddy @rgommers if you'd like to have a look. Edit: the CI failure is real, having a look now. |
[skip cirrus] [skip circle]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this, but it could use another review since I've been pretty heavily involved. Some notable points are:
- function signatures
- changes to
_lib._array_api.cov
(unrelated to the rest of the PR, but I happened to notice some apparently dead code while working on this.) - When
check_shape=True
,_check_scalar
raises anAssertionError
if a function returns a NumPy 0d array instead of a NumPy scalar. (This is specific to NumPy; 0d arrays of any other type are allowed/enforced.) This is to encourage consistency with NumPy, which returns a scalar instead of a 0d array at almost every opportunity. Noteable examples include:np.mean([1, 2, 3])
np.asarray(1)*2
np.asarray(1) + np.asarray(1)
np.sin(np.asarray(1))
I know that not all of SciPy follows this convention, but most code I have worked with in stats
, optimize
, special
, and integrate
does, so I think we should include this assertion by default to work toward better self-consistency and consistency with NumPy in new, Array API compatible features.
@@ -12,6 +12,7 @@ | |||
import warnings | |||
|
|||
import numpy as np | |||
from numpy.testing import assert_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a preference for not using this assert_
and instead just using the plain pytest
assert
at this point, though I forget where we documented that.
In any case, please don't make that change at this point, I'm just doing a hopefully-final pass...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#19186 (comment) makes sense, but we have a line break, so I think we want assert_
? Or msg=...
on the line above might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just split the message assignment yeah, but not worth it at this point of code review IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I noted this earlier, but let it pass to avoid the explicit line breaks. I'm also a fan of assigning to a variable (I almost always do that with pytest.raises(match=message)
); so that's fine too.
I'll merge this shortly, probably squash-merge given 20 commits and some 1-liners in there, etc. Things I checked/other thoughts:
--- a/scipy/_lib/tests/test_array_api.py
+++ b/scipy/_lib/tests/test_array_api.py
@@ -106,3 +106,9 @@ class TestArrayAPI:
with pytest.raises(AssertionError, match="Types do not match."):
xp_assert_equal(xp.asarray(0.), xp.float64(0))
xp_assert_equal(xp.float64(0), xp.asarray(0.))
+
+
+ @array_api_compatible
+ def test_coercion_stringency(self, xp):
+ with pytest.raises(AssertionError):
+ xp_assert_equal(xp.asarray([1, 2]), [1, 2])
--- a/scipy/_lib/tests/test_array_api.py
+++ b/scipy/_lib/tests/test_array_api.py
@@ -3,7 +3,8 @@ import pytest
from scipy.conftest import array_api_compatible
from scipy._lib._array_api import (
- _GLOBAL_CONFIG, array_namespace, as_xparray, copy, xp_assert_equal, is_numpy
+ _GLOBAL_CONFIG, array_namespace, as_xparray, copy, xp_assert_equal, is_numpy,
+ xp_assert_close,
)
import scipy._lib.array_api_compat.array_api_compat.numpy as np_compat
@@ -106,3 +107,9 @@ class TestArrayAPI:
with pytest.raises(AssertionError, match="Types do not match."):
xp_assert_equal(xp.asarray(0.), xp.float64(0))
xp_assert_equal(xp.float64(0), xp.asarray(0.))
+
+ @array_api_compatible
+ def test_assert_close(self, xp):
+ with pytest.raises(AssertionError):
+ xp_assert_close(xp.asarray([0.0001, 0.0001]), xp.asarray([0.00009, 0.00009]))
+ xp_assert_close(xp.asarray([0.0001, 0.0001]), xp.asarray([0.00009, 0.00009]), atol=1e-05)
I imagine a few rough spots will get ironed out when these assertions start getting used more heavily. |
This was intentional, given the discussion in #19157 (comment). My understanding from that was that developer is responsible for ensuring that the inputs to the function are of the appropriate array type. However, if |
Thanks for the thorough review Tyler, and all of the help on this one Matt! |
Maybe, although if they are both floats you're still hosed it seems: --- a/scipy/spatial/tests/test_hausdorff.py
+++ b/scipy/spatial/tests/test_hausdorff.py
@@ -6,6 +6,7 @@ import pytest
from scipy.spatial.distance import directed_hausdorff
from scipy.spatial import distance
from scipy._lib._util import check_random_state
+from scipy._lib._array_api import xp_assert_close
class TestHausdorff:
@@ -146,7 +147,7 @@ class TestHausdorff:
# verify fix for gh-11332
actual = directed_hausdorff(u=A, v=B, seed=seed)
# check distance
- assert_allclose(actual[0], expected[0])
+ xp_assert_close(actual[0], expected[0], check_namespace=False) Since that functionality probably isn't even planned for array API support anytime soon it doesn't matter much. On the flip side, I was perhaps wondering if longer-term we'd want to consider completely switching over to that set of assertions everywhere (to avoid picking between the plain NumPy ones in some places) and/or having the error message explain what to do in that case, or using |
Sure. That's just a consequence of the way #19157 (comment) went.
Sounds great to me. |
[skip cirrus] [skip circle]
Reference issue
#18930 (comment) @mdhaber
What does this implement/fix?
xp_
prefix to avoid conflict withnp.testing
.assert actual.dtype == desired.dtype
is added. This is already checked by default for PyTorch inassert_close
andassert_equal
.