New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST/MAINT: cluster: use new array API assertions #19251
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Lucas 🚀
I will let Matt have a look
ark CI seems to complain on Windows:
|
Okay, looks like we should make sure we have Edit: let's see if CI is happy after 9af46c8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance, I'd think something is not right whenever the first argument is an xp_assert
is explicitly converted to something else. Could you comment on those cases?
Using xp.asarray
on the second argument is fine. However, there are some places where there is explicit dtype conversion to float64
that doesn't look like it would be needed. For instance, if the calculation of the reference value starts with Python floats, it would be surprising if the dtype of the result were other than float64
, and I'd want to know why it were not.
If float32
is supported, are there any tests?
I see some places with integer output. Should these functions ever produce int32
, and if so, are any of these cases tested?
A lot of these tests were giving different dtypes (32-bit or 64-bit) for different namespaces. In particular, I remember
All this complexity means that it would probably be a good idea to test every supported dtype, although this seems like quite a bit of work.
This is mostly due to the 32 vs 64-bit stuff, but also compounded by the fact that lots of our tests are called as |
I wouldn't ask you to add that here. But yes, we should test whatever we claim to support.
Would you mind swapping these? I don't mean to ask you to do more than you meant to sign up fo. But if we're going to update these to use |
I completely agree 👍. No worries, I am happy to take this on. I might not be able to get round to this immediately, but hopefully in the next week or two. There is going to be a lot of work to upgrade the tests for larger submodules to meet these standards. I will add it on to the list of follow-ups for This work is definitely worthwhile in my opinion, since future changes to the namespaces we work with are likely to cause dtype issues. Having these tests in place should make the DX a lot better down the line. Perhaps the solution for now will include a lot of |
9af46c8
to
cb1364d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdhaber I've remade this PR with the changes to the assertions, namespaces and dtypes split into separate commits now - hopefully that makes it easier to review.
I've fixed the zeros
issue you pointed out above. For the rest of the dtype problems, I've added check_dtype=False
for now, and commented inline with exactly what the mismatches are. Please could you point me in the right direction for these? Quite a few seem to be torch
defaulting to float32
. I'm hesitant to make any more changes since I'm not sure what the desired way to fix the issues is.
Alternatively, I would be happy for this to merge and to open an issue for the dtype discrepancies to be sorted out separately.
scipy/cluster/tests/test_vq.py
Outdated
|
||
@skip_if_array_api_gpu | ||
@array_api_compatible | ||
def test_kmeans_diff_convergence(self, xp): | ||
# Regression test for gh-8727 | ||
obs = xp.asarray([-3, -1, 0, 1, 1, 8], dtype=xp.float64) | ||
res = kmeans(obs, xp.asarray([-3., 0.99])) | ||
assert_allclose(res[0], xp.asarray([-0.4, 8.])) | ||
assert_allclose(res[1], 1.0666666666666667) | ||
xp_assert_close(res[0], xp.asarray([-0.4, 8.]), check_dtype=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
res[0]
is float64
, but xp.asarray([-0.4, 8.])
is float32
for torch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, adding an explicit dtype=xp.float64
for the "expected" value seems more robust, and less likely to let the future reader wonder about why dtypes can't be checked here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I thought the solution was (seems pretty simple), but Matt seemed unsure:
One of the main dtype issues is just that the default dtype of torch, even when a double provided, is single precision. Not sure what the best way to handle that is.
I didn't want to make any tests too strict by requiring a specific dtype, where we may be okay with a different one being returned as well.
If you are happy to just say 'expect what we have currently to test for regressions' then I can include that here 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test requiring res
to have float64
dtype when the input to kmeans
is also float64
seems like a perfectly reasonable requirement and good to test. I'd probably consider anything else a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. The main thing meant to disagree with was changing the dtype of the actual result.
Before, it looked like dtypes were being specified wherever it was needed to force the test to pass. See #19251 (comment).
When the torch default being float32 came up (#19251 (comment)), the suggestion ended up being to force the dtype of the expected result to be float64 - but we did it where the array was created rather than each place it was used. In retrospect, suppose the original code would have been fine it that case. There was some added confusion because I didn't know that torch would default to float32
even when a double was provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. It's also not just PyTorch - all deep learning focused libraries will default to float32
, because that's much better supported on GPU/TPU, and more than enough precision for deep learning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I knew it was useful, I just didn't know it was the default.
Here was the overall comment btw. #19251 (review)
[skip cirrus] [skip circle]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much better. It is fine with me to merge as-is (maybe with the one change discussed with that really loose tolerance) and open an issue about dtypes.
One of the main dtype
issues is just that the default dtype of torch, even when a double provided, is single precision. Not sure what the best way to handle that is. But it doesn't need to be done here.
[skip cirrus] [skip circle]
@tupui see Matt's comment above. I can open the issue for dtypes once this merges (assuming the rest still looks okay). We should leave my inline comments related to the dtypes unresolved for reference from the issue. |
This looks quite good to me, only two minor comments. |
[skip cirrus] [skip circle]
@rgommers e09ce5e has removed most of the They remain in one test, where CI was catching a variation between Edit: CI has caught a few more |
[skip cirrus] [skip circle]
Only the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most tests were tightened and are passing, and comments were added for the few remaining check_dtype=False
instances. So this LGTM now - let's give this a go. Thanks @lucascolley and @mdhaber, @tupui!
Opened gh-19319 to document that checks which are turned off as requested, |
Reference issue
Follow-up to gh-19186 for gh-18668.
What does this implement/fix?
The new assertions
xp_assert_close
andxp_assert_equal
are used incluster
, such that all of our array-API-converted code demonstrates the preferred methods of testing.There are a few other changes in here, like changing
assert_equal(correspond(Z, y2), False)
toassert not correspond(Z, y2)
, and some minor PEP-8 bits. I hope that that's okay, but please let me know if you'd like those removed.Additional information
@mdhaber how does this look?
python dev.py test -s cluster -b all
passes for me, but there are a few tricky bits in here with dtypes, scalars and tolerances which could do with a look over. I think that I've covered everything, but there's a chance that I've missed something, or included an unwanted change.cc: @tupui