New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST make sure test_pca_sparse passes on all random seeds #28861
TST make sure test_pca_sparse passes on all random seeds #28861
Conversation
test_pca_sparse
test_pca_sparse
test_pca_sparse
test_pca_sparse
test_pca_sparse
Maybe we could check that |
I checked and they're always a lot larger. We're only looking for a max of 10 components. Even with the min density of 0.01, all 10 components always have the same approx explained variance of 0.4%, for all seeds. The kind of randomness I mention here has a very very small impact. It changes the final digit once in while. I think we can ignore it for now, the new tols are robust to that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM then. Thanks for the PR.
Fixes #28857
The array that we're comparing have a very wide range of values, from 1e-8 to 1e0. Having a same rtol for all is kind of problematic. In this PR I introduced an additional atol, useful for the very low values. In the plot below I show the absolute difference of the components vs the absolute value of the components to illustrate the need of an atol:
We see that for very small values of the components, the absolute diff doesn't foolow the same trend as for larger values. It looks like a plateau.
The following plot shows the relative diff of the components vs the absolute values of the components. It's clearly not constant or even upper bounded (if we were to extrapolate for even smaller values). This goes against what we assume when we write
assert_allclose(X1, X2, rtol=constant)
.This is symptomatic of a general issue that we have through the whole project for comparing arrays element-wise. I made a quick fix here to make the CI green, but I think that it should be improved in general (I've been thinking about that for a while: it's not an easy problem and I haven't found a satisfying solution yet).