TST make sure test_pca_sparse passes on all random seeds #28861

jeremiedbb · 2024-04-19T12:23:32Z

The array that we're comparing have a very wide range of values, from 1e-8 to 1e0. Having a same rtol for all is kind of problematic. In this PR I introduced an additional atol, useful for the very low values. In the plot below I show the absolute difference of the components vs the absolute value of the components to illustrate the need of an atol:

We see that for very small values of the components, the absolute diff doesn't foolow the same trend as for larger values. It looks like a plateau.

The following plot shows the relative diff of the components vs the absolute values of the components. It's clearly not constant or even upper bounded (if we were to extrapolate for even smaller values). This goes against what we assume when we write assert_allclose(X1, X2, rtol=constant).

This is symptomatic of a general issue that we have through the whole project for comparing arrays element-wise. I made a quick fix here to make the CI green, but I think that it should be improved in general (I've been thinking about that for a while: it's not an easy problem and I haven't found a satisfying solution yet).

test_pca_sparse

github-actions · 2024-04-19T12:25:24Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 928f723. Link to the linter CI: here}

test_pca_sparse

jeremiedbb · 2024-04-22T12:03:39Z

There's some randomness that I can't explain (and can't reproduce locally). The failing tests are not exactly the same in b2d30f9 and in e0448a3. The failing tests from #28857 isn't always failing either.

ogrisel · 2024-04-22T15:10:21Z

Maybe we could check that (pca.explained_variance_ > np.finfo(X.dtype).eps).all() before running any other assertion. If n_components is large enough and density is small enough, it's possible that the matrix rank of the data is lower than n_components and therefore, some components will be random.

jeremiedbb · 2024-04-23T10:45:33Z

I checked and they're always a lot larger. We're only looking for a max of 10 components. Even with the min density of 0.01, all 10 components always have the same approx explained variance of 0.4%, for all seeds.

The kind of randomness I mention here has a very very small impact. It changes the final digit once in while. I think we can ignore it for now, the new tols are robust to that.

ogrisel

LGTM then. Thanks for the PR.

jeremiedbb added 2 commits April 19, 2024 12:56

[all random seeds]

7dce925

test_pca_sparse

[azure parallel] [all random seeds]

fce98e1

test_pca_sparse

jeremiedbb added 3 commits April 19, 2024 17:41

[azure parallel] [all random seeds]

b2d30f9

test_pca_sparse

[azure parallel] [all random seeds]

e0448a3

test_pca_sparse

safer tolerances

1d062b5

jeremiedbb added the No Changelog Needed label Apr 22, 2024

[azure parallel] [all random seeds]

928f723

test_pca_sparse

jeremiedbb marked this pull request as ready for review April 22, 2024 11:45

jeremiedbb changed the title ~~[WIP] Check test_pca_sparse~~ TST make sure test_pca_sparse passes on all random seeds Apr 22, 2024

ogrisel approved these changes Apr 24, 2024

View reviewed changes

jeremiedbb added this to the 1.5 milestone Apr 25, 2024

glemaitre approved these changes Apr 26, 2024

View reviewed changes

glemaitre merged commit fa6ddba into scikit-learn:main Apr 26, 2024
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST make sure test_pca_sparse passes on all random seeds #28861

TST make sure test_pca_sparse passes on all random seeds #28861

jeremiedbb commented Apr 19, 2024 •

edited

github-actions bot commented Apr 19, 2024 •

edited

jeremiedbb commented Apr 22, 2024

ogrisel commented Apr 22, 2024 •

edited

jeremiedbb commented Apr 23, 2024

ogrisel left a comment

TST make sure test_pca_sparse passes on all random seeds #28861

TST make sure test_pca_sparse passes on all random seeds #28861

Conversation

jeremiedbb commented Apr 19, 2024 • edited

github-actions bot commented Apr 19, 2024 • edited

✔️ Linting Passed

jeremiedbb commented Apr 22, 2024

ogrisel commented Apr 22, 2024 • edited

jeremiedbb commented Apr 23, 2024

ogrisel left a comment

Choose a reason for hiding this comment

jeremiedbb commented Apr 19, 2024 •

edited

github-actions bot commented Apr 19, 2024 •

edited

ogrisel commented Apr 22, 2024 •

edited