New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] MNT Fixes for PCA with n_components='mle' #16841
Conversation
This looks good to me, thanks for fixing! |
Thanks for checking @lschwetlick . If you have time / feel like it, it'd be great if you could check with your notebook if the results make sense now? (but no worries otherwise) |
Well... not sure if this is maybe quite edge case-y but
-> [-inf nan -inf -inf -inf -inf] but then funnily enough it gives the right number of dimensions (1) because apparently argmax treats nan>-inf. Its giving nan because if all subsequent values in the spectrum are 0 then in line 76 |
I'm also still slightly confused to be honest: we dont test full rank because the method isn't defined for testing full rank. But what if the number of dimensions is full rank? |
Thanks @lschwetlick , I fixed that and added a test (though indeed it's an unlikely case I think)
I guess users use mle when they want some dimensionality reduction, but just don't know by how much. So that's fine not to consider rank==n_features (we can't anyway). There's always the solution to directly provide n_components=n_features. |
Alright cool, other than that my notebook and I are happy :) |
thanks a lot @NicolasHug and @lschwetlick for the team work ! |
* Fixed off by one in MLE and better handling of small eigenvalues * light update tests * pep8 * Added test + threhsold on small log
Fixes #16730
Closes #16546
infer_dimension()
now returns a value in[1, n_features - 1]
. rank = 0 isn't possible anymore (it's a bug fix, as this would transform into an empty array).rank
as passed to_assess_dimension()
is actually the rank, not an index as previously. This fixes the infamous off-by-one error. Note also that the method isn't defined if we pass it rank == n_features (division by zero)