Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG before #12069] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues #12145

Merged
merged 123 commits into from Nov 14, 2019

Conversation

@smarie
Copy link
Contributor

smarie commented Sep 24, 2018

This PR fixes #12140 by performing some numerical and conditioning checks on the eigenvalues found from kernel decomposition.

This PR is dependent on #12143, that solves an issue happening in transform when zero eigenvalues are present in the kernel.

smarie added 3 commits Sep 24, 2018
 - Added a few comments to clarify `_fit_transform`, `fit_transform` and `transform`.
 - Uniformized the numpy coding style for `fit` and `fit_transform`.
…rnel conditioning. Fixes #12140
@smarie

This comment has been minimized.

Copy link
Contributor Author

smarie commented Sep 24, 2018

Needs review: the only failing check is coverage. To improve it we would need to produce tests where the kernel matrix has artificial defects. Would it be ok ? How to check that warnings are properly raised ?

… It is used in `KernelPCA`. Added corresponding test `test_errors_and_warnings` for KernelPCA.
@smarie

This comment has been minimized.

Copy link
Contributor Author

smarie commented Sep 25, 2018

While waiting for the review I proposed a way to fix the coverage and improve maintainability: the method to check the kernel eigenvalues (check_kernel_eigenvalues) is now completely independent, in the validation submodule. It comes with some doctests.

In addition I created a test for kPCA to check that warnings and errors are raised correctly in case of bad conditioning.

smarie added 7 commits Sep 25, 2018
…ange failure in Travis.
…ed ('randomized' is not available in this branch !)
…o always check the inner check_kernel_eigenvalues method before the kPCA fit.
…aced during call to fit() - we now execute the test directly on `_fit_transform` instead, so that the matrix is untouched.
…e still errors of this kind.
@smarie smarie changed the title KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues [MRG] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues Sep 25, 2018
smarie added 3 commits Oct 8, 2018
…s where there is no zero division, to avoid numpy warnings.
…o zero division numpy warning.
smarie added 4 commits Oct 11, 2018
…ture to perform fine-grain warning assertion
# Conflicts:
#	sklearn/decomposition/tests/test_kernel_pca.py
@smarie

This comment has been minimized.

Copy link
Contributor Author

smarie commented Oct 17, 2018

I had some afterthoughts on one of the warnings: indeed it is normal to find quasi-zero eigenvalues when the number of samples is high enough (my intuition would be, in the case of a gaussian kernel, that it is when this number is larger than the underlying distribution's manifold dimensionality, but maybe in this paper there are better explanations).

For this reason I will push a new commit where there is no warning by default about zero eigenvalues.

smarie added 2 commits Oct 17, 2018
…all: this is most probably a common case especially when the number of samples gets high. Removing the warning by default.
…more, this is normal behaviour as the number of samples increases.
@smarie

This comment has been minimized.

Copy link
Contributor Author

smarie commented Oct 17, 2018

All set ! Ready to merge once #12143 will have been.

@smarie smarie changed the title [MRG] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues [MRG after #12143] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues Oct 17, 2018
@jnothman jnothman changed the title [MRG after #12143] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues [MRG] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues Feb 28, 2019
sklearn/utils/validation.py Outdated Show resolved Hide resolved
@NicolasHug NicolasHug dismissed their stale review Nov 7, 2019

Too many changes since review....

@NicolasHug

This comment has been minimized.

Copy link
Contributor

NicolasHug commented Nov 7, 2019

@adrinjalali are you ok with:

  • having only one parameter to control the warnings
  • leaving that false by default
  • discussing everything else in other PRs / issues

We are clearly over engineering this PR by trying too hard to future-proof it. Let's keep things simple.
(I would even be OK to remove all the warnings. The ValueErrors are already an improvement).

…g` is now `False` by default
@smarie

This comment has been minimized.

Copy link
Contributor Author

smarie commented Nov 7, 2019

a precision concerning your last sentences @NicolasHug "one parameter to control all the warnings" and "I would be ok to remove all the warnings": please have a look at my previous comments, there is a huge difference between the warning for significant negative values (which could even be transformed into a ValueError) and the warning for small non-zeros.

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Nov 7, 2019

I'm happy with a significant negative value to be a ValueError, but the rest should all be controlled by one single parameter. And for the rest, I'm happy with your proposal @NicolasHug

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Nov 7, 2019

Also, @smarie I really appreciate your persistence and you following the suggestions so promptly. I'm sorry it's being dragged for longer than we all would prefer :)

@smarie

This comment has been minimized.

Copy link
Contributor Author

smarie commented Nov 7, 2019

Perfect. I make the changes right now and push what should finally be the final proposal. Almost there :) !

Sylvain MARIE
 * Renamed `small_nonzeros_warning` into `enable_warnings`.
 * now consistent warnings are raised for all three cases (imaginary parts, negative, small non zero), and the parameter disables all of them.
 * improved string formatting using `%g` instead of `%f` or other things
@smarie

This comment has been minimized.

Copy link
Contributor Author

smarie commented Nov 7, 2019

Ready for a last round. That "simple" last change was actually quite impactant but I think that the result is now straightforward and consistent. I updated the docstring, please have a look.

In details:

  • Renamed small_nonzeros_warning into enable_warnings.
  • now consistent warnings are raised for all 3 cases of non-significant things set to zero (new imaginary parts, negative values, small non-zero values). Basically everytime the method modifies the eigenvalues, there is a potential warning telling why. The enable_warnings parameter enables/disables all of them at once, as requested.
  • improved string formatting using %g instead of %f or other things
  • updated the tests and got rid of the useless extra test, that is now included in the other
Sylvain MARIE added 4 commits Nov 7, 2019
…th the others. Now adopting the same message everywhere.
…ot copied back. Added it.
…y parts are all zeros, to convert to float dtype
…messages explicitly state what is happening (setting to zero the imag/negative/small value).
@smarie

This comment has been minimized.

Copy link
Contributor Author

smarie commented Nov 8, 2019

All seems ok now, stopping edits and waiting for final review @adrinjalali @NicolasHug .

Copy link
Contributor

NicolasHug left a comment

Thanks @smarie, only nitpicks from me but LGTM

sklearn/utils/tests/test_validation.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_validation.py Show resolved Hide resolved
Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>
@smarie

This comment has been minimized.

Copy link
Contributor Author

smarie commented Nov 8, 2019

Thanks very much for the review. @adrinjalali you're up next (if you have a few minutes to devote to this) :)

@NicolasHug

This comment has been minimized.

Copy link
Contributor

NicolasHug commented Nov 12, 2019

Copy link
Member

adrinjalali left a comment

LGTM, thanks a lot @smarie and @NicolasHug

sklearn/decomposition/tests/test_kernel_pca.py Outdated Show resolved Hide resolved
smarie and others added 2 commits Nov 14, 2019
Sylvain MARIE
@NicolasHug NicolasHug merged commit 3432140 into scikit-learn:master Nov 14, 2019
21 checks passed
21 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.11%)
Details
codecov/project 97.22% (+1.1%) compared to dbe4591
Details
scikit-learn.scikit-learn Build #20191114.31 succeeded
Details
scikit-learn.scikit-learn (Linting) Linting succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Linux_Runs pylatest_conda_mkl) Linux_Runs pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl_no_openmp) macOS pylatest_conda_mkl_no_openmp succeeded
Details
@NicolasHug

This comment has been minimized.

Copy link
Contributor

NicolasHug commented Nov 14, 2019

Merged!
Thank you so much @smarie for your consistent work on this

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Nov 14, 2019

@smarie

This comment has been minimized.

Copy link
Contributor Author

smarie commented Nov 15, 2019

So cool ! I will now be able to finalize #12069
Thanks so much for the hard reviewing task !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.