Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG before #12069] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues #12145

Merged
merged 123 commits into from Nov 14, 2019
Merged
Show file tree
Hide file tree
Changes from 120 commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
d84f675
Minor edits for clarity:
smarie Sep 24, 2018
27b4de5
Fixed #12141.
smarie Sep 24, 2018
cb7cd15
Added exceptions and warnings in case of numerical issues and bad ke…
smarie Sep 24, 2018
0c4ebaa
Added a `check_kernel_eigenvalues` validation method (with doctests).…
smarie Sep 25, 2018
5285a4e
Added an additional check in the test, in order to understand the str…
smarie Sep 25, 2018
66ce769
Fixed test_errors_and_warnings: a wrong solver name had been introduc…
smarie Sep 25, 2018
98ae1ac
kPCA test_errors_and_warnings: Added a few extra checks in the test t…
smarie Sep 25, 2018
12d1a6f
Fixed test `test_errors_and_warnings`. Our identity centerer was repl…
smarie Sep 25, 2018
381fbb0
Fixed doctest in `check_kernel_eigenvalues`
smarie Sep 25, 2018
7d98502
Final doctest fixes for `check_kernel_eigenvalues` (locally validated…
smarie Sep 25, 2018
ea92a52
pytest doc fix: added normalize whitespace everywhere since there wer…
smarie Sep 25, 2018
9d6ee75
Added test corresponding to issue #12141
smarie Oct 8, 2018
3206a43
Following code review from @NicolasHug : now only scaling eigenvector…
smarie Oct 9, 2018
9fbbbb6
Following code review from @NicolasHug : now checking that there is n…
smarie Oct 9, 2018
b09a1e7
Following code review from @NicolasHug : now using pytest warning cap…
smarie Oct 11, 2018
a3900f4
Merge branch 'master' into kPCA_fix_issue_12141
smarie Oct 12, 2018
7fe9f06
Merge was wrong - removing useless import
smarie Oct 12, 2018
72d88ef
Merge branch 'kPCA_fix_issue_12141' into kPCA_fix_issue_12140
smarie Oct 12, 2018
6124783
Afterthoughts on the need to warn when gram matrix eigenvalues are sm…
smarie Oct 17, 2018
2112f5d
Fixed test: no warning is raised in presence of small eigenvalues any…
smarie Oct 17, 2018
ae546ac
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Mar 1, 2019
b62b91c
Improved zero array initialization according to code review https://g…
Mar 1, 2019
fccc19f
Merge branch 'kPCA_fix_issue_12141' of https://github.com/smarie/scik…
Mar 1, 2019
d598f17
Merge branch 'kPCA_fix_issue_12141' of https://github.com/smarie/scik…
Mar 1, 2019
3fdec0e
Improved import style according to code review https://github.com/sci…
Mar 1, 2019
b8782a7
Updated what's new for this PR
Mar 2, 2019
b98f503
Merge branch 'kPCA_fix_issue_12141' of https://github.com/smarie/scik…
Mar 2, 2019
f34883e
Updated what's new for this PR
Mar 2, 2019
44d27fb
Update doc/whats_new/v0.21.rst
thomasjpfan Mar 14, 2019
6684f8d
grammar fix
Mar 14, 2019
b47d53a
Referenced the issue
Mar 14, 2019
b8a8f89
Grammar fix
Mar 14, 2019
5f5604d
Issue number replaced with PR number
Mar 14, 2019
7f40c3e
renamed `nz` `nonzeros` following review
Mar 15, 2019
d085cb5
Fixed 80 characters max issue + improved comment as per review
Mar 15, 2019
7a45ea2
Fixed comment as per review
Mar 15, 2019
a6c6ee1
Fixed comment as per review
Mar 19, 2019
4f647ab
Fixed test as per review
Mar 19, 2019
1e4377d
Merge branch 'kPCA_fix_issue_12141' into kPCA_fix_issue_12140
Mar 19, 2019
98b5eee
Fixed `check_kernel_eigenvalues` to return a copy (as per code review)
Mar 19, 2019
234a4d1
Fixed `check_kernel_eigenvalues` docstring as per code review
Mar 19, 2019
2a1b4cf
Added name in authors list
Mar 19, 2019
04b486e
Fixed flake8 error
Mar 19, 2019
9d3c423
Merge remote-tracking branch 'original/master' into kPCA_fix_issue_12140
Mar 22, 2019
493fe25
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Mar 22, 2019
2e326b2
Fixed issue number > PR number
Mar 22, 2019
a817f7a
Removed 1 blank line
Mar 27, 2019
fecd5d3
As per code review: since `assert_warns` and `assert_no_warnings` are…
Mar 27, 2019
6edf1f6
As per code review: renamed `check_kernel_eigenvalues` into `check_ps…
Mar 27, 2019
2ad6377
Minor edit: line too long
Mar 27, 2019
cf2fe2b
Added tests for `check_psd_eigenvalues`
Mar 27, 2019
75c234a
Fixed warning message for bad conditioning: now discarding exact zero…
Mar 27, 2019
5669d3f
Added `check_psd_eigenvalues` to `classes.rst` for auto documentation
Apr 2, 2019
bd99db0
Fixed one-liner docstring as per pep257.
Apr 2, 2019
265be2b
Using double barticks in rst docstring as per review
Apr 2, 2019
cb9993c
Using double barticks in rst docstring as per review
Apr 2, 2019
4dfe22d
Using double barticks in rst docstring as per review
Apr 2, 2019
96033ba
Update sklearn/utils/validation.py
NicolasHug Apr 2, 2019
8e78f01
Merge branch 'kPCA_fix_issue_12140' of https://github.com/smarie/scik…
Apr 2, 2019
5f1db2e
End with period (code review)
Apr 2, 2019
d40a292
Update sklearn/utils/validation.py
NicolasHug Apr 2, 2019
b256bcb
max -> maximum (as per review)
Apr 2, 2019
0a39b0d
Merge branch 'kPCA_fix_issue_12140' of https://github.com/smarie/scik…
Apr 2, 2019
c4b764d
renamed `input` into `lambdas` and `output` into `expected_lambdas`, …
Apr 2, 2019
4dca96c
using `match` as per review
Apr 2, 2019
256bb16
changed comment to match expression
Apr 2, 2019
21b3ed3
Removed explicit floats as per review
Apr 2, 2019
dd7a5a6
As per code review: cleaned the kPCA kernel conditioning test as most…
Apr 2, 2019
b35bcfb
fixed line length
Apr 2, 2019
a7c3616
reverted double barticks fix
Apr 3, 2019
43ea82a
minor test update as per review: added an assert
Apr 3, 2019
f6ee568
fixed last test update
Apr 3, 2019
55e63da
fixed doctest
Apr 3, 2019
03ded15
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Apr 4, 2019
ef07045
added double backticks as per review
Apr 5, 2019
ba576e4
Removed double backticks in unrelated places
Apr 5, 2019
48bf481
using `assert np.all` instead of `assert_array_equal` as per review
Apr 5, 2019
4f3fb72
All thresholds are now variables. Also, added a section about precision.
Apr 5, 2019
dd1ec9e
Update sklearn/utils/validation.py
jnothman Apr 9, 2019
f394570
Fixed brackets as per review
Apr 10, 2019
153c7ec
changed double precision check as per review
Apr 11, 2019
fda29c8
Fixed double precision check as per review, and added corresponding t…
Apr 23, 2019
2cee2ea
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Apr 23, 2019
3bc1b6c
Fixed line length
Apr 23, 2019
25a6908
Fixed unused import
Apr 23, 2019
ac18f82
Update sklearn/utils/tests/test_validation.py
glemaitre Apr 26, 2019
1e0fc68
Update sklearn/utils/validation.py
glemaitre Apr 26, 2019
e5d1db4
Update sklearn/utils/validation.py
glemaitre Apr 26, 2019
e3d5ec0
Update sklearn/utils/validation.py
glemaitre Apr 26, 2019
2f4e4e4
Update sklearn/utils/validation.py
glemaitre Apr 26, 2019
9a0753f
`PSDSpectrumWarning` renamed `PositiveSpectrumWarning` as per review
Apr 26, 2019
187c378
Hopefully fixed the line length warning
Apr 26, 2019
e301a34
Fixed line length
Apr 26, 2019
f5d21e4
`check_psd_eigenvalues` is now private: renamed to `_check_psd_eigenv…
May 22, 2019
be576d7
Merge remote-tracking branch 'original/master' into kPCA_fix_issue_12140
Jul 1, 2019
9442d88
Update sklearn/utils/validation.py
smarie Jul 2, 2019
907e9b2
Update doc/whats_new/v0.21.rst
smarie Jul 2, 2019
788d40d
Fixed lint issue introduced by code review
Jul 2, 2019
c969de7
Merge branch 'master' of github.com:scikit-learn/scikit-learn into pr…
NicolasHug Oct 24, 2019
71dd21f
moved whatsnew to 22
NicolasHug Oct 24, 2019
cb9b29c
simplify double precision detection
NicolasHug Oct 24, 2019
4615bc7
reduced diff
NicolasHug Oct 24, 2019
56a37d7
update whatsnew
NicolasHug Oct 24, 2019
d0cb033
reduce diff again
NicolasHug Oct 24, 2019
a9ad422
Merge branch 'master' of github.com:scikit-learn/scikit-learn into pr…
NicolasHug Nov 1, 2019
aa21b9f
Addressed most comments
NicolasHug Nov 1, 2019
97eaba5
Merge branch 'master' of github.com:scikit-learn/scikit-learn into pr…
NicolasHug Nov 5, 2019
072910a
update error message
NicolasHug Nov 5, 2019
f87ec5a
Fixed error message (2) and updated doctests accordingly
Nov 6, 2019
a2d0193
Renamed `warn_on_zeros` to `bad_conditioning_warning` as per review, …
Nov 6, 2019
a433b9a
Fixed comment for clarity on why we use `_check_psd_eigenvalues`
Nov 6, 2019
bd13289
Minor docstring update
Nov 6, 2019
25c5251
`bad_conditioning_warning` renamed to `small_nonzeros_warning` becaus…
Nov 7, 2019
13e2ba6
fixed linter error
Nov 7, 2019
71ed9fd
`_check_psd_eigenvalues` : as per code review, `small_nonzeros_warnin…
Nov 7, 2019
78936b5
(as per code review)
Nov 7, 2019
7fdbbd5
Fixed doctest + One of the error messages was still not consistent wi…
Nov 7, 2019
d06e71a
One test case (small pos eigenvals) had been mistakenly removed and n…
Nov 7, 2019
b58f1a0
Reverted bad line: `np.real(lambdas)` should also be done if imaginar…
Nov 7, 2019
581aca0
Two very minor improvements of the warning messages: now all warning …
Nov 8, 2019
4d05d0a
Update sklearn/utils/tests/test_validation.py
smarie Nov 8, 2019
a1c64c4
Update sklearn/decomposition/tests/test_kernel_pca.py
smarie Nov 14, 2019
b49f6ab
Fixed docstring
Nov 14, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/whats_new/v0.22.rst
Expand Up @@ -178,6 +178,13 @@ Changelog
:mod:`sklearn.cross_decomposition`
..................................

- |Enhancement| :class:`decomposition.KernelPCA` now properly checks the
eigenvalues found by the solver for numerical or conditioning issues. This
ensures consistency of results across solvers (different choices for
``eigen_solver``), including approximate solvers such as ``'randomized'`` and
``'lobpcg'`` (see :issue:`12068`).
:pr:`12145` by :user:`Sylvain Marié <smarie>`

- |Fix| Fixed a bug where :class:`cross_decomposition.PLSCanonical` and
:class:`cross_decomposition.PLSRegression` were raising an error when fitted
with a target matrix `Y` in which the first column was constant.
Expand Down
7 changes: 6 additions & 1 deletion sklearn/decomposition/_kernel_pca.py
Expand Up @@ -9,7 +9,8 @@

from ..utils import check_random_state
from ..utils.extmath import svd_flip
from ..utils.validation import check_is_fitted, check_array
from ..utils.validation import (check_is_fitted, check_array,
_check_psd_eigenvalues)
from ..exceptions import NotFittedError
from ..base import BaseEstimator, TransformerMixin
from ..preprocessing import KernelCenterer
Expand Down Expand Up @@ -211,6 +212,10 @@ def _fit_transform(self, K):
maxiter=self.max_iter,
v0=v0)

# make sure that the eigenvalues are ok and fix numerical issues
self.lambdas_ = _check_psd_eigenvalues(self.lambdas_,
enable_warnings=False)

# flip eigenvectors' sign to enforce deterministic output
self.alphas_, _ = svd_flip(self.alphas_,
np.empty_like(self.alphas_).T)
Expand Down
17 changes: 17 additions & 0 deletions sklearn/decomposition/tests/test_kernel_pca.py
Expand Up @@ -11,6 +11,7 @@
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.metrics.pairwise import rbf_kernel
from sklearn.utils.validation import _check_psd_eigenvalues


def test_kernel_pca():
Expand Down Expand Up @@ -270,3 +271,19 @@ def test_nested_circles():
# The data is perfectly linearly separable in that space
train_score = Perceptron(max_iter=5).fit(X_kpca, y).score(X_kpca, y)
assert train_score == 1.0


def test_kernel_conditioning():
"""Test that ``_check_psd_eigenvalues`` is correctly called."""
smarie marked this conversation as resolved.
Show resolved Hide resolved

# create a pathological X leading to small non-zero eigenvalue
X = [[5, 1],
[5+1e-8, 1e-8],
[5+1e-8, 0]]
kpca = KernelPCA(kernel="linear", n_components=2,
fit_inverse_transform=True)
kpca.fit(X)

# check that the small non-zero eigenvalue was correctly set to zero
assert kpca.lambdas_.min() == 0
smarie marked this conversation as resolved.
Show resolved Hide resolved
assert np.all(kpca.lambdas_ == _check_psd_eigenvalues(kpca.lambdas_))
15 changes: 14 additions & 1 deletion sklearn/exceptions.py
Expand Up @@ -12,7 +12,8 @@
'FitFailedWarning',
'NonBLASDotWarning',
'SkipTestWarning',
'UndefinedMetricWarning']
'UndefinedMetricWarning',
'PositiveSpectrumWarning']


class NotFittedError(ValueError, AttributeError):
Expand Down Expand Up @@ -171,3 +172,15 @@ class UndefinedMetricWarning(UserWarning):
.. versionchanged:: 0.18
Moved from sklearn.base.
"""


class PositiveSpectrumWarning(UserWarning):
"""Warning raised when the eigenvalues of a PSD matrix have issues

This warning is typically raised by ``_check_psd_eigenvalues`` when the
eigenvalues of a positive semidefinite (PSD) matrix such as a gram matrix
(kernel) present significant negative eigenvalues, or bad conditioning i.e.
very small non-zero eigenvalues compared to the largest eigenvalue.

.. versionadded:: 0.22
"""
79 changes: 78 additions & 1 deletion sklearn/utils/tests/test_validation.py
Expand Up @@ -41,13 +41,15 @@
check_non_negative,
_num_samples,
check_scalar,
_check_psd_eigenvalues,
_deprecate_positional_args,
_check_sample_weight,
_allclose_dense_sparse,
FLOAT_DTYPES)

import sklearn

from sklearn.exceptions import NotFittedError
from sklearn.exceptions import NotFittedError, PositiveSpectrumWarning
from sklearn.exceptions import DataConversionWarning

from sklearn.utils._testing import assert_raise_message
Expand Down Expand Up @@ -937,6 +939,81 @@ def test_check_scalar_invalid(x, target_name, target_type, min_val, max_val,
assert type(raised_error.value) == type(err_msg)


_psd_cases_valid = {
'nominal': ((1, 2), np.array([1, 2]), None, ""),
'nominal_np_array': (np.array([1, 2]), np.array([1, 2]), None, ""),
'insignificant_imag': ((5, 5e-5j), np.array([5, 0]),
PositiveSpectrumWarning,
"There are imaginary parts in eigenvalues "
"\\(1e\\-05 of the maximum real part"),
'insignificant neg': ((5, -5e-5), np.array([5, 0]),
PositiveSpectrumWarning, ""),
'insignificant neg float32': (np.array([1, -1e-6], dtype=np.float32),
np.array([1, 0], dtype=np.float32),
PositiveSpectrumWarning,
"There are negative eigenvalues \\(1e\\-06 "
"of the maximum positive"),
'insignificant neg float64': (np.array([1, -1e-10], dtype=np.float64),
np.array([1, 0], dtype=np.float64),
PositiveSpectrumWarning,
"There are negative eigenvalues \\(1e\\-10 "
"of the maximum positive"),
'insignificant pos': ((5, 4e-12), np.array([5, 0]),
PositiveSpectrumWarning,
"the largest eigenvalue is more than 1e\\+12 "
"times the smallest"),
}


@pytest.mark.parametrize("lambdas, expected_lambdas, w_type, w_msg",
list(_psd_cases_valid.values()),
ids=list(_psd_cases_valid.keys()))
@pytest.mark.parametrize("enable_warnings", [True, False])
def test_check_psd_eigenvalues_valid(lambdas, expected_lambdas, w_type, w_msg,
enable_warnings):
# Test that ``_check_psd_eigenvalues`` returns the right output for valid
# input, possibly raising the right warning

if not enable_warnings:
w_type = None
w_msg = ""

with pytest.warns(w_type, match=w_msg) as w:
smarie marked this conversation as resolved.
Show resolved Hide resolved
assert_array_equal(
_check_psd_eigenvalues(lambdas, enable_warnings=enable_warnings),
expected_lambdas
)
if w_type is None or not enable_warnings:
smarie marked this conversation as resolved.
Show resolved Hide resolved
assert not w


_psd_cases_invalid = {
smarie marked this conversation as resolved.
Show resolved Hide resolved
'significant_imag': ((5, 5j), ValueError,
"There are significant imaginary parts in eigenv"),
'all negative': ((-5, -1), ValueError,
"All eigenvalues are negative \\(maximum is -1"),
'significant neg': ((5, -1), ValueError,
"There are significant negative eigenvalues"),
'significant neg float32': (np.array([3e-4, -2e-6], dtype=np.float32),
ValueError,
"There are significant negative eigenvalues"),
'significant neg float64': (np.array([1e-5, -2e-10], dtype=np.float64),
ValueError,
"There are significant negative eigenvalues"),
}


@pytest.mark.parametrize("lambdas, err_type, err_msg",
smarie marked this conversation as resolved.
Show resolved Hide resolved
list(_psd_cases_invalid.values()),
ids=list(_psd_cases_invalid.keys()))
def test_check_psd_eigenvalues_invalid(lambdas, err_type, err_msg):
# Test that ``_check_psd_eigenvalues`` raises the right error for invalid
# input

with pytest.raises(err_type, match=err_msg):
_check_psd_eigenvalues(lambdas)


def test_check_sample_weight():
# check array order
sample_weight = np.ones(10)[::2]
Expand Down
166 changes: 165 additions & 1 deletion sklearn/utils/validation.py
Expand Up @@ -6,6 +6,7 @@
# Lars Buitinck
# Alexandre Gramfort
# Nicolas Tresegnie
# Sylvain Marie
# License: BSD 3 clause

from functools import wraps
Expand All @@ -22,7 +23,7 @@

from .fixes import _object_dtype_isnan
from .. import get_config as _get_config
from ..exceptions import NonBLASDotWarning
from ..exceptions import NonBLASDotWarning, PositiveSpectrumWarning
from ..exceptions import NotFittedError
from ..exceptions import DataConversionWarning

Expand Down Expand Up @@ -1020,6 +1021,169 @@ def check_scalar(x, name, target_type, min_val=None, max_val=None):
raise ValueError('`{}`= {}, must be <= {}.'.format(name, x, max_val))


def _check_psd_eigenvalues(lambdas, enable_warnings=False):
"""Check the eigenvalues of a positive semidefinite (PSD) matrix.

Checks the provided array of PSD matrix eigenvalues for numerical or
conditioning issues and returns a fixed validated version. This method
should typically be used if the PSD matrix is user-provided (e.g. a
Gram matrix) or computed using a user-provided dissimilarity metric
(e.g. kernel function), or if the decomposition process uses approximation
methods (randomized SVD, etc.).

It checks for three things:

- that there are no significant imaginary parts in eigenvalues (more than
1e-5 times the maximum real part). If this check fails, it raises a
smarie marked this conversation as resolved.
Show resolved Hide resolved
NicolasHug marked this conversation as resolved.
Show resolved Hide resolved
``ValueError``. Otherwise all non-significant imaginary parts that may
remain are set to zero. This operation is traced with a
``PositiveSpectrumWarning`` when ``enable_warnings=True``.

- that eigenvalues are not all negative. If this check fails, it raises a
``ValueError``
smarie marked this conversation as resolved.
Show resolved Hide resolved

- that there are no significant negative eigenvalues with absolute value
more than 1e-10 (1e-6) and more than 1e-5 (5e-3) times the largest
positive eigenvalue in double (simple) precision. If this check fails,
it raises a ``ValueError``. Otherwise all negative eigenvalues that may
remain are set to zero. This operation is traced with a
``PositiveSpectrumWarning`` when ``enable_warnings=True``.

Finally, all the positive eigenvalues that are too small (with a value
smaller than the maximum eigenvalue divided by 1e12) are set to zero.
This operation is traced with a ``PositiveSpectrumWarning`` when
``enable_warnings=True``.

Parameters
----------
lambdas : array-like of shape (n_eigenvalues,)
Array of eigenvalues to check / fix.

enable_warnings : bool, default=False
When this is set to ``True``, a ``PositiveSpectrumWarning`` will be
raised when there are imaginary parts, negative eigenvalues, or
extremely small non-zero eigenvalues. Otherwise no warning will be
raised. In both cases, imaginary parts, negative eigenvalues, and
extremely small non-zero eigenvalues will be set to zero.

Returns
-------
lambdas_fixed : ndarray of shape (n_eigenvalues,)
A fixed validated copy of the array of eigenvalues.

Examples
--------
>>> _check_psd_eigenvalues([1, 2]) # nominal case
array([1, 2])
>>> _check_psd_eigenvalues([5, 5j]) # significant imag part
Traceback (most recent call last):
...
ValueError: There are significant imaginary parts in eigenvalues (1
of the maximum real part). Either the matrix is not PSD, or there was
an issue while computing the eigendecomposition of the matrix.
>>> _check_psd_eigenvalues([5, 5e-5j]) # insignificant imag part
array([5., 0.])
>>> _check_psd_eigenvalues([-5, -1]) # all negative
Traceback (most recent call last):
...
ValueError: All eigenvalues are negative (maximum is -1). Either the
matrix is not PSD, or there was an issue while computing the
eigendecomposition of the matrix.
>>> _check_psd_eigenvalues([5, -1]) # significant negative
Traceback (most recent call last):
...
ValueError: There are significant negative eigenvalues (0.2 of the
maximum positive). Either the matrix is not PSD, or there was an issue
while computing the eigendecomposition of the matrix.
>>> _check_psd_eigenvalues([5, -5e-5]) # insignificant negative
array([5., 0.])
>>> _check_psd_eigenvalues([5, 4e-12]) # bad conditioning (too small)
array([5., 0.])

"""

lambdas = np.array(lambdas)
NicolasHug marked this conversation as resolved.
Show resolved Hide resolved
is_double_precision = lambdas.dtype == np.float64

# note: the minimum value available is
# - single-precision: np.finfo('float32').eps = 1.2e-07
# - double-precision: np.finfo('float64').eps = 2.2e-16

# the various thresholds used for validation
# we may wish to change the value according to precision.
significant_imag_ratio = 1e-5
significant_neg_ratio = 1e-5 if is_double_precision else 5e-3
significant_neg_value = 1e-10 if is_double_precision else 1e-6
small_pos_ratio = 1e-12

# Check that there are no significant imaginary parts
if not np.isreal(lambdas).all():
max_imag_abs = np.abs(np.imag(lambdas)).max()
max_real_abs = np.abs(np.real(lambdas)).max()
if max_imag_abs > significant_imag_ratio * max_real_abs:
raise ValueError(
"There are significant imaginary parts in eigenvalues (%g "
"of the maximum real part). Either the matrix is not PSD, or "
"there was an issue while computing the eigendecomposition "
"of the matrix."
% (max_imag_abs / max_real_abs))

# warn about imaginary parts being removed
if enable_warnings:
warnings.warn("There are imaginary parts in eigenvalues (%g "
"of the maximum real part). Either the matrix is not"
" PSD, or there was an issue while computing the "
"eigendecomposition of the matrix. Only the real "
"parts will be kept."
% (max_imag_abs / max_real_abs),
PositiveSpectrumWarning)

# Remove all imaginary parts (even if zero)
lambdas = np.real(lambdas)

# Check that there are no significant negative eigenvalues
max_eig = lambdas.max()
if max_eig < 0:
raise ValueError("All eigenvalues are negative (maximum is %g). "
"Either the matrix is not PSD, or there was an "
"issue while computing the eigendecomposition of "
"the matrix." % max_eig)

else:
min_eig = lambdas.min()
if (min_eig < -significant_neg_ratio * max_eig
and min_eig < -significant_neg_value):
raise ValueError("There are significant negative eigenvalues (%g"
" of the maximum positive). Either the matrix is "
"not PSD, or there was an issue while computing "
"the eigendecomposition of the matrix."
% (-min_eig / max_eig))
elif min_eig < 0:
# Remove all negative values and warn about it
if enable_warnings:
warnings.warn("There are negative eigenvalues (%g of the "
"maximum positive). Either the matrix is not "
"PSD, or there was an issue while computing the"
" eigendecomposition of the matrix. Negative "
"eigenvalues will be replaced with 0."
% (-min_eig / max_eig),
PositiveSpectrumWarning)
lambdas[lambdas < 0] = 0

# Check for conditioning (small positive non-zeros)
too_small_lambdas = (0 < lambdas) & (lambdas < small_pos_ratio * max_eig)
NicolasHug marked this conversation as resolved.
Show resolved Hide resolved
if too_small_lambdas.any():
if enable_warnings:
warnings.warn("Badly conditioned PSD matrix spectrum: the largest "
"eigenvalue is more than %g times the smallest. "
"Small eigenvalues will be replaced with 0."
"" % (1 / small_pos_ratio),
PositiveSpectrumWarning)
lambdas[too_small_lambdas] = 0

return lambdas


def _check_sample_weight(sample_weight, X, dtype=None):
"""Validate sample weights.

Expand Down