Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DotProduct Kernel not non-negative definite after rounding errors #8252

Closed
chrishmorris opened this issue Jan 31, 2017 · 17 comments
Closed

DotProduct Kernel not non-negative definite after rounding errors #8252

chrishmorris opened this issue Jan 31, 2017 · 17 comments

Comments

@chrishmorris
Copy link

Description

GPR will fail in the Cholesky decomposition if it finds negative eigenvectors. Cholesky decomposition itself is numerically stable. Unfortunately rounding errors in the kernel can cause this error, and this occurs with the DotProduct kernel.

Steps/Code to Reproduce

model = gaussian_process.GaussianProcessRegressor(
kernel=kernels.DotProduct(),
optimizer='fmin_l_bfgs_b',
random_state=None)
model.fit(X,Y)

Expected Results

Mathematically speaking, Dot Product is a kernel, i.e. symmetric and non-negative definite, so this should succeed.

Actual Results

A LinAlg error may be thrown in the Cholesky decomposition. The error is not in this routine. Rather, the value of kernels.DotProduct()(X) can have small negative eigenvalues after rounding errors.

An example of values for which this fails is at http://pims.structuralbiology.eu/X.csv

Versions

Linux-3.10.0-514.6.1.el7.x86_64-x86_64-with-centos-7.3.1611-Core
('Python', '2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')
('NumPy', '1.11.2')
('SciPy', '0.18.1')
('Scikit-Learn', '0.18')

@jnothman
Copy link
Member

Thanks for reporting the issue and providing a dataset. A full code example that breaks works really help someone trying to fix it.

@chrishmorris
Copy link
Author

chrishmorris commented Feb 9, 2017

Here is code to reproduce. The result is:

LinAlgError: 114-th leading minor not positive definite

import urllib2, csv
import sklearn, random
from sklearn import gaussian_process
import sklearn.gaussian_process.kernels as kernels
f = urllib2.urlopen('http://pims.structuralbiology.eu/X.csv') 
X = [[float(i) for i in row] for row in csv.reader(f, delimiter=',') ]
f.close()
model = gaussian_process.GaussianProcessRegressor(
 kernel=kernels.DotProduct(), 
 optimizer='fmin_l_bfgs_b',
 random_state=None
)
model.fit(X, [0.0]*len(X))

@lesteve
Copy link
Member

lesteve commented Feb 9, 2017

I can reproduce with a conda environment using openblas (although I got a complaint about the 140-th leading minor and not the 114-th one). I can not reproduce the error with a conda environment using mkl.

For the record readability counts, a lot! Please use triple back-quotes aka fenced code blocks to format error messages and code snippets.

Also, it does not seem to matter in this particular case, but setting the random_state to a integer (rather than None) will make behaviour deterministic.

@lesteve
Copy link
Member

lesteve commented Feb 15, 2017

I think that the underlying problem is that cholesky is supposed to work for positive-definite matrices. In this case np.dot(X, X.T) is positive-semi-definite (X has shape (150, 3) so np.dot(X, X.T) has rank 3 at most). I am not an expert about this but I would think there are ways to tackle this that may exist already Scientific Python ecosystem.

@chrishmorris
Copy link
Author

chrishmorris commented Feb 15, 2017

Yes, you correctly say that the Cholesky decomposition assumes a non-negative definite matrix. Any valid kernel for GPR must be non-negative definite. From the point of view of abstract mathematics, DotProduct meets this criterion, so correctly scikit offers it in the kernels package.

The defect here is that when I put these together as above, rounding errors create some negative eigenvectors. So a mathematically correct regression failed due to numerical issues.

The defect is not in the Cholesky decomposition. It is rounding errors in DotProduct. It is not sufficient for this kernel to calculate the individual dot products to the available precision. To be usable for GPR, it must also ensure that the resulting matrix as a whole is non-negative definite.

I appreciate that this might be difficult. There is a possible workaround, which is to supply an identity kernel, one that has a covariance of 0 between different entries and 1 on the diagonal. Then people who hit this issue could add in a small multiple of the identity kernel as follows:

model = gaussian_process.GaussianProcessRegressor(
 kernel=kernels.DotProduct()+0.001*kernels.Identity(), 
 optimizer='fmin_l_bfgs_b',
 random_state=None
)

@lesteve
Copy link
Member

lesteve commented Feb 15, 2017

The defect here is that when I put these together as above, rounding errors create some negative eigenvectors. So a mathematically correct regression failed due to numerical issues.

np.dot(X, X.T) has some zero eigenvalues, even if there were no floating point errors, and the smallest eigenvalues were zero instead of negative, cholesky would still give an error because np.dot(X, X.T) is not positive definite. Try this for example:

import numpy as np

np.linalg.cholesky([[1., 0.], [0., 0.]])  # LinAlgError: Matrix is not positive definite

There is a possible workaround, which is to supply an identity kernel, one that has a covariance of 0 between different entries and 1 on the diagonal. Then people who hit this issue could add in a small multiple of the identity kernel as follows:

This seems like a reasonable work-around although I am not an expert in Gaussian processes at all.

@chrishmorris
Copy link
Author

Lesteve has identified another defect. Kernels must be postive semidefinite, i.e. have no negative eigenvalues, but can have zero eigenvalues. There are implementations of a Cholesky-style factorization that can handle this, and that is what is required for GPR.

Just to be clear, the Identity kernel I suggest is not implemented yet. This is a request to implement it. This would also supply a workaround for the issue lesteve has identified. It is analogous to adding one to a series before reciprocating to avoid divide by zero errors.

@lesteve
Copy link
Member

lesteve commented Feb 15, 2017

Just to be clear, the Identity kernel I suggest is not implemented yet.

It seems like kernels.DotProduct() + kernels.WhiteKernel(noise_level=0.001) does what you want.

@lesteve
Copy link
Member

lesteve commented Feb 15, 2017

ping @jmetzen who reimplemented the gaussian_process subpackage for 0.18. He may have an informed opinion about this.

@jmetzen
Copy link
Member

jmetzen commented Feb 15, 2017

Thanks for reporting this. @lesteve is right that adding a WhiteKernel with a certain noise level would add a diagonal matrix to the kernel. However, there is also the parameter alphaof GaussianProcessRegressor, which does exactly this and is even easier to use (please see here). It defaults to alpha=1e-10 which should ensure in theory that the kernel is positive definite. If you encounter numerical problems, increasing alpha by a few orders of magnitude should be sufficient. However, the current behavior of throwing a LinAlgError is clearly not very user-friendly. At the very least, a useful error with the suggestion of increasing alpha should be thrown. If someone volunteers, it should be relatively easy to add (just catch LinAlgError and throw an appropriate Exception with a useful error string).

@lesteve
Copy link
Member

lesteve commented Feb 16, 2017

At the very least, a useful error with the suggestion of increasing alpha should be thrown. If someone volunteers, it should be relatively easy to add (just catch LinAlgError and throw an appropriate Exception with a useful error string).

This would make sense indeed.

@lesteve
Copy link
Member

lesteve commented Feb 16, 2017

@chrishmorris does using alpha in GaussianProcessRegressor work for your use case? If so can you please close this issue, and I'll create another one about the confusing error message.

@chrishmorris
Copy link
Author

Yes, thank you!

@lesteve
Copy link
Member

lesteve commented Feb 17, 2017

OK great, thanks for following up. I opened #8384 to improve the error message.

@chrishmorris
Copy link
Author

I also suggest that in http://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html where it says

alpha : float or array-like, optional (default: 1e-10)

    Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations and reduce potential numerical issue during fitting. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapoint-dependent noise level. Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge.

this is modified to read

alpha : float or array-like, optional (default: 1e-10)

    Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations. This can also prevent a potential numerical issue during fitting, by ensuring that the calculated values form a positive definite matrix. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapoint-dependent noise level. 
Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge.

@lesteve
Copy link
Member

lesteve commented Feb 17, 2017

PR welcome !

Your change amounts to adding this (quite hard to figure out visually, next time it is probably better to post a diff or directly open a PR), right?

by ensuring that the calculated values form a positive definite matrix

@jmetzen what do you think?

@jmetzen
Copy link
Member

jmetzen commented Feb 17, 2017

Yes, adding this to the docstring would make sense. It can be added to #8384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants