DotProduct Kernel not non-negative definite after rounding errors #8252

chrishmorris · 2017-01-31T16:42:41Z

Description

GPR will fail in the Cholesky decomposition if it finds negative eigenvectors. Cholesky decomposition itself is numerically stable. Unfortunately rounding errors in the kernel can cause this error, and this occurs with the DotProduct kernel.

Steps/Code to Reproduce

model = gaussian_process.GaussianProcessRegressor(
kernel=kernels.DotProduct(),
optimizer='fmin_l_bfgs_b',
random_state=None)
model.fit(X,Y)

Expected Results

Mathematically speaking, Dot Product is a kernel, i.e. symmetric and non-negative definite, so this should succeed.

Actual Results

A LinAlg error may be thrown in the Cholesky decomposition. The error is not in this routine. Rather, the value of kernels.DotProduct()(X) can have small negative eigenvalues after rounding errors.

An example of values for which this fails is at http://pims.structuralbiology.eu/X.csv

Versions

Linux-3.10.0-514.6.1.el7.x86_64-x86_64-with-centos-7.3.1611-Core
('Python', '2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')
('NumPy', '1.11.2')
('SciPy', '0.18.1')
('Scikit-Learn', '0.18')

jnothman · 2017-01-31T20:17:36Z

Thanks for reporting the issue and providing a dataset. A full code example that breaks works really help someone trying to fix it.

chrishmorris · 2017-02-09T15:25:44Z

Here is code to reproduce. The result is:

LinAlgError: 114-th leading minor not positive definite

import urllib2, csv
import sklearn, random
from sklearn import gaussian_process
import sklearn.gaussian_process.kernels as kernels
f = urllib2.urlopen('http://pims.structuralbiology.eu/X.csv') 
X = [[float(i) for i in row] for row in csv.reader(f, delimiter=',') ]
f.close()
model = gaussian_process.GaussianProcessRegressor(
 kernel=kernels.DotProduct(), 
 optimizer='fmin_l_bfgs_b',
 random_state=None
)
model.fit(X, [0.0]*len(X))

lesteve · 2017-02-09T15:40:55Z

I can reproduce with a conda environment using openblas (although I got a complaint about the 140-th leading minor and not the 114-th one). I can not reproduce the error with a conda environment using mkl.

For the record readability counts, a lot! Please use triple back-quotes aka fenced code blocks to format error messages and code snippets.

Also, it does not seem to matter in this particular case, but setting the random_state to a integer (rather than None) will make behaviour deterministic.

lesteve · 2017-02-15T11:02:52Z

I think that the underlying problem is that cholesky is supposed to work for positive-definite matrices. In this case np.dot(X, X.T) is positive-semi-definite (X has shape (150, 3) so np.dot(X, X.T) has rank 3 at most). I am not an expert about this but I would think there are ways to tackle this that may exist already Scientific Python ecosystem.

chrishmorris · 2017-02-15T11:37:03Z

Yes, you correctly say that the Cholesky decomposition assumes a non-negative definite matrix. Any valid kernel for GPR must be non-negative definite. From the point of view of abstract mathematics, DotProduct meets this criterion, so correctly scikit offers it in the kernels package.

The defect here is that when I put these together as above, rounding errors create some negative eigenvectors. So a mathematically correct regression failed due to numerical issues.

The defect is not in the Cholesky decomposition. It is rounding errors in DotProduct. It is not sufficient for this kernel to calculate the individual dot products to the available precision. To be usable for GPR, it must also ensure that the resulting matrix as a whole is non-negative definite.

I appreciate that this might be difficult. There is a possible workaround, which is to supply an identity kernel, one that has a covariance of 0 between different entries and 1 on the diagonal. Then people who hit this issue could add in a small multiple of the identity kernel as follows:

model = gaussian_process.GaussianProcessRegressor(
 kernel=kernels.DotProduct()+0.001*kernels.Identity(), 
 optimizer='fmin_l_bfgs_b',
 random_state=None
)

lesteve · 2017-02-15T11:57:07Z

The defect here is that when I put these together as above, rounding errors create some negative eigenvectors. So a mathematically correct regression failed due to numerical issues.

np.dot(X, X.T) has some zero eigenvalues, even if there were no floating point errors, and the smallest eigenvalues were zero instead of negative, cholesky would still give an error because np.dot(X, X.T) is not positive definite. Try this for example:

import numpy as np

np.linalg.cholesky([[1., 0.], [0., 0.]])  # LinAlgError: Matrix is not positive definite

There is a possible workaround, which is to supply an identity kernel, one that has a covariance of 0 between different entries and 1 on the diagonal. Then people who hit this issue could add in a small multiple of the identity kernel as follows:

This seems like a reasonable work-around although I am not an expert in Gaussian processes at all.

chrishmorris · 2017-02-15T12:09:38Z

Lesteve has identified another defect. Kernels must be postive semidefinite, i.e. have no negative eigenvalues, but can have zero eigenvalues. There are implementations of a Cholesky-style factorization that can handle this, and that is what is required for GPR.

Just to be clear, the Identity kernel I suggest is not implemented yet. This is a request to implement it. This would also supply a workaround for the issue lesteve has identified. It is analogous to adding one to a series before reciprocating to avoid divide by zero errors.

lesteve · 2017-02-15T12:18:25Z

Just to be clear, the Identity kernel I suggest is not implemented yet.

It seems like kernels.DotProduct() + kernels.WhiteKernel(noise_level=0.001) does what you want.

lesteve · 2017-02-15T12:22:33Z

ping @jmetzen who reimplemented the gaussian_process subpackage for 0.18. He may have an informed opinion about this.

jmetzen · 2017-02-15T18:53:48Z

Thanks for reporting this. @lesteve is right that adding a WhiteKernel with a certain noise level would add a diagonal matrix to the kernel. However, there is also the parameter alphaof GaussianProcessRegressor, which does exactly this and is even easier to use (please see here). It defaults to alpha=1e-10 which should ensure in theory that the kernel is positive definite. If you encounter numerical problems, increasing alpha by a few orders of magnitude should be sufficient. However, the current behavior of throwing a LinAlgError is clearly not very user-friendly. At the very least, a useful error with the suggestion of increasing alpha should be thrown. If someone volunteers, it should be relatively easy to add (just catch LinAlgError and throw an appropriate Exception with a useful error string).

lesteve · 2017-02-16T07:07:28Z

At the very least, a useful error with the suggestion of increasing alpha should be thrown. If someone volunteers, it should be relatively easy to add (just catch LinAlgError and throw an appropriate Exception with a useful error string).

This would make sense indeed.

lesteve · 2017-02-16T09:13:02Z

@chrishmorris does using alpha in GaussianProcessRegressor work for your use case? If so can you please close this issue, and I'll create another one about the confusing error message.

chrishmorris · 2017-02-17T15:16:11Z

Yes, thank you!

lesteve · 2017-02-17T15:25:06Z

OK great, thanks for following up. I opened #8384 to improve the error message.

chrishmorris · 2017-02-17T15:32:46Z

I also suggest that in http://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html where it says

alpha : float or array-like, optional (default: 1e-10)

    Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations and reduce potential numerical issue during fitting. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapoint-dependent noise level. Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge.

this is modified to read

alpha : float or array-like, optional (default: 1e-10)

    Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations. This can also prevent a potential numerical issue during fitting, by ensuring that the calculated values form a positive definite matrix. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapoint-dependent noise level. 
Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge.

lesteve · 2017-02-17T15:50:37Z

PR welcome !

Your change amounts to adding this (quite hard to figure out visually, next time it is probably better to post a diff or directly open a PR), right?

by ensuring that the calculated values form a positive definite matrix

@jmetzen what do you think?

jmetzen · 2017-02-17T19:16:56Z

Yes, adding this to the docstring would make sense. It can be added to #8384

chrishmorris closed this as completed Feb 17, 2017

lesteve mentioned this issue Feb 17, 2017

Better error message in GaussianProcessRegressor when cholesky is called on a non positive-definite array. #8384

Closed

lesteve mentioned this issue Feb 20, 2017

[MRG+1] Better error message for GPR #8386

Merged

rth mentioned this issue Sep 23, 2017

Fitting ExpSineSquared kernel to large dimension data throws a ValueError on MacOS #9824

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DotProduct Kernel not non-negative definite after rounding errors #8252

DotProduct Kernel not non-negative definite after rounding errors #8252

chrishmorris commented Jan 31, 2017

jnothman commented Jan 31, 2017

chrishmorris commented Feb 9, 2017 •

edited by lesteve

lesteve commented Feb 9, 2017 •

edited

lesteve commented Feb 15, 2017

chrishmorris commented Feb 15, 2017 •

edited

lesteve commented Feb 15, 2017 •

edited

chrishmorris commented Feb 15, 2017

lesteve commented Feb 15, 2017 •

edited

lesteve commented Feb 15, 2017

jmetzen commented Feb 15, 2017

lesteve commented Feb 16, 2017

lesteve commented Feb 16, 2017

chrishmorris commented Feb 17, 2017

lesteve commented Feb 17, 2017

chrishmorris commented Feb 17, 2017

lesteve commented Feb 17, 2017

jmetzen commented Feb 17, 2017

DotProduct Kernel not non-negative definite after rounding errors #8252

DotProduct Kernel not non-negative definite after rounding errors #8252

Comments

chrishmorris commented Jan 31, 2017

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

jnothman commented Jan 31, 2017

chrishmorris commented Feb 9, 2017 • edited by lesteve

lesteve commented Feb 9, 2017 • edited

lesteve commented Feb 15, 2017

chrishmorris commented Feb 15, 2017 • edited

lesteve commented Feb 15, 2017 • edited

chrishmorris commented Feb 15, 2017

lesteve commented Feb 15, 2017 • edited

lesteve commented Feb 15, 2017

jmetzen commented Feb 15, 2017

lesteve commented Feb 16, 2017

lesteve commented Feb 16, 2017

chrishmorris commented Feb 17, 2017

lesteve commented Feb 17, 2017

chrishmorris commented Feb 17, 2017

lesteve commented Feb 17, 2017

jmetzen commented Feb 17, 2017

chrishmorris commented Feb 9, 2017 •

edited by lesteve

lesteve commented Feb 9, 2017 •

edited

chrishmorris commented Feb 15, 2017 •

edited

lesteve commented Feb 15, 2017 •

edited

lesteve commented Feb 15, 2017 •

edited