fmin_cobyla used in gaussian_process.py does not like np.inf #291

Closed
goretkin opened this Issue Aug 2, 2011 · 10 comments

Comments

Projects
None yet
3 participants

goretkin commented Aug 2, 2011

The reduced_likelihood_function() in gaussian_process/gaussian_process.py returns np.inf when the Cholesky decomposition of R raises an np.linalg.LinAlg exception. When the objective function of scipy.optimize.fmin_cobyla returns np.inf, the optimizer passes np.nan to the objective function (I raised this issue on scipy-user). This np.nan causes an exception when trying to do the Cholesky decomposition again, because this time R is full of np.nan.

Maybe there are three routes to fixing this issue

  1. Do something other than return np.inf when the Cholesky decomposition raises a LinAlg exception
  2. Have reduced_likelihood_function() treat a nan argument better than raising an exception, in such a way that fmin_cobyla can proceed.
  3. Use an optimizer different from fmin_cobyla -- one that handles values of np.inf better.
Owner

GaelVaroquaux commented Aug 2, 2011

On Tue, Aug 02, 2011 at 02:43:10PM -0700, goretkin wrote:

The reduced_likelihood_function() in gaussian_process/gaussian_process.py returns np.inf when the Cholesky decomposition of R raises an np.linalg.LinAlg exception. When the objective function of scipy.optimize.fmin_cobyla returns np.inf, the optimizer passes np.nan to the objective function (I raised this issue on scipy-user). This np.nan causes an exception when trying to do the Cholesky decomposition again, because this time R is full of np.nan.

I don't know this code, but having a quick look at it, it seems that it
is trying to tell the optimizer: "get the hell out of there". We could
pass a large negative constant, that would then have to be an option of
the function/estimator.

Does that seem reasonnable? Could you try that on your problem and send
us a pull request if it works?

goretkin commented Aug 2, 2011

I think that the main problem is that the default value of "nugget", which is added to the diagonal elements of R (l2 penalization style) is too small for my dataset. The default value for nugget is 10_MACHINE_EPSILON. Making it something like 1000_MACHINE_EPSILON is working well for me. I know I'm sounding like one of those guys who doesn't understand numerical stability and just adds (1e-6)*identity to a matrix so the algorithms don't barf... investigation to come.

Owner

GaelVaroquaux commented Aug 3, 2011

I don't know the Gaussian Process code, but if you have very few samples
compared to your dimensionality, you will have to regularize more to
compute your covariance in general. We even have code to choose the
amount of regularization:
http://scikit-learn.sourceforge.net/stable/modules/covariance.html#shrunk-covariance
ledoit-wolf and OAS.

Now how this could plug in the GP code, I do not know.

goretkin commented Aug 3, 2011

My input dimensionality is 1, and I have 100s of samples. When I have more samples (and they are packed close together) then I need to apply more regularization. However many samples I choose, they are regularly spaced together, so R is constant diagonal (i.e. Toeplitz). The more samples there are, the less change there is from one diagonal to another. I can imagine that in the limit, the matrix approaches something that looks like the matrix all full of 1s, which may be the reason why there are problems when I have many samples.

Owner

amueller commented Jun 5, 2012

@goretkin could you give a minimal example? That would be quite helpful. I'd "fix" it by giving a more helpful error message then.

goretkin commented Jun 6, 2012

This raises it for me.

import sklearn.gaussian_process as gpm
import numpy as np
xs = np.linspace(0,7,500).reshape(-1,1)
ys = np.sin(xs)*np.exp(-xs/2)
gp = gpm.GaussianProcess(verbose=True,thetaL=2e-2,thetaU=3e-2,theta0=2.5e-2)
gp.fit(X=xs,y=ys)

The trick is choosing a theta0 so that the objective function doesn't return 0 at first (otherwise it raises Exception("Bad point. Try increasing theta0.")), but then when the optimizer moves theta around, the objective begins returning infinity.

In [51]: gp.reduced_likelihood_function(theta=[2.5e-2])[0]
Out[51]: -4.0893616701893984e-07
Owner

amueller commented Jun 7, 2012

I added a new error message in 2cc0af8.
Do you think that is helpful? I guess someone how knows more about what is going on should have a look.
If you think you can help us improve the stability, any contribution would be very welcome.

Owner

amueller commented Oct 17, 2012

Any one has any ideas what we should do with this issue?

Owner

GaelVaroquaux commented Oct 17, 2012

Any one has any ideas what we should do with this issue?

Keep it open, as it is a real problem, but we lack manpower to fix it :(

@amueller amueller modified the milestone: 0.15.1, 1.0 Jul 18, 2014

Owner

amueller commented Feb 25, 2015

untagging the milestone. Will be solved with the new GP implementation.

amueller closed this Feb 25, 2015

amueller removed this from the 0.16 milestone Feb 25, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment