Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually adding points and changing the utility function #337

Closed
sguysc opened this issue Jul 22, 2022 · 17 comments
Closed

Manually adding points and changing the utility function #337

sguysc opened this issue Jul 22, 2022 · 17 comments

Comments

@sguysc
Copy link

sguysc commented Jul 22, 2022

I'm using the suggest-register paradigm and I'm doing two things that I think break something in the operation and I wanted to know how to do it correctly.

At some point, I want to suggest a parameter by myself, externally. So what I do is

my_pnt_from_an_oracle = 0
next_point = {'x': my_pnt_from_an_oracle} 
optimizer.register(params=next_point, target=f(my_pnt_from_an_oracle) )
_ = optimizer.suggest(utility) # I also call this, although I'm not sure it is necessary

After that, I also want to change the utility to exploitation rather than exploration because
I happen to know that the point I'm manually adding, is supposed to be at least some local maxima
and also that it is for sure higher than all the points that were suggested until now.
So I change:

utility = UtilityFunction(kind="ei", kappa=0, xi=1e-4) #xi was 10 originally

But now whenever I run the suggest-register combo, I don't seem to get any suggestions that are closer to my_pnt_from_an_oracle to try to find a better maximum. I think it may not be registered correctly? Although, it appears in optimizer.res.

Any suggestion on how to add a point and change the utility in the middle of an operation?

@bwheelz36
Copy link
Collaborator

hey, at first glance, you have the target and the suggested point as the same thing. e.g. point 0 has target 0. Depending on your objective function, that may not be a very good target value, in which case it's nor surprising the optimiser doesn't look around that point any more. You still have to evaluate what the real target value is at your suggestion, and it still has to be good compared to previously probed points for the optimizer to continue exploring it.

overall, there should only be one line of difference between making a suggestion and using the optimizers internal suggestions. The below is a code snippet that won't run by itself, but I hope illustrates the idea.

if (self.Nsuggestions is not None) and (self.SuggestionsProbed < self.Nsuggestions):
    # evaluate any suggested solutions first
    next_point_to_probe = self.Suggestions[self.SuggestionsProbed]
    self.SuggestionsProbed += 1
else:
    next_point_to_probe = self.optimizer.suggest(utility)
target = self.BlackBoxFunction(next_point_to_probe)
self.optimizer.register(params=next_point_to_probe, target=target)

@sguysc
Copy link
Author

sguysc commented Jul 26, 2022

I didn't follow the explanation about what was wrong with my example. Let me clarify, I sample N points using suggest-register normally. If after those N points, I don't have an objective function value larger than some value V then I manually add a point, let's say 0, which I know for a fact that it has a larger objective value than all of the N points sampled until now. It might not be the true maximum over the whole range, but it is greater than the other sampled points.

So I would want the optimizer to investigate in the vicinity of the added point at least, depending on the exploration-exploitation parameter knobs. I think that essentially I have the same code as you. The only difference is that I also change the utility function at the same time. Could that be it? I'll try getting into the code more and see if I see something.

p.s.:
I just thought, maybe you missed that I call the target function with f(my_pnt_from_an_oracle)? f is my BlackBoxFunction.

@bwheelz36
Copy link
Collaborator

At the optimizer.register step, you need to register the location of the point, and the value of the objective function at that point.

In the example you gave, you are using the same value (0) for both.

@sguysc
Copy link
Author

sguysc commented Jul 26, 2022

This is what I'm doing, it's not registering the same value for both, f is the objective:

optimizer.register(params={'x':0}, target=f(0) )

But my f only accepts scalars and not dictionaries.

@bwheelz36
Copy link
Collaborator

bwheelz36 commented Jul 26, 2022

Ok my apologies, I didn't see the f there.
In that case, I think what you are doing is correct. Is there any chance you could set up a simple example that replicates the issue?

@bwheelz36
Copy link
Collaborator

OK. Again apologies for not understanding your initial question, I was trying to answer on my phone.

I've just had a look into this and I agree the behavior is not as expected. Below is a standalone example I wrote to investigate this.
A few points:

  • I think the point is being correctly registered. You can check what the predicted value is at the suggestion as per line 57 of the example. Following the registration of the suggested point, the prediction closely matches the real value.
  • It seems to me that when I set length scales in the gaussian process model, the behavior is more what I expect; I get a clustering of points around the suggestion. I think that this may be related to Choice of length_scale in GP kernel - wow! #287; the predictions of the gaussian process model can be highly dependent on the gaussian process settings. You can turn on/off the setting of length scales using the set_gp_params parameter.
    I think what is happening here is this; when no length scales are set, the gp model uses 1 length scale to predict all parameters. When they are set on an individual basis, each parameter is modeled independently. In both cases, the length scales are optimized behind the scenes - but setting them as an array results in much better predictions. When the predictions are off, it is not surprising that suggest utility doesn't suggest what we think it should. What perhaps IS surprising is that the optimization still works quite well!
  • As I suggested in Choice of length_scale in GP kernel - wow! #287; maybe we should change the default behaviour to always use an array of length scales instead of a single value.

I would really appreciate if you could play around with this example, and check if you agree with the conclusions I draw above...

from bayes_opt import BayesianOptimization
from bayes_opt import UtilityFunction
from scipy.optimize import rosen
import numpy as np
from matplotlib import pyplot as plt
from sklearn.gaussian_process.kernels import Matern

def run_optimization(use_suggestion=False, iterations=100, set_gp_params=True):
    """
    optimize the rosenbrock function; the correct answer is x=1, y=1.

    :param use_suggestion: False: ignore suggestoin, True: use sugestion
    :param iterations: iterations to run
    :param set_gp_params: if true, will set length scales, if false uses default
    """
    optimizer = BayesianOptimization(
        f=None,
        pbounds={'x': (-1, 1), 'y': (-1, 1)},
        verbose=1,
        random_state=1
    )
    if set_gp_params:
        optimizer.set_gp_params(alpha=0,
                                kernel=Matern(length_scale=[.2, .2]))

    utility = UtilityFunction(kind="ucb", kappa=2, xi=1e-4)
    suggestion = {'x': 1, 'y': 1}  # this is the correct answer
    objective_values = []
    x_values = []
    y_values = []
    for i in range(iterations):
        if use_suggestion:
            target = -1 * rosen(list(suggestion.values()))
            optimizer.register(params=suggestion, target=target)
            objective_values.append(target)
            x_values.append(suggestion['x'])
            y_values.append(suggestion['y'])
            use_suggestion = False

        next_point = optimizer.suggest(utility)
        target = -1*rosen(list(next_point.values()))
        # -1 because we seek max not min
        try:
            optimizer.register(params=next_point, target=target)
            objective_values.append(target)
            x_values.append(next_point['x'])
            y_values.append(next_point['y'])
        except KeyError:
            '''
            the optimiser has a tendency to try and probe points it already has probed
            when it gets very confident about where the maximum is. I'm just skipping these
            for this example
            '''
            pass
    print(optimizer.max)
    suggested_point_array = np.array(list(suggestion.values())).reshape(1, -1)
    print(f'predicted value at true maximum is {optimizer._gp.predict(suggested_point_array)}')
    return objective_values, x_values, y_values

if __name__ == '__main__':
    print(f'running no suggestion sim')
    no_sug_convergence, no_sug_x, no_sug_y = run_optimization(use_suggestion=False)
    print(f'running with suggestion sim')
    sug_convergence, sug_x, sug_y = run_optimization(use_suggestion=True)

    fig, axs = plt.subplots(nrows=1, ncols=2, figsize=[8,5])
    axs[0].plot(no_sug_convergence)
    axs[0].plot(sug_convergence)
    axs[0].legend(['without suggestion', 'with suggestion'])

    axs[1].scatter(no_sug_x, no_sug_y)
    axs[1].scatter(sug_x, sug_y)
    axs[1].grid()
    axs[1].legend(['without suggestion', 'with suggestion'])
    plt.show()

@bwheelz36
Copy link
Collaborator

PS: changing kappa to e.g. 0.1 makes the behavior a lot clearer. I'm not entirely confident that my theory about the gp parameters is correct....

@sguysc
Copy link
Author

sguysc commented Jul 26, 2022

I will check it out. I also wonder if the acquisition function type has anything to do with it. I use EI rather than UCB and then xi is the parameter.

@sguysc
Copy link
Author

sguysc commented Jul 27, 2022

Ok, I've gone inside the code and I have preliminary observations (I'm not sure that is the only time it happens, but this is what I see now):
After I sample N points as mentioned, and added the extra point manually, when asking to suggest the next point, the acquisition function is fitting the GP with the current data (util.py). The problem is that it is failing to fit a good function and the mean and std are both constants. Because of that, it basically suggests a point in random (the last value in x_seeds is the one that will get accepted).

Any suggestions on how to deal with a fit that suggests that it is a constant line (the data and targets are not constant)?

@bwheelz36
Copy link
Collaborator

that sounds strange - could you post some code to reproduce?

@sguysc
Copy link
Author

sguysc commented Jul 27, 2022

This is trying to emulate what I have:

import matplotlib.pyplot as plt
import numpy as np
from bayes_opt import BayesianOptimization, UtilityFunction

data_target = np.array([[ 4.11527487e+00, -5.25998354e-02],
			[ 9.87746489e-02, -4.13503230e-02],
			[ 3.57381547e+00, -5.25998354e-02],
			[ 1.60683672e+00, -5.25998354e-02],
			[ 5.26929603e+00, -6.18080378e-02],
			[ 9.08232311e-01, -5.25998354e-02],
			[ 2.39112060e-01, -3.52116406e-02],
			[ 4.29874823e+00, -4.23911691e-02],
			[ 9.88334539e-01, -5.25998354e-02],
			[ 3.10196661e+00, -5.25998354e-02],
			[ 7.87734943e-01, -5.25998354e-02],
			[ 8.27486483e-01, -5.16990900e-02],
			[ 3.80523113e+00, -5.25998354e-02],
			[ 7.03828442e-01, -2.22452462e-02],
			[ 5.83227361e+00, -3.09045196e-02],
			[ 4.65850859e+00, -2.24894464e-02],
			[ 5.35498380e+00, -4.55683947e-02],
			[ 5.15052521e+00, -2.59802818e-02],
			[ 3.43894560e+00, -5.25998354e-02],
			[ 3.40215825e+00, -5.25998354e-02],
			[ 0.00000000e+00, -4.35041189e-03],  # <- manually added point
			[ 6.28318531e+00, -4.35041189e-03]]) # <- manually added point

pbounds = {'x': (0, 6.283)}
optimizer = BayesianOptimization(f=None, pbounds=pbounds, verbose=0, random_state=None)
utility = UtilityFunction(kind="ei", kappa=0, xi=10) 

# this includes all the data i currently have, what was suggested from the optimizer, and the manually added points.
for i in range(data_target.shape[0]):
	next_point = {'x': data_target[i,0]}
	optimizer.register(params=next_point, target=data_target[i,1])

utility = UtilityFunction(kind="ei", kappa=0, xi=0.001) 
optimizer._gp.fit(optimizer._space.params, optimizer._space.target) #or: data_target[:,0:1], data_target[:,1:])

# to "understand" what's happening beneath in the suggestion function
x = np.linspace(0, 6.28, 1000).reshape(-1,1)
mu, sigma = optimizer._gp.predict(x, return_std=True)

plt.scatter(data_target[:,0], data_target[:,1], marker='*')
plt.scatter(x, mu, marker='.')
plt.show()
# or, just see what is the suggestion
newpnt = optimizer.suggest(utility)
print(newpnt['x'])
# the maximum should occur near either of the bounds.

Every time I run this code snippet, I get different results. When it cannot predict correctly, the suggestion is random. When it does predict correctly, I get sound suggestions most of the time.

@bwheelz36
Copy link
Collaborator

if you put optimizer._gp.fit(optimizer.space.params, optimizer.space.target) inside the for loop, it should work. I don't entirely know why tbh, I would have guessed that what you are doing is fine. The different results are likely because you are not setting random_state.

Figure_1

@sguysc
Copy link
Author

sguysc commented Jul 27, 2022

For me it doesn't. If I run this several times:

import matplotlib.pyplot as plt
import numpy as np
from bayes_opt import BayesianOptimization, UtilityFunction
from sklearn.preprocessing import StandardScaler

use_scalar = False

def closest(x, y, K):
    return y[min(range(len(x)), key = lambda i: abs(x[i]-K))]

scalerx = StandardScaler()
scalery = StandardScaler()

data_target = np.array([[ 4.11527487e+00, -5.25998354e-02],
					   [ 9.87746489e-02, -4.13503230e-02],
					   [ 3.57381547e+00, -5.25998354e-02],
					   [ 1.60683672e+00, -5.25998354e-02],
					   [ 5.26929603e+00, -6.18080378e-02],
					   [ 9.08232311e-01, -5.25998354e-02],
					   [ 2.39112060e-01, -3.52116406e-02],
					   [ 4.29874823e+00, -4.23911691e-02],
					   [ 9.88334539e-01, -5.25998354e-02],
					   [ 3.10196661e+00, -5.25998354e-02],
					   [ 7.87734943e-01, -5.25998354e-02],
					   [ 8.27486483e-01, -5.16990900e-02],
					   [ 3.80523113e+00, -5.25998354e-02],
					   [ 7.03828442e-01, -2.22452462e-02],
					   [ 5.83227361e+00, -3.09045196e-02],
					   [ 4.65850859e+00, -2.24894464e-02],
					   [ 5.35498380e+00, -4.55683947e-02],
					   [ 5.15052521e+00, -2.59802818e-02],
					   [ 3.43894560e+00, -5.25998354e-02],
					   [ 3.40215825e+00, -5.25998354e-02],
					   [ 0.00000000e+00, -4.35041189e-03],  # <- manually added point
					   [ 6.28318531e+00, -4.35041189e-03]]) # <- manually added point


pbounds = {'x': (0, 6.283)}
optimizer = BayesianOptimization(f=None, pbounds=pbounds, verbose=0, random_state=None)
utility = UtilityFunction(kind="ei", kappa=0, xi=10) 

# this includes all the data i currently have, what was suggested from the optimizre, and the manually added point.
for i in range(data_target.shape[0]):
    next_point = {'x': data_target[i,0]}
    optimizer.register(params=next_point, target=data_target[i,1])
    if(use_scalar):
        # optimizer._gp.set_params(kernel__length_scale=15)
        scalerx.fit(optimizer._space.params)
        scalery.fit(optimizer._space.target)
        X = scalerx.transform(optimizer._space.params)
        Y = scalery.transform(optimizer._space.target)
    else:
        X = optimizer._space.params
        Y = optimizer._space.target
        
    optimizer._gp.fit(X, Y) 
    
utility = UtilityFunction(kind="ei", kappa=0, xi=0.001) 
# optimizer._gp.fit(optimizer._space.params, optimizer._space.target) #or: data_target[:,0:1], data_target[:,1:])
optimizer._gp.fit(X, Y) 

# or, to "understand" what's happening beneath
x_pnts = np.linspace(0, 6.28, 1000).reshape(-1,1)
if(use_scalar ):
    x = scalerx.transform(x_pnts)
else:
    x = x_pnts
    
mu, sigma = optimizer._gp.predict(x, return_std=True)
if(use_scalar):
    mu = scalery.inverse_transform(mu.reshape(-1,1))

plt.scatter(data_target[:,0], data_target[:,1], marker='*')
plt.scatter(x_pnts, mu, marker='.')


newpnt_scaled = optimizer.suggest(utility)
# 
if(use_scalar ):
    newpnt = scalerx.inverse_transform( np.array(newpnt_scaled['x']).reshape(-1,1) )
else:
    newpnt = np.array(newpnt_scaled['x']).reshape(-1,1)
    
print( 'suggested point: ', newpnt )
plt.scatter(newpnt, closest(x_pnts, mu, newpnt), marker='s', color='r')

plt.show()

flat_prediction
When I run this several times it happens. On one single run, the results could be what you showed.

(I also tried playing around with scalers, to see if that'll help)

@sguysc
Copy link
Author

sguysc commented Jul 27, 2022

I guess the problem is with the rand_state parameter in the GaussianProcessRegressor. If you're unlucky, it may do a "bad" fit and then your predictions will be bad too. This is a problem whenever you call suggest, not just when you add a point manually.
But what is the remedy?

@bwheelz36
Copy link
Collaborator

OK:

  1. I'm glad it wasn't actually whether the fit was in the for loop or not because I was confused by that
  2. If you set random_state=1 (or some number) you should get consistent results - be they consistently good or consistently bad
  3. The problem is the gaussian process is struggling to fit to your data. I fixed this by adding the following line after the optimizer is created:
optimizer.set_gp_params(kernel=Matern(length_scale=[0.1]))

With this I get consistently good fits for random_state =0, 1, and 2. My understanding is that the length scale should be optimized for best fit anyway, so I'm slightly surprised that this made the difference - you could investigate why that is if you have any time!

Now: your GP is working, we can get back to the initial question of the clustering around suggestions!

@sguysc
Copy link
Author

sguysc commented Jul 29, 2022

Yes, another thing that helped me is:

optimizer.set_gp_params(n_restarts_optimizer=25)

You're right about the length scale, it is weird that the length_scale matters, because after the fit if I look at optimizer._gp.kernel_ it has a different length_scale.

I think I'm withdrawing my question now that I get that the problem is the fact that I don't get a good fit for a GP or if I do, it is overfitting the observations.

The reason is that anyway it's going to have high uncertainty between the observations and essentially the problem is ill-posed so the suggestions are random or almost random. So it won't go necessarily to the maximum value of the point I added, but according to the acquisition function.

And also, It is not just when adding points manually. It's going to happen every time I can't get a good fit. I'll have to think about what I can do about it, but I don't think there's any problem with your package.

p.s. - thank you for your help and time!!

@bwheelz36
Copy link
Collaborator

if you are having issues with over fitting, you can also set the alpha parameter in the call to set_gp_params. higher values will regularity more heavily and prevent over-fitting.

It's good that you checked that the gaussian process model is actually fitting your data. I think we should maybe generate a plot or something like that to make it easier for people to realise when the fit has failed...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants