New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper parallel optimisation #347
Comments
Hi Just to make sure I understand: Essentially the goal is to collect It appears to me that the issue arises when you register the results: Also, you seem to be using the kappa decay functionality to generate an Apart from this - have you searched these issues for 'parallel'? I don't have much (well any) experience running this library in parallel but it appears that there are some other examples out there... |
That is not exactly what is happening. Rather I have a pool of workers and when one of them finishes, I register its result and generate a new suggestion for it to evaluate. The number of suggestions is the same, but the registration is delayed by The loop depends on the number of registered results, so it always checks the same number of suggestions. As explained in other issues, generating multiple suggestions at the same time doesn't work, since the algorithm is deterministic. That is why a new observation is always registered before asking for a new suggestion. |
This is correct if kappa doesn't change, but you can certainly obtain n suggestions with n different kappas.
Apologies, I misinterpreted your code. I created a simplified version of your example, and I get consistent results when random_state is set: from bayes_opt.bayesian_optimization import BayesianOptimization
from bayes_opt.util import UtilityFunction
from scipy.optimize import rosen
from sklearn.gaussian_process.kernels import Matern
def multi_worker_optimize(num_iter: int, bounds: dict[str, tuple[float, float]], num_workers=0):
# set random state should ensure consistent results
optimizer = BayesianOptimization(f=None, pbounds=bounds, verbose=0, random_state=1)
utility = UtilityFunction(kind="ucb", kappa=5, xi=0.0)
# this line should improve the GP fitting:
optimizer.set_gp_params(kernel=Matern(length_scale=[0.5, 0.5]), n_restarts_optimizer=20)
result_queue = []
while len(optimizer.res) < num_iter:
sample = optimizer.suggest(utility)
loss = -1*rosen(list(sample.values()))
result_queue.append((sample, loss))
if len(result_queue) >= num_workers:
try:
optimizer.register(*result_queue.pop(0))
utility.update_params()
except KeyError as e:
pass
return optimizer
bounds = {'x': (-5, 5), 'y': (-5, 5)}
n_iter = 100
# test with multi worker approach
for n_workers in [1, 2, 3, 4]:
multi_opt = multi_worker_optimize(n_iter, bounds, 1)
print(f'Pooled worker approach with {n_workers} workers: best value of {multi_opt.max["target"]}') |
I tried that, it doesn't make a difference. Moreover, I am not talking about consistent results, I'm talking about optimisation successfully finding the values close to global optimum. That should be the case no matter what the initial seed is. I'm also running it multiple times to remove the possibility of lucky guesses.
If you run the code you provided, then it is expected that you receive the same result since you fixed the seed and
|
oops - good point - I was playing around with that and forgot to change it back. I've had another look at this, again using the rosen function with the n_workers parameter fixed.
The below plot shows the true rosen function, the rosen function predicted by the gaussian process model after being trained for 200 iterations, the GP standard deviation, and the absolute difference between true and predicted. I'll post the code for this below, but note that in the valley region both uncertainty (SD) and absolute difference are low. You can do similar things for your multi variable function but obviously visualization is not as straightforward. The two points shown are the true maximum and the point found by the optimizer.
🤷♂️ this is with kappa=10, so the algorithm will tend to prioritize exploring over finding the global minimum. If you put the kappa decay back in I get 0.0096. But in general this algorithm is good at exploring and not getting stuck, but not necessarily as good at honing in on local minima. from scipy.optimize import rosen
from matplotlib import pyplot as plt
import numpy as np
def plot_rosen(optimizer, bounds):
"""
compare the optimizer gaussian process predictor with the true function
:param optimizer: an instance of bayes_opt.bayesian_optimization.BayesianOptimization post training
:param bounds: the bounds dict passed to BayesianOptimization
"""
extent = (bounds['x'][0], bounds['x'][1], bounds['y'][0], bounds['y'][1])
x = np.linspace(bounds['x'][0], bounds['x'][1], 100)
y = np.linspace(bounds['y'][0], bounds['y'][1], 100)
[X, Y] = np.meshgrid(x, y)
x_sample = np.array([X.flatten(), Y.flatten()])
real_rosen = -1*rosen(x_sample)
real_rosen = np.reshape(real_rosen, X.shape)
# now get the _gp estimate
predicted_rosen, predicted_rosen_std = optimizer._gp.predict(x_sample.T, return_std=True)
predicted_rosen = np.reshape(predicted_rosen, X.shape)
predicted_rosen_std = np.reshape(predicted_rosen_std, X.shape)
fig, axs = plt.subplots(nrows=2, ncols=2, figsize=[12, 5])
im1 = axs[0, 0].imshow(real_rosen, vmin=-100, vmax=0, extent=extent, origin='lower')
axs[0, 0].scatter(0,0)
axs[0, 0].scatter(optimizer.max['params']['x'], optimizer.max['params']['y'])
axs[0, 0].set_title('real rosen')
axs[0, 0].set_xlabel('x')
axs[0, 0].set_ylabel('y')
plt.colorbar(im1, ax=axs[0, 0])
im2 = axs[0, 1].imshow(predicted_rosen, vmin=-100, vmax=0, extent=extent, origin='lower')
axs[0, 1].scatter(0,0)
axs[0, 1].scatter(optimizer.max['params']['x'], optimizer.max['params']['y'])
axs[0, 1].set_title('GP predicted rosen')
axs[0, 1].set_xlabel('x')
axs[0, 1].set_ylabel('y')
plt.colorbar(im2, ax=axs[0, 1])
im2 = axs[1, 0].imshow(predicted_rosen_std, vmin=0, vmax=500, extent=extent, origin='lower')
axs[1, 0].scatter(0,0)
axs[1, 0].scatter(optimizer.max['params']['x'], optimizer.max['params']['y'])
axs[1, 0].set_title('GP standard deviation')
axs[1, 0].set_xlabel('x')
axs[1, 0].set_ylabel('y')
plt.colorbar(im2, ax=axs[1, 0])
im3 = axs[1, 1].imshow(abs(real_rosen - predicted_rosen), vmin=0, vmax=100, extent=extent, origin='lower')
axs[1, 1].set_title('abs difference')
axs[1, 1].set_xlabel('x')
axs[1, 1].set_ylabel('y')
axs[1, 1].scatter(0,0)
axs[1, 1].scatter(optimizer.max['params']['x'], optimizer.max['params']['y'])
plt.colorbar(im3, ax=axs[1, 1])
plt.tight_layout()
plt.show() |
OK, thank you. I've done some tests and here are the results for my example:
These are averaged over 10 runs with different seeds to remove the effect of lucky guesses. In this toy example, I would say anything above 4.3 is satisfactory. 8 workers require 700 iterations instead of 500, which equates to 470% speed-up. |
One other thought - I'm not sure if this will help or not, but possibly a 'flatter' objective function (perhaps by taking the log or even just capping the minimum value) may perform better - the very large gradients involved with these high range objective functions might 'confuse' the algorithm. I'm not sure if that's the case or not but it's something you could check. |
I might have an idea of why performance degrades. I think if there are too many points evaluated concurrently, the algorithm might propose a very similar point to one of those which are in queue. This will essentially reduce effective number of iterations. |
I've finally run the experiments and the results are as I expected.
To fix this, I suggest that when a sample is suggested and put into the queue the GP STD at that point is set to 0, so that it doesn't suggest new points around that value. @bwheelz36 Any idea how this is best accomplished? |
Here is the code I used for the experiment, just in case: from __future__ import annotations
from pprint import pprint
from typing import Callable
import matplotlib.pyplot as plt
import numpy as np
from bayes_opt.bayesian_optimization import BayesianOptimization
from bayes_opt.util import UtilityFunction
from pandas.core.generic import pickle
from scipy.optimize import rosen
from tqdm import tqdm, trange
def _closest_distance(point, points):
return min(np.linalg.norm(point - p) for p in points if p is not point)
def optimize(
func: Callable[..., float], num_iter: int, bounds: dict[str, tuple[float, float]], num_workers=0
):
init_samples = int(np.sqrt(num_iter))
optimizer = BayesianOptimization(f=None, pbounds=bounds, verbose=0)
init_kappa = 10
kappa_decay = (0.1 / init_kappa) ** (1 / (num_iter - init_samples))
utility = UtilityFunction(
kind="ucb", kappa=init_kappa, xi=0.0, kappa_decay=kappa_decay, kappa_decay_delay=init_samples
)
init_queue = [optimizer.suggest(utility) for _ in range(init_samples)]
result_queue = []
tbar = tqdm(total=num_iter, leave=False)
while len(optimizer.res) < num_iter:
sample = init_queue.pop(0) if init_queue else optimizer.suggest(utility)
loss = func(list(sample.values())) * -1
result_queue.append((sample, loss))
if len(result_queue) >= num_workers:
try:
optimizer.register(*result_queue.pop(0))
utility.update_params()
tbar.update()
except KeyError:
pass
return optimizer.res
bounds = {"x": [-5, 5], "y": [-5, 5]}
all_results = {}
for num_workers in tqdm([1, 2, 4, 8, 16], desc="Checking num_workers"):
results = []
results = optimize(rosen, 200, bounds, num_workers)
samples = [res["params"] for res in results]
all_results[num_workers] = samples
with open("results.pkl", "wb") as f:
pickle.dump(all_results, f)
with open("results.pkl", "rb") as f:
all_results = pickle.load(f)
fig, axs = plt.subplots(2, 2)
fig.set_figheight(8)
fig.set_figwidth(8)
axs = [item for sublist in axs for item in sublist]
for idx, (num_workers, samples) in enumerate(all_results.items()):
if num_workers > 8:
continue
samples = [np.array(list(sample.values())) for sample in samples]
axs[idx].scatter(*zip(*samples), s=1)
axs[idx].set_title(f"{num_workers=}")
avg_min_distance = np.mean([_closest_distance(sample, samples) for sample in samples])
print(f"{num_workers=}, mean_min_distance={avg_min_distance:.3f}")
fig.tight_layout()
fig.savefig("results.png") |
Hey, thanks for following this up, I'll reopen this issue.
kappa_list = np.linspace(0, 10, n_workers) and then basically have each worker using a different kappa value or something like that? |
I think the problem is that when using
Ok, so is it possible to "de-register" an observation? I can quite easily register a dummy observation when a sample is suggested and then de-register, re-register once the actual result is available.
Might be a good alternative approach, but I would like to try improving the quality of suggestions first. |
I looked at the code and it seems that it should be rather trivial to implement de-register/re-register functionality since GP is fitted anew for each suggestion. Will try to make an MR. |
I think I have a working solution: #365 from __future__ import annotations
from pprint import pprint
from typing import Callable
import matplotlib.pyplot as plt
import numpy as np
from bayes_opt.bayesian_optimization import BayesianOptimization
from bayes_opt.util import UtilityFunction
from pandas.core.generic import pickle
from scipy.optimize import rosen
from tqdm import tqdm, trange
def _closest_distance(point, points):
return min(np.linalg.norm(point - p) for p in points if p is not point)
def optimize(
func: Callable[..., float], num_iter: int, bounds: dict[str, tuple[float, float]], num_workers=0, with_dummies=False
):
init_samples = int(np.sqrt(num_iter))
optimizer = BayesianOptimization(f=None, pbounds=bounds, verbose=0)
init_kappa = 10
kappa_decay = (0.1 / init_kappa) ** (1 / (num_iter - init_samples))
utility = UtilityFunction(
kind="ucb", kappa=init_kappa, xi=0.0, kappa_decay=kappa_decay, kappa_decay_delay=init_samples
)
init_queue = [optimizer.suggest(utility) for _ in range(init_samples)]
result_queue = []
tbar = tqdm(total=num_iter, leave=False)
while len(optimizer.res) < num_iter:
sample = init_queue.pop(0) if init_queue else optimizer.suggest(utility)
if with_dummies and num_workers > 1:
optimizer.register_dummy(sample, -1)
loss = func(list(sample.values())) * -1
result_queue.append((sample, loss))
if len(result_queue) >= num_workers:
try:
optimizer.register(*result_queue.pop(0))
utility.update_params()
tbar.update()
except KeyError:
pass
return optimizer.res
bounds = {"x": [-5, 5], "y": [-5, 5]}
all_results = {}
for num_workers in tqdm([1, 2, 4, 8], desc="Checking num_workers"):
results = []
results = optimize(rosen, 150, bounds, num_workers, with_dummies=True)
all_results[num_workers] = results
with open("results.pkl", "wb") as f:
pickle.dump(all_results, f)
with open("results.pkl", "rb") as f:
all_results = pickle.load(f)
fig, axs = plt.subplots(2, 2)
fig.set_figheight(8)
fig.set_figwidth(8)
axs = [item for sublist in axs for item in sublist]
for idx, (num_workers, results) in enumerate(all_results.items()):
if num_workers > 8:
continue
samples = [np.array(list(res["params"].values())) for res in results]
axs[idx].scatter(*zip(*samples), s=1)
axs[idx].set_title(f"{num_workers=}")
avg_min_distance = np.mean([_closest_distance(sample, samples) for sample in samples])
best_result = max([res["target"] for res in results])
print(f"{num_workers=}, mean_min_distance={avg_min_distance:.3f}, {best_result=:.3f}")
fig.tight_layout()
fig.savefig("results.png") Results without change (
Results with change (
As can be seen there is quite a big improvement.
|
looks really promising! Sorry I think it is going to take me a few days to look at this properly though :-( |
Wouldn't it be better to set the target of dummy points to what regressor would expect? |
@cestpasphoto shameless plug, but in PR #447 you can find this under the name That being said, during my limited testing I found both methods to work well, though I read in a paper that the lying seems to perform better, if you manage to choose a suitable value. |
I have a single threaded function which I want to optimise.
I'm trying to write a wrapper that would handle multiple runs at the same time, but I'm noticing considerable degradation in results as I increase the number of parallel evaluations.
Here is the rough logic of what I'm doing:
The
result_queue
variable is simulating evaluation across multiple processes.Here are the results:
As can be seen, the more processes I use, the worse the final result is. I don't understand why that would happen, even if a couple fewer suggestions are evaluated properly, the results should not differ so much.
What am I missing?
The text was updated successfully, but these errors were encountered: