New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace non-unique point error with warning? #353
Comments
Two (naive) questions that I have:
How do the GPR's handle receiving different target values for the same parameters? Does it just take the mean? Are the covariances adjusted at that point (or for all the points)?
I assume -- based on the name -- that this is an integer counter storing how often points that were already probed have been re-probed? |
Hey @till-m
correct!
This is actually something I am currently trying to figure out. I've been playing around with 10 runs of the same simulation, with varying levels of noise and seeing how best to model it. The distribution of the different data sets is below (n here means n particles, but that's not really important for the purpose of this discussion). If we run the bayesian_optimization code with the default parameters, repeatedly pass it the same point with different (noisy) values, and simply replace the However, we can construct a custom kernel like this: from bayes_opt import BayesianOptimization
from sklearn.gaussian_process.kernels import Matern, WhiteKernel
optimizer = BayesianOptimization(f=None,pbounds=pbounds, verbose=2, random_state=1)
k1 = Matern(length_scale=[3, 0.2, 0.2]) # Matern is the defaul kernel
k2 = WhiteKernel()
kernel = k1 + k2
optimizer.set_gp_params(kernel=kernel) then we can model noise very effectively: Note that when noise_level_bounds is not set, the hyper parameters will be fit to the data at each iteration. k2 = WhiteKernel(noise_level=1, noise_level_bounds='fixed') in which case the noise level will remain fixed. I think this is what I would want to do in my case, where I have an independent estimate of the noise that I am reasonably confident of. However, I am currently unclear as to the exact relation between the noise_level parameter and the modeled noise; it seems like even with the same value I can get different noise estimates depending on the input data. I am currently trying to put together a noisy optimization example for my own code which uses this code as one of its backends. If I get that working I could also work up a similar example for this repo. |
Note - we could also increase the alpha parameter to model noise without using a white kernel: k1 = Matern(length_scale=[3, 0.2, 0.2])
kernel = k1
optimizer.set_gp_params(kernel=kernel, alpha=1.1) However then I'm even more unclear on the relationship between the value of alpha and the known noise... |
Hi @bwheelz36, thanks for this, I didn't expect such a detailed explanation. I agree that it would be good if this were part of the documentation so if you find the time, a notebook would be a great addition. I assume that you need to write your own optimization loop to get around the caching that happens internally? In any case, I agree with your proposed handling of the initial problem of repeatedly sampling the same point. |
I hadn't actually thought of that, I always run in the 'advanced' mode of suggest/ probe/ register. |
I will update with a notebook and make the change suggested above once I'm more confident in what I'm doing! |
The test failure in #368 seems to occur by design: There is a test to ensure that no duplicate points are registered, i.e. loading from logs is idempotent. I'm guessing this was in fact the primary reason for disallowing duplicate points. Maybe adding a keyword |
Hey @till-m, I don't think I understand the problem (also I had to google what idempotent meant 😆). |
Sorry, I guess I explained the problem badly. The |
ok @till-m, this should be fixed in #372.
I haven't done anything to the log read in, because this should be handled already by the change in TargetSpace. |
@till-m - I have been having another look at this. try:
return self._cache[_hashable(x)]
except KeyError:
# this is the normal behavior that occurs when
# the point has not been seen before
... This means if the point x has ever been seen before, the code simply returns the previous value. It doesn't attempt to register the point again. This seems like odd behaviour to me. I think we should remove this, so that the behaviour is either:
|
As has been noted in #158, the code will crash if it is asked to probe a point it has already probed.
This condition occurs not infrequently when using the utility functions. In addition, when working with very noisy data, it is completely valid to probe the same point more than once.
Describe the solution you'd like
I suggest simply replacing the error with a warning, and maybe a parameter called '_n_repeated_points_probed' - that would allow users to handle the issue on their end. Thoughts?
Are you able and willing to implement this feature yourself and open a pull request?
The text was updated successfully, but these errors were encountered: