replace non-unique point error with warning? #353

bwheelz36 · 2022-09-12T01:16:43Z

As has been noted in #158, the code will crash if it is asked to probe a point it has already probed.

This condition occurs not infrequently when using the utility functions. In addition, when working with very noisy data, it is completely valid to probe the same point more than once.

Describe the solution you'd like
I suggest simply replacing the error with a warning, and maybe a parameter called '_n_repeated_points_probed' - that would allow users to handle the issue on their end. Thoughts?

Are you able and willing to implement this feature yourself and open a pull request?

Yes, I can provide this feature.

till-m · 2022-09-12T06:35:20Z

Two (naive) questions that I have:

In addition, when working with very noisy data, it is completely valid to probe the same point more than once.

How do the GPR's handle receiving different target values for the same parameters? Does it just take the mean? Are the covariances adjusted at that point (or for all the points)?

a parameter called '_n_repeated_points_probed'

I assume -- based on the name -- that this is an integer counter storing how often points that were already probed have been re-probed?

bwheelz36 · 2022-09-12T23:32:32Z

Hey @till-m

"I assume -- based on the name -- that this is an integer counter storing how often points that were already probed have been re-probed?"

correct!

"How do the GPR's handle receiving different target values for the same parameters? Does it just take the mean? Are the covariances adjusted at that point (or for all the points)?"

This is actually something I am currently trying to figure out. I've been playing around with 10 runs of the same simulation, with varying levels of noise and seeing how best to model it. The distribution of the different data sets is below (n here means n particles, but that's not really important for the purpose of this discussion).

If we run the bayesian_optimization code with the default parameters, repeatedly pass it the same point with different (noisy) values, and simply replace the raise repeated_points error with a pass statement, it runs without issue but doesn't do a good job handling the noise:

However, we can construct a custom kernel like this:

from bayes_opt import BayesianOptimization
from sklearn.gaussian_process.kernels import Matern, WhiteKernel

optimizer = BayesianOptimization(f=None,pbounds=pbounds,  verbose=2, random_state=1)

k1 = Matern(length_scale=[3, 0.2, 0.2])  # Matern is the defaul kernel
k2 = WhiteKernel()
kernel = k1 + k2
optimizer.set_gp_params(kernel=kernel)

then we can model noise very effectively:

Note that when noise_level_bounds is not set, the hyper parameters will be fit to the data at each iteration.
We can also set them to be fixed:

k2 = WhiteKernel(noise_level=1, noise_level_bounds='fixed')

in which case the noise level will remain fixed. I think this is what I would want to do in my case, where I have an independent estimate of the noise that I am reasonably confident of. However, I am currently unclear as to the exact relation between the noise_level parameter and the modeled noise; it seems like even with the same value I can get different noise estimates depending on the input data.

I am currently trying to put together a noisy optimization example for my own code which uses this code as one of its backends. If I get that working I could also work up a similar example for this repo.

bwheelz36 · 2022-09-12T23:52:00Z

Note - we could also increase the alpha parameter to model noise without using a white kernel:

k1 = Matern(length_scale=[3, 0.2, 0.2])
kernel = k1
optimizer.set_gp_params(kernel=kernel, alpha=1.1)

However then I'm even more unclear on the relationship between the value of alpha and the known noise...

till-m · 2022-09-13T09:02:42Z

Hi @bwheelz36,

thanks for this, I didn't expect such a detailed explanation. I agree that it would be good if this were part of the documentation so if you find the time, a notebook would be a great addition. I assume that you need to write your own optimization loop to get around the caching that happens internally?

In any case, I agree with your proposed handling of the initial problem of repeatedly sampling the same point.

bwheelz36 · 2022-09-14T01:12:15Z

"I assume that you need to write your own optimization loop to get around the caching that happens internally?"

I hadn't actually thought of that, I always run in the 'advanced' mode of suggest/ probe/ register.

bwheelz36 · 2022-09-14T01:12:51Z

I will update with a notebook and make the change suggested above once I'm more confident in what I'm doing!

till-m · 2022-10-13T13:04:05Z

The test failure in #368 seems to occur by design: There is a test to ensure that no duplicate points are registered, i.e. loading from logs is idempotent. I'm guessing this was in fact the primary reason for disallowing duplicate points. Maybe adding a keyword allow_duplicate=False to register is the right way to go here?
Additionally, maybe we should replace KeyError with a more specific (custom?) error type, so that we can intentionally catch that at appropriate places without worrying about catching unrelated KeyErrors.

bwheelz36 · 2022-10-18T22:43:03Z

Hey @till-m, I don't think I understand the problem (also I had to google what idempotent meant 😆).
The custom error does sound more appropriate albeit relatively low priority?

till-m · 2022-10-19T06:48:40Z

Sorry, I guess I explained the problem badly. The KeyError that was raised when registering a duplicate point was caught in the load_logs function to skip over points that had already been registered. That means that loading the same log twice didn't register any new points (and there is specifically a test to test for this). I'm not entirely sure whether we want to keep this behaviour or not, but if we do, then (dis)allowing duplicates should be an optional thing in the register function. Hope that makes sense.

bwheelz36 · 2022-10-27T02:53:10Z

ok @till-m, this should be fixed in #372.

added key word allow_duplicate_points. The default is False so the existing behavior of the code shouldnt change
added a parameter to keep track of duplicate points
added tests that duplicate points are not allowed by default, and are allowed when allow_duplicate_points=True

I haven't done anything to the log read in, because this should be handled already by the change in TargetSpace.

bwheelz36 · 2022-11-25T03:22:00Z

@till-m - I have been having another look at this.
I noticed that this error never occurred when using the maximize approach, only when using the suggest, probe, register approach.
After some digging, I found that within target_space.probe there is this line:

try:
    return self._cache[_hashable(x)]
except KeyError:
    # this is the normal behavior that occurs when 
    # the point has not been seen before
    ...

This means if the point x has ever been seen before, the code simply returns the previous value. It doesn't attempt to register the point again.

This seems like odd behaviour to me. I think we should remove this, so that the behaviour is either:

if allow_duplicate_points=False an error is thrown (the error that is already in the code)
else the duplicate points are registered in the normal way

bwheelz36 added the enhancement label Sep 12, 2022

till-m mentioned this issue Oct 13, 2022

Improve constrained optimization #368

Merged

bwheelz36 mentioned this issue Nov 7, 2022

Parameter Types: Handling and Implementation #376

Open

1 task

bwheelz36 closed this as completed in 6de9ba8 Nov 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace non-unique point error with warning? #353

replace non-unique point error with warning? #353

bwheelz36 commented Sep 12, 2022

till-m commented Sep 12, 2022

bwheelz36 commented Sep 12, 2022

bwheelz36 commented Sep 12, 2022

till-m commented Sep 13, 2022

bwheelz36 commented Sep 14, 2022

bwheelz36 commented Sep 14, 2022

till-m commented Oct 13, 2022

bwheelz36 commented Oct 18, 2022

till-m commented Oct 19, 2022

bwheelz36 commented Oct 27, 2022

bwheelz36 commented Nov 25, 2022

replace non-unique point error with warning? #353

replace non-unique point error with warning? #353

Comments

bwheelz36 commented Sep 12, 2022

till-m commented Sep 12, 2022

bwheelz36 commented Sep 12, 2022

bwheelz36 commented Sep 12, 2022

till-m commented Sep 13, 2022

bwheelz36 commented Sep 14, 2022

bwheelz36 commented Sep 14, 2022

till-m commented Oct 13, 2022

bwheelz36 commented Oct 18, 2022

till-m commented Oct 19, 2022

bwheelz36 commented Oct 27, 2022

bwheelz36 commented Nov 25, 2022