-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BOLFI + NUTS #135
BOLFI + NUTS #135
Conversation
vuolleko
commented
Apr 20, 2017
•
edited
Loading
edited
- Implementation of MCMC sampling using the No-U-Turn Sampler + convergence diagnostics
- Implementation of BOLFI on top of BayesianOptimization
- Result object for BOLFI, including some integrated plotting
elfi/methods/posteriors.py
Outdated
self.threshold = minval | ||
logger.info("Using minimum value of discrepancy estimate mean (%.4f) as threshold" % (self.threshold)) | ||
self.priors = [None] * model.input_dim | ||
self.priors = priors or [None] * model.input_dim | ||
self.ML, self.ML_val = stochastic_optimization(self._neg_unnormalized_loglikelihood_density, self.model.bounds, max_opt_iters) | ||
self.MAP, self.MAP_val = stochastic_optimization(self._neg_unnormalized_logposterior_density, self.model.bounds, max_opt_iters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above two will require some execution time and I'm not sure these are always needed. I think it would be cleaner to have this class only define the posterior distribution. Users can then optimize the distribution if they like.
elfi/methods/methods.py
Outdated
@@ -635,40 +635,59 @@ class BOLFI(InferenceMethod): | |||
|
|||
""" | |||
|
|||
def __init__(self, model, batch_size=1, discrepancy=None, bounds=None, **kwargs): | |||
""" | |||
def get_posterior(self, threshold=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could be renamed to something that better reflects the fact that we do quite a bit of work here to infer the posterior. E.g. infer_posterior
?
n_accepted += 1 | ||
|
||
logger.info("{}: Total acceptance ratio: {:.3f}".format(__name__, float(n_accepted) / n_samples)) | ||
return samples[1:, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work :)
elfi/methods/posteriors.py
Outdated
@@ -10,7 +10,7 @@ | |||
logger = logging.getLogger(__name__) | |||
|
|||
class Posterior(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we necessarily yet need a class for Posterior distribution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, this may just be confusing here.
elfi/methods/posteriors.py
Outdated
""" | ||
grad = self._grad_unnormalized_loglikelihood_density(x) + self._grad_logprior_density(x) | ||
return grad[0] | ||
|
||
def __getitem__(self, idx): | ||
return tuple([[v]*len(idx) for v in self.MAP]) | ||
|
||
def _unnormalized_loglikelihood_density(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _unnormalized_loglikelihood
would be more proper since likelihood is a function.
elfi/methods/posteriors.py
Outdated
x : np.array | ||
norm : boolean | ||
Whether to normalize (unsupported). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just remove these normalization flags? I don't know if there is any general (enough) way to normalize these.
elfi/methods/posteriors.py
Outdated
pdf = sp.stats.norm.pdf(term) | ||
cdf = sp.stats.norm.cdf(term) | ||
|
||
return factor * pdf / cdf | ||
|
||
def _unnormalized_likelihood_density(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about density.
elfi/methods/posteriors.py
Outdated
return sp.stats.norm.logcdf(self.threshold, mean, std) | ||
return sp.stats.norm.logcdf(self.threshold, mean, np.sqrt(var)) | ||
|
||
def _grad_unnormalized_loglikelihood_density(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about density.
elfi/methods/methods.py
Outdated
@@ -618,14 +618,14 @@ def _report_batch(self, batch_index, params, distances): | |||
logger.debug(str) | |||
|
|||
|
|||
class BOLFI(InferenceMethod): | |||
class BOLFI(BayesianOptimization): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test for BOLFI in test_inference.py
. That should be updated now as this is implemented (it's currently skipped).
# some heuristics to choose kernel parameters based on the initial data | ||
length_scale = (np.max(x) - np.min(x)) / 3. | ||
kernel_var = (np.max(y) / 3.)**2. | ||
bias_var = kernel_var / 4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there needs to be some amount of data available until these make sense. E.g. if we only have one observation, these may not work very well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but what would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just using ones, of course one cannot say if they would be better but e.g. length_scale
here would become 0 if there was just 1 observation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so now there's a check for really small length_scale, and a general check for reasonably large initialization set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good!
elfi/methods/methods.py
Outdated
"""Bayesian Optimization for Likelihood-Free Inference (BOLFI). | ||
|
||
Approximates the discrepancy function by a stochastic regression model. | ||
Discrepancy model is fit by sampling the discrepancy function at points decided by | ||
the acquisition function. | ||
|
||
The implementation follows that of Gutmann & Corander, 2016. | ||
The implementation (mostly) follows that of Gutmann & Corander, 2016. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe list differences?
elfi/methods/methods.py
Outdated
def __init__(self, model, batch_size=1, discrepancy=None, bounds=None, **kwargs): | ||
""" | ||
def get_posterior(self, threshold=None): | ||
"""Returns the posterior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...Returns an approximation for the posterior based on a GP regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Some minor comments.
6e827de
to
4201cb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lot of great work done!
elfi/bo/acquisition.py
Outdated
"""Returns the next batch of acquisition points. | ||
|
||
Parameters | ||
---------- | ||
n_values : int | ||
Number of values to return. | ||
pending_locations : None or numpy 2d array | ||
If given, asycnhronous acquisition functions may | ||
If given, asynchronous acquisition functions may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed this now but actually this applies to sync mode as well when max_parallel_batches > 1
(and furthermore we don't have async mode atm:)
elfi/bo/acquisition.py
Outdated
@@ -42,15 +42,15 @@ def evaluate(self, x, t=None): | |||
""" | |||
return NotImplementedError | |||
|
|||
def acquire(self, n_values, pending_locations=None, t=None): | |||
def acquire(self, n_values, pending_locations=None, t=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why t=0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be None as well, but then there should be a check. Otherwise it crashes if t is not given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, check is fine. I guess it was None at first because not all methods use it. The check can be implemented in the evaluate function though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what's wrong with 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing too much, just to be safe. Since it is supposed to be a iteration id number coming from an algorithm, setting it to a fixed valid value can cause hidden bugs. You cannot know if it was truly set by the caller or was just 0 by accident.
elfi/bo/acquisition.py
Outdated
@@ -95,23 +97,55 @@ class LCBSC(AcquisitionBase): | |||
would be t**(2d + 2). | |||
""" | |||
|
|||
def __init__(self, *args, delta=.1, **kwargs): | |||
def __init__(self, *args, delta=.1, noise_cov=0., **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about moving this to the parent class acquisition? That way we could avoid having to implement this in other subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if similar noise structure can be expected for all acquisition methods?
elfi/bo/acquisition.py
Outdated
super(LCBSC, self).__init__(*args, **kwargs) | ||
if delta <= 0 or delta >= 1: | ||
raise ValueError('Parameter delta must be in the interval (0,1)') | ||
self.delta = delta | ||
if isinstance(noise_cov, float) or isinstance(noise_cov, int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance(noise_cov, (float, int))
. This does not however detect numpy scalars but it may not be an issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good, didn't know that.
I suppose we should generally add more argument checking for user-facing routines.
elfi/bo/acquisition.py
Outdated
#class BolfiAcquisition(SecondDerivativeNoiseMixin, LCB): | ||
# """Default acquisition function for BOLFI. | ||
# """ | ||
# pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just remove these I think.
self._rbf_woodbury_inv = self._gp.posterior.woodbury_inv | ||
self._rbf_woodbury_chol = self._gp.posterior.woodbury_chol | ||
self._rbf_x2sum = np.sum(self._gp.X**2., 1)[None, :] | ||
self._rbf_is_cached = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the GP model is updated and hence it's hyperparameters changed, should these be updated as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They will be, since self.is_sampling=False
causes self._rbf_is_cached=False
.
elfi/bo/gpy_regression.py
Outdated
kernel = self.gp_params.get('kernel') | ||
self._kernel_is_default = False | ||
|
||
noise_var = self.gp_params.get('noise_var') or np.max(y)**2. / 100. | ||
mean_function = self.gp_params.get('mean_function') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the noise_var
and mean_function
also taken care of when computing the gradient? It seems that only the kernel
is checked to set the self._kernel_is_default
flag, but does it also then include these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, they may not be constant. I'll add a requirement for them to be None for using the direct gradients.
elfi/methods/methods.py
Outdated
bounds : list | ||
The region where to estimate the posterior for each parameter in | ||
model.parameters. | ||
`[(lower, upper), ... ]` | ||
initial_evidence : int, dict | ||
Number of initial evidence or a precomputed batch dict containing parameter | ||
and discrepancy values | ||
n_evidence : int | ||
n_acq: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed I had some inconsistencies here with n_evidence
and n_acq
. I wonder if we should only use n_evidence
, it would be easy for the user to know immediately how many evidence points he will have in the end (handy for estimating the required time to run the experiment). Also the initial_evidence
could then be None by default and then be set by the heuristic depending on the dimensions rather than being a fixed quantity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also then wouldn't need _n_initial_runs
since the only thing that matters is the n_evidence
. I essence, all the evidence you gather will turn into initial_evidence
for the future when you wish to continue where you are now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of work put into this, great job!
elfi/methods/methods.py
Outdated
"""Bayesian Optimization for Likelihood-Free Inference (BOLFI). | ||
|
||
Approximates the discrepancy function by a stochastic regression model. | ||
Discrepancy model is fit by sampling the discrepancy function at points decided by | ||
the acquisition function. | ||
|
||
The implementation follows that of Gutmann & Corander, 2016. | ||
The implementation loosely follows that of Gutmann & Corander, 2016. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
the method implements the framework introduced in ...
Only needed if model is an ElfiModel | ||
discrepancy_model : GPRegression, optional | ||
target_model : GPyRegression, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could say here that in BOLFI the target model is the discrepancy model.
tests/functional/test_inference.py
Outdated
|
||
# We should be able to carry out the inference in less than six batches | ||
assert res.populations[-1].n_batches < 6 | ||
|
||
|
||
@slow | ||
@pytest.mark.usefixtures('with_all_clients') | ||
def test_bayesian_optimization(): | ||
def test_BOLFI(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a separate test for bayesian optimization. Now the method is untested. Maybe test_bayesian_optimization_and_bolfi
. I was thinking that first run the BO test as it was but then just give the GP model to BOLFI and do the sampling with BOLFI. Then a separate quick test for running everything with BOLFI.