Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BOLFI + NUTS #135

Merged
merged 23 commits into from
May 13, 2017
Merged

BOLFI + NUTS #135

merged 23 commits into from
May 13, 2017

Conversation

vuolleko
Copy link
Member

@vuolleko vuolleko commented Apr 20, 2017

  • Implementation of MCMC sampling using the No-U-Turn Sampler + convergence diagnostics
  • Implementation of BOLFI on top of BayesianOptimization
  • Result object for BOLFI, including some integrated plotting

self.threshold = minval
logger.info("Using minimum value of discrepancy estimate mean (%.4f) as threshold" % (self.threshold))
self.priors = [None] * model.input_dim
self.priors = priors or [None] * model.input_dim
self.ML, self.ML_val = stochastic_optimization(self._neg_unnormalized_loglikelihood_density, self.model.bounds, max_opt_iters)
self.MAP, self.MAP_val = stochastic_optimization(self._neg_unnormalized_logposterior_density, self.model.bounds, max_opt_iters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above two will require some execution time and I'm not sure these are always needed. I think it would be cleaner to have this class only define the posterior distribution. Users can then optimize the distribution if they like.

@@ -635,40 +635,59 @@ class BOLFI(InferenceMethod):

"""

def __init__(self, model, batch_size=1, discrepancy=None, bounds=None, **kwargs):
"""
def get_posterior(self, threshold=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could be renamed to something that better reflects the fact that we do quite a bit of work here to infer the posterior. E.g. infer_posterior ?

n_accepted += 1

logger.info("{}: Total acceptance ratio: {:.3f}".format(__name__, float(n_accepted) / n_samples))
return samples[1:, :]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work :)

@@ -10,7 +10,7 @@
logger = logging.getLogger(__name__)

class Posterior():
Copy link
Contributor

@jlintusaari jlintusaari Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we necessarily yet need a class for Posterior distribution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, this may just be confusing here.

"""
grad = self._grad_unnormalized_loglikelihood_density(x) + self._grad_logprior_density(x)
return grad[0]

def __getitem__(self, idx):
return tuple([[v]*len(idx) for v in self.MAP])

def _unnormalized_loglikelihood_density(self, x):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _unnormalized_loglikelihood would be more proper since likelihood is a function.

x : np.array
norm : boolean
Whether to normalize (unsupported).

Copy link
Contributor

@jlintusaari jlintusaari Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just remove these normalization flags? I don't know if there is any general (enough) way to normalize these.

pdf = sp.stats.norm.pdf(term)
cdf = sp.stats.norm.cdf(term)

return factor * pdf / cdf

def _unnormalized_likelihood_density(self, x):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about density.

return sp.stats.norm.logcdf(self.threshold, mean, std)
return sp.stats.norm.logcdf(self.threshold, mean, np.sqrt(var))

def _grad_unnormalized_loglikelihood_density(self, x):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about density.

@@ -618,14 +618,14 @@ def _report_batch(self, batch_index, params, distances):
logger.debug(str)


class BOLFI(InferenceMethod):
class BOLFI(BayesianOptimization):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test for BOLFI in test_inference.py. That should be updated now as this is implemented (it's currently skipped).

# some heuristics to choose kernel parameters based on the initial data
length_scale = (np.max(x) - np.min(x)) / 3.
kernel_var = (np.max(y) / 3.)**2.
bias_var = kernel_var / 4.
Copy link
Contributor

@jlintusaari jlintusaari Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there needs to be some amount of data available until these make sense. E.g. if we only have one observation, these may not work very well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but what would be better?

Copy link
Contributor

@jlintusaari jlintusaari Apr 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just using ones, of course one cannot say if they would be better but e.g. length_scale here would become 0 if there was just 1 observation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so now there's a check for really small length_scale, and a general check for reasonably large initialization set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good!

"""Bayesian Optimization for Likelihood-Free Inference (BOLFI).

Approximates the discrepancy function by a stochastic regression model.
Discrepancy model is fit by sampling the discrepancy function at points decided by
the acquisition function.

The implementation follows that of Gutmann & Corander, 2016.
The implementation (mostly) follows that of Gutmann & Corander, 2016.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe list differences?

def __init__(self, model, batch_size=1, discrepancy=None, bounds=None, **kwargs):
"""
def get_posterior(self, threshold=None):
"""Returns the posterior.
Copy link
Contributor

@jlintusaari jlintusaari Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Returns an approximation for the posterior based on a GP regression.

Copy link
Contributor

@jlintusaari jlintusaari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Some minor comments.

Copy link
Contributor

@jlintusaari jlintusaari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of great work done!

"""Returns the next batch of acquisition points.

Parameters
----------
n_values : int
Number of values to return.
pending_locations : None or numpy 2d array
If given, asycnhronous acquisition functions may
If given, asynchronous acquisition functions may
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this now but actually this applies to sync mode as well when max_parallel_batches > 1 (and furthermore we don't have async mode atm:)

@@ -42,15 +42,15 @@ def evaluate(self, x, t=None):
"""
return NotImplementedError

def acquire(self, n_values, pending_locations=None, t=None):
def acquire(self, n_values, pending_locations=None, t=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why t=0 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be None as well, but then there should be a check. Otherwise it crashes if t is not given.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, check is fine. I guess it was None at first because not all methods use it. The check can be implemented in the evaluate function though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what's wrong with 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing too much, just to be safe. Since it is supposed to be a iteration id number coming from an algorithm, setting it to a fixed valid value can cause hidden bugs. You cannot know if it was truly set by the caller or was just 0 by accident.

@@ -95,23 +97,55 @@ class LCBSC(AcquisitionBase):
would be t**(2d + 2).
"""

def __init__(self, *args, delta=.1, **kwargs):
def __init__(self, *args, delta=.1, noise_cov=0., **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about moving this to the parent class acquisition? That way we could avoid having to implement this in other subclasses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if similar noise structure can be expected for all acquisition methods?

super(LCBSC, self).__init__(*args, **kwargs)
if delta <= 0 or delta >= 1:
raise ValueError('Parameter delta must be in the interval (0,1)')
self.delta = delta
if isinstance(noise_cov, float) or isinstance(noise_cov, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(noise_cov, (float, int)). This does not however detect numpy scalars but it may not be an issue here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good, didn't know that.

I suppose we should generally add more argument checking for user-facing routines.

#class BolfiAcquisition(SecondDerivativeNoiseMixin, LCB):
# """Default acquisition function for BOLFI.
# """
# pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just remove these I think.

self._rbf_woodbury_inv = self._gp.posterior.woodbury_inv
self._rbf_woodbury_chol = self._gp.posterior.woodbury_chol
self._rbf_x2sum = np.sum(self._gp.X**2., 1)[None, :]
self._rbf_is_cached = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the GP model is updated and hence it's hyperparameters changed, should these be updated as well?

Copy link
Member Author

@vuolleko vuolleko May 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be, since self.is_sampling=False causes self._rbf_is_cached=False.

kernel = self.gp_params.get('kernel')
self._kernel_is_default = False

noise_var = self.gp_params.get('noise_var') or np.max(y)**2. / 100.
mean_function = self.gp_params.get('mean_function')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the noise_var and mean_function also taken care of when computing the gradient? It seems that only the kernel is checked to set the self._kernel_is_default flag, but does it also then include these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, they may not be constant. I'll add a requirement for them to be None for using the direct gradients.

bounds : list
The region where to estimate the posterior for each parameter in
model.parameters.
`[(lower, upper), ... ]`
initial_evidence : int, dict
Number of initial evidence or a precomputed batch dict containing parameter
and discrepancy values
n_evidence : int
n_acq: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed I had some inconsistencies here with n_evidence and n_acq. I wonder if we should only use n_evidence, it would be easy for the user to know immediately how many evidence points he will have in the end (handy for estimating the required time to run the experiment). Also the initial_evidence could then be None by default and then be set by the heuristic depending on the dimensions rather than being a fixed quantity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also then wouldn't need _n_initial_runs since the only thing that matters is the n_evidence. I essence, all the evidence you gather will turn into initial_evidence for the future when you wish to continue where you are now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds reasonable.

Copy link
Contributor

@jlintusaari jlintusaari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of work put into this, great job!

"""Bayesian Optimization for Likelihood-Free Inference (BOLFI).

Approximates the discrepancy function by a stochastic regression model.
Discrepancy model is fit by sampling the discrepancy function at points decided by
the acquisition function.

The implementation follows that of Gutmann & Corander, 2016.
The implementation loosely follows that of Gutmann & Corander, 2016.
Copy link
Contributor

@jlintusaari jlintusaari May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

the method implements the framework introduced in ...

Only needed if model is an ElfiModel
discrepancy_model : GPRegression, optional
target_model : GPyRegression, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could say here that in BOLFI the target model is the discrepancy model.


# We should be able to carry out the inference in less than six batches
assert res.populations[-1].n_batches < 6


@slow
@pytest.mark.usefixtures('with_all_clients')
def test_bayesian_optimization():
def test_BOLFI():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a separate test for bayesian optimization. Now the method is untested. Maybe test_bayesian_optimization_and_bolfi. I was thinking that first run the BO test as it was but then just give the GP model to BOLFI and do the sampling with BOLFI. Then a separate quick test for running everything with BOLFI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants