# Optimize Acquisition Functions using CMA-ES

In this tutorial, we show how to use an external optimizer (in this case [CMA-ES](https://en.wikipedia.org/wiki/CMA-ES)) for optimizing botorch acquisition functions. CMA-ES is a zero-th order optimizer, that is, it only uses function evaluations and does not require gradient information. This is of course very useful if gradient informatiom about the function to be optimized is unavailable. 

In botorch, we typically do have gradient information available (thanks, autograd!). One is also generally better of using this information, rather than just ignoring it. However, for certain custom models or acquisition functions, we may not be able to backprop through the acquisition function and/or model. In such instances, using a zero-th order optimizer is appropriate.

For this example we use the [PyCMA](https://github.com/CMA-ES/pycma) implementation of CMA-ES. PyCMA is easily installed via pip by running `pip install cma`.

### Starting point

Let's assume for the purpose of this tutorial that
- `acq_function` is an instance of a botorch `AnalyticAcquistionFunction`, for instance `UpperConfidenceBound`.
- `X_init` is a `d`-dim torch Tensor that we use as the initial condition for the CMA-ES algorithm.

**Note:** Relative to sequential evaluations, parallel evaluations of ther acqusition function are extremely fast in botorch (due to automatic parallelization across batch dimensions). In order to exploit this, we use the "ask/tell" interface to `cma` - this way we can batch-evaluate the whole CMA-ES population in parallel.

In this examle we use an initial standard deviation $\sigma_0 = 0.25$ and a population size $\lambda = 100$. 
We also constrain the input `X` to the unit cube `[0, 1]^d`.
See `cma`'s [API Reference](http://cma.gforge.inria.fr/apidocs-pycma/cma.evolution_strategy.CMAEvolutionStrategy.html) for more information on these options.

With this, we can optimize this acquistition function as follows:

In [None]:
import torch
import cma

# convert IC to numpy
x0 = X_init.cpu().double().numpy()

# create the CMA-ES optimizer
es = cma.CMAEvolutionStrategy(
    x0=x0,
    sigma0=0.25,
    inopts={'bounds': [0, 1], "popsize": 100},
)

# speed up things by telling pytorch not to generate a compute graph in the background
with torch.no_grad():

    # Run the optimization loop using the ask/tell interface -- this uses 
    # PyCMA's default settings, see the PyCMA documentation for how to modify these
    while not es.stop():
        xs = es.ask()  # as for new points to evaluate
        # convert to Tensor for evaluating the acquisition function
        X = torch.from_numpy(xs).to(device=X_init.device, dtype=X_init.dtype)
        # evaluate the acquisition function
        Y = - acq_func(X)  # the optimizer assumes we're minimizing
        y = Y.view(-1).double().numpy()  # convert result into numpy array
        es.tell(xs, y)  # return the result to the optimizer

# convert result back to torch Tensor
best_x = torch.from_numpy(es.best.x)
# make sure we have the right device and data type
best_x = best_x.to(device=X_init.device, dtype=X_init.dtype)