Other Optimization Methods

At its core, Krotov’s method is a gradient-based optimization method, and most directly compares to GRadient Ascent Pulse Engineering (GRAPE) KhanejaJMR05, another gradient-based method widely used in quantum control GlaserEPJD2015. We therefore compare to GRAPE first and highlight the difference with gradient-free methods further below.

GRadient Ascent Pulse Engineering (GRAPE)

The GRAPE method looks at the direct gradient ∂J_T/∂ϵ_j with respect to any control parameter ϵ_j. The value of the control parameter is then updated in the direction of the gradient, although in all practical applications, a numerical estimate of the Hessian ∂²J_T/∂ϵ_j∂ϵ_j^′ should also be included in calculation of the update. The L-BFGS-B quasi-Newton method ByrdSJSC1995,ZhuATMS97 is most commonly used for this purpose.

The control parameter ϵ_j may be the value of a control field in a particular time interval. When the control field is a discretization of a time-continuous control, e.g. using the piecewise-constant scheme as in figkrotovscheme, and for typical functionals, the numerical effort required for the calculation of the gradient ∂J_T/∂ϵ_j is nearly identical to the effort for a single iteration of Krotov’s method. In both cases, a forward and backward propagation over the entire time grid is carried out, which determines almost entirely the numerical effort. This requirement results from the derivative of the complex overlaps τ_k between the propagated states $\{\ket{\phi_k(T)}\}$ and the target states $\{\ket{\phi_k^{\tgt}}\}$, on which the standard functionals are based, cf. Eq. tauk. The relevant term in the gradient is then KhanejaJMR05

$$\begin{aligned} \begin{split} \frac{\partial \tau_k}{\partial \epsilon_j} &= \frac{\partial}{\partial \epsilon_j} \big\langle \phi_k^{\tgt} \big\vert \Op{U}^{(i)}_{nt-1} \dots \Op{U}^{(i)}_{j} \dots \Op{U}^{(i)}_{1} \big\vert \phi_k \big\rangle \\\ &= \underbrace{% \big\langle \phi_k^{\tgt} \big\vert \Op{U}^{(i)}_{nt-1} \dots \Op{U}^{(i)}_{j+1}}_{% \bra{\chi^{(i)}_k(t_{j+1})} } \, \frac{\partial\Op{U}^{(i)}_{j}}{\partial\epsilon_j} \, \underbrace{% \Op{U}^{(i)}_{j-1} \dots \Op{U}^{(i)}_{1} \big\vert \phi_k \big\rangle}_{% \ket{\phi^{(i)}_k(t_j)} }\,, \end{split} \end{aligned}$$

with $\Op{U}^{(i)}_j$ the time evolution operator for the time interval j, using the guess controls in iteration (i) of the optimization. We end up with backward-propagated states $\ket{\chi_k(t_{j+1})}$ and forward-propagated states $\ket{\phi_k(t_j)}$. This is to be compared with the first-order update equation krotov_first_order_update for Krotov’s method.

In this example of (discretized) time-continuous controls, both GRAPE and Krotov’s method can generally be used interchangeably. The benefits of Krotov’s method compared to GRAPE are EitanPRA11:

Krotov’s method mathematically guarantees monotonic convergence in the continuous limit.
Krotov’s method does not require a line search to determine the ideal magnitude of the pulse update in the direction of the gradient.
The sequential nature of Krotov’s update scheme, where information from earlier times enters the update at later times, cf. figkrotovscheme, results in faster convergence than the concurrent update in GRAPE MachnesPRA11,JaegerPRA14. However, this advantage disappears as the optimization approaches the optimum EitanPRA11.
The choice of functional J_T in Krotov’s method only enters in the boundary condition for the backward-propagated states, Eq. chi_boundary, while the update equation stays the same otherwise. In contrast, for functionals that do not depend trivially on the overlaps $\tau_k = \Braket{\phi_k^\tgt}{\phi_k(T)}$, the evaluation of the gradient in GRAPE may deviate significantly from its usual form, requiring a problem-specific implementation from scratch.

GRAPE has a significant advantage if the controls are not time-continuous, but are physically piecewise constant (“bang-bang control”). The calculation of the GRAPE-gradient is unaffected by this, whereas Krotov’s method can break down when the controls are not approximately continuous. QuTiP contains an implementation of GRAPE limited to this use case.

GRAPE (with the second-order derivative estimates by the L-BFGS-B algorithm) has been shown to converge faster than Krotov’s method when the optimization is close to the optimum. This is because Krotov’s method only considers the first-order-gradient with respect to the control field EitanPRA11, and this derivative vanishes close to the optimum. This is true even for the Krotov update with the additional non-linear term, discussed in the SecondOrderKrotov. There, “second-order” refers to the expansion of the functional with respect to the states, not to the order of the derivative.

GRAPE in QuTiP

An implementation of GRAPE is included in QuTiP, see the section on Quantum Optimal Control in the QuTiP docs. It is used via the qutip.control.pulseoptim.optimize_pulse function. However, some of the design choices in QuTiP's GRAPE effectively limit the routine to applications with physically piecewise-constant pulses (where GRAPE has an advantage over Krotov's method, as discussed in the previous section).

For discretized time-continuous pulses, the implementation of Krotov's method in .optimize_pulses has the following advantages over qutip.control.pulseoptim.optimize_pulse:

Krotov's method can optimize for more than one control field at the same time (hence the name of the routine .optimize_pulses compared to ~qutip.control.pulseoptim.optimize_pulse).
Krotov's method optimizes a list of .Objective instances simultaneously. The optimization for multiple simultaneous objectives in QuTiP's GRAPE implementation is limited to optimizing a quantum gate. Other uses of simultaneous objectives, such as optimizing for robustness, are not available.
Krotov's method can start from an arbitrary set of guess controls. In the GRAPE implementation, guess pulses can only be chosen from a specific set of options (including "random"). Again, this makes sense for a control field that is piecewise constant with relatively few switching points, but is very disadvantageous for time-continuous controls.
Krotov's method has complete flexibility in which propagation method is used (via the propagator argument to .optimize_pulses), while QuTiP's GRAPE only allows to choose between fixed number of methods for time-propagation. Supplying a problem-specific propagator is not possible.

Thus, QuTiP's GRAPE implementation and the implementation of Krotov's method in this package complement each other, but will not compare directly.

Gradient-free optimization

In situations where the controls can be reduced to a relatively small number of controllable parameters (typically less than 20), gradient-free optimization becomes feasible. The most straightforward use case are controls with an analytic shape (maybe due to the constraints of an experimental setup), with just a few free parameters. As an example, consider control pulses that are restricted to Gaussian pulses, so that the only free parameters are the peak amplitude and pulse width. The control parameters are not required to be parameters of a time-dependent control, but may also be static parameters in the Hamiltonian, e.g. the polarization of the laser beams utilized in an experiment HornNJP2018.

A special case of gradient-free optimization is the Chopped RAndom Basis (CRAB) method DoriaPRL11,CanevaPRA2011. The essence of CRAB is in the specific choice of the parametrization in terms of a low-dimensional random basis, as the name implies. Thus, it can be used when the parametrization is not as “obvious” as in the case of direct free parameters in the pulse shape discussed above. The optimization itself is normally performed by Nelder-Mead simplex based on this parametrization, although any other gradient-free method could be used as well.

An implementation of CRAB is included in QuTiP, see QuTiP's documentation of CRAB, and uses the same qutip.control.pulseoptim.optimize_pulse interface as the GRAPE method discussed above (GrapeInQutip) with the same limitations.

Gradient-free optimization does not require backward propagation, but only a forward propagation of the initial states and the evaluation of an arbitrary functional J_T. It also does not require the storage of states. However, the number of iterations can grow extremely large, especially with an increasing number of control parameters. Thus, an optimization with a gradient-free method is not necessarily more efficient overall compared to a gradient-based optimization with much faster convergence. For only a few parameters, however, it can be highly efficient.

This makes gradient-free optimization useful for “pre-optimization”, that is, for finding guess controls that are then further optimized with a gradient-based method GoerzEPJQT2015. A further benefit of gradient-free optimization is that it can be applied to any functional, even if $\partial J_T / \partial \bra{\phi_k}$ or ∂J_T/∂ϵ_j cannot be calculated.

A possible drawback of gradient-free optimization is that is also prone to get stuck in local optimization minima. To some extent, this can be mitigated by trying different guess pulses, by re-parametrization RachPRA2015, or by using some of the global methods available in the NLopt package NLOpt.

Generally, gradient-free optimization can be easily realized directly in QuTiP or any other software package for the simulation of quantum dynamics:

Write a function that takes an array of optimization parameters as input and returns a figure of merit. This function would, e.g., construct a numerical control pulse from the control parameters, simulate the dynamics using qutip.mesolve.mesolve, and evaluate a figure of merit (like the overlap with a target state).
Pass the function to scipy.optimize.minimize for gradient-free optimization.

The implementation in scipy.optimize.minimize allows to choose between different optimization methods, with Nelder-Mead simplex being the default. There exist also more advanced methods such as Subplex in NLopt NLOpt that may be worth exploring for improvements in numerical efficiency, and additional functionality such as support for non-linear constraints.

Choosing an optimization method

Decision tree for the choice of an optimization method

Whether to use a gradient-free optimization method, GRAPE, or Krotov’s method depends on the size of the problem, the requirements on the control pulse, and the optimization functional. Gradient-free methods should be used if the number of independent control parameters is smaller than 20, or the functional is of a form that does not allow to calculate gradients easily. It is always a good idea to use a gradient-free method to obtain improved guess pulses for use with a gradient-based method GoerzEPJQT2015.

GRAPE should be used if the control parameters are discrete, such as on a coarse-grained time grid, and the derivative of J_T with respect to each control parameter is easily computable. Note that the implementation provided in QuTiP is limited to state-to-state transitions and quantum gates, even though the method is generally applicable to a wider range of objectives.

Krotov’s method should be used if the control is near-continuous, and if the derivative of J_T with respect to the states, Eq. chi_boundary, can be calculated. When these conditions are met, Krotov’s method gives excellent convergence. However, as discussed in the section GRAPE, it is often observed to slow down when getting close to the minimum of J_T, as the first order derivative vanishes close to the optimum. For the “best of both worlds”, it can be beneficial to switch from Krotov’s method to GRAPE with L-BFGS-B in the final stage of the optimization MachnesPRA11. It has also been proposed to modify Krotov’s method to include information from the quasi-Hessian EitanPRA11.

The decision tree in figoctdecisiontree can guide the choice of an optimization method. The key deciding factor between gradient-free and gradient-based is the number of control parameters. For gradient-free optimization, CRAB’s random parametrization is useful for when there is no obviously better parametrization of the control (e.g., the control is restricted to an analytic pulse shape and we only want to optimize the free parameters of that pulse shape). For gradient-based methods, the decision between GRAPE and Krotov depends mainly on whether the pulses are approximately time-continuous (up to discretization), or are of bang-bang type.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10_other_methods.rst

10_other_methods.rst

Other Optimization Methods

GRadient Ascent Pulse Engineering (GRAPE)

GRAPE in QuTiP

Gradient-free optimization

Choosing an optimization method

Files

10_other_methods.rst

Latest commit

History

10_other_methods.rst

File metadata and controls

Other Optimization Methods

GRadient Ascent Pulse Engineering (GRAPE)

GRAPE in QuTiP

Gradient-free optimization

Choosing an optimization method