Skip to content

Latest commit

 

History

History
596 lines (473 loc) · 24.2 KB

06_krotovs_method.rst

File metadata and controls

596 lines (473 loc) · 24.2 KB

Krotov’s Method

Optimization functional

Quantum optimal control methods formalize the problem of finding "control fields" that achieve some physical objective, using the time evolution of a quantum system, including possible constraints. The most direct example is a state-to-state transition, that is, for a known quantum state at time zero to evolve to a specific target state at final time T, controlling, e.g. a chemical reaction TannorJCP1985. Another example is the realization of quantum gates, the building blocks of a quantum computer. In this case, the states forming a computational basis must transform according to a unitary transformation NielsenChuang. The control fields might be the amplitudes of a laser pulse, for the control of a molecular system, RF fields for nuclear magnetic resonance, or microwave fields for superconducting circuits. There may be multiple independent controls involved in the dynamics, such as different color lasers used in the excitation of a Rydberg atom, or different polarization components of an electric field.

The quantum control methods build on a rich field of classical control theory BellmanBook,PontryaginBook. This includes Krotov’s method Krotov.book, which was originally formulated to optimize the soft landing of a spacecraft from orbit to the surface of a planet KonnovARC99, before being applied to quantum mechanical problems SklarzPRA2002. Fundamentally, they rely on the variational principle, that is, the minimization of a functional $J[\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}(t)\}]$ that includes any required constraints via Lagrange multipliers. The condition for minimizing J is then ϕk, ϵlJ = 0. In rare cases, the variational calculus can be solved in closed form, based on Pontryagin’s maximum principle PontryaginBook. Numerical methods are required in any other case. These start from an initial guess control (or set of guess controls, if there are multiple controls), and calculate an update to these controls that will decrease the value of the functional. The updated controls then become the guess for the next iteration of the algorithm, until the value of the functional is sufficiently small, or convergence is reached.

Mathematically, Krotov’s method, when applied to quantum systems Tannor92,ReichJCP12, minimizes a functional of the most general form

$$J[\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}(t)\}] = J_T(\{\ket{\phi_k^{(i)}(T)}\}) + \sum_l \int_0^T g_a(\epsilon_l^{(i)}(t)) \dd t + \int_0^T g_b(\{\phi^{(i)}_k(t)\}) \dd t\,,$$

where the $\{\ket{\phi_k^{(i)}(T)}\}$ are the time-evolved initial states $\{\ket{\phi_k}\}$ under the (guess) controls {ϵl(i)(t)} of the i’th iteration. In the simplest case of a single state-to-state transition, the index k vanishes. For the example of a two-qubit quantum gate, $\{\ket{\phi_k}\}$ would be the logical basis states $\ket{00}$, $\ket{01}$, $\ket{10}$, and $\ket{11}$. The sum over l vanishes if there is only a single control. For open system dynamics, the states $\{\ket{\phi_k}\}$ may be density matrices.

The functional consists of three parts:

  • A final-time functional JT. This is the "main" part of the functional, and we can usually think of J as being an auxiliary functional in the optimization of JT.
  • A running cost on the control fields, ga. The most commonly used expression (and the only one currently supported by the krotov package) is PalaoPRA2003


    $$g_a(\epsilon_l^{(i+1)}(t)) = \frac{\lambda_{a,l}}{S_l(t)} \Delta\epsilon_l^2(t)\,, \quad \Delta\epsilon_l(t) \equiv \epsilon_l^{(i+1)}(t) - \epsilon_l^{(i)}(t)\,.$$

    This introduces two parameters for each control, the (inverse) Krotov "step width" λa, l and the update-shape function Sl(t) ∈ [0, 1]. Δϵl(t) is the update of the control in a single iteration of the optimization algorithm. As we will see below, λa, l determines the overall magnitude of Δϵl(t), and Sl(t) can be used to ensure boundary conditions on ϵl(i + 1)(t).

  • An optional state-dependent running cost, gb, can be employed, e.g., for penalizing population in a subspace PalaoPRA2008. This is rarely used, as there are other methods to achieve the same effect, like using a non-Hermitian Hamiltonian to remove population from the forbidden subspace during the time evolution. Currently, the krotov package only supports gb ≡ 0.

The most commonly used final-time functionals (cf. krotov.functionals) optimize for a set of initial states $\{\ket{\phi_k}\}$ to evolve to a set of target states $\{\ket{\phi_k^\tgt}\}$. The functionals can then be expressed in terms of the complex overlaps of the target states with the final-time states under the given control. Thus,

$$\tau_k = \Braket{\phi_k^\tgt}{\phi_k(T)}$$

in Hilbert space, or

$$\tau_k = \langle\!\langle \Op{\rho}^{\tgt} \vert \Op{\rho}_k(T) \rangle\!\rangle \equiv \tr\left[\Op{\rho}_k^{\tgt\,\dagger} \Op{\rho}_k(T) \right]$$

in Liouville space.

The following functionals JT can be formed from these complex overlaps, taking into account that any optimization functional JT must be real. They differ by the way they treat the phases φk in the physical optimization goal $\ket{\phi_k(T)} \overset{!}{=} e^{i\varphi_k}\ket{\phi_k^{\tgt}}$ PalaoPRA2003:

  • Optimize for simultaneous state-to-state transitions, with completely arbitrary phases φk,


    $$J_{T,\text{ss}} = 1- \frac{1}{N} \sum_{k=1}^{N} \Abs{\tau_k}^2\,,$$

    cf. .J_T_ss.

  • Optimize for simultaneous state-to-state transitions, with an arbitrary global phase, i.e., φk = φglobal for all k with arbitrary φglobal,


    $$J_{T,\text{sm}} = 1- \frac{1}{N^2} \Abs{\sum_{k=1}^{N} \tau_k}^2 = 1- \frac{1}{N^2} \sum_{k=1}^{N} \sum_{k'=1}^{N} \tau_{k'}^* \tau_{k}\,,$$

    cf. .J_T_sm.

  • Optimize for simultaneous state-to-state transitions, with a global phase of zero, i.e., φk = 0 for all k,


    $$J_{T,\text{re}} = 1-\frac{1}{N} \Re \left[\, \sum_{k=1}^{N} \tau_k \,\right]\,,$$

    cf. .J_T_re.

Update equation

Krotov’s method is based on a rigorous examination of the conditions for calculating the updated fields ϵl(i + 1)(t) such that $J(\{\ket{\phi_k^{(i+1)}(t)}\}, \{\epsilon_l^{(i+1)}\}) \leq J(\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}\})$ is true by construction Krotov.book,KonnovARC99,PalaoPRA2003,ReichJCP12,SklarzPRA2002. It achieves this by adding a vanishing quantity to the functional that disentangles the implicit dependence of $\{\ket{\phi_k}\}$ and {ϵl(t)} in the variational calculus. Specifically, the derivation formulates an auxiliary functional $L[\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}(t)\}, \Phi]$ that is equivalent to $J[\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}(t)\}]$, but includes an arbitrary scalar potential Φ. The freedom in this scalar potential is then used to formulate a condition to ensure monotonic convergence,

$$\begin{aligned} \left.\frac{\partial g_a}{\partial \epsilon}\right\vert_{\epsilon^{(i+1)}(t)} = 2 \Im \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle\,. \end{aligned}$$

For ga as in Eq. g_a, this condition becomes the Krotov update equation Tannor92,PalaoPRA2003,SklarzPRA2002,

$$\begin{aligned} \Delta\epsilon(t) = \frac{S(t)}{\lambda_a} \Im \left[ \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right]\,, \end{aligned}$$

with the equation of motion for the forward propagation of $\ket{\phi_k}$ under the optimized controls ϵ(i + 1)(t) of the iteration (i),

$$\frac{\partial}{\partial t} \Ket{\phi_k^{(i+1)}(t)} = -\frac{\mathrm{i}}{\hbar} \Op{H}^{(i+1)} \Ket{\phi_k^{(i+1)}(t)}\,.$$

For the moment, we have assumed unitary dynamics; the generalization to open system dynamics will be discussed later in this section. The co-states $\ket{\chi_k^{(i)}(t)}$ are propagated backwards in time under the guess controls of iteration (i), i.e., the optimized controls from the previous iteration, as

$$\frac{\partial}{\partial t} \Ket{\chi_k^{(i)}(t)} = -\frac{\mathrm{i}}{\hbar} \Op{H}^{\dagger\,(i)} \Ket{\chi_k^{(i)}(t)} + \left.\frac{\partial g_b}{\partial \Bra{\phi_k}}\right\vert_{\phi^{(i)}(t)}\,,$$

with the boundary condition

$$\Ket{\chi_k^{(i)}(T)} = - \left.\frac{\partial J_T}{\partial \Bra{\phi_k}} \right\vert_{\phi^{(i)}(T)}\,.$$

Here, and in the following, we have dropped the index l of the controls and the associated λa, l and Sl(t); all equations are valid for each individual control.

Frequently, the control field ϵ(t) is required to be zero at t = 0 and t = T in order to smoothly switch on and off. To ensure that the update maintains this behavior, S(t) ∈ [0, 1] is chosen as a function with those same conditions. A typical example is a .flattop function

$$\begin{aligned} S(t) = \begin{cases} B(t; t_0=0, t_1=2 t_{\text{on}}) & \text{for} \quad 0 < t < t_{\text{on}} \\\ 1 & \text{for} \quad t_{\text{on}} \le t \le T - t_{\text{off}} \\\ B(t; t_0=T-2 t_{\text{off}}, t_1=T) & \text{for} \quad T - t_{\text{on}} < t < T\,, \end{cases} \end{aligned}$$

with the .blackman shape B(t; t0, t1), which is similar to a Gaussian, but exactly zero at t = t0, t1.

The scaling factor λa controls the overall magnitude of the pulse update, thereby taking the role of an (inverse) "step size". Values that are too large will change ϵ(i)(t) by only a small amount in every iteration, causing slow convergence. Values that are too small will result in numerical instability, see ChoiceOfLambdaA.

The coupled equations krotov_first_order_update-chi_boundary can be generalized to open system dynamics by replacing Hilbert space states with density matrices, $\Op{H}$ with $i \Liouville$, and brakets with Hilbert-Schmidt products, ⟨ ⋅ | ⋅ ⟩ → ⟨​⟨ ⋅ | ⋅ ⟩​⟩. In full generality, $\Op{H}$ in Eq. krotov_first_order_update is the operator H on the right-hand side of whatever the equation of motion for the forward propagation of the states is, written in the form iϕ̇ = Hϕ, cf. Eq. fw_eqm, see krotov.mu. Note also that the backward propagation Eq. bw_eqm uses the adjoint operator, which is relevant both for a dissipative Liouvillian BartanaJCP93,OhtsukiJCP99,GoerzNJP2014 and a non-Hermitian Hamiltonian MullerQIP11,GoerzQST2018.

Optimization of non-linear problems or non-convex functionals

The condition krotov_proto_update and the update Eq. krotov_first_order_update are based on a first-order expansion of the auxiliary potential Φ with respect to the states. This first order is sufficient if the equation of motion is linear ($\Op{H}$ does not depend on the states $\ket{\phi_k(t)}$), the functional JT is convex (all the "standard" functionals for quantum control are convex), and no state-dependent constraints are used (gb ≡ 0). When these conditions are not fulfilled, it is still possible to derive an optimization algorithm with monotonic convergence via a second term in Eq. krotov_proto_update KonnovARC99,ReichJCP12,

$$\begin{aligned} \begin{split} \left.\frac{\partial g_a}{\partial \epsilon}\right\vert_{\epsilon^{(i+1)}(t)} & = 2 \Im \left[ \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right. \\ & \qquad \left. + \frac{1}{2} \sigma(t) \Bigg\langle \Delta\phi_k(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right]\,, \end{split} \end{aligned}$$

with

$$\ket{\Delta \phi_k(t)} \equiv \ket{\phi_k^{(i+1)}(t)} - \ket{\phi_k^{(i)}(t)}\,.$$

This second term is the "non-linear" or "second order" contribution. The corresponding update quation is, assuming Eq. g_a,

$$\begin{aligned} \begin{split} \Delta\epsilon(t) & = \frac{S(t)}{\lambda_a} \Im \left[ \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right. \\ & \qquad \qquad \quad \left. + \frac{1}{2} \sigma(t) \Bigg\langle \Delta\phi_k(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right]\,. \end{split} \end{aligned}$$

The prefactor σ(t) to the second order update is a scalar function that must be chosen appropriately to ensure monotonic convergence.

As shown in Ref. ReichJCP12, it is possible to numerically approximate σ(t). In Refs WattsPRA2015,GoerzPRA2015, non-convex final-time functionals that depend higher than quadratically on the states are considered, for a standard equation of motion given by a linear Schrödinger equation. In this case,


σ(t) ≡  − max (εA,2A+εA) ,

where εA is a small non-negative number that can be used to enforce strict inequality in the second order optimality condition. The optimal value for A in each iteration can be approximated numerically as ReichJCP12

$$A = \frac{\sum_{k=1}^{N} 2 \Re\left[ \langle \chi_k(T) \vert \Delta\phi_k(T) \rangle \right] + \Delta J_T} {\sum_{k=1}^{N} \Abs{\Delta\phi_k(T)}^2} \,,$$

cf. krotov.second_order.numerical_estimate_A, with


ΔJT ≡ JT({ϕk(i + 1)(T)}) − JT({ϕk(i)(T)}) .

See the /notebooks/07_example_PE.ipynb for an example.

Note

Even when the second order update equation is mathematically required to guarantee monotonic convergence, very often an optimization with the first-order update equation krotov_first_order_update will give converging results. Since the second order update requires more numerical resources (calculation and storage of the states $\ket{\Delta\phi_k(t)}$), you should always try the optimization with the first-order update equation first.

Time discretization

Sequential update scheme in Krotov’s method on a time grid.

The derivation of Krotov’s method assumes time-continuous control fields. Only in this case, monotonic convergence is mathematically guaranteed. However, for practical numerical applications, we have to consider controls on a discrete time grid with nt points running from t = 0 to t = T, with a time step t. The states are defined on the points of the time grid, while the controls are assumed to be constant on the intervals of the time grid. See the notebook Time Discretization in Quantum Optimal Control for details. This discretization yields the numerical scheme shown in figkrotovscheme. It proceeds as follows PalaoPRA2003:

  1. Construct the states $\ket{\chi^{(i)}_k(T)}$ according to Eq. chi_boundary. These typically depend on the states $\{\ket{\phi^{(i)}_k(T)}\}$ forward-propagated under the optimized pulse from the previous iteration, that is, the guess pulse in the current iteration.
  2. Perform a backward-propagation using Eq. bw_eqm as the equation of motion over the entire time grid. The resulting state at each point in the time grid must be stored in memory.
  3. Starting from the known initial states $\ket{\phi_k} = \ket{\phi_k(t=0)}$, calculate the pulse update for the first time step according to Eq. krotov_first_order_update, with t = ⅆt/2 on the left-hand side (representing the first interval in the time grid, on which the control pulse is defined), and t = 0 on the right-hand side (representing the first point on the time grid). This approximation of t ≈ t + ⅆt/2 is what constitutes the "time discretization" mathematically. It resolves the seeming contradiction in the time-continuous Eq. krotov_first_order_update that the calculation of ϵ(i + 1)(t) requires knowledge of the states $\ket{\phi_k^{(i+1)}(t)}$ obtained from a propagation under ϵ(i + 1)(t).
  4. Use the updated field ϵ(i + 1)(ⅆt/2) for the first interval to propagate $\ket{\phi_k(t=0)}$ for a single time step to $\ket{\phi_k^{(i+1)}(t=\dd t)}$, with Eq. fw_eqm as the equation of motion. The updates then proceed sequentially, until the final forward-propagated state $\ket{\phi^{(i+1)}_k(T)}$ is reached.
  5. The updated control field becomes the guess control for the next iteration of the algorithm, starting again at step 1. The optimization continues until the value of the functional JT falls below some predefined threshold, or convergence is reached, i.e., ΔJT approaches zero so that no further significant improvement of JT is to be expected.

For multiple objectives, the scheme can run in parallel, and each objective contributes a term to the update. Summation of these terms yields the sum in Eq. krotov_first_order_update. See krotov.parallelization for details. For a second-order update, the forward propagated states from step 4, both for the current iteration and the previous iteration, must be stored in memory over the entire time grid.

Choice of λₐ

The monotonic convergence of Krotov's method is only guaranteed in the continuous limit; a coarse time step must be compensated by larger values of the inverse step size λa, slowing down convergence. Generally, choosing λa too small will lead to numerical instabilities and unphysical features in the optimized pulse. A lower limit for λa can be determined from the requirement that the change Δϵ(t) should be at most of the same order of magnitude as the guess pulse ϵ(i)(t) for that iteration. The Cauchy-Schwarz inequality applied to the update equation yields

$$\Norm{\Delta \epsilon(t)}_{\infty} \le \frac{\Norm{S(t)}}{\lambda_a} \sum_{k} \Norm{\ket{\chi_k (t)}}_{\infty} \Norm{\ket{\phi_k (t)}}_{\infty} \Norm{\frac{\partial \Op{H}}{\partial \epsilon}}_{\infty} \stackrel{!}{\le} \Norm{\epsilon^{(i)}(t)}_{\infty}\,,$$

where $\norm{\partial \Op{H}/\partial \epsilon}_{\infty}$ denotes the supremum norm (with respect to time) of the operator norms of the operators $\partial \Op{H}/\partial \epsilon$ obtained at time t. Since S(t) ∈ [0, 1] and $\ket{\phi_k}$ is normalized, the condition for λa becomes

$$\lambda_a \ge \frac{1}{\Norm{\epsilon^{(i)}(t)}_{\infty}} \left[ \sum_{k} \Norm{\ket{\chi_k(t)}}_{\infty} \right] \Norm{\frac{\partial \Op{H}}{\partial \epsilon}}_{\infty}\,.$$

From a practical point of view, the best strategy is to start the optimization with a comparatively large value of λa, and after a few iterations lower λa as far as possible without introducing numerical instabilities. The value of λa may be adjusted dynamically with respect to the rate of convergence. Generally, the optimal choice of λa requires some trial and error.

Rotating wave approximation

When using the rotating wave approximation (RWA), it is important to remember that the target states are usually defined in the lab frame, not in the rotating frame. This is relevant for the construction of $\ket{\chi_k(T)}$. When doing a simple optimization, such as a state-to-state or a gate optimization, the easiest approach is to transform the target states to the rotating frame before calculating $\ket{\chi_k(T)}$. This is both straightforward and numerically efficient.

Another solution would be to transform the result of the forward propagation $\ket{\phi_k(T)}$ from the rotating frame to the lab frame, then constructing $\ket{\chi_k(T)}$, and finally to transform $\ket{\chi_k(T)}$ back to the rotating frame, before starting the backward propagation.

When the RWA is used the control fields are complex-valued. In this case the Krotov update equation is valid for both the real and the imaginary part independently. The most straightforward implementation of the method is for real controls only, requiring that any complex control Hamiltonian is rewritten as two independent control Hamiltonians, one for the real part and one for the imaginary part of the control field. For example,

$$\epsilon^*(t) \Op{a} + \epsilon(t) \Op{a}^\dagger = \epsilon_{\text{re}}(t) (\Op{a} + \Op{a}^\dagger) + \epsilon_{\text{im}}(t) (i \Op{a}^\dagger - i \Op{a})$$

with two independent control fields ϵre(t) = ℜ[ϵ(t)] and ϵim(t) = ℑ[ϵ(t)].

See the /notebooks/02_example_lambda_system_rwa_complex_pulse.ipynb for an example.

Optimization in Liouville space

The control equations have been written in the notation of Hilbert space. However, they are equally valid for a gate optimization in Liouville space, by replacing Hilbert space states with density matrices, $\Op{H}$ with $i \Liouville$ (cf. krotov.mu), and inner products with Hilbert-Schmidt products, ⟨ ⋅ | ⋅ ⟩ → ⟨​⟨ ⋅ | ⋅ ⟩​⟩, cf., e.g., Ref. GoerzNJP2014.

See the /notebooks/04_example_dissipative_qubit_reset.ipynb for an example.