Improve usage overview docs (#76)

* Add some references * Suppress warning related to notebook index file * Improve equation number location * Improve inverse problem documentation (relevant to #64) * Add todo sections * Add description of x-update ADMM solvers * Added description of step size methods in optimizer docs * Formatting fix * Edits for consistency * CSS improvement * Typo fixes * Remove _k on the regularizer * Update overview.rst Implement @Michael-T-McCann's suggestion. Co-authored-by: Michael-T-McCann <michael.thompson.mccann@gmail.com> Co-authored-by: crstngc <cristina.cgarcia@gmail.com>
lanl · Nov 5, 2021 · 384a2bf · 384a2bf
1 parent ff97e6d
commit 384a2bf
Show file tree

Hide file tree

Showing 6 changed files with 315 additions and 26 deletions.
diff --git a/docs/source/_static/scico.css b/docs/source/_static/scico.css
@@ -75,6 +75,17 @@ dl.citation > dd {
 }
 
 
+/* Fix equation number location */
+
+.math {
+  text-align: left;
+}
+
+.eqno {
+  float: right;
+}
+
+
 /* Style for autosummary API docs */
 
 dl.field-list.simple {

diff --git a/docs/source/classes.rst b/docs/source/classes.rst
@@ -1,3 +1,5 @@
+.. _classes:
+
 Important Classes
 =================
 
@@ -6,3 +8,4 @@ Important Classes
 
    operator
    functional
+   optimizer
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -26,6 +26,7 @@ SCICO Documentation
    :hidden:
 
    api
+   examples/index.ipynb
 
 
 Indices

diff --git a/docs/source/optimizer.rst b/docs/source/optimizer.rst
@@ -0,0 +1,234 @@
+.. _optimizer:
+
+Optimization Algorithms
+=======================
+
+
+.. raw:: html
+
+    <style type='text/css'>
+    div.document ul blockquote {
+       margin-bottom: 8px !important;
+    }
+    div.document li > p {
+       margin-bottom: 4px !important;
+    }
+    div.document ul > li {
+      list-style: square outside !important;
+      margin-left: 1em !important;
+      margin-bottom: 1em !important;
+    }
+    section {
+      padding-bottom: 1em;
+    }
+    ul {
+      margin-bottom: 1em;
+    }
+    </style>
+
+
+
+ADMM
+----
+
+The Alternating Direction Method of Multipliers (ADMM) :cite:`glowinski-1975-approximation` :cite:`gabay-1976-dual`
+is an algorithm for minimizing problems of the form
+
+.. math::
+   :label: eq:admm_prob
+
+   \argmin_{\mb{x}, \mb{z}} \; f(\mb{x}) + g(\mb{z}) \; \text{such that}
+   \; \acute{A} \mb{x} + \acute{B} \mb{z} = \mb{c} \;,
+
+where :math:`f` and :math:`g` are convex (but not necessarily smooth)
+functions, :math:`\acute{A}` and :math:`\acute{B}` are linear operators,
+and :math:`\mb{c}` is a constant vector. (For a thorough introduction and
+overview, see :cite:`boyd-2010-distributed`.)
+
+The SCICO ADMM solver, :class:`.ADMM`, solves problems of the form
+
+.. math::
+   \argmin_{\mb{x}} \; f(\mb{x}) + \sum_{i=1}^N g_i(C_i \mb{x}) \;,
+
+where :math:`f` and the :math:`g_i` are instances of :class:`.Functional`,
+and the :math:`C_i` are :class:`.LinearOperator`, by defining
+
+.. math::
+   g(\mb{z}) = \sum_{i=1}^N g_i(\mb{z}_i) \qquad \mb{z}_i = C_i \mb{x}
+
+in :eq:`eq:admm_prob`, corresponding to defining
+
+.. math::
+  \acute{A} = \left( \begin{array}{c} C_0 \\ C_1 \\ C_2 \\
+	   \vdots \end{array} \right)  \quad
+  \acute{B} = \left( \begin{array}{cccc}
+	      -I & 0 & 0 & \ldots \\
+	      0 & -I & 0 & \ldots \\
+	      0 &  0  & -I & \ldots \\
+	      \vdots & \vdots & \vdots & \ddots
+	      \end{array} \right) \quad
+  \mb{z} = \left( \begin{array}{c} \mb{z}_0 \\ \mb{z}_1 \\ \mb{z}_2 \\
+	   \vdots \end{array} \right)  \quad
+  \mb{c} = \left( \begin{array}{c} 0 \\ 0 \\ 0 \\
+	   \vdots \end{array} \right) \;.
+
+In :class:`.ADMM`, :math:`f` is a :class:`.Functional`, typically a :class:`.Loss`, corresponding to the forward model of an imaging problem,
+and the :math:`g_i` are :class:`.Functional`, typically corresponding to a
+regularization term or constraint. Each of the :math:`g_i` must have a
+proximal operator defined. It is also possible to set ``f = None``, which corresponds to defining :math:`f = 0`, i.e. the zero function.
+
+
+Subproblem Solvers
+^^^^^^^^^^^^^^^^^^
+
+The most computational expensive component of the ADMM iterations is typically
+the :math:`\mb{x}`-update,
+
+    .. math::
+       :label: eq:admm_x_step
+
+        \argmin_{\mb{x}} \; f(\mb{x}) + \sum_i \frac{\rho_i}{2}
+        \norm{\mb{z}^{(k)}_i - \mb{u}^{(k)}_i - C_i \mb{x}}_2^2 \;.
+
+
+The available solvers for this problem are:
+
+* :class:`.admm.GenericSubproblemSolver`
+
+  This is the default subproblem solver as it is applicable in all cases. It
+  it is only suitable for relatively small-scale problems as it makes use of
+  :func:`.solver.minimize`, which wraps :func:`scipy.optimize.minimize`.
+
+
+* :class:`.admm.LinearSubproblemSolver`
+
+  This subproblem solver can be used when :math:`f` takes the form
+  :math:`\norm{\mb{A} \mb{x} - \mb{y}}^2_W`. It makes use of the conjugate
+  gradient method, and is significantly more efficient than
+  :class:`.admm.GenericSubproblemSolver` when it can be used.
+
+* :class:`.admm.CircularConvolveSolver`
+
+  This subproblem solver can be used when :math:`f` takes the form
+  :math:`\norm{\mb{A} \mb{x} - \mb{y}}^2_W` and :math:`\mb{A}` and all
+  the :math:`C_i` s are circulant (i.e., diagonalizable in a Fourier basis).
+
+
+For more details of these solvers and how to specify them, see the API
+reference page for :mod:`scico.admm`.
+
+
+
+PGM
+---
+
+The Proximal Gradient Method (PGM) :cite:`daubechies-2004-iterative`
+:cite:`beck-2010-gradient` and Accelerated Proximal Gradient Method (AcceleratedPGM) :cite:`beck-2009-fast` are algorithms for minimizing
+problems of the form
+
+.. math::
+   \argmin_{\mb{x}} f(\mb{x}) + g(\mb{x})
+
+where :math:`g` is convex and :math:`f` is smooth and convex. The
+corresponding SCICO solvers are :class:`PGM` and :class:`AcceleratedPGM`
+respectively. In most cases :class:`AcceleratedPGM` is expected to provide
+faster convergence. In both of these classes, :math:`f` and :math:`g` are
+both of type :class:`.Functional`, where :math:`f` must be differentiable,
+and :math:`g` must have a proximal operator defined.
+
+While ADMM provides significantly more flexibility than PGM, and often
+converges faster, the latter is preferred when solving the ADMM
+:math:`\mb{x}`-step is very computationally expensive, such as in the case of
+:math:`f(\mb{x}) = \norm{\mb{A} \mb{x} - \mb{y}}^2_W` where :math:`A` is
+large and does not have any special structure that would allow an efficient
+solution of :eq:`eq:admm_x_step`.
+
+
+
+Step Size Options
+^^^^^^^^^^^^^^^^^
+
+The step size (usually referred to in terms of its reciprocal, :math:`L`) for the gradient descent in :class:`PGM` can be adapted via
+Barzilai-Borwein methods (also called spectral methods) and iterative
+line search methods.
+
+The available step size policy classes are:
+
+* :class:`BBStepSize`
+
+  This implements the step size adaptation based on the Barzilai-Borwein
+  method :cite:`barzilai-1988-stepsize`. The step size :math:`\alpha` is
+  estimated as
+
+  .. math::
+     \mb{\Delta x} = \mb{x}_k - \mb{x}_{k-1} \; \\
+     \mb{\Delta g} = \nabla f(\mb{x}_k) - \nabla f (\mb{x}_{k-1}) \; \\
+     \alpha = \frac{\mb{\Delta x}^T \mb{\Delta g}}{\mb{\Delta g}^T
+     \mb{\Delta g}} \;\;.
+
+  Since the PGM solver uses the reciprocal of the step size, the value
+  :math:`L = 1 / \alpha` is returned.
+
+
+* :class:`AdaptiveBBStepSize`
+
+  This implements the adaptive Barzilai-Borwein method as introduced in
+  :cite:`zhou-2006-adaptive`. The adaptive step size rule computes
+
+  .. math::
+     \mb{\Delta x} = \mb{x}_k - \mb{x}_{k-1} \; \\
+     \mb{\Delta g} = \nabla f(\mb{x}_k) - \nabla f (\mb{x}_{k-1}) \; \\
+     \alpha^{\mathrm{BB1}} = \frac{\mb{\Delta x}^T \mb{\Delta x}}
+     {\mb{\Delta x}^T \mb{\Delta g}} \; \\
+     \alpha^{\mathrm{BB2}} = \frac{\mb{\Delta x}^T \mb{\Delta g}}
+     {\mb{\Delta g}^T \mb{\Delta g}} \;\;.
+
+  The determination of the new step size is made via the rule
+
+  .. math::
+     \alpha = \left\{ \begin{matrix} \alpha^{\mathrm{BB2}} \;, &
+     \mathrm{~if~} \alpha^{\mathrm{BB2}} / \alpha^{\mathrm{BB1}}
+     < \kappa \; \\
+     \alpha^{\mathrm{BB1}} \;, & \mathrm{~otherwise} \end{matrix}
+     \right . \;\;,
+
+  with :math:`\kappa \in (0, 1)`.
+
+  Since the PGM solver uses the reciprocal of the step size, the value
+  :math:`L = 1 / \alpha` is returned.
+
+
+* :class:`LineSearchStepSize`
+
+  This implements the line search strategy described in :cite:`beck-2009-fast`.
+  This strategy estimates :math:`L` such that
+  :math:`f(\mb{x}) \leq \hat{f}_{L}(\mb{x})` is satisfied with
+  :math:`\hat{f}_{L}` a quadratic approximation to :math:`f` defined as
+
+  .. math::
+     \hat{f}_{L}(\mb{x}, \mb{y}) = f(\mb{y}) + \nabla f(\mb{y})^H
+     (\mb{x} - \mb{y}) + \frac{L}{2} \left\| \mb{x} - \mb{y}
+     \right\|_2^2 \;\;,
+
+  with :math:`\mb{x}` the potential new update and :math:`\mb{y}` the
+  current solution or current extrapolation (if using :class:`AcceleratedPGM`).
+
+
+* :class:`RobustLineSearchStepSize`
+
+  This implements the robust line search strategy described in
+  :cite:`florea-2017-robust`. This strategy estimates :math:`L` such that
+  :math:`f(\mb{x}) \leq \hat{f}_{L}(\mb{x})` is satisfied with
+  :math:`\hat{f}_{L}` a quadratic approximation to :math:`f` defined as
+
+  .. math::
+     \hat{f}_{L}(\mb{x}, \mb{y}) = f(\mb{y}) + \nabla f(\mb{y})^H
+     (\mb{x} - \mb{y}) + \frac{L}{2} \left\| \mb{x} - \mb{y} \right\|_2^2 \;\;,
+
+  with :math:`\mb{x}` the potential new update and :math:`\mb{y}` the
+  auxiliary extrapolation state. Note that this should only be used
+  with :class:`AcceleratedPGM`.
+
+
+For more details of these step size managers and how to specify them, see
+the API reference page for :mod:`scico.pgm`.
diff --git a/docs/source/overview.rst b/docs/source/overview.rst
@@ -61,23 +61,23 @@ please cite :cite:`pfister-2021-scico`
 in the source distribution).
 
 
-Anatomy of an Inverse Problem
------------------------------
+Inverse Problems
+----------------
 
-SCICO is designed to solve problems of the form
+SCICO is designed to solve inverse problems such as
 
 .. math::
 
-   \argmin_{\mb{x}} \; f(\mb{x}) + \sum_{k=1}^K g_k(C_k (\mb{x})),
+   \argmin_{\mb{x}} \; f(\mb{x}) + g(C \mb{x}) \;,
 
-with :math:`\mb{x} \in \mathbb{R}^{n}` representing the reconstruction,
-e.g., a 1D, 2D, 3D, or 3D+time image,
-:math:`C_k: \mathbb{R}^{n} \to \mathbb{R}^{m_k}`
-being regularization operators,
+where :math:`\mb{x} \in \mathbb{R}^{n}` represents the reconstruction,
+e.g., a 1D signal, 2D image, 3D volume, or 3D+time volume sequence,
+:math:`C: \mathbb{R}^{n} \to \mathbb{R}^{m}`
+is a regularization operator,
 and :math:`f: \mathbb{R}^{n} \to \mathbb{R}`
-and :math:`g_k: \mathbb{R}^{m_k} \to \mathbb{R}`
-being functionals associated with the data fidelity
-and regularization, respectively.
+and :math:`g: \mathbb{R}^{m} \to \mathbb{R}`
+are functionals associated with the data fidelity
+and regularization term, respectively.
 
 In a typical computational imaging problem,
 we have a `forward model` that models the data acquisition process.
@@ -89,7 +89,7 @@ in some appropriate sense.
 This discrepency is measured in our data fidelity, or `loss`, function
 
 .. Math::
-   f(\mb{x}) = L(A(\mb{x}), \mb{y}) \,,
+   f(\mb{x}) = L(A(\mb{x}), \mb{y}) \;,
 
 where :math:`L` is a `functional` that maps a vector (or array)
 into a scalar.
@@ -104,15 +104,17 @@ and the functional :math:`L` in a single object.
 A library of functionals and losses are implemented
 in :mod:`.functional` and :mod:`.loss`, respectively.
 
-The functionals :math:`g_r(C_r (\cdot))` are regularization functionals.
-The :math:`C_r` are operators, usually linear operators.
-Together,
-these terms encourage solutions that are "simple" in some sense.
+The functionals :math:`g(\cdot)` or :math:`g(C (\cdot))`
+are regularization functionals, and the :math:`C` are operators,
+usually linear operators.
+Together, these terms encourage solutions that are "simple" in some sense.
 A popular choice is to let :math:`g = \norm{ \cdot }_1`
-and :math:`C` be a :class:`.FiniteDifferece` linear operator;
-this combination promotes piecewise smooth solutions
+and :math:`C` be a :class:`.FiniteDifferece` linear operator,
+which promotes piecewise smooth solutions
 to the inverse problem.
 
+For more detail in these classes, see :ref:`classes`.
+
 
 Usage Examples
 --------------