mlpack · rcurtin · Jan 15, 2019 · Mar 21, 2019 · Mar 21, 2019 · Mar 21, 2019
diff --git a/doc/function_types.md b/doc/function_types.md
@@ -32,7 +32,7 @@ Each of these optimizers has an `Optimize()` function that is called as
 `Optimize()` is called, `x` will hold the final result of the optimization
 (that is, the best `x` found that minimizes `f(x)`).
 
-#### Example: Linear Regression
+#### Example: squared function optimization
 
 An example program that implements the objective function f(x) = 2 |x|^2 is
 shown below, using the simulated annealing optimizer.
@@ -950,3 +950,93 @@ int main()
   std::cout << "SDP optimized with objective " << obj << "." << std::endl;
 }
 ```
+
+## Alternate matrix types
+
+All of the examples above (and throughout the rest of the documentation)
+generally assume that the matrix being optimized has type `arma::mat`.  But
+ensmallen's optimizers are capable of optimizing more types than just dense
+Armadillo matrices.  In fact, the full signature of each optimizer's
+`Optimize()` method is this:
+
+```
+template<typename FunctionType, typename MatType>
+typename MatType::elem_type Optimize(FunctionType& function,
+                                     MatType& coordinates);
+```
+
+The return type, `typename MatType::elem_type`, is just the numeric type held by
+the given matrix type.  So, for `arma::mat`, the return type is just `double`.
+In addition, optimizers for differentiable functions have a third template
+parameter, `GradType`, which specifies the type of the gradient.  `GradType` can
+be manually specified in the situation where, e.g., a sparse gradient is
+desired.
+
+It is easy to write a function to optimize, e.g., an `arma::fmat`.  Here is an
+example, adapted from the `SquaredFunction` example from the
+[arbitrary function documentation](#example__squared_function_optimization).
+
+```c++
+#include <ensmallen.hpp>
+
+class SquaredFunction
+{
+ public:
+  // This returns f(x) = 2 |x|^2.
+  float Evaluate(const arma::fmat& x)
+  {
+    return 2 * std::pow(arma::norm(x), 2.0);
+  }
+
+  void Gradient(const arma::fmat& x, arma::fmat& gradient)
+  {
+    gradient = 4 * x;
+  }
+};
+
+int main()
+{
+  // The minimum is at x = [0 0 0].  Our initial point is chosen to be 
+  // [1.0, -1.0, 1.0].
+  arma::fmat x("1.0 -1.0 1.0");
+
+  // Create simulated annealing optimizer with default options.
+  // The ens::SA<> type can be replaced with any suitable ensmallen optimizer
+  // that is able to handle arbitrary functions.
+  ens::L_BFGS<> optimizer;
+  SquaredFunction f; // Create function to be optimized.
+  optimizer.Optimize(f, x); // The optimizer will infer arma::fmat!
+
+  std::cout << "Minimum of squared function found with simulated annealing is "
+      << x;
+}
+```
+
+Note that we have simply changed the `SquaredFunction` to accept `arma::fmat`
+instead of `arma::mat` as parameters to `Evaluate()`, and the return type has
+accordingly been changed to `float` from `double`.  It would even be possible to
+optimize functions with sparse coordinates by having `Evaluate()` take a sparse
+matrix (i.e. `arma::sp_mat`).
+
+If it were desired to represent the gradient as a sparse type, the `Gradient()`
+function would need to be modified to take a sparse matrix (i.e. `arma::sp_mat`
+or similar), and then you could call `optimizer.Optimize<SquaredFunction,
+arma::mat, arma::sp_mat>(f, x);` to perform the optimization while using sparse
+matrix types to represent the gradient.  Using sparse `MatType` or `GradType`
+should *only* be done when it is known that the objective matrix and/or
+gradients will be sparse; otherwise the code may run very slow!
+
+ensmallen will automatically infer `MatType` from the call to `Optimize()`, and
+check that the given `FunctionType` has all of the necessary functions for the
+given `MatType`, throwing a `static_assert` error if not.  If you would like to
+disable these checks, define the macro `ENS_DISABLE_TYPE_CHECKS` before
+including ensmallen:
+
+```
+#define ENS_DISABLE_TYPE_CHECKS
+#include <ensmallen.hpp>
+```
+
+This can be useful for situations where you know that the checks should be
+ignored.  However, be aware that the code may fail to compile and give more
+confusing and difficult error messages!
diff --git a/doc/optimizers.md b/doc/optimizers.md
@@ -1345,14 +1345,8 @@ programs.
 
 #### Constructors
 
- * `PrimalDualSolver<`_`SDPType`_`>(`_`sdp`_`)`
- * `PrimalDualSolver<`_`SDPType`_`>(`_`sdp, initialX, initialYSparse, initialYDense, initialZ`_`)`
-
-The _`SDPType`_ template parameter specifies the type of SDP to solve.  The
-`SDP<arma::mat>` and `SDP<arma::sp_mat>` classes are available for use; these
-represent SDPs with dense and sparse `C` matrices, respectively.  The `SDP<>`
-class is detailed in the [semidefinite program
-documentation](#semidefinite-programs).
+ * `PrimalDualSolver<>(`_`maxIterations`_`)`
+ * `PrimalDualSolver<>(`_`maxIterations, tau, normXzTol, primalInfeasTol, dualInfeasTol`_`)`
 
 #### Attributes
 
@@ -1377,17 +1371,27 @@ optionally return the converged values for the dual variables.
  * Invoke the optimization procedure, returning the converged values for the
  * primal and dual variables.
  */
-double Optimize(arma::mat& X,
+template<typename SDPType>
+double Optimize(SDPType& s,
+                arma::mat& X,
                 arma::vec& ySparse,
                 arma::vec& yDense,
                 arma::mat& Z);
 
 /**
  * Invoke the optimization procedure, and only return the primal variable.
  */
-double Optimize(arma::mat& X);
+template<typename SDPType>
+double Optimize(SDPType& s, arma::mat& X);
 ```
 
+The _`SDPType`_ template parameter specifies the type of SDP to solve.  The
+`SDP<arma::mat>` and `SDP<arma::sp_mat>` classes are available for use; these
+represent SDPs with dense and sparse `C` matrices, respectively.  The `SDP<>`
+class is detailed in the [semidefinite program
+documentation](#semidefinite-programs).  _`SDPType`_ is automatically inferred
+when `Optimize()` is called with an SDP.
+
 #### See also:
 
  * [Primal-dual interior-point methods for semidefinite programming](http://www.dtic.mil/dtic/tr/fulltext/u2/1020236.pdf)

diff --git a/include/ensmallen.hpp b/include/ensmallen.hpp
@@ -59,6 +59,8 @@
 #include "ensmallen_bits/ens_version.hpp"
 #include "ensmallen_bits/log.hpp" // TODO: should move to another place
 
+#include "ensmallen_bits/utility/any.hpp"
+
 #include "ensmallen_bits/problems/problems.hpp" // TODO: should move to another place
 
 #include "ensmallen_bits/ada_delta/ada_delta.hpp"

diff --git a/include/ensmallen_bits/ada_delta/ada_delta.hpp b/include/ensmallen_bits/ada_delta/ada_delta.hpp
@@ -83,14 +83,20 @@ class AdaDelta
    * API consistency at compile time.
    *
    * @tparam DecomposableFunctionType Type of the function to optimize.
+   * @tparam MatType Type of matrix to optimize with.
+   * @tparam GradType Type of matrix to use to represent function gradients.
    * @param function Function to optimize.
    * @param iterate Starting point (will be modified).
    * @return Objective value of the final point.
    */
-  template<typename DecomposableFunctionType>
-  double Optimize(DecomposableFunctionType& function, arma::mat& iterate)
+  template<typename DecomposableFunctionType,
+           typename MatType,
+           typename GradType = MatType>
+  typename MatType::elem_type Optimize(DecomposableFunctionType& function,
+                                       MatType& iterate)
   {
-    return optimizer.Optimize(function, iterate);
+    return optimizer.Optimize<DecomposableFunctionType,
+                              MatType, GradType>(function, iterate);
   }
 
   //! Get the step size.

diff --git a/include/ensmallen_bits/ada_delta/ada_delta_update.hpp b/include/ensmallen_bits/ada_delta/ada_delta_update.hpp
@@ -51,49 +51,6 @@ class AdaDeltaUpdate
     // Nothing to do.
   }
 
-  /**
-   * The Initialize method is called by SGD Optimizer method before the start of
-   * the iteration update process. In AdaDelta update policy, the mean squared
-   * and the delta mean squared gradient matrices are initialized to the zeros
-   * matrix with the same size as gradient matrix (see ens::SGD<>).
-   *
-   * @param rows Number of rows in the gradient matrix.
-   * @param cols Number of columns in the gradient matrix.
-   */
-  void Initialize(const size_t rows, const size_t cols)
-  {
-    // Initialize empty matrices for mean sum of squares of parameter gradient.
-    meanSquaredGradient = arma::zeros<arma::mat>(rows, cols);
-    meanSquaredGradientDx = arma::zeros<arma::mat>(rows, cols);
-  }
-
-  /**
-   * Update step for SGD. The AdaDelta update dynamically adapts over time using
-   * only first order information. Additionally, AdaDelta requires no manual
-   * tuning of a learning rate.
-   *
-   * @param iterate Parameters that minimize the function.
-   * @param stepSize Step size to be used for the given iteration.
-   * @param gradient The gradient matrix.
-   */
-  void Update(arma::mat& iterate,
-              const double stepSize,
-              const arma::mat& gradient)
-  {
-    // Accumulate gradient.
-    meanSquaredGradient *= rho;
-    meanSquaredGradient += (1 - rho) * (gradient % gradient);
-    arma::mat dx = arma::sqrt((meanSquaredGradientDx + epsilon) /
-        (meanSquaredGradient + epsilon)) % gradient;
-
-    // Accumulate updates.
-    meanSquaredGradientDx *= rho;
-    meanSquaredGradientDx += (1 - rho) * (dx % dx);
-
-    // Apply update.
-    iterate -= (stepSize * dx);
-  }
-
   //! Get the smoothing parameter.
   double Rho() const { return rho; }
   //! Modify the smoothing parameter.
@@ -104,18 +61,77 @@ class AdaDeltaUpdate
   //! Modify the value used to initialise the mean squared gradient parameter.
   double& Epsilon() { return epsilon; }
 
+  /**
+   * The UpdatePolicyType policy classes must contain an internal 'Policy'
+   * template class with two template arguments: MatType and GradType.  This is
+   * instantiated at the start of the optimization, and holds parameters
+   * specific to an individual optimization.
+   */
+  template<typename MatType, typename GradType>
+  class Policy
+  {
+   public:
+    /**
+     * This constructor is called by the SGD optimizer method before the start
+     * of the iteration update process. In AdaDelta update policy, the mean
+     * squared and the delta mean squared gradient matrices are initialized to
+     * the zeros matrix with the same size as gradient matrix (see ens::SGD<>).
+     *
+     * @param parent AdaDeltaUpdate object.
+     * @param rows Number of rows in the gradient matrix.
+     * @param cols Number of columns in the gradient matrix.
+     */
+    Policy(AdaDeltaUpdate& parent, const size_t rows, const size_t cols) :
+        parent(parent)
+    {
+      meanSquaredGradient.zeros(rows, cols);
+      meanSquaredGradientDx.zeros(rows, cols);
+    }
+
+    /**
+     * Update step for SGD. The AdaDelta update dynamically adapts over time
+     * using only first order information. Additionally, AdaDelta requires no
+     * manual tuning of a learning rate.
+     *
+     * @param iterate Parameters that minimize the function.
+     * @param stepSize Step size to be used for the given iteration.
+     * @param gradient The gradient matrix.
+     */
+    void Update(MatType& iterate,
+                const double stepSize,
+                const GradType& gradient)
+    {
+      // Accumulate gradient.
+      meanSquaredGradient *= parent.rho;
+      meanSquaredGradient += (1 - parent.rho) * (gradient % gradient);
+      GradType dx = arma::sqrt((meanSquaredGradientDx + parent.epsilon) /
+          (meanSquaredGradient + parent.epsilon)) % gradient;
+
+      // Accumulate updates.
+      meanSquaredGradientDx *= parent.rho;
+      meanSquaredGradientDx += (1 - parent.rho) * (dx % dx);
+
+      // Apply update.
+      iterate -= (stepSize * dx);
+    }
+
+   private:
+    // The instantiated parent class.
+    AdaDeltaUpdate& parent;
+
+    // The mean squared gradient matrix.
+    GradType meanSquaredGradient;
+
+    // The delta mean squared gradient matrix.
+    GradType meanSquaredGradientDx;
+  };
+
  private:
   // The smoothing parameter.
   double rho;
 
   // The epsilon value used to initialise the mean squared gradient parameter.
   double epsilon;
-
-  // The mean squared gradient matrix.
-  arma::mat meanSquaredGradient;
-
-  // The delta mean squared gradient matrix.
-  arma::mat meanSquaredGradientDx;
 };
 
 } // namespace ens

diff --git a/include/ensmallen_bits/ada_grad/ada_grad.hpp b/include/ensmallen_bits/ada_grad/ada_grad.hpp
@@ -79,14 +79,20 @@ class AdaGrad
    * objective value is returned.
    *
    * @tparam DecomposableFunctionType Type of the function to optimize.
+   * @tparam MatType Type of matrix to optimize with.
+   * @tparam GradType Type of matrix to use to represent function gradients.
    * @param function Function to optimize.
    * @param iterate Starting point (will be modified).
    * @return Objective value of the final point.
    */
-  template<typename DecomposableFunctionType>
-  double Optimize(DecomposableFunctionType& function, arma::mat& iterate)
+  template<typename DecomposableFunctionType,
+           typename MatType,
+           typename GradType = MatType>
+  typename MatType::elem_type Optimize(DecomposableFunctionType& function,
+                                       MatType& iterate)
   {
-    return optimizer.Optimize(function, iterate);
+    return optimizer.Optimize<DecomposableFunctionType,
+                              MatType, GradType>(function, iterate);
   }
 
   //! Get the step size.