Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Templated Optimize() #113

Closed
wants to merge 46 commits into from
Closed
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
e7c8515
Intermediate attempts.
rcurtin Jan 15, 2019
a8a00fd
Adapt some of the problem functions.
rcurtin Mar 21, 2019
cf13217
Allow L-BFGS to have different objective and gradient types.
rcurtin Mar 21, 2019
00f1b40
Re-add different L-BFGS tests.
rcurtin Mar 21, 2019
a6b7f8c
Make lines fit a little bit better.
rcurtin Mar 21, 2019
d11940c
Add MatType and GradType to decomposable functions.
rcurtin Mar 22, 2019
00e1dc3
Fix static checks to work with MatType/GradType.
rcurtin Mar 23, 2019
bb2d368
Actually these checks are not working right yet.
rcurtin Mar 26, 2019
4e7e65a
Fix incorrect type.
rcurtin Mar 26, 2019
fdb42c9
Templatize SGDTestFunction.
rcurtin Mar 26, 2019
a1dd582
Adapt SGD to have templated MatType and GradType. Works.
rcurtin Mar 26, 2019
fb076d6
Add utility Any class.
rcurtin Mar 27, 2019
fcafc11
Adapt AdaDelta and AdaGrad and associated test problems.
rcurtin Mar 27, 2019
261f71d
Update BigBatchSGD, Eve, FTML, Padam, and part of IQN.
rcurtin Apr 20, 2019
3f4ee7a
Refactor nearly all optimizers.
rcurtin Apr 30, 2019
57b1507
Add and update AugLagrangian and LRSDP tests.
rcurtin Apr 30, 2019
6e6bece
Fix failing compilation and tests.
rcurtin Apr 30, 2019
e6b7bbc
Uncomment FunctionType checks.
rcurtin May 11, 2019
ae269d8
Add FTML test.
rcurtin May 11, 2019
9cc1d24
Add checks for allowed MatTypes, etc.
rcurtin May 11, 2019
9a4f01c
Adapt optimizers to use new checks.
rcurtin May 11, 2019
31b6551
Add FTML test.
rcurtin May 11, 2019
44a51bb
Update SDP tests.
rcurtin May 11, 2019
3c2acbb
Update documentation where needed.
rcurtin May 11, 2019
cd2f801
Re-add and adapt FunctionTest.
rcurtin May 11, 2019
71acbe4
Merge remote-tracking branch 'origin/master' into templated_optimize
rcurtin May 11, 2019
743ea3a
Adapt SPSA.
rcurtin May 12, 2019
03ac6d9
Merge remote-tracking branch 'origin/master' into templated_optimize
rcurtin May 12, 2019
98e2714
Adapt AdaptiveStepsize to use a Policy to store MatTypes.
rcurtin May 12, 2019
3c3bdd4
Disable tests that will fail when we have a too-old Armadillo version.
rcurtin May 13, 2019
39c965f
Fix notes from Conrad's review.
rcurtin May 13, 2019
d7cb084
Fix documentation: also document new template parameters.
rcurtin May 15, 2019
1881760
Use parent members directly.
rcurtin May 15, 2019
70f32d2
Adapt DE optimizer (must have missed it).
rcurtin May 15, 2019
381eff1
Update deprecated definition to match Armadillo.
rcurtin May 15, 2019
92e2417
Merge remote-tracking branch 'origin/master' into templated_optimize
rcurtin May 15, 2019
1a0b628
Update QHAdam and QHSGD.
rcurtin May 15, 2019
09db596
Merge remote-tracking branch 'origin/master' into templated_optimize
rcurtin Jun 3, 2019
e8b0aa2
Use ElemType.
rcurtin Jun 4, 2019
5454e99
Tune test to prevent failures.
rcurtin Jun 4, 2019
01f3dd8
Fix use of ElemType---could this have caused slow convergence?
rcurtin Jun 4, 2019
e51b5c0
Remove unnecessary parts of file (or some of them at least).
rcurtin Jun 4, 2019
00e2680
Recomment accidentally-committed random seed setting.
rcurtin Jun 4, 2019
3c72dae
Merge branch 'master' into templated_optimize
rcurtin Jul 10, 2019
a8a2ff0
Minor type fixing.
rcurtin Jul 10, 2019
00c0c79
This change caused more problems than it fixed.
rcurtin Jul 10, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
92 changes: 91 additions & 1 deletion doc/function_types.md
Expand Up @@ -32,7 +32,7 @@ Each of these optimizers has an `Optimize()` function that is called as
`Optimize()` is called, `x` will hold the final result of the optimization
(that is, the best `x` found that minimizes `f(x)`).

#### Example: Linear Regression
#### Example: squared function optimization

An example program that implements the objective function f(x) = 2 |x|^2 is
shown below, using the simulated annealing optimizer.
Expand Down Expand Up @@ -950,3 +950,93 @@ int main()
std::cout << "SDP optimized with objective " << obj << "." << std::endl;
}
```

## Alternate matrix types

All of the examples above (and throughout the rest of the documentation)
generally assume that the matrix being optimized has type `arma::mat`. But
ensmallen's optimizers are capable of optimizing more types than just dense
Armadillo matrices. In fact, the full signature of each optimizer's
`Optimize()` method is this:

```
template<typename FunctionType, typename MatType>
typename MatType::elem_type Optimize(FunctionType& function,
MatType& coordinates);
```

The return type, `typename MatType::elem_type`, is just the numeric type held by
the given matrix type. So, for `arma::mat`, the return type is just `double`.
In addition, optimizers for differentiable functions have a third template
parameter, `GradType`, which specifies the type of the gradient. `GradType` can
be manually specified in the situation where, e.g., a sparse gradient is
desired.

It is easy to write a function to optimize, e.g., an `arma::fmat`. Here is an
example, adapted from the `SquaredFunction` example from the
[arbitrary function documentation](#example__squared_function_optimization).

```c++
#include <ensmallen.hpp>

class SquaredFunction
{
public:
// This returns f(x) = 2 |x|^2.
float Evaluate(const arma::fmat& x)
{
return 2 * std::pow(arma::norm(x), 2.0);
}

void Gradient(const arma::fmat& x, arma::fmat& gradient)
{
gradient = 4 * x;
}
};

int main()
{
// The minimum is at x = [0 0 0]. Our initial point is chosen to be
// [1.0, -1.0, 1.0].
arma::fmat x("1.0 -1.0 1.0");

// Create simulated annealing optimizer with default options.
// The ens::SA<> type can be replaced with any suitable ensmallen optimizer
// that is able to handle arbitrary functions.
ens::L_BFGS<> optimizer;
SquaredFunction f; // Create function to be optimized.
optimizer.Optimize(f, x); // The optimizer will infer arma::fmat!

std::cout << "Minimum of squared function found with simulated annealing is "
<< x;
}
```

Note that we have simply changed the `SquaredFunction` to accept `arma::fmat`
instead of `arma::mat` as parameters to `Evaluate()`, and the return type has
accordingly been changed to `float` from `double`. It would even be possible to
optimize functions with sparse coordinates by having `Evaluate()` take a sparse
matrix (i.e. `arma::sp_mat`).

If it were desired to represent the gradient as a sparse type, the `Gradient()`
function would need to be modified to take a sparse matrix (i.e. `arma::sp_mat`
or similar), and then you could call `optimizer.Optimize<SquaredFunction,
arma::mat, arma::sp_mat>(f, x);` to perform the optimization while using sparse
matrix types to represent the gradient. Using sparse `MatType` or `GradType`
should *only* be done when it is known that the objective matrix and/or
gradients will be sparse; otherwise the code may run very slow!

ensmallen will automatically infer `MatType` from the call to `Optimize()`, and
check that the given `FunctionType` has all of the necessary functions for the
given `MatType`, throwing a `static_assert` error if not. If you would like to
disable these checks, define the macro `ENS_DISABLE_TYPE_CHECKS` before
including ensmallen:

```
#define ENS_DISABLE_TYPE_CHECKS
#include <ensmallen.hpp>
```

This can be useful for situations where you know that the checks should be
ignored. However, be aware that the code may fail to compile and give more
confusing and difficult error messages!
24 changes: 14 additions & 10 deletions doc/optimizers.md
Expand Up @@ -1345,14 +1345,8 @@ programs.

#### Constructors

* `PrimalDualSolver<`_`SDPType`_`>(`_`sdp`_`)`
* `PrimalDualSolver<`_`SDPType`_`>(`_`sdp, initialX, initialYSparse, initialYDense, initialZ`_`)`

The _`SDPType`_ template parameter specifies the type of SDP to solve. The
`SDP<arma::mat>` and `SDP<arma::sp_mat>` classes are available for use; these
represent SDPs with dense and sparse `C` matrices, respectively. The `SDP<>`
class is detailed in the [semidefinite program
documentation](#semidefinite-programs).
* `PrimalDualSolver<>(`_`maxIterations`_`)`
* `PrimalDualSolver<>(`_`maxIterations, tau, normXzTol, primalInfeasTol, dualInfeasTol`_`)`

#### Attributes

Expand All @@ -1377,17 +1371,27 @@ optionally return the converged values for the dual variables.
* Invoke the optimization procedure, returning the converged values for the
* primal and dual variables.
*/
double Optimize(arma::mat& X,
template<typename SDPType>
double Optimize(SDPType& s,
arma::mat& X,
arma::vec& ySparse,
arma::vec& yDense,
arma::mat& Z);

/**
* Invoke the optimization procedure, and only return the primal variable.
*/
double Optimize(arma::mat& X);
template<typename SDPType>
double Optimize(SDPType& s, arma::mat& X);
```

The _`SDPType`_ template parameter specifies the type of SDP to solve. The
`SDP<arma::mat>` and `SDP<arma::sp_mat>` classes are available for use; these
represent SDPs with dense and sparse `C` matrices, respectively. The `SDP<>`
class is detailed in the [semidefinite program
documentation](#semidefinite-programs). _`SDPType`_ is automatically inferred
when `Optimize()` is called with an SDP.

#### See also:

* [Primal-dual interior-point methods for semidefinite programming](http://www.dtic.mil/dtic/tr/fulltext/u2/1020236.pdf)
Expand Down
2 changes: 2 additions & 0 deletions include/ensmallen.hpp
Expand Up @@ -59,6 +59,8 @@
#include "ensmallen_bits/ens_version.hpp"
#include "ensmallen_bits/log.hpp" // TODO: should move to another place

#include "ensmallen_bits/utility/any.hpp"

#include "ensmallen_bits/problems/problems.hpp" // TODO: should move to another place

#include "ensmallen_bits/ada_delta/ada_delta.hpp"
Expand Down
12 changes: 9 additions & 3 deletions include/ensmallen_bits/ada_delta/ada_delta.hpp
Expand Up @@ -83,14 +83,20 @@ class AdaDelta
* API consistency at compile time.
*
* @tparam DecomposableFunctionType Type of the function to optimize.
* @tparam MatType Type of matrix to optimize with.
* @tparam GradType Type of matrix to use to represent function gradients.
* @param function Function to optimize.
* @param iterate Starting point (will be modified).
* @return Objective value of the final point.
*/
template<typename DecomposableFunctionType>
double Optimize(DecomposableFunctionType& function, arma::mat& iterate)
template<typename DecomposableFunctionType,
typename MatType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the new template parameters aren't listed in the method documentation, perhaps this is intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, it was just an oversight, thanks. Fixed in d7cb084.

typename GradType = MatType>
typename MatType::elem_type Optimize(DecomposableFunctionType& function,
MatType& iterate)
{
return optimizer.Optimize(function, iterate);
return optimizer.Optimize<DecomposableFunctionType,
MatType, GradType>(function, iterate);
}

//! Get the step size.
Expand Down
114 changes: 65 additions & 49 deletions include/ensmallen_bits/ada_delta/ada_delta_update.hpp
Expand Up @@ -51,49 +51,6 @@ class AdaDeltaUpdate
// Nothing to do.
}

/**
* The Initialize method is called by SGD Optimizer method before the start of
* the iteration update process. In AdaDelta update policy, the mean squared
* and the delta mean squared gradient matrices are initialized to the zeros
* matrix with the same size as gradient matrix (see ens::SGD<>).
*
* @param rows Number of rows in the gradient matrix.
* @param cols Number of columns in the gradient matrix.
*/
void Initialize(const size_t rows, const size_t cols)
{
// Initialize empty matrices for mean sum of squares of parameter gradient.
meanSquaredGradient = arma::zeros<arma::mat>(rows, cols);
meanSquaredGradientDx = arma::zeros<arma::mat>(rows, cols);
}

/**
* Update step for SGD. The AdaDelta update dynamically adapts over time using
* only first order information. Additionally, AdaDelta requires no manual
* tuning of a learning rate.
*
* @param iterate Parameters that minimize the function.
* @param stepSize Step size to be used for the given iteration.
* @param gradient The gradient matrix.
*/
void Update(arma::mat& iterate,
const double stepSize,
const arma::mat& gradient)
{
// Accumulate gradient.
meanSquaredGradient *= rho;
meanSquaredGradient += (1 - rho) * (gradient % gradient);
arma::mat dx = arma::sqrt((meanSquaredGradientDx + epsilon) /
(meanSquaredGradient + epsilon)) % gradient;

// Accumulate updates.
meanSquaredGradientDx *= rho;
meanSquaredGradientDx += (1 - rho) * (dx % dx);

// Apply update.
iterate -= (stepSize * dx);
}

//! Get the smoothing parameter.
double Rho() const { return rho; }
//! Modify the smoothing parameter.
Expand All @@ -104,18 +61,77 @@ class AdaDeltaUpdate
//! Modify the value used to initialise the mean squared gradient parameter.
double& Epsilon() { return epsilon; }

/**
* The UpdatePolicyType policy classes must contain an internal 'Policy'
* template class with two template arguments: MatType and GradType. This is
* instantiated at the start of the optimization, and holds parameters
* specific to an individual optimization.
*/
template<typename MatType, typename GradType>
class Policy
{
public:
/**
* This constructor is called by the SGD optimizer method before the start
* of the iteration update process. In AdaDelta update policy, the mean
* squared and the delta mean squared gradient matrices are initialized to
* the zeros matrix with the same size as gradient matrix (see ens::SGD<>).
*
* @param parent AdaDeltaUpdate object.
* @param rows Number of rows in the gradient matrix.
* @param cols Number of columns in the gradient matrix.
*/
Policy(AdaDeltaUpdate& parent, const size_t rows, const size_t cols) :
parent(parent)
{
meanSquaredGradient.zeros(rows, cols);
meanSquaredGradientDx.zeros(rows, cols);
}

/**
* Update step for SGD. The AdaDelta update dynamically adapts over time
* using only first order information. Additionally, AdaDelta requires no
* manual tuning of a learning rate.
*
* @param iterate Parameters that minimize the function.
* @param stepSize Step size to be used for the given iteration.
* @param gradient The gradient matrix.
*/
void Update(MatType& iterate,
const double stepSize,
const GradType& gradient)
{
// Accumulate gradient.
meanSquaredGradient *= parent.rho;
meanSquaredGradient += (1 - parent.rho) * (gradient % gradient);
GradType dx = arma::sqrt((meanSquaredGradientDx + parent.epsilon) /
(meanSquaredGradient + parent.epsilon)) % gradient;

// Accumulate updates.
meanSquaredGradientDx *= parent.rho;
meanSquaredGradientDx += (1 - parent.rho) * (dx % dx);

// Apply update.
iterate -= (stepSize * dx);
}

private:
// The instantiated parent class.
AdaDeltaUpdate& parent;

// The mean squared gradient matrix.
GradType meanSquaredGradient;

// The delta mean squared gradient matrix.
GradType meanSquaredGradientDx;
};

private:
// The smoothing parameter.
double rho;

// The epsilon value used to initialise the mean squared gradient parameter.
double epsilon;

// The mean squared gradient matrix.
arma::mat meanSquaredGradient;

// The delta mean squared gradient matrix.
arma::mat meanSquaredGradientDx;
};

} // namespace ens
Expand Down
12 changes: 9 additions & 3 deletions include/ensmallen_bits/ada_grad/ada_grad.hpp
Expand Up @@ -79,14 +79,20 @@ class AdaGrad
* objective value is returned.
*
* @tparam DecomposableFunctionType Type of the function to optimize.
* @tparam MatType Type of matrix to optimize with.
* @tparam GradType Type of matrix to use to represent function gradients.
* @param function Function to optimize.
* @param iterate Starting point (will be modified).
* @return Objective value of the final point.
*/
template<typename DecomposableFunctionType>
double Optimize(DecomposableFunctionType& function, arma::mat& iterate)
template<typename DecomposableFunctionType,
typename MatType,
typename GradType = MatType>
typename MatType::elem_type Optimize(DecomposableFunctionType& function,
MatType& iterate)
{
return optimizer.Optimize(function, iterate);
return optimizer.Optimize<DecomposableFunctionType,
MatType, GradType>(function, iterate);
}

//! Get the step size.
Expand Down