CCSA/MMA improvements: docs, rho_init, initial_step (#561)

* update algorithms manual to generaly recommend CCSAQ over MMA * add rho_init param to CCSA * rm obscure reference to non-exported name * support initial step size in CCSA/MMA * support initial_step in tests
stevengj · Aug 9, 2024 · 0f9cc09 · 0f9cc09
1 parent 95172af
commit 0f9cc09
Show file tree

Hide file tree

Showing 7 changed files with 45 additions and 19 deletions.
diff --git a/doc/docs/NLopt_Algorithms.md b/doc/docs/NLopt_Algorithms.md
@@ -305,18 +305,21 @@ My implementation of the globally-convergent method-of-moving-asymptotes (MMA) a
 
 This is an improved CCSA ("conservative convex separable approximation") variant of the original MMA algorithm published by Svanberg in 1987, which has become popular for topology optimization. (*Note:* "globally convergent" does *not* mean that this algorithm converges to the global optimum; it means that it is guaranteed to converge to *some* local minimum from any feasible starting point.)
 
-At each point **x**, MMA forms a local approximation using the gradient of *f* and the constraint functions, plus a quadratic "penalty" term to make the approximations "conservative" (upper bounds for the exact functions). The precise approximation MMA forms is difficult to describe in a few words, because it includes nonlinear terms consisting of a poles at some distance from *x* (outside of the current trust region), almost a kind of Padé approximant. The main point is that the approximation is both convex and separable, making it trivial to solve the approximate optimization by a dual method. Optimizing the approximation leads to a new candidate point **x**. The objective and constraints are evaluated at the candidate point. If the approximations were indeed conservative (upper bounds for the actual functions at the candidate point), then the process is restarted at the new **x**. Otherwise, the approximations are made more conservative (by increasing the penalty term) and re-optimized.
+At each point **x**, CCSA forms a local approximation using the gradient of *f* and the constraint functions, plus a quadratic "penalty" term to make the approximations "conservative" (upper bounds for the exact functions). The precise approximation MMA forms is difficult to describe in a few words, because it includes nonlinear terms consisting of a poles at some distance from *x* (outside of the current trust region), almost a kind of Padé approximant. The main point is that the approximation is both convex and separable, making it trivial to solve the approximate optimization by a dual method. Optimizing the approximation leads to a new candidate point **x**. The objective and constraints are evaluated at the candidate point. If the approximations were indeed conservative (upper bounds for the actual functions at the candidate point), then the process is restarted at the new **x**. Otherwise, the approximations are made more conservative (by increasing the penalty term) and re-optimized.
 
 (If you contact [Professor Svanberg](https://people.kth.se/~krille/), he has been willing in the past to graciously provide you with his original code, albeit under restrictions on commercial use or redistribution. The MMA implementation in NLopt, however, is completely independent of Svanberg's, whose code we have not examined; any bugs are my own, of course.)
 
-I also implemented another CCSA algorithm from the same paper, `NLOPT_LD_CCSAQ`: instead of constructing local MMA approximations, it constructs simple quadratic approximations (or rather, affine approximations plus a quadratic penalty term to stay conservative). This is the ccsa_quadratic code. It seems to have similar convergence rates to MMA for most problems, which is not surprising as they are both essentially similar. However, for the quadratic variant I implemented the possibility of [preconditioning](NLopt_Reference.md#preconditioning-with-approximate-hessians): including a user-supplied Hessian approximation in the local model. It is easy to incorporate this into the proof in Svanberg's paper, and to show that global convergence is still guaranteed as long as the user's "Hessian" is positive semidefinite, and in practice it can greatly improve convergence if the preconditioner is a good approximation for the real Hessian (at least for the eigenvectors of the largest eigenvalues).
+I also implemented another CCSA algorithm from the same paper, `NLOPT_LD_CCSAQ`: instead of constructing local MMA approximations, it constructs simple quadratic approximations (or rather, affine approximations plus a quadratic penalty term to stay conservative). It seems to have similar convergence rates to MMA for most problems, which is not surprising as they are both essentially similar.  My CCSAQ implementation seems to behave better in some ways that the MMA variant, in fact, so I generally recommend trying `NLOPT_LD_CCSAQ` first rather than `NLOPT_LD_MMA`.
+
+For the quadratic variant I also implemented the possibility of [preconditioning](NLopt_Reference.md#preconditioning-with-approximate-hessians): including a user-supplied Hessian approximation in the local model. It is easy to incorporate this into the proof in Svanberg's paper, and to show that global convergence is still guaranteed as long as the user's "Hessian" is positive semidefinite, and in practice it can greatly improve convergence if the preconditioner is a good approximation for the real Hessian (at least for the eigenvectors of the largest eigenvalues).
 
 The `NLOPT_LD_MMA` and `NLOPT_LD_CCSAQ` algorithms support the following internal parameters, which can be
 specified using the [`nlopt_set_param` API](NLopt_Reference#algorithm-specific-parameters):
 
 * `inner_maxeval`: If ≥ 0, gives maximum number of "inner" iterations of the algorithm where it tries to ensure that its approximatations are "conservative"; defaults to `0` (no limit).   It can be useful to specify a finite number (e.g. `5` or `10`) for this parameter if inaccuracies in your gradient or objective function are preventing the algorithm from making progress.
 * `dual_algorithm` (defaults to `NLOPT_LD_MMA`), `dual_ftol_rel` (defaults to `1e-14`), `dual_ftol_abs` (defaults to `0`), `dual_xtol_rel` (defaults to `0`), `dual_xtol_abs` (defaults to `0`), `dual_maxeval` (defaults to `100000`): These specify how the algorithm internally solves the "dual" optimization problem for its approximate objective.   Because this subsidiary solve requires no evaluations of the user's objective function, it is typically fast enough that we can solve it to high precision without worrying too much about the details.  Howeve,r in high-dimensional problems you may notice that MMA/CCSA is taking a long time between optimization steps, in which case you may want to increase `dual_ftol_rel` or make other changes.   If these parameters are not specified, NLopt takes them from the [subsidiary-optimizer algorithm](NLopt_Reference#localsubsidiary-optimization-algorithm) if that has been specified, and otherwise uses the defaults indicated here.
 * `verbosity`: If > 0, causes the algorithm to print internal status information on each iteration.
+* `rho_init`: if specified, should be a rough upper bound for the second derivative (the biggest eigenvalue of the Hessian of the objective or constraints); defaults to `1.0`.   CCSA/MMA will adaptively adjust this as the optimization progresses, so even it if `rho_init` is completely wrong the algorithm will still converge.  A `rho_init` that is too large will cause the algorithm to take overly small steps at the beginning, while a `rho_init` that is too small will cause it to take overly large steps (and have to backtrack) at the beginning.   Similarly, you can also use the "initial stepsize" option ([NLopt reference](NLopt_Reference.md#initial-step-size)) to control the maximum size of the initial steps (half the diameter of the trust region).
 
 ### SLSQP
 
@@ -396,5 +399,3 @@ The subsidiary optimization algorithm is specified by the `nlopt_set_local_optim
 The augmented Lagrangian method is specified in NLopt as `NLOPT_AUGLAG`. We also provide a variant, `NLOPT_AUGLAG_EQ`, that only uses penalty functions for equality constraints, while inequality constraints are passed through to the subsidiary algorithm to be handled directly; in this case, the subsidiary algorithm must handle inequality constraints (e.g. MMA or COBYLA).
 
 While NLopt uses an independent re-implementation of the Birgin and Martínez algorithm, those authors provide their own free-software implementation of the method as part of the [TANGO](http://www.ime.usp.br/~egbirgin/tango/) project, and implementations can also be found in [semi-free](http://www.gnu.org/philosophy/categories.html#semi-freeSoftware) packages like [LANCELOT](http://www.numerical.rl.ac.uk/lancelot/blurb.html).
-
-
diff --git a/src/algs/mma/ccsa_quadratic.c b/src/algs/mma/ccsa_quadratic.c
@@ -218,7 +218,8 @@ nlopt_result ccsa_quadratic_minimize(
      double *x, /* in: initial guess, out: minimizer */
      double *minf,
      nlopt_stopping *stop,
-     nlopt_opt dual_opt, int inner_maxeval, unsigned verbose)
+     nlopt_opt dual_opt, int inner_maxeval, unsigned verbose, double rho_init,
+	 const double *sigma_init)
 {
      nlopt_result ret = NLOPT_SUCCESS;
      double *xcur, rho, *sigma, *dfdx, *dfdx_cur, *xprev, *xprevprev, fcur;
@@ -322,14 +323,16 @@ nlopt_result ccsa_quadratic_minimize(
      }
 
      for (j = 0; j < n; ++j) {
-	  if (nlopt_isinf(ub[j]) || nlopt_isinf(lb[j]))
+	  if (sigma_init && sigma_init[j] > 0)
+	  	   sigma[j] = sigma_init[j];
+	  else if (nlopt_isinf(ub[j]) || nlopt_isinf(lb[j]))
 	       sigma[j] = 1.0; /* arbitrary default */
 	  else
 	       sigma[j] = 0.5 * (ub[j] - lb[j]);
      }
-     rho = 1.0;
+     rho = rho_init;
      for (i = 0; i < m; ++i) {
-	  rhoc[i] = 1.0;
+	  rhoc[i] = rho_init;
 	  dual_lb[i] = y[i] = 0.0;
 	  dual_ub[i] = HUGE_VAL;
      }

diff --git a/src/algs/mma/mma.c b/src/algs/mma/mma.c
@@ -148,7 +148,8 @@ nlopt_result mma_minimize(unsigned n, nlopt_func f, void *f_data,
 			  double *x, /* in: initial guess, out: minimizer */
 			  double *minf,
 			  nlopt_stopping *stop,
-			  nlopt_opt dual_opt, int inner_maxeval, unsigned verbose)
+			  nlopt_opt dual_opt, int inner_maxeval, unsigned verbose, double rho_init,
+			  const double *sigma_init)
 {
      nlopt_result ret = NLOPT_SUCCESS;
      double *xcur, rho, *sigma, *dfdx, *dfdx_cur, *xprev, *xprevprev, fcur;
@@ -198,14 +199,16 @@ nlopt_result mma_minimize(unsigned n, nlopt_func f, void *f_data,
      dd.gcval = gcval;
 
      for (j = 0; j < n; ++j) {
-	  if (nlopt_isinf(ub[j]) || nlopt_isinf(lb[j]))
+	  if (sigma_init && sigma_init[j] > 0)
+	  	   sigma[j] = sigma_init[j];
+	  else if (nlopt_isinf(ub[j]) || nlopt_isinf(lb[j]))
 	       sigma[j] = 1.0; /* arbitrary default */
 	  else
 	       sigma[j] = 0.5 * (ub[j] - lb[j]);
      }
-     rho = 1.0;
+     rho = rho_init;
      for (i = 0; i < m; ++i) {
-	  rhoc[i] = 1.0;
+	  rhoc[i] = rho_init;
 	  dual_lb[i] = y[i] = 0.0;
 	  dual_ub[i] = HUGE_VAL;
      }

diff --git a/src/algs/mma/mma.h b/src/algs/mma/mma.h
@@ -40,7 +40,8 @@ nlopt_result mma_minimize(unsigned n, nlopt_func f, void *f_data,
 			  double *x, /* in: initial guess, out: minimizer */
 			  double *minf,
 			  nlopt_stopping *stop,
-			  nlopt_opt dual_opt, int inner_maxeval, unsigned verbose);
+			  nlopt_opt dual_opt, int inner_maxeval, unsigned verbose, double rho_init,
+                 const double *sigma_init);
 
 nlopt_result ccsa_quadratic_minimize(
      unsigned n, nlopt_func f, void *f_data,
@@ -52,11 +53,11 @@ nlopt_result ccsa_quadratic_minimize(
      double *x, /* in: initial guess, out: minimizer */
      double *minf,
      nlopt_stopping *stop,
-     nlopt_opt dual_opt, int inner_maxeval, unsigned verbose);
+     nlopt_opt dual_opt, int inner_maxeval, unsigned verbose, double rho_init,
+     const double *sigma_init);
 
 #ifdef __cplusplus
 }  /* extern "C" */
 #endif /* __cplusplus */
 
 #endif
-
diff --git a/src/api/optimize.c b/src/api/optimize.c
@@ -656,8 +656,16 @@ static nlopt_result nlopt_optimize_(nlopt_opt opt, double *x, double *minf)
     case NLOPT_LD_MMA:
     case NLOPT_LD_CCSAQ:
         {
+            int inner_maxeval = (int)nlopt_get_param(opt, "inner_maxeval",0);
+            int verbosity = (int)nlopt_get_param(opt, "verbosity",0);
+            double rho_init = nlopt_get_param(opt, "rho_init",1.0);
             nlopt_opt dual_opt;
             nlopt_result ret;
+
+            if (!(rho_init > 0) && !nlopt_isinf(rho_init))
+                RETURN_ERR(NLOPT_INVALID_ARGS, opt, "rho_init must be positive and finite");
+            verbosity = verbosity < 0 ? 0 : verbosity;
+
 #define LO(param, def) (opt->local_opt ? opt->local_opt->param : (def))
             dual_opt = nlopt_create((nlopt_algorithm)nlopt_get_param(opt, "dual_algorithm", LO(algorithm, nlopt_local_search_alg_deriv)),
                                     nlopt_count_constraints(opt->m, opt->fc));
@@ -669,11 +677,10 @@ static nlopt_result nlopt_optimize_(nlopt_opt opt, double *x, double *minf)
             nlopt_set_xtol_abs1(dual_opt, nlopt_get_param(opt, "dual_xtol_abs", 0.0));
             nlopt_set_maxeval(dual_opt, nlopt_get_param(opt, "dual_maxeval", LO(maxeval, 100000)));
 #undef LO
-
             if (algorithm == NLOPT_LD_MMA)
-                ret = mma_minimize(n, f, f_data, opt->m, opt->fc, lb, ub, x, minf, &stop, dual_opt, (int)nlopt_get_param(opt, "inner_maxeval",0), (unsigned)nlopt_get_param(opt, "verbosity",0));
+                ret = mma_minimize(n, f, f_data, opt->m, opt->fc, lb, ub, x, minf, &stop, dual_opt, inner_maxeval, (unsigned)verbosity, rho_init, opt->dx);
             else
-                ret = ccsa_quadratic_minimize(n, f, f_data, opt->m, opt->fc, opt->pre, lb, ub, x, minf, &stop, dual_opt, (int)nlopt_get_param(opt, "inner_maxeval",0), (unsigned)nlopt_get_param(opt, "verbosity",0));
+                ret = ccsa_quadratic_minimize(n, f, f_data, opt->m, opt->fc, opt->pre, lb, ub, x, minf, &stop, dual_opt, inner_maxeval, (unsigned)verbosity, rho_init, opt->dx);
             nlopt_destroy(dual_opt);
             return ret;
         }

diff --git a/test/t_tutorial.cxx b/test/t_tutorial.cxx
@@ -51,6 +51,10 @@ int main() {
     return EXIT_FAILURE;
   }
 
+  // set a couple of other parameters
+  opt.set_param("rho_init", 0.5);
+  opt.set_initial_step(0.1);
+
   std::vector<double> x(2);
   x[0] = 1.234; x[1] = 5.678;
   double minf;

diff --git a/test/testopt.c b/test/testopt.c
@@ -49,6 +49,7 @@ extern "C" int feenableexcept(int EXCEPTS);
 
 static nlopt_algorithm algorithm = NLOPT_GN_DIRECT_L;
 static double ftol_rel = 0, ftol_abs = 0, xtol_rel = 0, xtol_abs = 0, minf_max_delta;
+static double initial_step = 0;
 static int maxeval = 1000, iterations = 1, center_start = 0;
 static double maxtime = 0.0;
 static double xinit_tol = -1;
@@ -235,6 +236,8 @@ static int test_function(int ifunc)
         nlopt_set_xtol_abs(opt, xtabs);
         nlopt_set_maxeval(opt, maxeval);
         nlopt_set_maxtime(opt, maxtime);
+        if (initial_step != 0)
+            nlopt_set_initial_step1(opt, initial_step);
         ret = nlopt_optimize(opt, x, &minf);
         printf("finished after %g seconds.\n", nlopt_seconds() - start);
         printf("return code %d from nlopt_minimize\n", ret);
@@ -290,6 +293,7 @@ static void usage(FILE * f)
             " -a <n> : use optimization algorithm <n>\n"
             " -o <n> : use objective function <n>\n"
             " -0 <x> : starting guess within <x> + (1+<x>) * optimum\n"
+            " -S <dx>: initial step size dx (default: none)\n"
             " -b <dim0,dim1,...>: eliminate given dims by equating bounds\n");
     fprintf(f,
             "     -c : starting guess at center of cell\n"
@@ -321,7 +325,7 @@ int main(int argc, char **argv)
     feenableexcept(FE_INVALID);
 #endif
 
-    while ((c = getopt(argc, argv, "hLvVCc0:r:a:o:i:e:t:x:X:f:F:m:b:")) != -1)
+    while ((c = getopt(argc, argv, "hLvVCc0:r:a:o:i:e:t:x:X:f:F:m:b:S:")) != -1)
         switch (c) {
         case 'h':
             usage(stdout);
@@ -389,6 +393,9 @@ int main(int argc, char **argv)
             center_start = 0;
             xinit_tol = atof(optarg);
             break;
+        case 'S':
+            initial_step = atof(optarg);
+            break;
         case 'b':{
                 const char *s = optarg;
                 while (s && *s) {