Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: optimize: ``trust-constr`` optimization algorithms [GSoC 2017] #8328

Merged
merged 19 commits into from Apr 1, 2018

Conversation

@antonior92
Copy link
Member

commented Jan 25, 2018

Hi All,

The thread in Pull Request #7729 became a little too large and GitHub is having a hard time loading the page. Hence I decided to create this new pull request with the same content.

This commit contains two solvers I implemented for my GSoC project.:

  1. An implementation of a Byrd-Omojokun Trust-Region SQP method that solves
    equality constrained optimization problems of the type:

     minimize fun(x)
     subject to: constr(x) = 0
    

The underlying principles of this method are described in the following blog post (here).

  1. An implementation of a trust-region interior point method described
    that solves general nonlinear programming problems:

     minimize f(x)
     subject to: constr_ineq(x) <= 0
                 constr_eq(x)    = 0
    

which uses the previously described SQP method as a substep. The underlying principle of this method can be found in another blog post (here).

Both methods are based on the projected CG algorithm (described here) and are appropriate for large and sparse problems.

I tested both solvers on several problems from the CUTEst problem collection (the results are described here). And while the performance (in number of iterations) is not as good as the one from the original paper yet, it came pretty close to it and could probably be improved with some fine tuning of the solver internal parameters.

@antonior92

This comment has been minimized.

Copy link
Member Author

commented Jan 25, 2018

Me and @nmayorov are doing some final refactoring in the code and we expect the pull request to be ready for 1.1 scipy release. The main steps we have agreed upon are:

  • Refactor new constraint classes;
  • Improve solver return options;
  • Improve documentation and write tutorial about the new method.
@nmayorov

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2018

@pv @rgommers

Need assistance. We decided that the old bounds format in minimize should be removed in the future and issue a DeprecationWarning here. Our substitute is BoxConstraint(lb, ub), and when the old format is removed we can possible simplify it to (lb, ub) as in least_squares.

So the questions:

  1. Does it sound right to you?
  2. What should we write in DeprecationWarning? Like in which versions we want to remove it.
  3. Meanwhile, should we just fix all tests to use the new format and not to trigger this warning?
@andyfaff

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2018

Need assistance. We decided that the old bounds format in minimize should be removed in the future and issue a DeprecationWarning here. Our substitute is BoxConstraint(lb, ub), and when the old format is removed we can possible simplify it to (lb, ub) as in least_squares.

This is something that's going to need wider discussion on scipy-dev, as the way of providing bounds has implications in several other functions/class in optimize, and a large amount of users code might have to change to accomodate this.

@nmayorov

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2018

Actually, now I'm thinking about it and I'm not sure if this deprecations is necessary. The only bonus is the possibility to remove this BoxConstraint class in the future. But again it's a confusing road: first remove the sequence format and then reintroduce it with the different convention. This is not bad, but maybe not worth it.

@antonior92

This comment has been minimized.

Copy link
Member Author

commented Jan 25, 2018

I think it would be better to keep both modes, without the deprecation. This pull request already has many complicated features. We may consider the deprecation latter on, if it is the case.

@nmayorov nmayorov force-pushed the antonior92:ipsolver branch 2 times, most recently from 31707d7 to 1a9a511 Jan 25, 2018

self.H = sps.csr_matrix((self.n, self.n))

def change_x(self, x):
if not np.array_equal(x, self.x):

This comment has been minimized.

Copy link
@antonior92

antonior92 Jan 29, 2018

Author Member

Hi Nikolay,

I know this is quite minor, but I think in order to make the implementation more similar to VectorFunction. It would be better if we define:

def update_x(self, x):
    self.x = x
    self.f_updated = False

and:

def fun(self, x):
    if not np.array_equal(x, self.x):
        self.update(x)
...
def jac(self, x):
    if not np.array_equal(x, self.x):
        self.update(x)
...
def hess(self, x):
    if not np.array_equal(x, self.x):
        self.update(x)
...

@nmayorov nmayorov force-pushed the antonior92:ipsolver branch from 1a9a511 to d416cf9 Jan 29, 2018

@antonior92

This comment has been minimized.

Copy link
Member Author

commented Jan 30, 2018

Hi @nmayorov,

I have of proposal of modification for the canonical constraints. It involves the definition of a base class for VectorFunction, LinearVectorFunction and IdentityVectorFunction. I will call
this base class VectorFunctionBase:

class VectorFunctionBase:

    def __init__():
        ...

    def fun(self, x):  # Returns Vector
        ...

    def jac(self, x):  # Returns Matrix (sparse or dense)
        ...

    def hess(self, x): # Return Matrix (sparse or dense) or LinearOperator
        ...

We will create a class that is basically a concatenation of VectorFunctionBase:

class ConcatenationOfVectorFunctionBase:

    def __init__():
        ...

    def fun(self, x):  # Returns List of Vector
        ...

    def jac(self, x):  # Returns List of Matrix (sparse or dense)
        ...

    def hess(self, x): # Returnl List of Matrix (sparse or dense) or LinearOperator
        ...

And once the user have passed the constraints we would form three structures:

  1. One object containg ConcatenationOfVectorFunctionBase.
  2. One object containing a list of enforce_feasibility information for each constraint
  3. One object containing a list of kind for each constraint.

We then could follow the following sequence:

I. Convert all constraints to constraints of the type c_eq(x) == 0 or c_ineq(x) >=0. We would use a flag ineq_eq_flag to indicate the type of the constraint, and have a new object ConcatenationOfVectorFunctionBase for the modified constraints.

(concatOfVectorFunction, enforce_feasibility, kind) -> (newConcatOfVectorFunction, ineq_eq_flag)

II. Concatenate inequalities constraints and equality constraints. Obtaining ConcatenationOfVectorFunctionBase containing only two functions, one for equality and other for inequalities constraints.

(concatofVectorFunction, enforce_feasibility, kind) -> newConcatOfVectorFunctionWithOnly2Constr

ConcatenationOfVectorFunctionBase would assume the role of our CanonicalConstraint.

It has some advantages:

  1. ConcatenationOfVectorFunctionBase have a more clear meaning than CanonicalConstraint;
  2. It would allow us to keep considering the possibility that a inequality "interval" constraint can be an equality constraint when lb[i] == ub[i], which, as I have said is usefull for compaibility with CUTEst and IPOPT interfaces.
  3. It would probably make the code more simple and easier to read.

Just to make it clear why I think this class would simplify things up: its API would guarantee that all constraints are called at once. Hence, consider:

concatofVectorFunction -> newConcatOfVectorFunction

Two of the new constraints from newConcatOfVectorFunction may be a function of a single constraint from concatofVectorFunction without we need to call a constraint from concatofVectorFunction twice.

@ilayn

This comment has been minimized.

Copy link
Member

commented Feb 1, 2018

Maybe you can rebase the commits where you feel confident such that the main steps are easier to track? For example every n commits where the last one has passed the tests etc.

@nmayorov nmayorov force-pushed the antonior92:ipsolver branch 2 times, most recently from 3c901de to c838b1f Feb 3, 2018

@nmayorov

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2018

We have sort of a problem here: a "flickering" test fail on the travis OSX configuration, see here for example.

My guess on what is happening here is described in #8208 by @pv ---something in complicated linear algebra routines, which are used a lot in this solver, is not deterministic. I also noted the warning

RuntimeWarning: internal gelsd driver lwork query error, required iwork dimension not returned. This is likely the result of LAPACK bug 0038, fixed in LAPACK 3.2.2 (released July 21, 2010). Falling back to 'gelss' driver.
    overwrite_b=overwrite)

It only appears in this configuration and may indicate that the LAPACK version used there is different from other configurations.

@antonior92

This comment has been minimized.

Copy link
Member Author

commented Feb 5, 2018

Hi Nikolay,

I think something in the line of #8208 is a good guess. Nevertheless, it is worth pointing out that scipy.linalg.lstsq, which is generating the warning, is not used in the portion of the code we are getting the error (I think scipy.linalg.lstsq is used only in test_projections.py).

@andyfaff

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

I'm just trying to refactor the scalar minimizers (F: R^n->R) in optimize into class based form. See #8414. As such, that's required creation of a Function object, which I'm thinking will end up as part of a public interface.
There is obviously a lot of overlap here with ScalarFunction in scipy/scipy/optimize/_differentiable_functions.py.

I have a few comments:

  1. Will ScalarFunction end up being made public? If so, then we should probably find some way of merging them. I like the different ways of doing numerical differentiation, but less of a fan of other bits.
  2. From a discussion with @jaimefrio, he pointed out it's a bad idea to save state (such as number of function evaluations) in such an object, because the same ScalarFunction may be reused across different minimizers, etc. The optimizer itself should be tracking the function evaluations.
  3. All of the ScalarFunction parameters should probably be optional (and most of the init could be streamlined). Making them optional leads to a nicer design and would enable easy inheritance of the object - for example users may be able to provide analytic func, grad, and hess, meaning they could easily override those methods (computation time may be saved if the user has fancy tricks).
  4. Across scipy.optimize grad or jacobian are used in different locations, it would be good if the entire module settled on one. (Seems to be more heterogenous with solvers used by minimize).
  5. I think the class based design in #8414, with the ability to iterate a single step at a time, is the way of the future, and will address many open issues. I've translated BFGS, LBFGSB and NelderMead so far, and it's a relatively simple transition - you just have to break out the iteration loop into a __next__ method. I propose that all new minimizers be class based and iteratable.
  6. It would be good if a solid consensus was reached on how to describe upper/lower bounds, e.g. [(lower0, upper0)....] or np.r_[all_lower, all_upper].

Anyway, let me know what you think of the design I'm working on, and my comments, and let's see if we can converge somewhat.

@antonior92

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2018

Hi @andyfaff,

Thank you a lot for your comments. I answer to them bellow:

  1. In my opinion, we should keep ScalarFunction and VectorFunction as private classes. I don't know what @nmayorov thinks about this, but I think of these classes as a common implementation of features like finite-differentiation and quasi-Newton approximations. In the long term, I think it would be a easy way to make these features available to all the solvers. For now, it is only the cleanest way I found of making these feature available to the solver we are implementing.
  2. I think if this class is not made public it should not be much of a problem, since the user cannot pass it twice to minimize. Furthermore, it can be hard to have a accurate count of function evaluations when using finite-differences approximations and quasi-Newton approximations.
  3. Again, I think will be a problem only if we made the class public.
  4. I totally agree with you there. This is kind of messy right now.
  5. I like the idea of introducing a little more structure to be shared across different solvers. I will study #8414 a little further and try to evaluate how we would include these modifications.
  6. I generally prefer np.r_[all_lower, all_upper]. But now we allow both options in minimize using the class Bounds. We considered deprecating [(lower0, upper0)....], but we though it would be better if we left it for latter on.

Anyway, I think it is important to create some structure for sharing features like finite-differences and hessian approximations for more than one solver and I think ScalarFunction and VectorFunction would be good for that, specially in the heterogeneous space of scipy solvers.
I don't think, however, our implementation is a good public interface.

About the common structure you are trying to give to all the other solvers, I generally think it to be a very good idea. But anyway, I don't think we should hold back this PR for too long and try to solve too many problems with it. Ideally I would like to try to keep the goal of merging it for scipy 1.1.

@antonior92

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2018

About the common structure you are trying to give to all the other solvers, I generally think it to be a very good idea. But anyway, I don't think we should hold back this PR for too long and try to solve too many problems with it. Ideally I would like to try to keep the goal of merging it for scipy 1.1.

Just adding something to this comment that I think I have not made entirely clear. While I think we should not try to solve too many problems with this PR, I also think it should try to give a sketch of the structures that latter on will be used to other solvers as well.

Hence, we need to better cordinate our work and try to reach some consensus on the general structure we will adopt in order to avoid implementing everything twice

@nmayorov

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2018

Hey Andrew,

Overall I think it's good to introduce some object oriented design to numerical optimization, and if scipy will be the first library to provide that --- it will be great.

In fact I think @antonior92 made the first steps in this direction here by introducing constraint and HessianUpdateStrategy classes. I don't see a good reason not to do the same for the objective function. My feeling that these classes should be "thin" (just definition of an object) or interfaces to keep ability to use them with broad variety of solvers.

Also as numerical optimization algorithms operate iterative then adding the ability to advance them one step farther looks logical and doesn't contradict anything. But adding this ability to already implemented solvers in other languages might be difficult.


For this pull request we decided with Antonio that we should aim to finish it for the current minimize framework. And we want to merge it within 2 weeks. After that we are both interested to discuss/develop a new framework you are thinking about and working at the moment.

@nmayorov
Copy link
Contributor

left a comment

Some comments regarding the return.

Gradient of the objective function at the solution.
lagrangian_grad : ndarray, shape (n,)
Gradient of the Lagrangian function at the solution.
nfev : integer

This comment has been minimized.

Copy link
@nmayorov

nmayorov Feb 19, 2018

Contributor

As a different strategy we may count nfev only for conceptual function evaluations and not when estimating the gradient by finite differences. And always increment ngev when computing the gradient. I don't know what is more useful for a user.

The same goes for grad/hess pairs.

This comment has been minimized.

Copy link
@andyfaff

andyfaff Feb 19, 2018

Contributor

Judging from past experience the user would like to know the total number of function evaluations, including those used in numerical calculation of grad and hess.

This comment has been minimized.

Copy link
@antonior92

antonior92 Feb 20, 2018

Author Member

A few commits ago I was using conceptual nfev/ngev/nhess... I changed it because I think it make more sense to give the real number of function/gradient/hessian evaluations. It does have a clear meaning, it is easier to document, and I think it is more useful for the user.

Besides that, I think the number of iterations niter, gives some information about the algorithm progress, and makes "conceptual function evaluations" more or less redundant.

Total execution time.
trust_radius : float
Trust radius at the last iteration.
penalty : float

This comment has been minimized.

Copy link
@nmayorov

nmayorov Feb 19, 2018

Contributor

Is penalty function explained anywhere?

This comment has been minimized.

Copy link
@antonior92

antonior92 Feb 20, 2018

Author Member

No. I will explain it somewhere.

penalty : float
Penalty function at last iteration.
tolerance : float
Tolerance for barrier subproblem at the last iteration.

This comment has been minimized.

Copy link
@nmayorov

nmayorov Feb 19, 2018

Contributor

Perhaps it should be set to None for equality_constrained_sqp. And I also suggest not to inform users about this specific names, better explain that "Only for problems with inequality constraints."

This comment has been minimized.

Copy link
@antonior92

antonior92 Feb 20, 2018

Author Member

Ok, I changed the documentation to "Only for problems with inequality constraints.". Much better indeed.

Termination message.
method : {'equality_constrained_sqp', 'tr_interior_point'}
Optimization method used.
constr_violation : float

This comment has been minimized.

Copy link
@nmayorov

nmayorov Feb 19, 2018

Contributor

I would argue that this and optimality are quite important fields, such that they should be moved upwards.

Overall consider importance of different fields and place them accordingly.

This comment has been minimized.

Copy link
@antonior92

antonior92 Feb 20, 2018

Author Member

Ok, I tried to both order fields by importance while trying to group them by similarity. Take a look to see what you think.

constr : list of ndarray
List of constraint values at the solution. This list will give
the values in the same order used in `constraints`. When bound
constraints are present one extra element will be include(in

This comment has been minimized.

Copy link
@nmayorov

nmayorov Feb 19, 2018

Contributor

I believe we can remove Bounds from this. Because it is simply x. I think it is less explanation and more logical.

But if you think it's too much trouble --- then let's keep them.

This comment has been minimized.

Copy link
@antonior92

antonior92 Feb 20, 2018

Author Member

I have struggled with this already.

The point is that v will have to include an element correponding to bounds. So I choose to keep constr, jac, v, constr_nfev, constr_njev, and constr_nhev with the same length.

I am not entirely sure about it thought. Maybe we could include a single explanation "When bound constraints are present one extra element will be included (in the end of the list) to account to the values of these constraints." to all these structure.

This comment has been minimized.

Copy link
@nmayorov

nmayorov Feb 23, 2018

Contributor

I am not entirely sure about it thought. Maybe we could include a single explanation "When bound constraints are present one extra element will be included (in the end of the list) to account to the values of these constraints." to all these structure.

Sounds like a good idea.

This comment has been minimized.

Copy link
@antonior92

antonior92 Feb 23, 2018

Author Member

Do you know how to do it nicely? I tried it, but it just messed things up. I never know exactly what to do for making scipy documentation look nice.

`constraints`. When bound constraints are present one extra element
will be include (in the end of the list) to account to the values
of these constraints Jacobian matrices.
v : list of ndarray

This comment has been minimized.

Copy link
@nmayorov

nmayorov Feb 19, 2018

Contributor

I think it needs some clarification for the case of interval constraint.

It seems to me that v < 0 indicates that the upper bound is active and v > 0 that lower bound is active. Am I correct here? It should be included into the doc.

Considering this I believe it makes sense to include Bounds, to indicate which is active and which is not. But perhaps a clearer approach is to include them separately (well, maybe not).

This comment has been minimized.

Copy link
@antonior92

antonior92 Feb 20, 2018

Author Member

Actually is the oposite :) v > 0 for an active upper bound and v<0 for an active lower bound. I just added a note about it.

Besides that, I think it is better to keep it as it is, without separating them.

This comment has been minimized.

Copy link
@nmayorov

nmayorov Feb 23, 2018

Contributor

I see now, there are several places where the sign changes compared to "Numerical Optimization".

@nmayorov

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2018

I'm a little bit concerned that Returns displayed before Options in html. One way I was able to solve it, is to name Options as Parameters. The only difference is that it doesn't make this note

See also
For documentation for the rest of the parameters, see scipy.optimize.minimize

Should we use Parameters?

@rgommers

This comment has been minimized.

Copy link
Member

commented Mar 31, 2018

Rebased set of commits looks great Antonio.

Would you mind adding a short description under New features at https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.1.0?

@nmayorov

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2018

Hey Antônio! Don't bother with thanks for now. Great work finishing it! Add the description in the release notes and then let's merge it.

@rgommers

This comment has been minimized.

Copy link
Member

commented Mar 31, 2018

@nmayorov merge it now if you're happy! The longer it gets in master before branching the better. Release notes is a wiki entry, so not in this PR.

@rgommers rgommers removed the needs-work label Mar 31, 2018

antonior92 and others added 13 commits Nov 15, 2017
MAINT: optimize: only modify the variable state inside update_state
Keep updating `state` in several places was confusing and made it
difficult to change stop criteria and callback logic. This way is cleaner.
ENH: optimize: Always update Hessian / reduce min_curvature default
Two improvements:

1. Reduce `min_curvature` default for BFGS `skip_update`

The default value of `min_curvature` was causing errors in some
situations. Too many steps were getting rejected and this was
causing the solver to fail. I notice this when trying to modify the
HessianUpdateStrategy.

2. Always update Hessian when using HessianUpdateStategy

This approach is more solid because it keeps improving hessian
estimation even for failed iterations.
ENH: optimize: new way of computing lagr. grad and constr violation
Other minor modifications include:
1. FIX: optimize: fix wrong printing mode.

2. ENH: optimize: add counters to VectorFunction

I also removed some unused atributes that I should have removed in
some previous commit.

3. FIX: optimize: remove debug prints

4. ENH: optimize: improve number of evaluations counters

Now each constraint has its own counter. And these new counters
take into account finite-diference and quasi-newton approximations.

This new quantities are used as stop criteria.

API: optimize: create verbose level 3

FIX: optimize: pep8 corrections
MAINT: optimize: Refactor differentiable functions slightly
Also refactor progress report for trustregion_constr.
ENH: optimize: fix parameter update in tr_interior_point
1. Fix place of parameter update in tr_interior_point
2. Fix trust region update logic (i.e. the way the trust-radius is updated from one solution of interior point subproblem to the next).
ENH: optimize: HessianUpdateStrategy made robust for special cases
1. Add guarantee that ys > 0. I thought the other restrictions should
be enough. But I was still getting errors. This additional constraint
fixed the problem.

3. Reinitialize for indefinite hessian matrix. For the BFGS update,
due to roundoff errors, indefinite matrix may appear. We correct this
by reinitializing the quasi-newton update when this happens.

4. Rename 'damped_bfgs'->'damp_update'

5. Always initialize with positive definite matrix

7. Use 'zeros_like' instead of 'empty_like'. This helps avoid some
really awkward behaviours in the case of overflows.

8. Skip update in the case 'delta_grad == 0'

@antonior92 antonior92 force-pushed the antonior92:ipsolver branch from c12db20 to c343842 Mar 31, 2018

@antonior92

This comment has been minimized.

Copy link
Member Author

commented Mar 31, 2018

I just wrote better messages for each commit. I think we are ready to merge now...

@antonior92

This comment has been minimized.

@antonior92

This comment has been minimized.

Copy link
Member Author

commented Mar 31, 2018

I am getting some uncorrelated test errors from scipy.special.

It is strange because I was getting "all green" CI tests before and have not actually modified the code, only the commit messages.

@rgommers

This comment has been minimized.

Copy link
Member

commented Mar 31, 2018

I am getting some uncorrelated test errors from scipy.special.

That's due to a change in numpy master, you can ignore that here. This is good to merge as is.

@antonior92

This comment has been minimized.

Copy link
Member Author

commented Mar 31, 2018

Ok! I will merge it them. @nmayorov is that ok for you?

@nmayorov

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2018

Yes, go ahead!

@antonior92 antonior92 merged commit 9fd893b into scipy:master Apr 1, 2018

3 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
ci/circleci: build Your tests passed on CircleCI!
Details
codecov/project 76.28% (target 1%)
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@rgommers

This comment has been minimized.

Copy link
Member

commented Apr 1, 2018

Woohoo! Awesome job Antonio and Nikolay.

@antonior92 antonior92 deleted the antonior92:ipsolver branch May 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.