Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT,ENH: linprog, highs and callback mechanisms #19451

Open
HaoZeke opened this issue Oct 29, 2023 · 9 comments
Open

MAINT,ENH: linprog, highs and callback mechanisms #19451

HaoZeke opened this issue Oct 29, 2023 · 9 comments
Labels
enhancement A new feature or improvement maintenance Items related to regular maintenance tasks scipy.optimize

Comments

@HaoZeke
Copy link
Contributor

HaoZeke commented Oct 29, 2023

SciPy callbacks

Associated PR: #19420 (WIP).

Some comments on additions required for scipy. In some sense the callbacks in highs are more functional (they allow, e.g. interrupts) than those of scipy. scipy only needs to have the OptimizeResult available for consumption by the callback, that is, it is essentially a logging callback.

In general, an OptimizeResult has:

    x : ndarray
        The solution of the optimization.
    success : bool
        Whether or not the optimizer exited successfully.
    status : int
        Termination status of the optimizer. Its value depends on the
        underlying solver. Refer to `message` for details.
    message : str
        Description of the cause of the termination.
    fun, jac, hess: ndarray
        Values of objective function, its Jacobian and its Hessian (if
        available). The Hessians may be approximations, see the documentation
        of the function in question.
    hess_inv : object
        Inverse of the objective function's Hessian; may be an approximation.
        Not available for all solvers. The type of this attribute may be
        either np.ndarray or scipy.sparse.linalg.LinearOperator.
    nfev, njev, nhev : int
        Number of evaluations of the objective functions and of its
        Jacobian and Hessian.
    nit : int
        Number of iterations performed by the optimizer.
    maxcv : float
        The maximum constraint violation.

As will be seen however, not all of these are set (both in the documentation and implementation).

simplex callbacks

  • Are called at each solver step
    callback : callable, optional
        If a callback function is provided, it will be called within each
        iteration of the algorithm. The callback must accept a
        `scipy.optimize.OptimizeResult` consisting of the following fields:

            x : 1-D array
                Current solution vector
            fun : float
                Current value of the objective function
            success : bool
                True only when a phase has completed successfully. This
                will be False for most iterations.
            slack : 1-D array
                The values of the slack variables. Each slack variable
                corresponds to an inequality constraint. If the slack is zero,
                the corresponding constraint is active.
            con : 1-D array
                The (nominally zero) residuals of the equality constraints,
                that is, ``b - A_eq @ x``
            phase : int
                The phase of the optimization being executed. In phase 1 a basic
                feasible solution is sought and the T has an additional row
                representing an alternate objective function.
            status : int
                An integer representing the exit status of the optimization::

                     0 : Optimization terminated successfully
                     1 : Iteration limit reached
                     2 : Problem appears to be infeasible
                     3 : Problem appears to be unbounded
                     4 : Serious numerical difficulties encountered

            nit : int
                The number of iterations performed.
            message : str
                A string descriptor of the exit status of the optimization.

In the implementation, we see:

            solution[:] = 0
            solution[basis[:n]] = T[:n, -1]
            x = solution[:m]
            x, fun, slack, con = _postsolve(
                x, postsolve_args
            )
            res = OptimizeResult({
                'x': x,
                'fun': fun,
                'slack': slack,
                'con': con,
                'status': status,
                'message': message,
                'nit': nit,
                'success': status == 0 and complete,
                'phase': phase,
                'complete': complete,
                })
            callback(res)

revised_simplex

As seen in _linprog_rs.py

    callback : callable, optional
        If a callback function is provided, it will be called within each
        iteration of the algorithm. The callback function must accept a single
        `scipy.optimize.OptimizeResult` consisting of the following fields:

            x : 1-D array
                Current solution vector.
            fun : float
                Current value of the objective function ``c @ x``.
            success : bool
                True only when an algorithm has completed successfully,
                so this is always False as the callback function is called
                only while the algorithm is still iterating.
            slack : 1-D array
                The values of the slack variables. Each slack variable
                corresponds to an inequality constraint. If the slack is zero,
                the corresponding constraint is active.
            con : 1-D array
                The (nominally zero) residuals of the equality constraints,
                that is, ``b - A_eq @ x``.
            phase : int
                The phase of the algorithm being executed.
            status : int
                For revised simplex, this is always 0 because if a different
                status is detected, the algorithm terminates.
            nit : int
                The number of iterations performed.
            message : str
                A string descriptor of the exit status of the optimization.

The implementation:

        res = OptimizeResult(
            {
                "x": x_o,
                "fun": fun,
                "slack": slack,
                "con": con,
                "nit": iteration,
                "phase": phase,
                "complete": False,
                "status": status,
                "message": "",
                "success": False,
            }
        )
        callback(res)

interior-point

    callback : callable, optional
        If a callback function is provided, it will be called within each
        iteration of the algorithm. The callback function must accept a single
        `scipy.optimize.OptimizeResult` consisting of the following fields:

            x : 1-D array
                Current solution vector
            fun : float
                Current value of the objective function
            success : bool
                True only when an algorithm has completed successfully,
                so this is always False as the callback function is called
                only while the algorithm is still iterating.
            slack : 1-D array
                The values of the slack variables. Each slack variable
                corresponds to an inequality constraint. If the slack is zero,
                the corresponding constraint is active.
            con : 1-D array
                The (nominally zero) residuals of the equality constraints,
                that is, ``b - A_eq @ x``
            phase : int
                The phase of the algorithm being executed. This is always
                1 for the interior-point method because it has only one phase.
            status : int
                For revised simplex, this is always 0 because if a different
                status is detected, the algorithm terminates.
            nit : int
                The number of iterations performed.
            message : str
                A string descriptor of the exit status of the optimization.

Implementation:

        x_o, fun, slack, con = _postsolve(x / tau, postsolve_args)
        res = OptimizeResult(
            {
                "x": x_o,
                "fun": fun,
                "slack": slack,
                "con": con,
                "nit": iteration,
                "phase": 1,
                "complete": False,
                "status": 0,
                "message": "",
                "success": False,
            }
        )

Mapping to highspy

highspy uses the HighsCallbackDataOut which has:

typedef struct {
  int log_type;  // cast of HighsLogType
  double running_time;
  HighsInt simplex_iteration_count;
  HighsInt ipm_iteration_count;
  double objective_function_value;
  int64_t mip_node_count;
  double mip_primal_bound;
  double mip_dual_bound;
  double mip_gap;
  double* mip_solution;
} HighsCallbackDataOut;

Which means that (bold entries are not present):

  • x : Current solution vector (not just for mip)
  • fun : Already present
  • success : Can probably be set to false without much trouble (e.g. _rs and _ip)
  • slack : The (nominally positive) values of the slack, b_ub - A_ub @ x.
  • con : The (nominally zero) residuals of the equality constraints, b_eq - A_eq @ x.
  • phase : ip sets this to 1 always
  • nit : Already present
  • message : Already supported (via the callback interface, not in HighsCallbackDataOut)

However, to add things which are only ever going to be used for logging seems like a pointless slowdown though. Additionally, it seems like (#9536) the design of linprog callbacks could in general do with some more discussion.

Alternatives

There is a similar issue open over at HiGHS (ERGO-Code/HiGHS#911), but as such, from a solver perspective, the existing implementation seems to be pretty good.

If it is acceptable, highs can either include a SciPy compatible (guarded by a compile flag) set of callbacks building on what they have, or we could maintain a set in our copy of HiGHs.

Possible next steps

The choices are:

  • Deprecate the callback interface completely
  • Expose only highspy style callbacks

Anything in between requires changes to highs. I believe in part this issue is linked to "what is the scop of SciPy's linear solvers" and so #15915 (comment), where @fuglede describes his usage of SciPy in the context of also having highspy / PuLP / PyOMO etc. is of use.

Seeking comments from anyone using linprog but also maybe @rgommers , @mckib2 , @mdhaber .

Personally, I think it would be best to transparently pass callback functionality through to highspy, as that has the lowest maintenance burden to additional feature ratio. Unless it is in wide use, then, I would suggest first deprecating the existing methods #15707 (non-HiGHs) then adding back highspy compatible callbacks. Or we could remove existing methods while introducing highspy compatible callbacks.

@mdhaber
Copy link
Contributor

mdhaber commented Oct 30, 2023

Unless it is in wide use, then, I would suggest first deprecating the existing methods #15707 (non-HiGHs) then adding back highspy compatible callbacks.

All non-HiGHS methods are already deprecated. They produce a warning stating that they "will be removed in SciPy 1.11.0". We missed that deadline only because we didn't think we should remove these methods without a compatible callback interface. If we aren't going to provide a callback interface that is compatible with what they have offered, we will probably need to restart the deprecation cycle or continue to offer one of these methods.

However, to add things which are only ever going to be used for logging seems like a pointless slowdown though.

Why is there necessarily a slowdown? It is possible to design things such that there is essentially zero overhead if a callback function does not need to be called.

Anything in between requires changes to highs

I thought we are the reason they were adding a callback interface to begin with. Is that not right?

they allow, e.g. interrupts

We can add that as we have done for minimize: if the Python callback raises a StopIteration error, we can intercept that and return a nonzero status to HiGHS.
Are there other features of the HiGHS callbacks that you don't think we can support in a backward compatible way?

@mckib2
Copy link
Contributor

mckib2 commented Oct 30, 2023

Unless it is in wide use, then, I would suggest first deprecating the existing methods #15707 (non-HiGHs) then adding back highspy compatible callbacks.

All non-HiGHS methods are already deprecated. They produce a warning stating that they "will be removed in SciPy 1.11.0". We missed that deadline only because we didn't think we should remove these methods without a compatible callback interface. If we aren't going to provide a callback interface that is compatible with what they have offered, we will probably need to restart the deprecation cycle or continue to offer one of these methods.

However, to add things which are only ever going to be used for logging seems like a pointless slowdown though.

Why is there necessarily a slowdown? It is possible to design things such that there is essentially zero overhead if a callback function does not need to be called.

@HaoZeke's comment seems to be in reference to the amount of data being passed around, i.e., copying data to and from structs is easy but does have some overhead cost and we'd need to justify that cost. slack and con seem to me to be the big offenders here as they would likely need to be recomputed and stored each time the callback is invoked, or (at best) copied from some internal HiGHS representation. Passing by mutable reference is likely unacceptable here.

Anything in between requires changes to highs

I thought we are the reason they were adding a callback interface to begin with. Is that not right?

We are probably on their map, but they seem to have other drivers and I don't see any evidence that we had a lot of input into their design other than expressing what we were currently doing in the legacy SciPy linprog implementations. (I haven't been able to follow very closely though, so this could be wrong).

they allow, e.g. interrupts

We can add that as we have done for minimize: if the Python callback raises a StopIteration error, we can intercept that and return a nonzero status to HiGHS.

Reading their callback documentation, I'm wondering if it's possible to emulate per-iteration callbacks for simplex and interior point via the Simplex and IPM Interrupt callbacks until the old-style callbacks can be deprecated fully. The interrupt callbacks seem to be guaranteed to be run every iteration. I have this in mind, e.g., Simplex:

  • register a Simplex Interrupt Callback
    • put the SciPy user's callback in a wrapper that can calculate things (e.g., slack, con) and grab more information (e.g., current solution value if there is one) before invoking SciPy callback
  • pass the parent Highs object (and anything else we need) as user_callback_data (this is how we can get any solution info if it's available)

There are two assumptions above:

  • the in-progress simplex solution is easily available from top level objects such as the parent Highs obj
  • callbacks are being run in a way which allows thread-safe access to whatever is in user_callback_data (probably a good question for HiGHS devs: what are the concurrency guarantees for callbacks?)

Are there other features of the HiGHS callbacks that you don't think we can support in a backward compatible way?

A good rule of thumb seems to be that if you do what Gurobi does, everything else is at least pretty close to that. So might be worth taking a look at that documentation is and seeing what differences there are.

@ev-br
Copy link
Member

ev-br commented Oct 30, 2023

From 30000 feet and without knowing details, I wonder what is the value of having scipy.optimize.linprog in the long run at all if all it does is delegate to highspy.

@HaoZeke
Copy link
Contributor Author

HaoZeke commented Oct 31, 2023

From 30000 feet and without knowing details, I wonder what is the value of having scipy.optimize.linprog in the long run at all if all it does is delegate to highspy.

I had this confusion initially as well @ev-br, and the scope of where SciPy's linprog fits into the ecosystem is discussed by @fuglede in #15915 (comment) which I think is important enough to be reproduced here for context:

Some scattered thoughts:

Already in its current form, highspy also provides some amount of access to the state of the solver (for instance, one thing I found useful was having ERGO-Code/HiGHS#1264 available in highspy, which enables sharing solver state across process boundaries).

If any of this would be relevant in SciPy, I think it would be if we could provide zero-overhead abstractions to some of the functionality that's already available, but even then, I imagine that that would quickly become very HiGHS-specific and might as well live in highspy itself: IMO, the main value-adds of linprog over highspy at the moment are the small things, not having to manually convert inequality constraints to equality constraints, the fact that CSC arrays are the native citizens of highspy, so you can draw on what's already available in scipy.sparse, and most importantly that it's all zero-overhead: (Very) high-level libraries like PuLP/PyOMO are super useful for experimentation but very often you run into the issue that generating and passing state to the underlying solver becomes the bottleneck, with relatively short time spent by the solver itself. (I found it quite curious, for instance, that only rather recently did Gurobi add a feature to pass LPs as sparse arrays instead of having to go through their modelling API.)

And even if the more stateful operations were available here, we would still want the stateless one to be available: When you don't need stateful operations, using them adds a good deal of overhead (compare e.g. the first few rows of the "SciPy (defaults)"/"highspy (defaults)" columns in ERGO-Code/HiGHS#1198 (comment)); I've had use cases where I'll creatively switch between stateless linprog and stateful highspy based on model size. If one could do something it that domain, I think that would be useful, but even then, probably highspy itself (or HiGHS, really) would be a better home for any improvements.

So that's all if all we care about is performance (which is the main reason for preferring SciPy over e.g. PuLP/PyOMO at the moment). Another value-add could be if it's possible to design a (preferably zero-overhead) API that caters to a broader set of users: My impression is that things like PuLP/PyOMO are more accessible than the lower-level APIs in SciPy or highspy; to the point where some users would find it difficult to go to a lower level even when performance requirements ask for it. Now I don't think you want SciPy to compete with PuLP/PyOMO directly (right?), but if something could be done to introduce a broader class of users to the higher-performance lower-level APIs somehow, then that could be good.

My understanding from that is that in the hierarchy there is:

  • PuLP / PyOMO (high level)
  • Scipy (middle)
  • Highs + highspy (lowest level)

Which seems reasonable. There are other reasons to have SciPy's linprog interface, which is intuitive to users and in-use already for a long time. From a practical perspective, simplex and older linprog methods simply do not compare to the performance and dedicated maintainence of highs and the team (@jajhall) has been very responsive and open to working with SciPy (including assisting with relevant bugs). Similarly pragmatic is the fact that users would reasonably expect to have a linear solver suite embedded in SciPy, and HiGHs is the best choice for that now.

Another thing (but maybe I am not the best person to talk about this) is that it seems the callbacks were originally a special feature of the simplex solver and then later expanded to the others, so it seems there is precedent to first accomodate one implementation and later consider expanding (if ever a need arises).

@HaoZeke
Copy link
Contributor Author

HaoZeke commented Oct 31, 2023

Unless it is in wide use, then, I would suggest first deprecating the existing methods #15707 (non-HiGHs) then adding back highspy compatible callbacks.

All non-HiGHS methods are already deprecated. They produce a warning stating that they "will be removed in SciPy 1.11.0". We missed that deadline only because we didn't think we should remove these methods without a compatible callback interface. If we aren't going to provide a callback interface that is compatible with what they have offered, we will probably need to restart the deprecation cycle or continue to offer one of these methods.

However, to add things which are only ever going to be used for logging seems like a pointless slowdown though.

Why is there necessarily a slowdown? It is possible to design things such that there is essentially zero overhead if a callback function does not need to be called.

@HaoZeke's comment seems to be in reference to the amount of data being passed around, i.e., copying data to and from structs is easy but does have some overhead cost and we'd need to justify that cost. slack and con seem to me to be the big offenders here as they would likely need to be recomputed and stored each time the callback is invoked, or (at best) copied from some internal HiGHS representation. Passing by mutable reference is likely unacceptable here.

Yup, also currently there is one data structure which is being used for all the callback kinds, the slowdown here is the fact that say, a callback meant to only interrupt the solver shouldn't need to do the (potentially expensive) operation of fillin a struct which may or may not even be used for logging (depending on if the logging callback is active). I might be wrong, but generally, large data transfer / computation purely for logging is almost always only for development / small problems, eventually for production cases it becomes difficult to interpret anyway.

Anything in between requires changes to highs

I thought we are the reason they were adding a callback interface to begin with. Is that not right?

We are probably on their map, but they seem to have other drivers and I don't see any evidence that we had a lot of input into their design other than expressing what we were currently doing in the legacy SciPy linprog implementations. (I haven't been able to follow very closely though, so this could be wrong).

There are other drivers, but I think the HiGHs devs are very open to reasonable changes, as long as they are also likely to be of use in production systems (i.e. don't slow down the solver). This was why for example, per-iteration callbacks were a non-starter (I have since opened a discussion regarding if we might include a compile time guarded path just for SciPy but it would just increase maintainer burdens / might cause debugging issues).

they allow, e.g. interrupts

We can add that as we have done for minimize: if the Python callback raises a StopIteration error, we can intercept that and return a nonzero status to HiGHS.

Reading their callback documentation, I'm wondering if it's possible to emulate per-iteration callbacks for simplex and interior point via the Simplex and IPM Interrupt callbacks until the old-style callbacks can be deprecated fully. The interrupt callbacks seem to be guaranteed to be run every iteration. I have this in mind, e.g., Simplex:

  • register a Simplex Interrupt Callback

    • put the SciPy user's callback in a wrapper that can calculate things (e.g., slack, con) and grab more information (e.g., current solution value if there is one) before invoking SciPy callback
  • pass the parent Highs object (and anything else we need) as user_callback_data (this is how we can get any solution info if it's available)

There are two assumptions above:

  • the in-progress simplex solution is easily available from top level objects such as the parent Highs obj
  • callbacks are being run in a way which allows thread-safe access to whatever is in user_callback_data (probably a good question for HiGHS devs: what are the concurrency guarantees for callbacks?)

In general HiGHs consideres thread safety, so if there aren't any guarantees yet they'd be open to fixing that. However, AFAIK think the in-progress simplex solution is not part of the parent Highs object, I might be wrong about that though, @jajhall would be best suited to answer that.

Are there other features of the HiGHS callbacks that you don't think we can support in a backward compatible way?

A good rule of thumb seems to be that if you do what Gurobi does, everything else is at least pretty close to that. So might be worth taking a look at that documentation is and seeing what differences there are.

Yeah it seems like most of the linear optimization literature / software have different paradigms but generally converge (across C++ / Julia / Java / Matlab / other) implementations to Gurobi as a baseline..

@jajhall
Copy link

jajhall commented Oct 31, 2023

From 10,000m, using our callback mechanism, HiGHS could add a per-iteration callback to simplex with zero overhead for those who don't want. However, even if it were activated, the only information it would make available is execution time and primal or dual objective function value, depending on whether primal or dual simplex is being used.

Generally it would be a call from the dual simplex algorithm applied to a presolved LP with perturbed costs, so it wouldn't even know the primal objective value without the unacceptable overhead of computing it. And, it would be at a point that is, at best, only dual feasible. So, if the incumbent solution in HiGHS were converted into the corresponding solution for the original LP, it would normally only be a dual solution, and who amongst your users would interpret that? The primal solution that is known in dual simplex - like the dual solution known in primal simplex - is infeasible until optimality.

Surely the current per-factorization callbacks in simplex come frequently enough for any sane user

In short I'd ditch per-iteration callbacks, and explain the futility of them to any user who objects.

Generally, we are prepared to add things that are useful to major users, but nothing that would impact meaningfully on performance.

We are adding callbacks because (1) they are things that optimization solvers offer and (2) because they are required by an upcoming major user of HiGHS.

IMO, the main value-adds of linprog over highspy at the moment are the small things, not having to manually convert inequality constraints to equality constraints, the fact that CSC arrays are the native citizens of highspy, so you can draw on what's already available in scipy.sparse, and most importantly that it's all zero-overhead:

I do wonder at the value of these advantages when the underlying linprog solver is so much slower than HiGHS. I find it baffling that anything in the SciPy - HiGHS interface could ever be as remotely expensive as solving an LP or MIP. I'm also puzzled by the reference to converting inequality constraints to equalities. HiGHS handles general boxed constraints so, whatever forms of constraints SciPy uses can be communicated without modification. Yes, linprog may be "zero overhead" but I'd be amazed if the HiGHS overhead compares remotely with its superior performance.

@mckib2
Copy link
Contributor

mckib2 commented Oct 31, 2023

I find it baffling that anything in the SciPy - HiGHS interface could ever be as remotely expensive as solving an LP or MIP.

Python modeling interfaces can be very slow (which is why I'm not a huge fan of them, but data scientists sure seem to be). IIRC, Google's or-tools is a big offender where most of the time can be spent building models using classes and abstractions before anything is converted to actual coefficients and constraint matrices and the solver invoked. SciPy is different in the Python library ecosystem in this regard by providing light, MATLAB-linprog-like constraint abstractions. I think the comment about converting inequality constraints to equalities is misleading and should instead point to the (trivial but annoying) changing of constraint formulation from the SciPy/MATLAB-style Ax == b_e, Ax <= b_l, Ax >= b_g to something like l_i <= x_i <= u_i that HiGHS expects

+1 for relaxing the requirement of SciPy linprog simplex callbacks to run every iteration and instead at a factorization points.

@jajhall
Copy link

jajhall commented Oct 31, 2023

Interesting comments on speed of modelling languages, but how is this relevant to HiGHS: we just deal in finalised matrices - or are you referring to building models by calls to HiGHS to add individual variables and constraints? JuMP seems to be efficient. Is this because it's written in Julia?

the (trivial but annoying) changing of constraint formulation from the SciPy/MATLAB-style Ax == b_e, Ax <= b_l, Ax >= b_g to something like l_i <= x_i <= u_i that HiGHS expects

If it's the concatenation of three matrices of constraint coefficients into one that's irritating - surely it's not meaningfully expensive - then I can add an alternative method for passing an LP that has these restrictions on the general boxed constraints and then do the concatenation at negligible cost in C++

Use of general boxed constraints is more efficient, particularly when modellers want distinct, finite lower and upper bounds. With the SciPy/MATLAB style, an extra matrix row is required.

@mckib2
Copy link
Contributor

mckib2 commented Nov 7, 2023

(At the risk of getting distracted from the core issue at hand...)

Interesting comments on speed of modelling languages, but how is this relevant to HiGHS: we just deal in finalised matrices - or are you referring to building models by calls to HiGHS to add individual variables and constraints? JuMP seems to be efficient. Is this because it's written in Julia?

I don't think there's anything relevant to HiGHS here, just a comment on modeling overhead incurred by some Python libraries that either integrate HiGHS or use other solvers. JuMP could be more efficient at runtime here due its compilation steps (able to statically transform a modeling representation into finalized matrices). I just pulled up a presentation we gave back in 2021 comparing availability and performance of LP solvers in the Python ecosystem -- it looks like just OR-Tools had the model building problem which added seconds of runtime to a benchmark run against various sizes of dense constraint matrices:

image

I can't comment on if it's an issue anymore in OR-Tools and is likely beside the point in this issue -- presented only for your curiosity :) A counter argument here would be that modeling overhead might be worth it if it's useful to the user and you're not constantly rebuilding the model

@lucascolley lucascolley added enhancement A new feature or improvement maintenance Items related to regular maintenance tasks scipy.optimize labels Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement maintenance Items related to regular maintenance tasks scipy.optimize
Projects
None yet
Development

No branches or pull requests

6 participants