optimize: Change Hessian function signature? #593

btracey · 2018-09-20T22:35:20Z

I've been thinking for a while that we might want to change the signature for the Hessian method. The current signature forces the Hessian to be a *SymDense, while for many problems the Hessian is sparse, or block structured, or something. Directly, this limits the size of problems that can be solved due to memory constraints. Indirectly, this also significantly limits optimizations available. For instance, right now Newton calls Solve, which means the matrix is always using a Dense routine. There could be simplifications available. For instance, Newton really only requires a SolveGradToer, something like

type SolveVecToer interface {
    Symmetric()
    SolveGradTo(v []float64) err
}

which is easily implemented by SymDense, but also enables taking advantage of sparse structure (or approximate solves).

cc @vladimir-ch

The text was updated successfully, but these errors were encountered:

btracey · 2018-12-02T21:22:10Z

I think we should change the call to be Hessian: func(x []float64) Symmetric. We can then do a switch in Newton, for instance to see if it satisfies SolveVecTo. This would also solve #223

vladimir-ch · 2018-12-22T13:06:47Z

SGTM. If we do this change, then I would change also the Grad function from

Grad func(grad []float64, x []float64)

to

Grad func(x []float64) []float64

…unction calls We need a better way to express the Hessian function call so that sparse Hessians can be provided. This change updates the Problem function definitions to allow an arbitrary Symmetric matrix. With this change, we need to change how Location is used, so that we do not allocate a SymDense. Once this location is changed, we no longer need Needser to allocate the appropriate memory, and can shift that to initialization, further simplifying the interfaces. A 'fake' Problem is passed to Method to continue to make it impossible for the Method to call the functions directly. Fixes #727, #593.

#779) * optimize: Change initialization, remove Needser, and update Problem function calls We need a better way to express the Hessian function call so that sparse Hessians can be provided. This change updates the Problem function definitions to allow an arbitrary Symmetric matrix. With this change, we need to change how Location is used, so that we do not allocate a SymDense. Once this location is changed, we no longer need Needser to allocate the appropriate memory, and can shift that to initialization, further simplifying the interfaces. A 'fake' Problem is passed to Method to continue to make it impossible for the Method to call the functions directly. Fixes #727, #593.

kortschak · 2019-04-18T23:28:55Z

Is this closed by #955?

btracey · 2019-04-19T06:05:05Z

Yes. The dream is dead

kortschak · 2019-04-19T08:27:13Z

Are there other approaches you could use to get what you want?

btracey · 2019-04-19T09:03:05Z

Not within the current design of optimize, that I see. The problem is that anything that uses the Hessian but wants to stay sparse needs to know more than just that it's a Symmetric. It needs more specialized things, like MulVecTo or SolveWithRegularize. As far as I can see, this is incompatible with optimize which by design only has a couple ways of specifying functions. The upside to this is that the methods are guaranteed to be consistent in how they treat Settings, since they all use identical code to do so. This also gives a common place to do the evaluations, which is otherwise tricky to do concurrently.

There are a couple ways to fix this. One is to add a lot of fields to Problem containing all the methods one might want. This of course does not scale. The second is to ditch the idea of Minimize altogether, and instead say that every Method has its own entry function where it can ask for a type that satisfies a certain interface (Functioner, FunGrader, FunGradHesser, etc.). I feel like the current design is quite usable, and I'm not sure how we would make Settings common and useful, but maybe there's a clean design I'm missing.

A third solution, and my current thoughts on the matter, is to generalize the "back-end" of optimize, and then provide different front ends (for which optimize is one of them). This is #925 . The general idea is that we (or someone) could have a different package like sparsenewton. sparsenewton could have it's own Problem, which would be something like:

type Problem struct {
    Func func(x []float64) float64
    Grad(g, x []float64)
    SolveRegularizedTo(dst, x []float64, gamma float64)
}

it's own Settings, but then could call into the same backend to do concurrency. Another potential front-end would be something with constrained optimization, where the Jacobian of the constraint matrix could be sparse. Block-structured constraint Jacobians are very common, but I don't know how that could work in optimize for the same reason that Hessian is difficult -- you need to know what operations you're going to perform on that sparse matrix.

If there is a different way of doing this that I'm missing, let me know. It would be great to just have one entry point for everything, but I don't see the design.

kortschak · 2019-04-19T10:20:44Z

Yes, I think the solution is to step back a little bit from the current abstraction if you can (I don't know enough about this to know if this is possible).

btracey · 2019-04-19T10:21:47Z

What do you mean by "step back"?

kortschak · 2019-04-19T10:29:32Z

Try to find a more abstracted description of the work to be done. At the moment, you have the Func, Grad and the Hess functions, but the Hess function ties you to a specific implementation. What I understood from what you have above is that you might have an approach to abstracting that away to a greater degree).

btracey · 2019-04-21T09:33:31Z

The approach above (#3) was to leave optimize the way it is (after #955 ), and then provide different front-ends. I don't see away to abstract optimize directly

kortschak · 2019-04-21T09:52:59Z

OK

btracey · 2019-04-21T09:54:19Z

(but again, if there is an alternate design I'm missing)

kortschak · 2019-04-21T10:28:45Z

Now that the Hess method has the signature of Hess(dst *mat.SymDense, x []float64) and doesn't return anything, it, could easily be made Hess(dst mat.MutableSymmetric, x []float64) and have the type also have a method the returned the correct size and type of mat.MutableSymmatric, that would be called in evaluate when needed (where we will call mat.NewSymDense(len(x), nil) after #955 is merged). Then loc.Hessian would be a mat.MutableSymmatric. The Hess method could immediately type assert to the concrete type, since it will know what type it should be.

btracey · 2019-04-21T11:10:31Z

How does Minimize know the type to pass into Hess at the start of the optimization? It cannot pass in nil, and does not start with any examples. Do we add a HessianFactory to Settings, and use SymDense if that's not specified? A similar problem arises if the Method needs to retain information about the Hessian at a location. Right now the method can set the field to nil and Minimize will allocate a new one.
It's not clear to me how we make use of this functionality. Does the Method use interface assertions and panic with a mismatch? This can work, but it's not clear to me this is better than having a different front-end that is explicit (and compile-time safe) about what it uses and needs.
This gets more tricky with constraints. There are a few ways to specify it, and I don't know that it's possible to handle them all with one abstraction. I posted thoughts on optimize, all: Re-use of the optimize implementation #925 that I think are mostly clean, but it's a Dense based approach. I don't see how to generalize this to allow for larger optimization problems. The ability to evaluate constraints independently seems incompatible with having a single sparse Jacobian for the constraints.

kortschak · 2019-04-21T11:38:02Z

Do we add a HessianFactory to Settings, and use *SymDense if that's not specified?

I was thinking that it would go in Problem since it would be a partner with Hess.

// evaluate evaluates the routines specified by the Operation at loc.X, and stores
// the answer into loc. loc.X is copied into x before evaluating in order to
// prevent the routines from modifying it.
func evaluate(p *Problem, loc *Location, op Operation, x []float64) {
	if !op.isEvaluation() {
		panic(fmt.Sprintf("optimize: invalid evaluation %v", op))
	}
	copy(x, loc.X)
	if op&FuncEvaluation != 0 {
		loc.F = p.Func(x)
	}
	if op&GradEvaluation != 0 {
		if loc.Gradient == nil {
			loc.Gradient = make([]float64, len(x))
		}
		p.Grad(loc.Gradient, x)
	}
	if op&HessEvaluation != 0 {
		if loc.Hessian == nil {
			loc.Hessian = p.NewHessian(len(x))
		}
		p.Hess(loc.Hessian, x)
	}
}

Since p.NewHessian and p.Hess were both obtained from the same method struct, they will expect the same type and a type assertion can always be expected to succeed.

I really don't know enough about how optimize works to be able to make any kind of necessarily sensible suggestions, but this seemed plausible, I don't know whether this suggestion would be better or worse than an alternative.

The last point, 3., is outside the scope of this. If you think it's worth expanding things to handle that, then maybe it's not worth trying to shoehorn this in.

vladimir-ch · 2019-05-06T13:32:43Z

It's difficult to comment on all of the recent discussion, so I'll just cherry-pick a couple of points.

sparsenewton could have it's own Problem, which would be something like:
type Problem struct {
...
SolveRegularizedTo(dst, x []float64, gamma float64)
}

I don't think this would work. First, it pushes work to the user that they shouldn't be doing because it's an algorithmic detail of the modified Newton. Second, there would have to be a feedback to Method to indicate whether the value of gamma was large enough (and the solve succeeded) or not.

... generalize the "back-end" of optimize, and then provide different front ends (for which optimize is one of them). This is #925 .

Maybe it's because I don't see far enough but I'd like to first have a couple of concrete implementations that use different storage type for the Hessian (or the Jacobian or the constraints) before generalizing into a backend and frontend for optimize. How would this separation do away with the tension between the types that Methods use and the types that the users fill in?

Do we add a HessianFactory to Settings, and use *SymDense if that's not specified?

I don't think that the Hessian creation should go into Settings. The way I see it is that Methods dictate what type the Hessian should be. Current Newton dictates that it must be SymDense because it modifies its diagonal and call Cholesky on the result. If we had NewtonCG, it would probably dictate that it must be just a MulVecToer (or it could simply pick one concrete sparse storage type). So, the restriction on the type comes from the Method and the restriction can take various forms (a concrete type, possibly an interface). Then there is Problem where the user provides problem-specific functions. They should have a way to work with concrete types so that their (repeated) assembly is efficient.

I haven't thought about the details but at least conceptually it follows that Problem should be method specific and the Problem.Hess function would work with the type that also the Method works with. The user would have to adapt to it but that's fine - they have an objective function, they choose the Method based on how well it exploits any special structure and then work with a type that can represent that structure.

btracey added the stability affects API stability label Sep 20, 2018

btracey mentioned this issue Dec 21, 2018

optimize: Replace Needser #727

Closed

btracey mentioned this issue Jan 3, 2019

optimize: Change initialization, remove Needser, and update Problem f… #779

Merged

kortschak mentioned this issue Apr 18, 2019

diff/fd,optimize: don't allocate matrix on nil input to fd.Hessian and use mat.SymDense in optimize #955

Merged

kortschak added this to the v1 milestone Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize: Change Hessian function signature? #593

optimize: Change Hessian function signature? #593

btracey commented Sep 20, 2018

btracey commented Dec 2, 2018

vladimir-ch commented Dec 22, 2018

kortschak commented Apr 18, 2019

btracey commented Apr 19, 2019

kortschak commented Apr 19, 2019

btracey commented Apr 19, 2019

kortschak commented Apr 19, 2019

btracey commented Apr 19, 2019

kortschak commented Apr 19, 2019

btracey commented Apr 21, 2019

kortschak commented Apr 21, 2019

btracey commented Apr 21, 2019

kortschak commented Apr 21, 2019

btracey commented Apr 21, 2019

kortschak commented Apr 21, 2019

vladimir-ch commented May 6, 2019

optimize: Change Hessian function signature? #593

optimize: Change Hessian function signature? #593

Comments

btracey commented Sep 20, 2018

btracey commented Dec 2, 2018

vladimir-ch commented Dec 22, 2018

kortschak commented Apr 18, 2019

btracey commented Apr 19, 2019

kortschak commented Apr 19, 2019

btracey commented Apr 19, 2019

kortschak commented Apr 19, 2019

btracey commented Apr 19, 2019

kortschak commented Apr 19, 2019

btracey commented Apr 21, 2019

kortschak commented Apr 21, 2019

btracey commented Apr 21, 2019

kortschak commented Apr 21, 2019

btracey commented Apr 21, 2019

kortschak commented Apr 21, 2019

vladimir-ch commented May 6, 2019