Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize: Change Hessian function signature? #593

Open
btracey opened this issue Sep 20, 2018 · 16 comments
Open

optimize: Change Hessian function signature? #593

btracey opened this issue Sep 20, 2018 · 16 comments
Labels
stability affects API stability
Milestone

Comments

@btracey
Copy link
Member

btracey commented Sep 20, 2018

I've been thinking for a while that we might want to change the signature for the Hessian method. The current signature forces the Hessian to be a *SymDense, while for many problems the Hessian is sparse, or block structured, or something. Directly, this limits the size of problems that can be solved due to memory constraints. Indirectly, this also significantly limits optimizations available. For instance, right now Newton calls Solve, which means the matrix is always using a Dense routine. There could be simplifications available. For instance, Newton really only requires a SolveGradToer, something like

type SolveVecToer interface {
    Symmetric()
    SolveGradTo(v []float64) err
}

which is easily implemented by SymDense, but also enables taking advantage of sparse structure (or approximate solves).

cc @vladimir-ch

@btracey btracey added the stability affects API stability label Sep 20, 2018
@btracey
Copy link
Member Author

btracey commented Dec 2, 2018

I think we should change the call to be Hessian: func(x []float64) Symmetric. We can then do a switch in Newton, for instance to see if it satisfies SolveVecTo. This would also solve #223

@vladimir-ch
Copy link
Member

SGTM. If we do this change, then I would change also the Grad function from

Grad func(grad []float64, x []float64)

to

Grad func(x []float64) []float64

btracey added a commit that referenced this issue Jan 3, 2019
…unction calls

We need a better way to express the Hessian function call so that sparse Hessians can be provided. This change updates the Problem function definitions to allow an arbitrary Symmetric matrix. With this change, we need to change how Location is used, so that we do not allocate a SymDense. Once this location is changed, we no longer need Needser to allocate the appropriate memory, and can shift that to initialization, further simplifying the interfaces.

A 'fake' Problem is passed to Method to continue to make it impossible for the Method to call the functions directly.

Fixes #727, #593.
btracey added a commit that referenced this issue Feb 1, 2019
#779)

* optimize: Change initialization, remove Needser, and update Problem function calls

We need a better way to express the Hessian function call so that sparse Hessians can be provided. This change updates the Problem function definitions to allow an arbitrary Symmetric matrix. With this change, we need to change how Location is used, so that we do not allocate a SymDense. Once this location is changed, we no longer need Needser to allocate the appropriate memory, and can shift that to initialization, further simplifying the interfaces.

A 'fake' Problem is passed to Method to continue to make it impossible for the Method to call the functions directly.

Fixes #727, #593.
@kortschak
Copy link
Member

Is this closed by #955?

@btracey
Copy link
Member Author

btracey commented Apr 19, 2019

Yes. The dream is dead

@kortschak
Copy link
Member

Are there other approaches you could use to get what you want?

@btracey
Copy link
Member Author

btracey commented Apr 19, 2019

Not within the current design of optimize, that I see. The problem is that anything that uses the Hessian but wants to stay sparse needs to know more than just that it's a Symmetric. It needs more specialized things, like MulVecTo or SolveWithRegularize. As far as I can see, this is incompatible with optimize which by design only has a couple ways of specifying functions. The upside to this is that the methods are guaranteed to be consistent in how they treat Settings, since they all use identical code to do so. This also gives a common place to do the evaluations, which is otherwise tricky to do concurrently.

There are a couple ways to fix this. One is to add a lot of fields to Problem containing all the methods one might want. This of course does not scale. The second is to ditch the idea of Minimize altogether, and instead say that every Method has its own entry function where it can ask for a type that satisfies a certain interface (Functioner, FunGrader, FunGradHesser, etc.). I feel like the current design is quite usable, and I'm not sure how we would make Settings common and useful, but maybe there's a clean design I'm missing.

A third solution, and my current thoughts on the matter, is to generalize the "back-end" of optimize, and then provide different front ends (for which optimize is one of them). This is #925 . The general idea is that we (or someone) could have a different package like sparsenewton. sparsenewton could have it's own Problem, which would be something like:

type Problem struct {
    Func func(x []float64) float64
    Grad(g, x []float64)
    SolveRegularizedTo(dst, x []float64, gamma float64)
}

it's own Settings, but then could call into the same backend to do concurrency. Another potential front-end would be something with constrained optimization, where the Jacobian of the constraint matrix could be sparse. Block-structured constraint Jacobians are very common, but I don't know how that could work in optimize for the same reason that Hessian is difficult -- you need to know what operations you're going to perform on that sparse matrix.

If there is a different way of doing this that I'm missing, let me know. It would be great to just have one entry point for everything, but I don't see the design.

@kortschak
Copy link
Member

Yes, I think the solution is to step back a little bit from the current abstraction if you can (I don't know enough about this to know if this is possible).

@btracey
Copy link
Member Author

btracey commented Apr 19, 2019

What do you mean by "step back"?

@kortschak
Copy link
Member

Try to find a more abstracted description of the work to be done. At the moment, you have the Func, Grad and the Hess functions, but the Hess function ties you to a specific implementation. What I understood from what you have above is that you might have an approach to abstracting that away to a greater degree).

@btracey
Copy link
Member Author

btracey commented Apr 21, 2019

The approach above (#3) was to leave optimize the way it is (after #955 ), and then provide different front-ends. I don't see away to abstract optimize directly

@kortschak
Copy link
Member

OK

@btracey
Copy link
Member Author

btracey commented Apr 21, 2019

(but again, if there is an alternate design I'm missing)

@kortschak
Copy link
Member

Now that the Hess method has the signature of Hess(dst *mat.SymDense, x []float64) and doesn't return anything, it, could easily be made Hess(dst mat.MutableSymmetric, x []float64) and have the type also have a method the returned the correct size and type of mat.MutableSymmatric, that would be called in evaluate when needed (where we will call mat.NewSymDense(len(x), nil) after #955 is merged). Then loc.Hessian would be a mat.MutableSymmatric. The Hess method could immediately type assert to the concrete type, since it will know what type it should be.

@btracey
Copy link
Member Author

btracey commented Apr 21, 2019

  1. How does Minimize know the type to pass into Hess at the start of the optimization? It cannot pass in nil, and does not start with any examples. Do we add a HessianFactory to Settings, and use SymDense if that's not specified? A similar problem arises if the Method needs to retain information about the Hessian at a location. Right now the method can set the field to nil and Minimize will allocate a new one.

  2. It's not clear to me how we make use of this functionality. Does the Method use interface assertions and panic with a mismatch? This can work, but it's not clear to me this is better than having a different front-end that is explicit (and compile-time safe) about what it uses and needs.

  3. This gets more tricky with constraints. There are a few ways to specify it, and I don't know that it's possible to handle them all with one abstraction. I posted thoughts on optimize, all: Re-use of the optimize implementation #925 that I think are mostly clean, but it's a Dense based approach. I don't see how to generalize this to allow for larger optimization problems. The ability to evaluate constraints independently seems incompatible with having a single sparse Jacobian for the constraints.

@kortschak
Copy link
Member

Do we add a HessianFactory to Settings, and use *SymDense if that's not specified?

I was thinking that it would go in Problem since it would be a partner with Hess.

// evaluate evaluates the routines specified by the Operation at loc.X, and stores
// the answer into loc. loc.X is copied into x before evaluating in order to
// prevent the routines from modifying it.
func evaluate(p *Problem, loc *Location, op Operation, x []float64) {
	if !op.isEvaluation() {
		panic(fmt.Sprintf("optimize: invalid evaluation %v", op))
	}
	copy(x, loc.X)
	if op&FuncEvaluation != 0 {
		loc.F = p.Func(x)
	}
	if op&GradEvaluation != 0 {
		if loc.Gradient == nil {
			loc.Gradient = make([]float64, len(x))
		}
		p.Grad(loc.Gradient, x)
	}
	if op&HessEvaluation != 0 {
		if loc.Hessian == nil {
			loc.Hessian = p.NewHessian(len(x))
		}
		p.Hess(loc.Hessian, x)
	}
}

Since p.NewHessian and p.Hess were both obtained from the same method struct, they will expect the same type and a type assertion can always be expected to succeed.

I really don't know enough about how optimize works to be able to make any kind of necessarily sensible suggestions, but this seemed plausible, I don't know whether this suggestion would be better or worse than an alternative.

The last point, 3., is outside the scope of this. If you think it's worth expanding things to handle that, then maybe it's not worth trying to shoehorn this in.

@vladimir-ch
Copy link
Member

It's difficult to comment on all of the recent discussion, so I'll just cherry-pick a couple of points.

sparsenewton could have it's own Problem, which would be something like:
type Problem struct {
...
SolveRegularizedTo(dst, x []float64, gamma float64)
}

I don't think this would work. First, it pushes work to the user that they shouldn't be doing because it's an algorithmic detail of the modified Newton. Second, there would have to be a feedback to Method to indicate whether the value of gamma was large enough (and the solve succeeded) or not.

... generalize the "back-end" of optimize, and then provide different front ends (for which optimize is one of them). This is #925 .

Maybe it's because I don't see far enough but I'd like to first have a couple of concrete implementations that use different storage type for the Hessian (or the Jacobian or the constraints) before generalizing into a backend and frontend for optimize. How would this separation do away with the tension between the types that Methods use and the types that the users fill in?

Do we add a HessianFactory to Settings, and use *SymDense if that's not specified?

I don't think that the Hessian creation should go into Settings. The way I see it is that Methods dictate what type the Hessian should be. Current Newton dictates that it must be SymDense because it modifies its diagonal and call Cholesky on the result. If we had NewtonCG, it would probably dictate that it must be just a MulVecToer (or it could simply pick one concrete sparse storage type). So, the restriction on the type comes from the Method and the restriction can take various forms (a concrete type, possibly an interface). Then there is Problem where the user provides problem-specific functions. They should have a way to work with concrete types so that their (repeated) assembly is efficient.

I haven't thought about the details but at least conceptually it follows that Problem should be method specific and the Problem.Hess function would work with the type that also the Method works with. The user would have to adapt to it but that's fine - they have an objective function, they choose the Method based on how well it exploits any special structure and then work with a type that can represent that structure.

@kortschak kortschak added this to the v1 milestone Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stability affects API stability
Projects
None yet
Development

No branches or pull requests

3 participants