Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] use backend everywhere i/o only for autodiff #245

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

sweichwald
Copy link
Member

@sweichwald sweichwald commented Jun 29, 2023

Hi!

Status quo: One needs to decorate the cost function to indicate which autodiff backend ought to be used by pymanopt to obtain gradient/hessian of the cost function. Pymanopt solvers/manifolds (retractions, gradient norm stopping criteria, ...), however, are implemented in numpy and rely on casting from/to numpy arrays.

Idea: Use the backend throughout to avoid moving back and forth between backend and numpy. This requires adapting the manifolds to rely on the backend to compute SVDs etc thinly adding corresponding functionality to the backends.

Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
@coveralls
Copy link

Coverage Status

coverage: 87.558%. remained the same when pulling f33f135 on backend-everywhere into acb52b2 on master.

…stead.

Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
@sweichwald
Copy link
Member Author

Toying around with examples/backend-everywhere.py -b pytorch is revealing:
Currently, we decorate the cost function to indicate the backend, which is then used to obtain gradient/hessian if autodiff capability is provided by the backend (in our case, that's all backends but numpy). The problem/manifold/solver instances are not in the know about the backend, agnostic to it, and just presume they can feed numpy arrays to cost/gradient/hessian and retrieve numpy arrays back. If that will be no longer the case, manifold.euclidean_to_riemannian_gradient etc will trip as they presume numpy arrays and will fail e.g. when calling numpy multi tools on pytorch tensors. The question is, where and how we would want to set the backend so that, for example, in manifolds/group.py:48 we can choose to call the appropriate multi tools?
We could check dtype in each of the multi tools functions and call appropriate subroutines, but this may be creeping up other places and lead to separate backend switch implementations in various places (the current cost function decorators, in multi tools, ...). This would add to maintenance burden.
Alternatively, we could have a global switch for all of pymanopt so that users would essentially load pymanopt in one of the backend-flavours (and cost functions would no longer need to be annotated explicitly). This may hinder modular use of pymanopt components in larger projects.
I am leaning towards a) decoration of the cost function (while I would drop the need to decorate gradient and hessian and instead enforce they be aligned with the cost function) and b) adding a backend (defaulting to numpy) to the manifold constructor. This way, the problem instance can use the appropriate backend for autodiff capability, and perhaps even set the appropriate backend on the problem manifold, while we still support isolated use of only the manifolds, say to do some Riemannian geometry on the Stiefel manifold using pytorch tensors.

Thoughts?

(I propose to focus on and test with the numpy/jax/pytorch backends first, and adapt the others in the end.)

PS. Currently, one could in principle pass a jax-cost, numpy-gradient, and pytorch-hessian to the problem constructor – I don't think this flexibility is needed and propose to explicitly restrict to one backend per problem instance.

@sweichwald sweichwald marked this pull request as draft June 30, 2023 10:43
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
@nkoep
Copy link
Member

nkoep commented Jul 1, 2023

Hey everyone! Great to see some influx of activity on this project again :)

Unfortunately I couldn't make it to the meeting you guys had or I would have voiced my opinion and concerns about resolving the backend issue there already. To be blunt, I am strongly advising against adding any type of information about the computational backend to manifold constructors or the Problem class. We had a similar setup initially when we were still focused on graph libraries like TF (at the time) and theano, and I believe this led to a lot of maintenance issues. As far as the autodiff backend is concerned, we've worked hard to isolate this information in the function decorators since then, but that only takes care of derivatives, which mainly involves dealing with packing and unpacking of gradients and hvps w.r.t. the point representation of the manifolds involved.

Adding information about the computational backend to manifolds would be a prime example of a leaky abstraction. These classes shouldn't have a notion of the particular execution context at all. (Of course you could argue that by implementing manifold numerics in numpy we already do that, but we had to use something.) In my view, that type of abstraction should happen at a higher level, which means an approach similar to geomstats (see https://github.com/geomstats/geomstats/blob/master/geomstats/_backend/README.md) as I outlined in an old email. Instead of writing such an abstraction from scratch though it might be better to look into something like https://github.com/jonasrauber/eagerpy first, and see how much mileage we'd get out of that, potentially by forking the project if necessary and feasible. Implementing internal numerics on top of eagerpy would also resolve the issue of having to request the backend explicitly which addresses one of the issues you raised, @sweichwald, that would make it hard to switch the backend at runtime. Given the shared pain points this could also represent a good opportunity to pool resources and collaborate further with geomstats.

@sweichwald
Copy link
Member Author

Hi all 😎 and thanks @nkoep !

These classes shouldn't have a notion of the particular execution context at all. (Of course you could argue that by implementing manifold numerics in numpy we already do that, but we had to use something.)

Yes, I do agree that Manifold and Problem should not care about the computational backend and in just choosing numpy we are also somewhat in conflict with this principle. Perhaps a relatively direct way to address this is dispatching the manifold numerics based on dtype instead of “just choosing numpy” (and instead of passing backend information to the constructors – agreed, that's not the way to go).

To me, how Python Optimal Transport handles this strikes a good balance between (relatively lean) maintenance overhead and easy entry to new contributors (a focus of our last meeting); see their docs on the multi-lib backend and at the core is essentially a dispatch on dtype. Being able to change the backend during runtime is helpful for quick prototyping in interactive live sessions, which, for me, rules out implementations that force users to choose the backend globally.

I am hesitant to add dependency on eagerpy for this; I prefer to keep dependencies to a minimum, avoid having to maintain a fork or wait for contributions to make it into eagerpy, and reduce the risk of api-changes/discontinuation of external tools breaking pymanopts core functions. It's likely that most of our multi tools are not generic enough to make it into eagerpy and we thus will have to maintain backend-specific implementations of some functions anyways.

To sum up: What are everyone's thoughts about following a dispatch-on-dtype scheme similar to what is implemented in POT?
If we implement a fallback to casting to/from numpy if a certain function is not yet implemented for the current computational context, we could even facilitate a non-breaking gradual transition.

@nkoep
Copy link
Member

nkoep commented Jul 2, 2023

Yeah that seems like a feasible approach, though I think calling some sort of .get_backend() function every time it's needed will make the code very verbose. I like the idea of a generic "numerics" module though. We can probably leverage the stdlib's singledispatch decorator for that. I whipped up a small PR that demonstrates the idea: #246. As you pointed out, Seb, this would not require users to specify the backend at all as it's fully determined by the first argument which is almost always an array or a tensor type.

What's also nice is that we won't even have to bother checking for the availability of a backend and raise an informative error later. If e.g. tensorflow isn't installed we won't register a backend for it, but a user also won't be able to define a problem that would even require us to dispatch on tf.Tensor so we completely eliminate that error path. Lastly, a user's code can still be entirely written in the framework they wish without having to look up the API of another abstraction layer as we only need to use the numerics layer internally.

@antoinecollas
Copy link
Contributor

I agree with @sweichwald . We must handle this "backend everywhere" such that we have low maintenance and easy entry for newcomers. What @nkoep has proposed in #246 seems to be the way to go!

@kellertuer
Copy link

I think what @nkoep proposes in his PR is basically what I had in mind as well during our discussion on Thursday. Well done!

@NicolasBoumal
Copy link
Contributor

NicolasBoumal commented Jul 4, 2023

Adding information about the computational backend to manifolds would be a prime example of a leaky abstraction. These classes shouldn't have a notion of the particular execution context at all.

I view this somewhat differently. I'll explain below, and please jump in (also if I'm missing something about Python idioms).

We have

  • Problem descriptions, which combine a manifold and a cost function, together with some derivatives.
  • Solvers, which receive a problem description and operate with it.

Solvers should never ever care what backend is in use. They use the manifold structure to create + manipulate points and tangent vectors, via the cost for which they get to call $f(x)$, $\nabla f(x)$, $\nabla^2 f(x)[v]$, ... They don't need to know if those things are arrays, tensors, structures, objects, etc. They may need to do some math of their own, but that would be separate (for that, they can use numpy or whatever, without side effects).

Problem descriptions are different. The user provides a cost function. That function, by definition, computes with points on the manifold. Hence, the person who writes the cost function must know how those points are represented. In principle, it is the manifold which decides that (after all, it is also the manifold implementation which decides whether we represent, e.g., rotations are 3x3 orthogonal matrices or as quaternions). Likewise, if the user implement the Riemannian gradient by hand, then they must know how tangent vectors are supposed to be represented.

Now, often, points and tangent vectors are represented with standard things such as arrays, and all reasonable backends offer support for arrays. So it makes sense to implement manifolds in a way that they can operate with arrays from various backends. A singledispatch system can help with that. But when that's not convenient / desirable,° the way I see it, it should still be possible to implement a manifold in PyManopt which explicitly caters to a specific backend, in which case users of that manifold will need to write their cost function for that backend. This notably allows users to optimize code for a production backend.

If a manifold is compatible with several backends, then the person who implements the cost function needs to say which one they want when they summon the manifold, and write the cost function for that backend. Then, AD from that backend can take care of figuring out the gradient and Hessian.

° E.g., because there is a feature that's really important to you that only exists in one backend.

EDIT. TL;DR: In my view, the manifold should decide the backend (possibly via a decorator or flag if it's compatible with more than one), and AD should be applied to a problem description (manifold + cost), not to a function alone. Since the manifold specifies the backend, and the cost function caters to the manifold, AD will use that same backend. Parts of that view may very well be too far from current implementation to be practical now.

@antoinecollas
Copy link
Contributor

What @nkoep proposed in #246 are general manifolds, i.e., given a cost function defined with a specific backend, the chosen manifold can perform the operations on this backend. But I think it does not prevent defining a manifold for a specific backend. You can even overload a method (e.g. the retraction) and change it with a custom function that is specific to a backend.

To not spend too much time to implement/maintain manifolds, we have to implement them with a general backend (nx in #246). This way, the user specifies a cost function (and thus a backend) and a manifold. The manifold uses the backend specifies by the cost function. And if the user wants to optimize the manifold, he overloads the desired methods for his backend.

@NicolasBoumal
Copy link
Contributor

NicolasBoumal commented Jul 4, 2023

Agreed with your follow up @antoinecollas, and then zooming in on:

[...] the user specifies a cost function (and thus a backend) and a manifold. The manifold uses the backend specified by the cost function.

My suggestion / question is: would it not make sense to specify the backend already when instantiating the manifold (independently from the cost)? This would be done with a decorator, or some other type of parameter / flag. This way, the constructor of the manifold can already prepare things for that backend (if need be).

(If the worry is that this would force the user to declare the backend twice (once for the manifold, and once for the cost function), I think this can also be arranged, but maybe not in an easily backward compatible way.)

@antoinecollas
Copy link
Contributor

All the backend (except maybe tensorflow) compute on the fly. So there is no need to tell the manifold which backend it will use in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants