-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] use backend everywhere i/o only for autodiff #245
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
…stead. Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Toying around with Thoughts? (I propose to focus on and test with the numpy/jax/pytorch backends first, and adapt the others in the end.) PS. Currently, one could in principle pass a jax-cost, numpy-gradient, and pytorch-hessian to the problem constructor – I don't think this flexibility is needed and propose to explicitly restrict to one backend per problem instance. |
Signed-off-by: Sebastian Weichwald <git@sweichwald.de>
Hey everyone! Great to see some influx of activity on this project again :) Unfortunately I couldn't make it to the meeting you guys had or I would have voiced my opinion and concerns about resolving the backend issue there already. To be blunt, I am strongly advising against adding any type of information about the computational backend to manifold constructors or the Adding information about the computational backend to manifolds would be a prime example of a leaky abstraction. These classes shouldn't have a notion of the particular execution context at all. (Of course you could argue that by implementing manifold numerics in numpy we already do that, but we had to use something.) In my view, that type of abstraction should happen at a higher level, which means an approach similar to geomstats (see https://github.com/geomstats/geomstats/blob/master/geomstats/_backend/README.md) as I outlined in an old email. Instead of writing such an abstraction from scratch though it might be better to look into something like https://github.com/jonasrauber/eagerpy first, and see how much mileage we'd get out of that, potentially by forking the project if necessary and feasible. Implementing internal numerics on top of eagerpy would also resolve the issue of having to request the backend explicitly which addresses one of the issues you raised, @sweichwald, that would make it hard to switch the backend at runtime. Given the shared pain points this could also represent a good opportunity to pool resources and collaborate further with geomstats. |
Hi all 😎 and thanks @nkoep !
Yes, I do agree that To me, how Python Optimal Transport handles this strikes a good balance between (relatively lean) maintenance overhead and easy entry to new contributors (a focus of our last meeting); see their docs on the multi-lib backend and at the core is essentially a dispatch on dtype. Being able to change the backend during runtime is helpful for quick prototyping in interactive live sessions, which, for me, rules out implementations that force users to choose the backend globally. I am hesitant to add dependency on eagerpy for this; I prefer to keep dependencies to a minimum, avoid having to maintain a fork or wait for contributions to make it into eagerpy, and reduce the risk of api-changes/discontinuation of external tools breaking pymanopts core functions. It's likely that most of our multi tools are not generic enough to make it into eagerpy and we thus will have to maintain backend-specific implementations of some functions anyways. To sum up: What are everyone's thoughts about following a dispatch-on-dtype scheme similar to what is implemented in POT? |
Yeah that seems like a feasible approach, though I think calling some sort of What's also nice is that we won't even have to bother checking for the availability of a backend and raise an informative error later. If e.g. tensorflow isn't installed we won't register a backend for it, but a user also won't be able to define a problem that would even require us to dispatch on |
I agree with @sweichwald . We must handle this "backend everywhere" such that we have low maintenance and easy entry for newcomers. What @nkoep has proposed in #246 seems to be the way to go! |
I think what @nkoep proposes in his PR is basically what I had in mind as well during our discussion on Thursday. Well done! |
I view this somewhat differently. I'll explain below, and please jump in (also if I'm missing something about Python idioms). We have
Solvers should never ever care what backend is in use. They use the manifold structure to create + manipulate points and tangent vectors, via the cost for which they get to call Problem descriptions are different. The user provides a cost function. That function, by definition, computes with points on the manifold. Hence, the person who writes the cost function must know how those points are represented. In principle, it is the manifold which decides that (after all, it is also the manifold implementation which decides whether we represent, e.g., rotations are 3x3 orthogonal matrices or as quaternions). Likewise, if the user implement the Riemannian gradient by hand, then they must know how tangent vectors are supposed to be represented. Now, often, points and tangent vectors are represented with standard things such as arrays, and all reasonable backends offer support for arrays. So it makes sense to implement manifolds in a way that they can operate with arrays from various backends. A singledispatch system can help with that. But when that's not convenient / desirable,° the way I see it, it should still be possible to implement a manifold in PyManopt which explicitly caters to a specific backend, in which case users of that manifold will need to write their cost function for that backend. This notably allows users to optimize code for a production backend. If a manifold is compatible with several backends, then the person who implements the cost function needs to say which one they want when they summon the manifold, and write the cost function for that backend. Then, AD from that backend can take care of figuring out the gradient and Hessian. ° E.g., because there is a feature that's really important to you that only exists in one backend. EDIT. TL;DR: In my view, the manifold should decide the backend (possibly via a decorator or flag if it's compatible with more than one), and AD should be applied to a problem description (manifold + cost), not to a function alone. Since the manifold specifies the backend, and the cost function caters to the manifold, AD will use that same backend. Parts of that view may very well be too far from current implementation to be practical now. |
What @nkoep proposed in #246 are general manifolds, i.e., given a cost function defined with a specific backend, the chosen manifold can perform the operations on this backend. But I think it does not prevent defining a manifold for a specific backend. You can even overload a method (e.g. the retraction) and change it with a custom function that is specific to a backend. To not spend too much time to implement/maintain manifolds, we have to implement them with a general backend ( |
Agreed with your follow up @antoinecollas, and then zooming in on:
My suggestion / question is: would it not make sense to specify the backend already when instantiating the manifold (independently from the cost)? This would be done with a decorator, or some other type of parameter / flag. This way, the constructor of the manifold can already prepare things for that backend (if need be). (If the worry is that this would force the user to declare the backend twice (once for the manifold, and once for the cost function), I think this can also be arranged, but maybe not in an easily backward compatible way.) |
All the backend (except maybe tensorflow) compute on the fly. So there is no need to tell the manifold which backend it will use in advance. |
Hi!
Status quo: One needs to decorate the cost function to indicate which autodiff backend ought to be used by pymanopt to obtain gradient/hessian of the cost function. Pymanopt solvers/manifolds (retractions, gradient norm stopping criteria, ...), however, are implemented in numpy and rely on casting from/to numpy arrays.
Idea: Use the backend throughout to avoid moving back and forth between backend and numpy. This requires adapting the manifolds to rely on the backend to compute SVDs etc thinly adding corresponding functionality to the backends.