New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added pytorch autograd backend (minimal changes) #66
Conversation
Nice, thanks for the PR and the accompanying examples! This looks good to me -- only the flake8 tests and some minor pedantic remarks to be fixed, then it should be ready to be merged :-) |
@@ -73,6 +73,7 @@ def precon(x, d): | |||
self._backends = list( | |||
filter(lambda b: b.is_available(), [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer alphabetical order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One needs to test for PytorchBackend
's availability before AutogradBackend
's because both check that the objective is callable and only the former tests the argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again related to #55 #56
@nkoep Option 1 outlined in #55 resolves the most pressing issues for now, while it can be seen as groundwork for Option 2 should the need for this more intrusive rework arise.
I agree, time permitting, resolving #55 #56 as outlined and then merging #66 is preferable. (In case none of us finds the time, I think it is fair to merge this in the meanwhile and revisit #55 #56 at a later point.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is clear that using arg=torch.Tensor()
to indicate that one wants the pytorch backend is a hack. To make things dirtier, one side effect of this hack is to provide a container for the cache. There are surely cleaner ways to do all this, at the expense of more work as usual. In the end, this is your project and you must decide what makes sense for you.
I'm not sure I agree with merging this just yet before coming to a final decision regarding #55 and #56. From a cursory glance, it seems the caching mechanism suggested here should be compatible with the rudimentary pytorch backend proposed here: https://github.com/pymanopt/pymanopt/pull/56/files#diff-25bf4ee53fa80a478acbac699286855e |
I agree that we should move on with (deciding on) #55 / #56 and a more fundamental rework of the backend for the mid-/long-term, which we have not gotten around to yet. In my opinion, a viable solution to offer pytorch support in the short-term is merging this PR which integrates well with our current backend. This way we are not bottlenecked by the more mid-/long-term plans of reworking the backend more fundamentally. It appears the merger would not hinder any future reworks, while it offers some pytorch support for the current version. At this point, this may help to keep engaged more pytorch users which will also be helpful once the fundamental backend change is to come. |
Extra points of potential value:
Alas this involves changing the solvers. |
Fair enough. My only worry was that adding a pytorch backend now, and then immediately breaking the API once we move forward with #55 wouldn't be ideal either. The thing is that there's ultimately nothing which holds #55 back barring a decision on which method to adopt. The actual implementation work is very minor. |
Co-Authored-By: leonbottou <leon@bottou.org>
Co-Authored-By: leonbottou <leon@bottou.org>
Co-Authored-By: leonbottou <leon@bottou.org>
Ping? |
Hey Leon,
sorry for the long delay.
I will start continue work on the backend rewrite probably tomorrow. Once
that is in better shape I'll merge your changes or at least kindly ask you
to refactor them on top of the new architecture. I expect the required
changes to your code to be minimal though. Again, sorry for not being more
responsive.
…On Fri, Jun 14, 2019, 17:13 Leon Bottou ***@***.***> wrote:
Ping?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#66?email_source=notifications&email_token=AAKYLNOYUIBHTPB44UZEW2TP2OYSZA5CNFSM4GCXAYLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXXCXHA#issuecomment-502147996>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKYLNJDUXXFYL7GT7TPXVTP2OYSZANCNFSM4GCXAYLA>
.
|
Niklas: please do not take my previous message as a request for immediate action. Pymanopt really helped us understanding our problem. Then we've made analytical progress and no longer rely on manifold optimization at all. So there is no urgency for our side. I was just curious to know whether the idea of a pytorch backend was still alive. Anyway, this is not a big change. The only nontrivial issues were the caching and the backend selection method. My patch does nothing for the latter... |
Oh, no worries, Leon. It's been bugging me for a while that I had to abandon the rewrite somewhere in the middle. Adding support for pytorch is still very much of interest. Thanks for clarifying the situation though. |
This contains a pytorch backend for pymanopt without changing anything else. The selection of the pytorch backend depends on adding
arg=torch.Tensor()
when creating theProblem
instance, as illustrated in the additional examples.The pytorch tape based differentiation requires us to compute the cost whenever we want the gradient, and to compute the gradient whenever we want the Hessian. To avoid all these duplicate computations, the additional
arg=torch.Tensor()
serves in fact as a container to cache all the computations performed for the latest value ofx
. Other than that, the implement pretty straightforward.