-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relation to capstan and cassette.jl #1
Comments
So the idea with Zygote is to play around with source-to-source AD techniques which are currently not in Capstan's focus. Jarrett is focused on a fairly different (and currently incompatible) implementation strategy, mainly because he has concerns about Zygote's ability to scale to handle larger programs. Right now, Zygote and Capstan will have to exist as alternatives, but Capstan is intended to become more pluggable so that it'll become possible to integrate whatever parts of Zygote work well in future. @jrevels might wish to say if that's a fair summary, or elaborate on his concerns. |
Got it, thanks. I assume you've also seen Google's Tangent? It also does source to source AD. |
Yup, Tangent is a good mental model for how Zygote is working as well. The main difficulty with doing this in Python is that you have to pretend you're working with a sufficiently-static subset of Python where you can resolve what everything actually does – otherwise you'd have to look up gradient definitions at runtime. Hence Tangent's various limitations with respect to closures, classes etc. Zygote is fully dynamic, which works because we can actually look up gradients dynamically (via dispatch) and have Julia resolve that at compile time where possible (as usual). Another difference is that Zygote works on SSA form IR rather than Julia source code. We don't aim for readable output in any way, but are able to support a lot of the language in a uniform way (e.g. arbitrary control flow including |
@MikeInnes is Zygote going to be an alternative AD backend for Flux just like what you expect for capstan and cassette? |
Yes, it will be a drop-in replacement for Flux's current AD, and may eventually become the default. |
How far are we along this path? I would like to use Zygote instead of Tracker in Flux but so far I haven't found a way of collecting parameters from more complex, i.e., chained models. But maybe I'm missing something obvious.. |
On flux master you can do m = Chain(...)
gradient(params(m)) do
m(input)
end I haven't documented this yet because it doesn't have great coverage of Flux yet. But feel free to try it and open issues for anything you run into, and that'll help me prioritise fixes. |
…es (#1) * ChainRules pullbacks always have 1 input JuliaDiff/ChainRulesCore.jl#152 * swap to version of chainrules that don't use multiarg pullbacks * update tests * make so don't need custom rule anymore * add comment * Update src/compiler/chainrules.jl Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk> Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk>
Given it's been almost 2 years since Zygote became the default AD in Flux, this should be safe to close :) |
How does this relate to capstan and cassette?
Is this an alternative, replacement or stopgap?
The text was updated successfully, but these errors were encountered: