Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relation to capstan and cassette.jl #1

Closed
datnamer opened this issue Aug 12, 2018 · 8 comments
Closed

Relation to capstan and cassette.jl #1

datnamer opened this issue Aug 12, 2018 · 8 comments

Comments

@datnamer
Copy link

How does this relate to capstan and cassette?

Is this an alternative, replacement or stopgap?

@MikeInnes
Copy link
Member

So the idea with Zygote is to play around with source-to-source AD techniques which are currently not in Capstan's focus. Jarrett is focused on a fairly different (and currently incompatible) implementation strategy, mainly because he has concerns about Zygote's ability to scale to handle larger programs. Right now, Zygote and Capstan will have to exist as alternatives, but Capstan is intended to become more pluggable so that it'll become possible to integrate whatever parts of Zygote work well in future.

@jrevels might wish to say if that's a fair summary, or elaborate on his concerns.

@datnamer
Copy link
Author

Got it, thanks.

I assume you've also seen Google's Tangent? It also does source to source AD.

@MikeInnes
Copy link
Member

Yup, Tangent is a good mental model for how Zygote is working as well. The main difficulty with doing this in Python is that you have to pretend you're working with a sufficiently-static subset of Python where you can resolve what everything actually does – otherwise you'd have to look up gradient definitions at runtime. Hence Tangent's various limitations with respect to closures, classes etc.

Zygote is fully dynamic, which works because we can actually look up gradients dynamically (via dispatch) and have Julia resolve that at compile time where possible (as usual).

Another difference is that Zygote works on SSA form IR rather than Julia source code. We don't aim for readable output in any way, but are able to support a lot of the language in a uniform way (e.g. arbitrary control flow including @goto).

@johnnychen94
Copy link
Contributor

johnnychen94 commented Aug 20, 2018

@MikeInnes is Zygote going to be an alternative AD backend for Flux just like what you expect for capstan and cassette?

@MikeInnes
Copy link
Member

Yes, it will be a drop-in replacement for Flux's current AD, and may eventually become the default.

@DoktorMike
Copy link

How far are we along this path? I would like to use Zygote instead of Tracker in Flux but so far I haven't found a way of collecting parameters from more complex, i.e., chained models. But maybe I'm missing something obvious..

@MikeInnes
Copy link
Member

On flux master you can do params(m) and pass this to gradient following Zygote's API. e.g.

m = Chain(...)
gradient(params(m)) do
  m(input)
end

I haven't documented this yet because it doesn't have great coverage of Flux yet. But feel free to try it and open issues for anything you run into, and that'll help me prioritise fixes.

MikeInnes pushed a commit that referenced this issue Jul 8, 2019
updating to new master
bors bot pushed a commit that referenced this issue Apr 5, 2020
Merge upstream changes.
bors bot pushed a commit that referenced this issue May 28, 2020
…es (#1)

* ChainRules pullbacks always have 1 input JuliaDiff/ChainRulesCore.jl#152

* swap to version of chainrules that don't use multiarg pullbacks

* update tests

* make so don't need custom rule anymore

* add comment

* Update src/compiler/chainrules.jl

Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk>

Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk>
@ToucheSir
Copy link
Member

Given it's been almost 2 years since Zygote became the default AD in Flux, this should be safe to close :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants