-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash on 0.7.0-rc3.0 #2
Comments
Zygote requires (edit: now simply says it works faster on) Julia built from source including Mike’s patches; I have a Docker image with such a build here: |
We should ideally not be crashing Julia's codegen here, as opposed to just throwing an error, but this is probably easily fixed by adding the right gradient definition (of which we have very few right now; |
Just to confirm in case useful: this does happen on James's image too. |
I'm experiencing a segfault with the following code on julia v0.7/Zygote v0.1. I can't really tell if it's related to this issue using ForwardDiff, Zygote
w = randn(3,3)
x = randn(3)
f(w,x) = w*x
function loss(w,x)
∇xf(x) = ForwardDiff.jacobian(x->f(w,x), x)
w -> sum(abs2.(w)) + norm(∇xf(x)) # Loss function contains nested differentiation of the model with respect to the input
end
l = loss(w,x)
g = Zygote.gradient(l,w) |
I modified the example in the README to add a loss function and it crashed with a sigfault too: using Zygote: @grad, Params, gradient
W, b = rand(2, 3), rand(2);
predict(x) = W*x .+ b;
loss(x,y) = sum(abs2, y .- predict(x))
g = gradient(() -> sum(loss([1,2,3], [1, 1])), Params([W, b]))
g[W], g[b]
|
I added a simple gradient def for I also just realised that our broadcast grad is not type-stable, so this benchmark comes out a bit slower than it needs to be. |
Current release Julia (I'll remove any need for a source build this week, though there'll still be a flag for type system abuse). (v1.0) pkg> add Zygote#master IRTools#master
julia> using Zygote, BenchmarkTools
julia> function logsumexp(x::Array{Float64,1})
A = maximum(x)
ema = exp.(x .- A)
sema = sum(ema)
log(sema) + A
end
logsumexp (generic function with 1 method)
julia> const x = rand(100);
julia> @btime logsumexp(x)
1.515 μs (1 allocation: 896 bytes)
5.119135839482861
julia> @btime Zygote.gradient(logsumexp, x);
4.882 μs (35 allocations: 7.61 KiB) Almost all of the time is spent in broadcast; I imagine there's yet more that we could do to be cleverer here, though it's hard to imagine how we'd get as good as the hand written version without special knowledge. |
The overhead for small problems still seems high (so my tests StaticArrays aren't fast yet), but... julia> @btime logsumexp_both(x);
960.263 ns (3 allocations: 1.78 KiB)
julia> @btime Zygote.gradient(logsumexp, x);
2.737 μs (35 allocations: 7.61 KiB)
julia> @btime ReverseDiff.gradient!($results, $compiled_tape, $inputs);
15.900 μs (0 allocations: 0 bytes)
julia> @btime ForwardDiff.gradient!($res, logsumexp, x, $cfg);
25.158 μs (10 allocations: 87.50 KiB) This was on regular 1.0. This is awesome to see, because it is (a) significantly faster, and (b) Zygote didn't require me to redefine logsumexp to be generic. |
Nice! I did notice one of the earlier StaticArrays examples not infering properly. Feel free to open a new issue for anything like that. |
Very nice. Note that my initial code wasn't quite comparing apples to apples: logsumexp_both computes the function value too. For timing, we should compare Zygote.gradient to # grad_logsumexp()
function grad_logsumexp(x::Array{Float64,1})
A = maximum(x);
ema = exp.(x .- A);
sema = sum(ema);
# l = log(sema) + A;
return ema ./ sema;
end |
Attached jl produces attached log when run with
julia .\zg-logsumexp.jl
Apologies, haven't tested with 1.0. Zygote pkg is added from master in this repo.
zg.log
The text was updated successfully, but these errors were encountered: