Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on 0.7.0-rc3.0 #2

Closed
awf opened this issue Aug 15, 2018 · 10 comments
Closed

Crash on 0.7.0-rc3.0 #2

awf opened this issue Aug 15, 2018 · 10 comments

Comments

@awf
Copy link

awf commented Aug 15, 2018

Attached jl produces attached log when run with
julia .\zg-logsumexp.jl

Apologies, haven't tested with 1.0. Zygote pkg is added from master in this repo.

zg.log

using Test
using Printf
using Zygote
using BenchmarkTools

# logsumexp()
function logsumexp(x::Array{Float64,1})
  A = maximum(x);
  ema = exp.(x .- A);
  sema = sum(ema);
  log(sema) + A;
end

r = rand(5);
@test logsumexp(r) ≈ log(sum(exp.(r))) atol=1.0e-8
@printf("logsumexp: %f = %f\n", logsumexp(r), log(sum(exp.(r))));

# logsumexp()
function logsumexp_both(x::Array{Float64,1})
  A = maximum(x);
  ema = exp.(x .- A);
  sema = sum(ema);
  l = log(sema) + A;
  Jacobian = ema ./ sema;
  return (l, Jacobian)
end

x = rand(100)

"Timing logsumexp_both" |> println
@btime logsumexp_both(x)

"Timing zygote" |> println
@btime Zygote.gradient(logsumexp, x)

@jekbradbury
Copy link

jekbradbury commented Aug 15, 2018

Zygote requires (edit: now simply says it works faster on) Julia built from source including Mike’s patches; I have a Docker image with such a build here: docker pull gcr.io/personal-204923/julia:zygote

@MikeInnes
Copy link
Member

We should ideally not be crashing Julia's codegen here, as opposed to just throwing an error, but this is probably easily fixed by adding the right gradient definition (of which we have very few right now; * and + with scalars is enough to check the chain rule is working :)). I'll take a look shortly.

@awf
Copy link
Author

awf commented Aug 16, 2018

Just to confirm in case useful: this does happen on James's image too.

@baggepinnen
Copy link
Contributor

baggepinnen commented Aug 18, 2018

I'm experiencing a segfault with the following code on julia v0.7/Zygote v0.1. I can't really tell if it's related to this issue

using ForwardDiff, Zygote
w = randn(3,3)
x = randn(3)
f(w,x) = w*x
function loss(w,x)
    ∇xf(x) = ForwardDiff.jacobian(x->f(w,x), x)
    w -> sum(abs2.(w)) + norm(∇xf(x)) # Loss function contains nested differentiation of the model with respect to the input
end

l = loss(w,x)
g = Zygote.gradient(l,w)

@ngphuoc
Copy link

ngphuoc commented Aug 21, 2018

I modified the example in the README to add a loss function and it crashed with a sigfault too:

using Zygote: @grad, Params, gradient

W, b = rand(2, 3), rand(2);

predict(x) = W*x .+ b;

loss(x,y) = sum(abs2, y .- predict(x))

g = gradient(() -> sum(loss([1,2,3], [1, 1])), Params([W, b]))

g[W], g[b]

@MikeInnes
Copy link
Member

I added a simple gradient def for maximum which fixes the original example. I really need to bring over all the definitions that Flux has.

I also just realised that our broadcast grad is not type-stable, so this benchmark comes out a bit slower than it needs to be.

@MikeInnes
Copy link
Member

Current release Julia (I'll remove any need for a source build this week, though there'll still be a flag for type system abuse).

(v1.0) pkg> add Zygote#master IRTools#master

julia> using Zygote, BenchmarkTools

julia> function logsumexp(x::Array{Float64,1})
         A = maximum(x)
         ema = exp.(x .- A)
         sema = sum(ema)
         log(sema) + A
       end
logsumexp (generic function with 1 method)

julia> const x = rand(100);

julia> @btime logsumexp(x)
  1.515 μs (1 allocation: 896 bytes)
5.119135839482861

julia> @btime Zygote.gradient(logsumexp, x);
  4.882 μs (35 allocations: 7.61 KiB)

Almost all of the time is spent in broadcast; I imagine there's yet more that we could do to be cleverer here, though it's hard to imagine how we'd get as good as the hand written version without special knowledge.

@chriselrod
Copy link

chriselrod commented Sep 11, 2018

The overhead for small problems still seems high (so my tests StaticArrays aren't fast yet), but...

julia> @btime logsumexp_both(x);
  960.263 ns (3 allocations: 1.78 KiB)

julia> @btime Zygote.gradient(logsumexp, x);
  2.737 μs (35 allocations: 7.61 KiB)

julia> @btime ReverseDiff.gradient!($results, $compiled_tape, $inputs);
  15.900 μs (0 allocations: 0 bytes)

julia> @btime ForwardDiff.gradient!($res, logsumexp, x, $cfg);
  25.158 μs (10 allocations: 87.50 KiB)

This was on regular 1.0.

This is awesome to see, because it is (a) significantly faster, and (b) Zygote didn't require me to redefine logsumexp to be generic.

@MikeInnes
Copy link
Member

MikeInnes commented Sep 11, 2018

Nice!

I did notice one of the earlier StaticArrays examples not infering properly. Feel free to open a new issue for anything like that.

@awf
Copy link
Author

awf commented Sep 13, 2018

Very nice. Note that my initial code wasn't quite comparing apples to apples: logsumexp_both computes the function value too. For timing, we should compare Zygote.gradient to

# grad_logsumexp()
function grad_logsumexp(x::Array{Float64,1})
  A = maximum(x);
  ema = exp.(x .- A);
  sema = sum(ema);
  # l = log(sema) + A;
  return ema ./ sema;
end

bors bot pushed a commit that referenced this issue Apr 22, 2020
Merge upstream changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants