Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious printing of "DiffRules._abs_deriv" when LoopVectorization.jl loaded #26

Closed
MasonProtter opened this issue Aug 18, 2020 · 5 comments · Fixed by JuliaSIMD/LoopVectorization.jl#172

Comments

@MasonProtter
Copy link

MasonProtter commented Aug 18, 2020

julia> using Tullio

julia> let v = [-1, 0, 1, 2]
           @tullio out := abs(v[i])
       end
4

julia> using LoopVectorization

julia> let v = [-1, 0, 1, 2]
           @tullio out := abs(v[i])
       end
DiffRules._abs_deriv
4

julia> typeof(ans)
Int64

julia> let v = [-1, 0, 1, 2]
           @tullio out := abs(v[i])
       end
DiffRules._abs_deriv
4

Is this maybe some println debugging that didn't get removed?

@mcabbott
Copy link
Owner

mcabbott commented Aug 18, 2020

What's going on is that it tries to expand @avx within a try-catch block, and for the gradient it fails:

julia> let v = [-1, 0, 1, 2]
           @tullio out := abs(v[i])  verbose=true
       end
┌ Info: symbolic gradients
│   inbody =1-element Array{Any,1}:
└     :(𝛥v[i] = 𝛥v[i] + conj(conj(𝛥ℛ[1]) * DiffRules._abs_deriv(v[i])))
DiffRules._abs_deriv
┌ Warning: LoopVectorization failed  (symbolic gradient)
│   err =
│    LoadError: "Expression not recognized."in expression starting at /Users/me/.julia/packages/Tullio/MpOl9/src/macro.jl:856
└ @ Tullio ~/.julia/packages/Tullio/MpOl9/src/macro.jl:864

And just before it fails, it helpfully prints out the problematic expression:
https://github.com/chriselrod/LoopVectorization.jl/blob/c21c174f0f6676ea9098e632d9bd79e5fb51e885/src/graphs.jl#L606

It seems a little un-Junlian to need try/catch, but it seems otherwise hard to predict what will or won't work. I'm not even sure why it dislikes _abs_deriv, which is from here
https://github.com/JuliaDiff/DiffRules.jl/blob/c97ee0b8a7431a7d707e3a6bcb66de76fdc240b1/src/rules.jl#L68
but it certainly doesn't work:

julia> let v = [-1, 0, 1, 2]
           @tullio a[i] := Tullio.DiffRules._abs_deriv(v[i]) 
       end
ERROR: TypeError: non-boolean (VectorizationBase.Mask{4,UInt8}) used in boolean context
Stacktrace:
 [1] _abs_deriv at /Users/me/.julia/packages/DiffRules/5QwtC/src/rules.jl:72 [inlined]

@mcabbott
Copy link
Owner

This particular case is silenced by 3d6d677.

@MasonProtter
Copy link
Author

Hm, won't there be a performance overhead from the try-catch?

@mcabbott
Copy link
Owner

mcabbott commented Aug 18, 2020

Sure, but I don't think it's the biggest concern. On a fresh session, it takes about this long to run the macro:

julia> @time using Tullio
  0.355330 seconds (1.22 M allocations: 61.950 MiB)

julia> @time @macroexpand @tullio C[i,j] = A[i,k] * B[k,j];
  4.052740 seconds (12.47 M allocations: 629.727 MiB, 4.78% gc time)

julia> @time @macroexpand @tullio C[i,j] = A[i,k] * B[k,j];
  0.001012 seconds (2.01 k allocations: 123.000 KiB)

and with LoopVectorization:

julia> @time using Tullio, LoopVectorization
  2.222334 seconds (4.79 M allocations: 264.077 MiB, 3.54% gc time)

julia> @time @macroexpand @tullio C[i,j] = A[i,k] * B[k,j];
  6.466098 seconds (16.44 M allocations: 832.955 MiB, 5.04% gc time)

julia> @time @macroexpand @tullio C[i,j] = A[i,k] * B[k,j];
  0.002154 seconds (4.49 k allocations: 294.984 KiB)

I would love this to be quicker, but don't know how. Slightly sad comparison:

julia> @time using Einsum
  0.035711 seconds (121.89 k allocations: 6.455 MiB)

julia> @time @macroexpand @einsum C[i,j] = A[i,k] * B[k,j];
  0.276000 seconds (394.87 k allocations: 19.989 MiB, 4.82% gc time)

julia> @time @macroexpand @einsum C[i,j] = A[i,k] * B[k,j];
  0.000290 seconds (458 allocations: 29.516 KiB)

(Edit -- replaced with @macroexpand times, perhaps a better measure. All on Julia 1.5.0.)

If I'm doing this right, the try-catch itself costs about 60μs, compared to 1ms. I guess the cost of expanding @avx until it fails is the big cost, which could be avoided if I detected when to do this. Earlier versions had some code for this...

@MasonProtter
Copy link
Author

Oh, I didn't realize the try-catch was being run at macroexpansion time, I thought it was runtime, my mistake! I'm not at all concerned about the macroexpansion or compile times of Tullio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants