Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive performance degredation between v0.2.8 and v0.2.9 with LoopVectorization.jl #50

Closed
MasonProtter opened this issue Nov 13, 2020 · 3 comments

Comments

@MasonProtter
Copy link

I think something is not being communicated correctly to LoopVectorization.jl on version 0.2.9:

julia> using Tullio, LoopVectorization
[ Info: Precompiling Tullio [bc48ee85-29a4-5162-ae0b-a64e1601d4bc]

julia> tmul!(C, A, B) = @tullio C[i, j] = A[i, k] * B[k, j]
tmul! (generic function with 1 method)

julia> foreach((2, 10, 50, 100)) do N
           A, B = rand(N, N + 1), rand(N + 1, N + 2)
           @show N
           @btime tmul!(C, $A, $B) setup=(C=zeros($N, $N+2)) # Matmul with Tullio.jl
           @btime  mul!(C, $A, $B) setup=(C=zeros($N, $N+2)) # Matmul with OpenBLAS
       end
N = 2
  51.804 ns (0 allocations: 0 bytes)
  123.709 ns (0 allocations: 0 bytes)
N = 10
  210.035 ns (0 allocations: 0 bytes)
  371.549 ns (0 allocations: 0 bytes)
N = 50
  10.550 μs (0 allocations: 0 bytes)
  13.939 μs (0 allocations: 0 bytes)
N = 100
  25.340 μs (49 allocations: 3.19 KiB)
  39.860 μs (0 allocations: 0 bytes)

(@v1.5) pkg> st Tullio LoopVectorization
Status `~/.julia/environments/v1.5/Project.toml`
  [bdcacae8] LoopVectorization v0.8.26
  [bc48ee85] Tullio v0.2.8

Now, restarting julia,

(@v1.5) pkg> add Tullio@v0.2.9
   Updating registry at `~/.julia/registries/General`
   Updating git-repo `https://github.com/JuliaRegistries/General.git`
  Resolving package versions...
Updating `~/.julia/environments/v1.5/Project.toml`
  [bc48ee85]  Tullio v0.2.8  v0.2.9
Updating `~/.julia/environments/v1.5/Manifest.toml`
  [bc48ee85]  Tullio v0.2.8  v0.2.9

julia> using Tullio, LoopVectorization
[ Info: Precompiling Tullio [bc48ee85-29a4-5162-ae0b-a64e1601d4bc]

julia> tmul!(C, A, B) = @tullio C[i, j] = A[i, k] * B[k, j]
tmul! (generic function with 1 method)

julia> foreach((2, 10, 50, 100)) do N
           A, B = rand(N, N + 1), rand(N + 1, N + 2)
           @show N
           @btime tmul!(C, $A, $B) setup=(C=zeros($N, $N+2)) # Matmul with Tullio.jl
           @btime  mul!(C, $A, $B) setup=(C=zeros($N, $N+2)) # Matmul with OpenBLAS
       end
N = 2
  51.125 ns (0 allocations: 0 bytes)
  129.749 ns (0 allocations: 0 bytes)
N = 10
  847.338 ns (0 allocations: 0 bytes)
  372.568 ns (0 allocations: 0 bytes)
N = 50
  111.719 μs (0 allocations: 0 bytes)
  13.920 μs (0 allocations: 0 bytes)
N = 100
  261.787 μs (49 allocations: 3.19 KiB)
  38.220 μs (0 allocations: 0 bytes)

(@v1.5) pkg> st Tullio LoopVectorization
Status `~/.julia/environments/v1.5/Project.toml`
  [bdcacae8] LoopVectorization v0.8.26
  [bc48ee85] Tullio v0.2.9
@MasonProtter MasonProtter changed the title Massive performance degredation between v0.2.8 and v0.2.9 wiht LoopVectorization.jl Massive performance degredation between v0.2.8 and v0.2.9 with LoopVectorization.jl Nov 13, 2020
@mcabbott
Copy link
Owner

Thanks for the report, and for narrowing it down.

I think I see what's going on now. My fix for #46 was to interpolate many basic functions, which results in this:

julia> :(zero(T)) |> dump
Expr
  head: Symbol call
  args: Array{Any}((2,))
    1: Symbol zero
    2: Symbol T

julia> :($zero(T)) |> dump
Expr
  head: Symbol call
  args: Array{Any}((2,))
    1: zero (function of type typeof(zero))
    2: Symbol T

And LoopVectorization doesn't like that at all:

lex1 = quote
    @avx for i in 1:10
        acc = zero(T)
        for j in 1:10
            acc += A[i,j]
        end
        B[i] = acc
    end
end

lex2 = quote
    @avx for i in 1:10
        acc = $zero(T)
        for j in 1:10
            acc += A[i,j]
        end
        B[i] = acc
    end
end

macroexpand(Main, lex1) # ok

macroexpand(Main, lex2) # MethodError: no method matching instruction!(::LoopVectorization.LoopSet, ::typeof(zero))

The error during expansion is caught, so you get only the slow Base version:

julia> using Tullio, LoopVectorization

julia> A, B, C = rand(9,9), rand(9,9), rand(9,9);

julia> @tullio C[i, j] = A[i, k] * B[k, j] grad=false verbose=1;
┌ Warning: LoopVectorization failed 
│   err =
│    LoadError: MethodError: no method matching instruction!(::LoopVectorization.LoopSet, ::typeof(zero))
...

I'm not sure which side the fix should be on. Perhaps I should undo some of these interpolations in the meantime.

mcabbott pushed a commit that referenced this issue Nov 14, 2020
@MasonProtter
Copy link
Author

Ah of course. Yeah, this is a good example of a way in which LoopVectorization.jl is operating kinda at the wrong level of abstraction. Hopefully we can remedy this with the new compiler technology, but that'll have to wait and see...

@mcabbott
Copy link
Owner

I'm going to mark this as closed, since I think v0.2.10 fixes it.

And will make a note to check that any further work on #46 doesn't break this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants