Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The initialization for user defined reduction function is always zero(T) #24

Closed
N5N3 opened this issue Aug 17, 2020 · 7 comments · Fixed by #25
Closed

The initialization for user defined reduction function is always zero(T) #24

N5N3 opened this issue Aug 17, 2020 · 7 comments · Fixed by #25

Comments

@N5N3
Copy link

N5N3 commented Aug 17, 2020

Lead to

a = fill(100,4,4)
@tullio avx = false (&) b[i] := sin(a[i,j])+2 >= 0 

4-element Array{Bool,1}:
 0 # should be 1
 0 # should be 1
 0 # should be 1
 0 # should be 1

Also some time user want keep the initial value for defined reduction function like += and *=
Maybe a synthesis like below is fine.

@tullio (fun_for_reduction,fun_for_initialization,keep) b[i] = sin(a[i,j])+2 >= 0
@mcabbott
Copy link
Owner

There should indeed be a way to set this. I wasn't sure whether to make it a function like zero, or a value like init=0.0, the pattern from reduce etc. Writing @tullio (&,false) b[i] := ... isn't so ugly either.

Another possibility is that there ought to be a function zero_under(+, Float64) == 0.0 but I don't know of one.

It could also catch a few more functions from Base, are there other candidates besides & and |?

mcabbott pushed a commit that referenced this issue Aug 17, 2020
@N5N3
Copy link
Author

N5N3 commented Aug 17, 2020

as far as i know, the reducedim.jl in Base only has:

for (Op, initfun) in ((:(typeof(add_sum)), :zero), (:(typeof(mul_prod)), :one))
    @eval initarray!(a::AbstractArray{T}, ::$(Op), init::Bool, src::AbstractArray) where {T} = (init && fill!(a, $(initfun)(T)); a)
end

for Op in (:(typeof(max)), :(typeof(min)))
    @eval initarray!(a::AbstractArray{T}, ::$(Op), init::Bool, src::AbstractArray) where {T} = (init && copyfirst!(a, src); a)
end

for (Op, initval) in ((:(typeof(&)), true), (:(typeof(|)), false))
    @eval initarray!(a::AbstractArray, ::$(Op), init::Bool, src::AbstractArray) = (init && fill!(a, $initval); a)
end

I opened this issue because I tried use StructArrays.jl and Tullio.jl together to merge three "@tullio"s into one, like:

        Result = StructArray((X1_sum, X2_sum, F_sum))
        @tullio Result[i] += (
            abs2(x1_fft[i, j]),
            abs2(x2_fft[i, j]),
            x1_fft[i, j] * conj(x2_fft[i, j]),
        )

instead of

        @tullio X1_sum[i] += abs2(x1_fft[i,j])
        @tullio X2_sum[i] += abs2(x2_fft[i,j])
        @tullio F_sum[i] += x1_fft[i,j] * conj(x2_fft[i,j])

for some acceleration, at the cost of "Base pollution" with dispatched Base.:+ and Base.zero.
I tried use accumu_fun(a,b) instead of +(a,b), but i find the dispatched zero(T) is unavoided, and there seems no way to turn on the keep.

@mcabbott
Copy link
Owner

I see, and after defining Base.zero(T::Type{<:Tuple}) = map(zero, T.parameters) (why isn't that in Base?) and similar +, do you see acceleration from doing these loops together?

To be clear, there is no feature to control this at the moment, neither initialisation nor keeping values (except for += and *=). It's something I haven't got around to building.

@N5N3
Copy link
Author

N5N3 commented Aug 17, 2020

module test
using Tullio
using LoopVectorization
using StructArrays
# dispatch the Base.:+ and Base.zero
const ST = Tuple{T,T,Complex{T}} where {T}
Base.:+(x::ST, y::ST) = (x[1] + y[1], x[2] + y[2], x[3] + y[3])
Base.zero(::Type{Tuple{T,T,Complex{T}}}) where {T} =
    (zero(T), zero(T), zero(Complex{T}))
# 3 @tullio
func1(a,b,x,y,z) = begin
    @tullio x[i] = abs2(a[i,j])
    @tullio y[i] = abs2(b[i,j])
    @tullio z[i] = a[i,j] * conj(b[i,j])
end
# merged
func2(a,b,x,y,z) = begin
    result = StructArray((x,y,z))
    @tullio result[i] = (abs2(a[i,j]),abs2(b[i,j]),a[i,j]*conj(b[i,j]))
end
end
a = randn(ComplexF64,4000,4000)
b = randn(ComplexF64,4000,4000)
x = randn(Float64,4000)
y = randn(Float64,4000)
z = randn(ComplexF64,4000)
Tullio.TILE[] = 2^21
@btime test.func1($a,$b,$x,$y,$z) # 127.037 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 62.509 ms (50 allocations: 3.64 KiB)
Tullio.TILE[] = 2^10
@btime test.func1($a,$b,$x,$y,$z) # 46.956 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 27.686 ms (50 allocations: 3.64 KiB)

On my Laptop, it offered about 2x speed-up, as the cost of getindex is halved.
On the other hand, I think @avx will not work for the merged version:

# dispatch has modified
a = randn(Float64,4000,4000)
b = randn(Float64,4000,4000)
x = randn(Float64,4000)
y = randn(Float64,4000)
z = randn(Float64,4000)
Tullio.TILE[] = 2^21
@btime test.func1($a,$b,$x,$y,$z) # 19.890 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 44.259 ms (50 allocations: 3.64 KiB)
Tullio.TILE[] = 2^10
@btime test.func1($a,$b,$x,$y,$z) # 19.234 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 14.042 ms (50 allocations: 3.64 KiB)

@mcabbott
Copy link
Owner

Thanks, I see. And this persists even for small sizes. Agree that @avx doesn't like the tuples, but it doesn't like Vector{ComplexF64} either.

@mcabbott
Copy link
Owner

With the PR, you can write:

func3(a,b,x,y,z) = begin
    res = StructArray((x,y,z))
    ++(p,q) = map(+,p,q)
    @tullio (++) res[i] = (abs2(a[i,j]), abs2(b[i,j]), a[i,j]*conj(b[i,j]))  init=(0.0, 0.0, 0.0+0im)
end

@N5N3
Copy link
Author

N5N3 commented Aug 18, 2020

It seems that the redunction towards scalar(see thread_scalar()), might use init several times (depend on how many threads it used and whether KEEP is on at first), while thread_halves() will use init at most once.

Thus, a arbitrary user defined init might be dangerous for thread_scalar(), like init = 5 for reduction :+, at least a test like redfun(init,init) == init is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants