The initialization for user defined reduction function is always zero(T) #24

N5N3 · 2020-08-17T01:54:56Z

Lead to

a = fill(100,4,4)
@tullio avx = false (&) b[i] := sin(a[i,j])+2 >= 0 

4-element Array{Bool,1}:
 0 # should be 1
 0 # should be 1
 0 # should be 1
 0 # should be 1

Also some time user want keep the initial value for defined reduction function like += and *=
Maybe a synthesis like below is fine.

@tullio (fun_for_reduction,fun_for_initialization,keep) b[i] = sin(a[i,j])+2 >= 0

mcabbott · 2020-08-17T05:39:29Z

There should indeed be a way to set this. I wasn't sure whether to make it a function like zero, or a value like init=0.0, the pattern from reduce etc. Writing @tullio (&,false) b[i] := ... isn't so ugly either.

Another possibility is that there ought to be a function zero_under(+, Float64) == 0.0 but I don't know of one.

It could also catch a few more functions from Base, are there other candidates besides & and |?

#24

N5N3 · 2020-08-17T06:24:59Z

as far as i know, the reducedim.jl in Base only has:

for (Op, initfun) in ((:(typeof(add_sum)), :zero), (:(typeof(mul_prod)), :one))
    @eval initarray!(a::AbstractArray{T}, ::$(Op), init::Bool, src::AbstractArray) where {T} = (init && fill!(a, $(initfun)(T)); a)
end

for Op in (:(typeof(max)), :(typeof(min)))
    @eval initarray!(a::AbstractArray{T}, ::$(Op), init::Bool, src::AbstractArray) where {T} = (init && copyfirst!(a, src); a)
end

for (Op, initval) in ((:(typeof(&)), true), (:(typeof(|)), false))
    @eval initarray!(a::AbstractArray, ::$(Op), init::Bool, src::AbstractArray) = (init && fill!(a, $initval); a)
end

I opened this issue because I tried use StructArrays.jl and Tullio.jl together to merge three "@tullio"s into one, like:

        Result = StructArray((X1_sum, X2_sum, F_sum))
        @tullio Result[i] += (
            abs2(x1_fft[i, j]),
            abs2(x2_fft[i, j]),
            x1_fft[i, j] * conj(x2_fft[i, j]),
        )

instead of

        @tullio X1_sum[i] += abs2(x1_fft[i,j])
        @tullio X2_sum[i] += abs2(x2_fft[i,j])
        @tullio F_sum[i] += x1_fft[i,j] * conj(x2_fft[i,j])

for some acceleration, at the cost of "Base pollution" with dispatched Base.:+ and Base.zero.
I tried use accumu_fun(a,b) instead of +(a,b), but i find the dispatched zero(T) is unavoided, and there seems no way to turn on the keep.

mcabbott · 2020-08-17T07:12:14Z

I see, and after defining Base.zero(T::Type{<:Tuple}) = map(zero, T.parameters) (why isn't that in Base?) and similar +, do you see acceleration from doing these loops together?

To be clear, there is no feature to control this at the moment, neither initialisation nor keeping values (except for += and *=). It's something I haven't got around to building.

N5N3 · 2020-08-17T07:28:23Z

module test
using Tullio
using LoopVectorization
using StructArrays
# dispatch the Base.:+ and Base.zero
const ST = Tuple{T,T,Complex{T}} where {T}
Base.:+(x::ST, y::ST) = (x[1] + y[1], x[2] + y[2], x[3] + y[3])
Base.zero(::Type{Tuple{T,T,Complex{T}}}) where {T} =
    (zero(T), zero(T), zero(Complex{T}))
# 3 @tullio
func1(a,b,x,y,z) = begin
    @tullio x[i] = abs2(a[i,j])
    @tullio y[i] = abs2(b[i,j])
    @tullio z[i] = a[i,j] * conj(b[i,j])
end
# merged
func2(a,b,x,y,z) = begin
    result = StructArray((x,y,z))
    @tullio result[i] = (abs2(a[i,j]),abs2(b[i,j]),a[i,j]*conj(b[i,j]))
end
end
a = randn(ComplexF64,4000,4000)
b = randn(ComplexF64,4000,4000)
x = randn(Float64,4000)
y = randn(Float64,4000)
z = randn(ComplexF64,4000)
Tullio.TILE[] = 2^21
@btime test.func1($a,$b,$x,$y,$z) # 127.037 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 62.509 ms (50 allocations: 3.64 KiB)
Tullio.TILE[] = 2^10
@btime test.func1($a,$b,$x,$y,$z) # 46.956 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 27.686 ms (50 allocations: 3.64 KiB)

On my Laptop, it offered about 2x speed-up, as the cost of getindex is halved.
On the other hand, I think @avx will not work for the merged version:

# dispatch has modified
a = randn(Float64,4000,4000)
b = randn(Float64,4000,4000)
x = randn(Float64,4000)
y = randn(Float64,4000)
z = randn(Float64,4000)
Tullio.TILE[] = 2^21
@btime test.func1($a,$b,$x,$y,$z) # 19.890 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 44.259 ms (50 allocations: 3.64 KiB)
Tullio.TILE[] = 2^10
@btime test.func1($a,$b,$x,$y,$z) # 19.234 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 14.042 ms (50 allocations: 3.64 KiB)

mcabbott · 2020-08-17T07:41:46Z

Thanks, I see. And this persists even for small sizes. Agree that @avx doesn't like the tuples, but it doesn't like Vector{ComplexF64} either.

mcabbott · 2020-08-17T17:47:55Z

With the PR, you can write:

func3(a,b,x,y,z) = begin
    res = StructArray((x,y,z))
    ++(p,q) = map(+,p,q)
    @tullio (++) res[i] = (abs2(a[i,j]), abs2(b[i,j]), a[i,j]*conj(b[i,j]))  init=(0.0, 0.0, 0.0+0im)
end

N5N3 · 2020-08-18T09:40:22Z

It seems that the redunction towards scalar(see thread_scalar()), might use init several times (depend on how many threads it used and whether KEEP is on at first), while thread_halves() will use init at most once.

Thus, a arbitrary user defined init might be dangerous for thread_scalar(), like init = 5 for reduction :+, at least a test like redfun(init,init) == init is needed.

mcabbott pushed a commit that referenced this issue Aug 17, 2020

init for redfun and & or

396b3b6

#24

mcabbott mentioned this issue Aug 17, 2020

Allow init=0.0, and updating with any reduction #25

Merged

mcabbott closed this as completed in #25 Aug 23, 2020

mcabbott mentioned this issue Aug 24, 2020

Scalar reductions #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The initialization for user defined reduction function is always zero(T) #24

The initialization for user defined reduction function is always zero(T) #24

N5N3 commented Aug 17, 2020

mcabbott commented Aug 17, 2020

N5N3 commented Aug 17, 2020 •

edited

mcabbott commented Aug 17, 2020

N5N3 commented Aug 17, 2020 •

edited

mcabbott commented Aug 17, 2020

mcabbott commented Aug 17, 2020

N5N3 commented Aug 18, 2020 •

edited

The initialization for user defined reduction function is always zero(T) #24

The initialization for user defined reduction function is always zero(T) #24

Comments

N5N3 commented Aug 17, 2020

mcabbott commented Aug 17, 2020

N5N3 commented Aug 17, 2020 • edited

mcabbott commented Aug 17, 2020

N5N3 commented Aug 17, 2020 • edited

mcabbott commented Aug 17, 2020

mcabbott commented Aug 17, 2020

N5N3 commented Aug 18, 2020 • edited

N5N3 commented Aug 17, 2020 •

edited

N5N3 commented Aug 17, 2020 •

edited

N5N3 commented Aug 18, 2020 •

edited