New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The initialization for user defined reduction function is always zero(T) #24
Comments
There should indeed be a way to set this. I wasn't sure whether to make it a function like Another possibility is that there ought to be a function It could also catch a few more functions from Base, are there other candidates besides |
as far as i know, the reducedim.jl in Base only has: for (Op, initfun) in ((:(typeof(add_sum)), :zero), (:(typeof(mul_prod)), :one))
@eval initarray!(a::AbstractArray{T}, ::$(Op), init::Bool, src::AbstractArray) where {T} = (init && fill!(a, $(initfun)(T)); a)
end
for Op in (:(typeof(max)), :(typeof(min)))
@eval initarray!(a::AbstractArray{T}, ::$(Op), init::Bool, src::AbstractArray) where {T} = (init && copyfirst!(a, src); a)
end
for (Op, initval) in ((:(typeof(&)), true), (:(typeof(|)), false))
@eval initarray!(a::AbstractArray, ::$(Op), init::Bool, src::AbstractArray) = (init && fill!(a, $initval); a)
end I opened this issue because I tried use StructArrays.jl and Tullio.jl together to merge three "@tullio"s into one, like: Result = StructArray((X1_sum, X2_sum, F_sum))
@tullio Result[i] += (
abs2(x1_fft[i, j]),
abs2(x2_fft[i, j]),
x1_fft[i, j] * conj(x2_fft[i, j]),
) instead of @tullio X1_sum[i] += abs2(x1_fft[i,j])
@tullio X2_sum[i] += abs2(x2_fft[i,j])
@tullio F_sum[i] += x1_fft[i,j] * conj(x2_fft[i,j]) for some acceleration, at the cost of "Base pollution" with dispatched |
I see, and after defining To be clear, there is no feature to control this at the moment, neither initialisation nor keeping values (except for |
module test
using Tullio
using LoopVectorization
using StructArrays
# dispatch the Base.:+ and Base.zero
const ST = Tuple{T,T,Complex{T}} where {T}
Base.:+(x::ST, y::ST) = (x[1] + y[1], x[2] + y[2], x[3] + y[3])
Base.zero(::Type{Tuple{T,T,Complex{T}}}) where {T} =
(zero(T), zero(T), zero(Complex{T}))
# 3 @tullio
func1(a,b,x,y,z) = begin
@tullio x[i] = abs2(a[i,j])
@tullio y[i] = abs2(b[i,j])
@tullio z[i] = a[i,j] * conj(b[i,j])
end
# merged
func2(a,b,x,y,z) = begin
result = StructArray((x,y,z))
@tullio result[i] = (abs2(a[i,j]),abs2(b[i,j]),a[i,j]*conj(b[i,j]))
end
end
a = randn(ComplexF64,4000,4000)
b = randn(ComplexF64,4000,4000)
x = randn(Float64,4000)
y = randn(Float64,4000)
z = randn(ComplexF64,4000)
Tullio.TILE[] = 2^21
@btime test.func1($a,$b,$x,$y,$z) # 127.037 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 62.509 ms (50 allocations: 3.64 KiB)
Tullio.TILE[] = 2^10
@btime test.func1($a,$b,$x,$y,$z) # 46.956 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 27.686 ms (50 allocations: 3.64 KiB) On my Laptop, it offered about 2x speed-up, as the cost of # dispatch has modified
a = randn(Float64,4000,4000)
b = randn(Float64,4000,4000)
x = randn(Float64,4000)
y = randn(Float64,4000)
z = randn(Float64,4000)
Tullio.TILE[] = 2^21
@btime test.func1($a,$b,$x,$y,$z) # 19.890 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 44.259 ms (50 allocations: 3.64 KiB)
Tullio.TILE[] = 2^10
@btime test.func1($a,$b,$x,$y,$z) # 19.234 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 14.042 ms (50 allocations: 3.64 KiB) |
Thanks, I see. And this persists even for small sizes. Agree that |
With the PR, you can write: func3(a,b,x,y,z) = begin
res = StructArray((x,y,z))
++(p,q) = map(+,p,q)
@tullio (++) res[i] = (abs2(a[i,j]), abs2(b[i,j]), a[i,j]*conj(b[i,j])) init=(0.0, 0.0, 0.0+0im)
end |
It seems that the redunction towards scalar(see Thus, a arbitrary user defined |
Lead to
Also some time user want keep the initial value for defined reduction function like += and *=
Maybe a synthesis like below is fine.
The text was updated successfully, but these errors were encountered: