-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setindex!()
performance
#10
Comments
Are your blocks constant size? The use of views in |
No, and they're not even square in general. The blocks in our case represent the mapping from configuration derivative to velocity for a single joint, so the blocks can be of varying sizes and can be non-square for joints with a different number of configurations than velocities. |
Are they like in your example If not, that is, if they are arbitrary sized, I don't believe there is any way around the fact that view(A, Block(N,M))[band(1)] .= (1:N).^(-1) in this case, the code could be designed to do the lookups once. (And could even be done on a GPU with no CPU memory access.) Are you sure that your code needs to call Also, look in BlockArrays.jl at |
No, that's just pulled from the readme. We don't know anything about the sizes of the blocks ahead of time, but we know that the resulting matrix will have block lower and upper bandwidths of 0 (i.e. there is only a single, contiguous diagonal band of blocks), but each block can be of varying size. The matrices we're dealing with are also small enough (less than 50x50) that sparsity isn't a huge concern. My current solution is a struct which stores the parent as a dense matrix and then stores a vector of views into that parent. We don't have to do any fancy lookup to find the right block because there's just the one diagonal band of blocks, so I can just store the views in diagonal-order and look them up linearly. The only two operations that I currently perform on the block-diagonal matrix are:
It looks like |
There’s still a lookup as you have to access the view, whose location is stored in memory. So I don’t think there’s anything about your setup that makes it inherently faster, though storing the views will indeed avoid allocation. I think the fastest option is to lookup the starting index and stride and access the data directly, without creating any views. This is actually a really easy change. |
Sorry, yes, I totally agree. My approach will still be slower than directly accessing a plain Matrix because of the cost of looking up the existing view. In my particular case that's OK, as I can amortize a single lookup over all of the indices which are being set within the given view. And yes, I think looking up the index and stride directly would be faster (and non-allocating) compared to constructing a new view each time. |
Try this: @inline function foosetindex!(A::BlockBandedMatrix{T}, v, i::Int, j::Int) where T
@boundscheck checkbounds(A, i, j)
bi = BlockBandedMatrices.global2blockindex(A.block_sizes, (i, j))
ind = A.block_sizes.block_starts[bi.I...]
st = A.block_sizes.block_strides[bi.I[2]]
@inbounds V.data[ind + bi.α[1]-1 + (bi.α[2]-1)*st] = convert(T, v)::T
return v
end I get julia> @btime foosetindex!(m,2, 5,6)
2.755 μs (4 allocations: 160 bytes)
2 |
That doesn't look quite right. You're accessing a non-existent variable It looks like there's something strange going on with julia> s = m.block_sizes;
julia> @btime BlockBandedMatrices.global2blockindex($s, $(5, 6))
42.521 ns (2 allocations: 128 bytes) |
Ah, I got it. There's a subtle bug that occurs in several places in BlockArrays.jl. In many of the where you do function foo(x)
(BlockArrays.Expr)(:meta, :inline)
_do_stuff(x)
end Changing that to |
Awesome! Though by "you" you mean Can you make a PR? |
Haha, sorry, didn't mean to accuse 😉 . And yup, working on a PR now. |
Hi! We're considering using the
BlockBandedMatrix
type in RigidBodyDynamics.jl to store a block-diagonal Jacobian matrix, but it looks like the current performance ofsetindex!()
is a little too slow for our needs. More importantly, we're trying to write our in-place algorithms in a way that doesn't allocate at all, and the current implementation ofsetindex!()
for BlockBandedMatrix allocates memory for a newview()
at each call.For example:
as compared to
1.604 ns (0 allocations: 0 bytes)
for a plainMatrix{Float64}
.Do you have any thoughts or plans on how to improve the performance? My current solution involves a different AbstractMatrix type which stores all of its block views at construction time, so looking up a block doesn't require allocating a new
view
. Could we do the same thing inBlockBandedMatrix
?The text was updated successfully, but these errors were encountered: