Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcasting and scalar indexing in the case of CuArrays #42

Closed
corentin-dev opened this issue Feb 1, 2022 · 12 comments · Fixed by #43 or #45
Closed

Broadcasting and scalar indexing in the case of CuArrays #42

corentin-dev opened this issue Feb 1, 2022 · 12 comments · Fixed by #43 or #45

Comments

@corentin-dev
Copy link
Contributor

Hi,
Good progress was made in term of GPU support.
Some problem still arise when using PencilArrays. For example the following code fails:

## MPI
using MPI

using PencilArrays

MPI.Init()
comm = MPI.COMM_WORLD
rank_ = MPI.Comm_rank(comm)
size_ = MPI.Comm_size(comm)

topo_dims2 = [0]
MPI.Dims_create!(size_,topo_dims2)
topo2 = MPITopology(comm, Tuple(topo_dims2))
pen2_x = Pencil(topo2, (128,128))

xp2p = PencilArray(pen2_x, Array{Complex{Float32}}(undef, size_local(pen2_x)));

rx,ry = axes(global_view(xp2p));
nxl,nyl = size_local(xp2p);
xxp = Array(reshape(LinRange(0,1,128)[rx],nxl,1));
yyp = Array(reshape(LinRange(0,1,128)[ry],1,nyl));
@. xp2p = sin(2π * xxp)*sin(2π * yyp);

## CUDA
using CUDA

# create pencil
pen2g_x = Pencil(CuArray, topo2, (128,128))
# create array
xp2g = PencilArray{Complex{Float32}}(undef, pen2g_x);

# restriction
rx,ry = axes(global_view(xp2g));
nxl,nyl = size_local(xp2g);
xxg = CuArray(reshape(LinRange(0,1,128)[rx],nxl,1));
yyg = CuArray(reshape(LinRange(0,1,128)[ry],1,nyl));
@. xp2g = sin(2π * xxg)*sin(2π * yyg);

The error mention materialize!:

 [1] error(s::String)
    @ Base ./error.jl:33
  [2] assertscalar(op::String)
    @ GPUArrays ~/.julia/packages/GPUArrays/umZob/src/host/indexing.jl:53
  [3] getindex(::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, ::Int64, ::Int64)
    @ GPUArrays ~/.julia/packages/GPUArrays/umZob/src/host/indexing.jl:86
  [4] getindex
    @ ~/.julia/packages/GPUArrays/umZob/src/host/indexing.jl:107 [inlined]
  [5] _broadcast_getindex
    @ ./broadcast.jl:636 [inlined]
  [6] _getindex
    @ ./broadcast.jl:667 [inlined]
  [7] _getindex
    @ ./broadcast.jl:666 [inlined]
  [8] _broadcast_getindex
    @ ./broadcast.jl:642 [inlined]
  [9] _getindex
    @ ./broadcast.jl:667 [inlined]
 [10] _broadcast_getindex
    @ ./broadcast.jl:642 [inlined]
 [11] _getindex
    @ ./broadcast.jl:666 [inlined]
 [12] _broadcast_getindex
    @ ./broadcast.jl:642 [inlined]
 [13] getindex
    @ ./broadcast.jl:597 [inlined]
 [14] macro expansion
    @ ./broadcast.jl:961 [inlined]
 [15] macro expansion
    @ ./simdloop.jl:77 [inlined]
 [16] copyto!
    @ ./broadcast.jl:960 [inlined]
 [17] copyto!
    @ ./broadcast.jl:913 [inlined]
 [18] materialize!
    @ ./broadcast.jl:871 [inlined]
 [19] materialize!(dest::PencilArray{ComplexF32, 2, CuArray{ComplexF32, 2, CUDA.Mem.DeviceBuffer}, 2, 0, Pencil{2, 1, NoPermutation, CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}}}, bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(sin), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(*), Tuple{Int64, Irrational{:π}}}, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}}}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(sin), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(*), Tuple{Int64, Irrational{:π}}}, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}}}}}})
 [20] @ Base.Broadcast ./broadcast.jl:868

I am not sure why it fails. The same kind of line with CuArray instead of PencilArray work.

Thanks a lot!
Corentin

@jipolanco
Copy link
Owner

Hi, indeed it seems to be a problem with broadcasting.

I've made some progress trying to fix this but it's quite tricky, especially when combined with dimension permutations! I'll try to push a fix soon.

@corentin-dev
Copy link
Contributor Author

Thanks a lot!

@corentin-dev
Copy link
Contributor Author

corentin-dev commented Feb 3, 2022

I don't know if you are aware, but this commit changed the order for broadcasting:

# size nx ny nz
plan = PencilFFTPlan(pen_array, Transforms.FFT())
x = LinRange(0,2*pi,nx)
y = LinRange(0,2*pi,ny)
z = LinRange(0,2*pi,nz)

# temporary arrays
pen_hat = allocate_output(plan)

# axes
pen_hat_glob = global_view(pen_hat)

# ranges
rx_hat, ry_hat, rz_hat = axes(pen_hat_glob)
nxl_hat, nyl_hat, nzl_hat = size_local(pen_hat)

# range compatible to all types of Array
rrx_hat = rx_hat[begin]:rx_hat[end]
rry_hat = ry_hat[begin]:ry_hat[end]
rrz_hat = rz_hat[begin]:rz_hat[end]

# reshape hat versions of positions
# what used to work
x_hat = reshape(x[rrx_hat],nxl_hat,1,1)
y_hat = reshape(y[rry_hat],1,nyl_hat,1)
z_hat = reshape(z[rrz_hat],1,1,nzl_hat)
@. pen_hat = x_hat * pen_hat

# what works now
x_hat = reshape(x[rrx_hat],1,1,nxl_hat)
y_hat = reshape(y[rry_hat],1,nyl_hat,1)
z_hat = reshape(z[rrz_hat],nzl_hat,1,1)
@. pen_hat = x_hat * pen_hat

The workaround works, but maybe you have some suggestions on how to do it better?

@jipolanco
Copy link
Owner

I guess on the last line of your example you meant something like

@. pen_hat = x_hat * y_hat * z_hat * pen_hat

which uses the three *_hat arrays?

I'm not that surprised by this change of behaviour, but I agree it's not ideal, and the original version of your example should be the good one. Sorry about that, I'm not that familiar with numpy-style broadcasting using singleton dimensions, but I see that it can be very convenient and especially useful on GPUs. I'm reopening this issue and I'll try to have this fixed.

Note that the order changes only when working with the output of transforms in PencilFFTs. This is done to make sure that 1D FFTs are always done contiguously in memory. This can be disabled by passing permute_dims = Val(false) to PencilFFTPlan (docs are here). I'm not sure how this will impact performance, or whether it will actually work on CuArrays.

Another workaround (which will stop working if this issue is fixed) is to do something like:

x_hat = reshape(x[rrx_hat], permutation(pen_hat) * (nxl_hat,1,1))
y_hat = reshape(y[rry_hat], permutation(pen_hat) * (1,nyl_hat,1))
z_hat = reshape(z[rrz_hat], permutation(pen_hat) * (1,1,nzl_hat))

which will automatically permute the array dimensions according to the order of indices in pen_hat.


As a side note, instead of using global_view to get the range of local indices, it would be more convenient to use range_local, which directly returns basic ranges of the form (i1:i2, j1:j2, k1:k2). I'll try to mention that somewhere in the docs.

@jipolanco jipolanco reopened this Feb 4, 2022
@jipolanco
Copy link
Owner

Hi Corentin, I've been playing around with an alternative way of working with grids which may "solve" this issue. The idea is to add a localgrid function that lets you easily work with the local part of the grid (associated to the local MPI process). I've been wanting to do this for a while, and now it seemed like a good opportunity :)

The details are in #45. Basically, in your example, this would allow you to do:

grid = localgrid(pen_hat, (x, y, z))
@. pen_hat = (grid.x + grid.y + grid.z) * pen_hat

Note that one doesn't need to care about possible index permutations and things like that. Everything is taken care of in the grid object. I still want to finish some things before merging #45, but of course let me know if you have any comments or ideas!

@corentin-dev
Copy link
Contributor Author

This would be really (really!) convenient!
Broadcasting is really useful to have a "generic" code without indexing.
I guess grid.x will have the same shape as x?
Will it work with names like this grid = localgrid(pen_hat, (ξx,ξy,ξz)) and grid.ξx, etc...? It should work with the second notation grid[1] in any case.

@corentin-dev
Copy link
Contributor Author

Just to be sure, will localgrid returns an array of type typeof(x) (or interpretable as such)?

@jipolanco
Copy link
Owner

I guess grid.x will have the same shape as x?

Yes. The difference is that during broadcasting it would be reshaped similarly to what you did in your example.

Will it work with names like this grid = localgrid(pen_hat, (ξx,ξy,ξz)) and grid.ξx, etc...?

No, in that case it would still be grid.x, etc... It's the same convention used by StaticArrays (e.g. SVector(2, 3, 1).x == 2). I guess changing the name would be possible using macros, but for now I prefer to keep things simple...

Just to be sure, will localgrid returns an array of type typeof(x) (or interpretable as such)?

Not sure if I understand the question, but localgrid returns a LocalRectilinearGrid (also defined in #45), which can be indexed as if it was an N-dimensional array. For example grid[2, 3, 5] returns a coordinate (x, y, z).

@corentin-dev
Copy link
Contributor Author

corentin-dev commented Feb 7, 2022

Just to be sure, will localgrid returns an array of type typeof(x) (or interpretable as such)?

Not sure if I understand the question, but localgrid returns a LocalRectilinearGrid (also defined in #45), which can be indexed as if it was an N-dimensional array. For example grid[2, 3, 5] returns a coordinate (x, y, z).

My concern is that it stays a CuArray. I guess I'll just try to see what it gives me!

@jipolanco
Copy link
Owner

Right. It should stay a CuArray, since no new arrays are created. But let me know if that's not the case!

@jipolanco jipolanco linked a pull request Feb 8, 2022 that will close this issue
5 tasks
@jipolanco
Copy link
Owner

I just merged #45, which will be included in v0.15 (being registered right now). So I'm closing this for now, but let me know if there are still issues!

@corentin-dev
Copy link
Contributor Author

Everything seems to work, for broadcasting, I'll open an another issue with CuArrays due to CuPtr and Isend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants