You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Output of uname -a: Linux 4.13.0-36-generic #40-Ubuntu SMP Fri Feb 16 20:07:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
MPI library being used: mpich 3.2.1
Machine architecture and number of physical cores: 12
Version of CMake: 3.10.2
Observed Behavior
I was playing around trying to reproduce the behavior of a MPI_Gatherv, and the following example was failing in strange ways (see the comments), as if the data was shifted.
! caf -o gather gather.f90
! cafrun -np 2 ./gather
! this example mimics a mpi_gatherv with root=1, and variable size chunks
! gather into specified locations from all processes in a group
program gather
implicit none
type gvec ! a global vector
real, allocatable :: a(:)
end type
type(gvec) :: gc[*]
real, allocatable :: gv(:), tmp(:)
integer:: me, nimg, gsize, i, lo, hi
logical:: fail =.false., workaround =.false.
me = this_image()
nimg = num_images()
allocate(gc % a(2* me)) ! variable size datain container
gc % a = [(me * i, i=1, 2* me)] ! assignement
! img 1: [1, 2]
! img 2: [2, 4, 6, 8]
print*, me, 'gc % a=', gc % a, ' size=', size(gc % a)
! collect the global vector size by summing local sizes
gsize =size(gc % a)
call co_sum(gsize, result_image=1)
sync all
if (me == 1) thenif (gsize /=6) error stop1
allocate(gv(gsize)) ! allocate a global vector of size 6 on img 1
lo =1do i =1, nimg
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! WORKAROUND, uncomment to pass this test
if (workaround) thenif (allocated(tmp)) deallocate(tmp)
allocate(tmp(size(gc[i] % a))) ! ISSUE: remote access is done twice
end if
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! note: automatic reallocation of tmp
! bounds are correct, but data seems to be shifted by 1 (as if tmp was 0 based indexing)
tmp = gc[i] % a
! print*, size(tmp) ! <== uncomment to pass the test !! WHAT ?
! print*, tmp, lbound(tmp), ubound(tmp), size(tmp)
! img 1: [1, 2] lb=1 ub=2
! img 2: [2, 4, 6, 8] lb=1 ub=4
! print*, tmp(lbound(tmp)), tmp(ubound(tmp)) ! <== segfault
! print*, tmp(1:size(tmp)) ! correct data, left shifted ???
! img1: [1, 2]
! img1: [2, 4, 6, 8]
hi = lo +size(tmp) -1if (workaround) then
gv(lo:hi) = tmp
else
! gv(lo:hi) = tmp(0:size(tmp)-1) ! seems OC is 'left shifting' the data, hence the 0 lb
gv(lo:hi) = tmp
end if
lo = hi +1 ! start of next chunk
end doprint*, 'gv=', gv, ' sum=', sum(gv)
if (abs(sum(gv) -23.) > epsilon(0.)) fail =.true.end if
sync all
! CMake test output handler
call co_broadcast(fail, source_image=1)
if (fail) thenwrite(*, *) 'Test failed!'
error stop5elsewrite(*, *) 'Test passed.'end ifend program
Expected Behavior
gv= 1.0 2.0 2.0 4.0 6.0 8.0 sum= 23.0
Steps to Reproduce
caf -o gather gather.f90
cafrun -np 2 ./gather
gv= 2.0 0.0 4.0 6.0 8.0 0.0 sum= 20.0
Question
So I spent a few hours on the reallocation step implied by the tmp = gc[i] % a line, malloc calls are correct, bounds/stride are also correct, but the offset is never initialized when the realloc_required flag is true (which is the case here: tmp is never allocated explicitly).
So a possible patch could be something like mpi_caf.c
// around line 4650
if (extent_mismatch)
{
dst->dim[dst_cur_dim].lower_bound = 1;
dst->dim[dst_cur_dim]._ubound = delta;
dst->dim[dst_cur_dim]._stride = size;
if (realloc_required)
dst->offset = -1; // <== initialize offset to 0xffffffffffffffff
}
With this patch the test passes as expected. Is this patch correct ?
The text was updated successfully, but these errors were encountered:
@neok-m4700 That patch looks correct to me. If I'm mistaken, perhaps @vehre or @afanfa could weigh in. I've merged the PR with the patch so if/when either Andre or Alessandro confirm this is correct, let's close the issue.
Defect/Bug Report
OpenCoarrays Version:
2.0
Fortran Compiler:
gfortran 7.3.0
C compiler used for building lib:
gcc 7.3.0
Installation method:
manual
Output of
uname -a
:Linux 4.13.0-36-generic #40-Ubuntu SMP Fri Feb 16 20:07:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
MPI library being used:
mpich 3.2.1
Machine architecture and number of physical cores:
12
Version of CMake:
3.10.2
Observed Behavior
I was playing around trying to reproduce the behavior of a MPI_Gatherv, and the following example was failing in strange ways (see the comments), as if the data was shifted.
Expected Behavior
gv= 1.0 2.0 2.0 4.0 6.0 8.0 sum= 23.0
Steps to Reproduce
gv= 2.0 0.0 4.0 6.0 8.0 0.0 sum= 20.0
Question
So I spent a few hours on the reallocation step implied by the
tmp = gc[i] % a
line,malloc
calls are correct, bounds/stride are also correct, but the offset is never initialized when therealloc_required
flag is true (which is the case here: tmp is never allocated explicitly).So a possible patch could be something like
mpi_caf.c
With this patch the test passes as expected. Is this patch correct ?
The text was updated successfully, but these errors were encountered: