Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defect: test 6 - allocate_as_barrier_proc (Failed) on Debian9/gcc-6.3/openmpi-3.1.0 #538

Closed
schloegl opened this issue May 19, 2018 · 10 comments
Assignees

Comments

@schloegl
Copy link

schloegl commented May 19, 2018

When trying to install OpenCoarrays 2.0.0 on Debian 9 using gcc 6.3, and a openmpi 3.1.0, running make test failed with:

 5/49 Test  #5: allocate_as_barrier ..................   Passed    1.69 sec
      Start  6: allocate_as_barrier_proc
 6/49 Test  #6: allocate_as_barrier_proc .............***Failed  Required regular expression not found.Regex=[Test passed.
]  2.13 sec
      Start  7: get_array
 7/49 Test  #7: get_array ............................   Passed    0.95 sec
      Start  8: get_self
...
49/49 Test #49: test-installation-scripts.sh .........   Passed    0.46 sec

98% tests passed, 1 tests failed out of 49

Total Test time (real) =  44.68 sec

The following tests FAILED:
          6 - allocate_as_barrier_proc (Failed)
Errors while running CTest
Makefile:127: recipe for target 'test' failed
make: *** [test] Error 8

gcc/gfortran 6.3 from Debian 9 /stretch was was used for compiling OpenCorrarys as well as OpenMPI 3.1.0. gcc -v / gfortran -v report provide this report.

$ gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-18+deb9u1' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) 

openmpi 3.1.0 was compiled from source with the following configuration flags:

       ./configure --prefix=/mnt/nfs/clustersw/$(lsb_release -is)/$(lsb_release -cs)/openmpi/${VER} \
                --enable-orterun-prefix-by-default  \
                --enable-mpi-cxx \
                --with-pmi \
                --with-pmix \
                --with-sge \
                --with-slurm \
                --with-pmi-libdir=/usr/lib/x86_64-linux-gnu  \

OpenCoarrays 2.0.0 was build with

        mkdir opencoarrays-build
        cd opencoarrays-build
        export FC=/mnt/nfs/clustersw/Debian/stretch/openmpi/3.1.0/bin/mpifort
        export CC=/mnt/nfs/clustersw/Debian/stretch/openmpi/3.1.0/bin/mpicc
        cmake ${BUILDDIR}/OpenCoarrays-${VER} \
          -DCMAKE_INSTALL_PREFIX=/mnt/nfs/clustersw/$(lsb_release -is)/$(lsb_release -cs)/${PKG}/${VER}
        make
        make test 

This is were the failed tests (as described above) were observed.

Defect/Bug Report

  • OpenCoarrays Version: 2.0.0

  • Fortran Compiler: Debian, 6.3.0

  • C compiler used for building lib: 6.3.0

  • Installation method: cmake (see above)

  • Output of uname -a: Linux bea81 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux

  • MPI library being used: openmpi 3.1.0

  • Machine architecture and number of physical cores:x86_64, 24

  • Version of CMake: 3.7.2, preinstalled

Observed Behavior

The following tests FAILED:
          6 - allocate_as_barrier_proc (Failed)

98% tests passed, 1 tests failed out of 49

Total Test time (real) =  44.68 sec

The following tests FAILED:
          6 - allocate_as_barrier_proc (Failed)

Expected Behavior

   100% tests passed

Steps to Reproduce

installing openmpi 3.1.0 on Debian9, setting environment variables CC and FC to mpicc and mpifort, and compile OpenCoarrays 2.0.0 accordingly . Running make test will produce the failed test.

@schloegl
Copy link
Author

The above report is based on release 2.0.0 of OpenCoarrays.

The problem seems to be fixed in the repository (tested on commit 4d47cf2 ) . I can not say which commit fixed it. It would be good if an official release with this bug fixed would become available soon.

@rouson
Copy link
Member

rouson commented May 23, 2018

@schloegl Thanks for confirming that the fix is in the repository. That commit has been merged into the master branch and will therefore appear in the next release. Although we don't have an exact release schedule (or really even an inexact one!), we have several good reasons to release soon so the wait shouldn't be too long. There is a weekly OpenCoarrays videoconference that happens on Fridays so possibly we can get the next release out at the end of this week during or after the videoconference.

@zbeekman
Copy link
Collaborator

zbeekman commented May 24, 2018

This may be a duplicate of #415. I still see non-deterministic test failures of allocate_as_barrier_proc every once in a while. I have no idea what the source is, so it's likely possible that you just hit this bug once and then it went away. AFAIK, the have not been any/many commits to the library itself since 2.0.0.

EDIT: I lied, there are library changes so it IS possible that this bug (and #415) are now resolved. (Library code changes since 2.0.0: 2.0.0...master#diff-775a104746a19927fb80aad9b039981f)

@schloegl
Copy link
Author

schloegl commented May 28, 2018

Seeing there is a new release 2.1.0, I've been trying to compile and test OpenCorrays again. Because, i've difficulties reproducing the earlier results, I run the tests also again on OpenCoarrays 2.0.0 (commit 4d47cf2). Moreover, I extended the tests gcc/gfortran 7.3.0 and 8.1.0, and openmpi-3.1.0 was recompiled with the corresponding compiler as well. The log files are attached

log-200.txt
log-210.txt

The log files how these results:

OpenCoarrays: 2.0.0 (commit 4d47cf2)

gcc/gfortran 6.3.0:

6 - allocate_as_barrier_proc (Failed)

gcc/gfortran 7.3.0:

100% tests passed, 0 tests failed out of 67

gcc/gfortran 8.1.0:

compilation failed

OpenCoarrays: 2.1.0 (commit 8737658)

gcc/gfortran 6.3.0

      6 - allocate_as_barrier_proc (Failed)
     51 - ISO_Fortran_binding_tests (Failed)

gcc/gfortran 7.3.0

     66 - ISO_Fortran_binding_tests (Failed)

gcc/gfortran 8.1.0

compilation failed

summary

  • the ISO_Fortran_binding_tests fails with OpenCoarrays 2.1.0.
  • the test allocate_as_barrier_proc fails on gcc/gfortran 6.3.0 but not on 7.3.0
  • It seems only gcc 7.3.0 with opencoarrays 2.0.0 has no failed tests.

@schloegl schloegl reopened this May 28, 2018
@zbeekman zbeekman changed the title test 6 - allocate_as_barrier_proc (Failed) on Debian9/gcc-6.3/openmpi-3.1.0 Defect: test 6 - allocate_as_barrier_proc (Failed) on Debian9/gcc-6.3/openmpi-3.1.0 Jul 1, 2018
@zbeekman zbeekman self-assigned this Jul 1, 2018
@zbeekman
Copy link
Collaborator

zbeekman commented Jul 1, 2018

I don't understand why ISO_Fortran_binding is failing for you if you're using GCC as the C compiler with 2.1.0. The other issues I believe I understand. (allocate_as_barrier_proc is fixed in GFortran >= 7.x and the issue with 8.1 appears to be due to a bug in the GFortran 8.1 release which requires this patch to fix.

allocate_as_barrier_proc will likely stay broken on the 6.x branch of GFortran, but I will turn off the test there since it's fixed on master. After PR #557 is merged can you try again (and this time run CTest with the --output-on-failure flag).

@schloegl
Copy link
Author

schloegl commented Jul 3, 2018

I compiled from commit 8ef32a8 with gcc/7.3m openmpi/3.1.0, and the test failure "allocate_as_barrier_proc" did not appear. The test "#67: ISO_Fortran_binding_tests " still fails, attached is the error log produced with
CTEST_OUTPUT_ON_FAILURE=1 ctest

ctest.txt

@zbeekman
Copy link
Collaborator

zbeekman commented Jul 4, 2018 via email

@schloegl
Copy link
Author

schloegl commented Jul 4, 2018

I've been using the official release of gcc 7.3.0 (Jan 25, 2018, https://gcc.gnu.org/gcc-7/ ) build from source.

When using gcc 6.3, I'm using the version from debian/stretch https://tracker.debian.org/pkg/gcc-6, currently this is "gcc-Version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ".

allocate_as_barrier_proc seems to fail only with gcc 6.3 (from debian/stable) but seems fine when compiling with gcc 7.3.

attached are the log files from cmake and ctest for gcc-7 as requested.

cmake.log
ctest.log

Since "allocate_as_barrier_proc will likely stay broken on the 6.x", perhaps this issue should be closed as the title refers to gcc-6.3, and perhaps a separate issued should be opened for the test "66 - ISO_Fortran_binding_tests (Failed)".

@zbeekman
Copy link
Collaborator

zbeekman commented Jul 5, 2018

allocate_as_barrier_proc seems to fail only with gcc 6.3 (from debian/stable) but seems fine when compiling with gcc 7.3.

Great! As I said, I don't think we'll get around to fixing this on GCC 6, and as soon as we get some important GCC/GFortran 8 patches included in important GCC/GFortran *nix packages, we'll probably deprecate support for GCC 6.x and earlier.

Since "allocate_as_barrier_proc will likely stay broken on the 6.x", perhaps this issue should be closed as the title refers to gcc-6.3, and perhaps a separate issued should be opened for the test "66 - ISO_Fortran_binding_tests (Failed)".

We are aware of the problem with ISO_Fortran_binding and have a fix. Unless you object I'm going to close this issue because ISO_Fortran_binding is tracked elsewhere (#554 and others)

@zbeekman
Copy link
Collaborator

zbeekman commented Jul 5, 2018

If you disagree, with closing this, or encounter a new issue feel free to reopen or post a new issue.

@zbeekman zbeekman closed this as completed Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants