Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vader compile issue #5814

Closed
jsquyres opened this issue Oct 1, 2018 · 8 comments
Closed

vader compile issue #5814

jsquyres opened this issue Oct 1, 2018 · 8 comments

Comments

@jsquyres
Copy link
Member

jsquyres commented Oct 1, 2018

As reported by @siegmargross in https://www.mail-archive.com/users@lists.open-mpi.org/msg32704.html, he's having a compile issue with the vader BTL.

@hjelmn can you have a look?

oki openmpi-master-201809290304-73075b8-Linux.x86_64.64_cc 151 head -7 config.log | tail -1
 $ ../openmpi-master-201809290304-73075b8/configure --prefix=/usr/local/openmpi-master_64_cc --libdir=/usr/local/openmpi-master_64_cc/lib64 --with-jdk-bindir=/usr/local/jdk-10.0.1/bin --with-jdk-headers=/usr/local/jdk-10.0.1/include JAVA_HOME=/usr/local/jdk-10.0.1 LDFLAGS=-m64 -mt -Wl,-z -Wl,noexecstack -L/usr/local/lib64 CC=cc CXX=CC FC=f95 CFLAGS=-m64 -mt CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp --disable-mpi-fortran --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java --with-valgrind=/usr/local/valgrind --with-hwloc=internal --without-verbs --with-wrapper-cflags=-std=c11 -m64 -mt --with-wrapper-cxxflags=-m64 --with-wrapper-fcflags=-m64 --with-wrapper-ldflags=-mt --enable-debug


loki openmpi-master-201809290304-73075b8-Linux.x86_64.64_cc 151 tail -33 log.make.Linux.x86_64.64_cc
 CC       btl_vader_sendi.lo
 CC       btl_vader_get.lo
"../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_fifo.h", line 157: operand cannot have void type: op "="
"../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_fifo.h", line 157: assignment type mismatch:
       long "=" void
cc: acomp failed for ../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_sendi.c
Makefile:1929: recipe for target 'btl_vader_sendi.lo' failed
make[2]: *** [btl_vader_sendi.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
"../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_fifo.h", line 157: operand cannot have void type: op "="
"../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_fifo.h", line 157: assignment type mismatch:
       long "=" void
cc: acomp failed for ../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_send.c
"../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_fifo.h", line 157: operand cannot have void type: op "="
"../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_fifo.h", line 157: assignment type mismatch:
       long "=" void
Makefile:1929: recipe for target 'btl_vader_send.lo' failed
make[2]: *** [btl_vader_send.lo] Error 1
cc: acomp failed for ../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_module.c
Makefile:1929: recipe for target 'btl_vader_module.lo' failed
make[2]: *** [btl_vader_module.lo] Error 1
"../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_fifo.h", line 157: operand cannot have void type: op "="
"../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_fifo.h", line 157: assignment type mismatch:
       long "=" void
cc: acomp failed for ../../../../../openmpi-master-201809290304-73075b8/opal/mca/btl/vader/btl_vader_component.c
Makefile:1929: recipe for target 'btl_vader_component.lo' failed
make[2]: *** [btl_vader_component.lo] Error 1
make[2]: Leaving directory '/export2/src/openmpi-master/openmpi-master-201809290304-73075b8-Linux.x86_64.64_cc/opal/mca/btl/vader'
Makefile:2392: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/export2/src/openmpi-master/openmpi-master-201809290304-73075b8-Linux.x86_64.64_cc/opal'
Makefile:1910: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1
loki openmpi-master-201809290304-73075b8-Linux.x86_64.64_cc 152

Siegmar specifically stated that he tested master, but I'm guessing that this affects all of 2.1.x, 3.0.x, 3.1.x, and 4.0.x.

@hjelmn
Copy link
Member

hjelmn commented Oct 1, 2018

Well, somehow my local master was very stale. The line is really:

    fifo->fifo_head = fifo->fifo_tail = VADER_FIFO_FREE;

This points to something being off with the definition of the atomic types with this platform/compiler combination.

@hjelmn
Copy link
Member

hjelmn commented Oct 1, 2018

@siegmargross Can you post a gist with the config.log?

@hjelmn
Copy link
Member

hjelmn commented Oct 1, 2018

Something is really off here. Might still be a compiler bug. The head and tail are of type opal_atomic_intptr_t. This is defined as either:

typedef volatile intptr_t opal_atomic_intptr_t;

or

typedef _Atomic intptr_t opal_atomic_intptr_t;

depending on whether C11 atomics are in use. Given what @siegmargross has provided above it looks like C11 atomics are indeed in use. The config.log will hopefully confirm.

So the line in question has types:

   _Atomic intptr_t  = _Atomic intptr_t = intptr_t

Which from my reading of ISO C11 is valid.

Might be worth trying to compile this small C program and see if it barfs:

#include <stdlib.h>
#include <stdatomic.h>

int main (int argc, char *argv[]) {
    _Atomic intptr_t test;
    intptr_t x = 0;

    test = x;

    return 0;
}

@jsquyres
Copy link
Member Author

jsquyres commented Oct 1, 2018

@siegmargross Nathan posted a sample program (via editing his prior comment), so you didn't get the mail about it. Can you check #5814 (comment) and compile/run the sample program he proposed and see what happens?

@jsquyres
Copy link
Member Author

jsquyres commented Oct 2, 2018

@siegmargross replied in https://www.mail-archive.com/users@lists.open-mpi.org/msg32706.html; filing here to keep all the information together.


Hi Jeff, hi Nathan,

the compilers (Sun C 5.15, Sun C 5.14, Sun C 5.13) don't like the code.

loki tmp 110 cc -V
cc: Studio 12.6 Sun C 5.15 Linux_i386 2017/05/30
loki tmp 111 \cc -std=c11 atomic_test.c
"atomic_test.c", line 5: warning: no explicit type given
"atomic_test.c", line 5: syntax error before or at: test
"atomic_test.c", line 8: undefined symbol: test
"atomic_test.c", line 8: undefined symbol: x
cc: acomp failed for atomic_test.c
loki tmp 112
loki tmp 111 cc -V
cc: Studio 12.5 Sun C 5.14 Linux_i386 2016/05/31
loki tmp 112 \cc -std=c11 atomic_test.c
"atomic_test.c", line 5: warning: no explicit type given
"atomic_test.c", line 5: syntax error before or at: test
"atomic_test.c", line 8: undefined symbol: test
"atomic_test.c", line 8: undefined symbol: x
cc: acomp failed for atomic_test.c
loki tmp 113
loki tmp 108 cc -V
cc: Sun C 5.13 Linux_i386 2014/10/20
loki tmp 109 \cc -std=c11 atomic_test.c
"atomic_test.c", line 2: cannot find include file: <stdatomic.h>
"atomic_test.c", line 5: warning: _Atomic is a keyword in ISO C11
"atomic_test.c", line 5: undefined symbol: _Atomic
"atomic_test.c", line 5: syntax error before or at: intptr_t
"atomic_test.c", line 6: undefined symbol: intptr_t
"atomic_test.c", line 8: undefined symbol: test
"atomic_test.c", line 8: undefined symbol: x
cc: acomp failed for atomic_test.c
loki tmp 110

I have attached the file config.log.gist from my master build, although I
didn't know what the gist is. Let me know if you need something different
from that file. By the way, I was able to build the upcoming version 4.0.0.

loki openmpi-v4.0.x-201809290241-a7e275c-Linux.x86_64.64_cc 124 grep Error log.*
log.make-install.Linux.x86_64.64_cc: /usr/bin/install -c -m 644 mpi/man/man3/MPI_Compare_and_swap.3 mpi/man/man3/MPI_Dims_create.3 mpi/man/man3/MPI_Dist_graph_create.3 mpi/man/man3/MPI_Dist_graph_create_adjacent.3 mpi/man/man3/MPI_Dist_graph_neighbors.3 mpi/man/man3/MPI_Dist_graph_neighbors_count.3 mpi/man/man3/MPI_Errhandler_create.3 mpi/man/man3/MPI_Errhandler_free.3 mpi/man/man3/MPI_Errhandler_get.3 mpi/man/man3/MPI_Errhandler_set.3 mpi/man/man3/MPI_Error_class.3 mpi/man/man3/MPI_Error_string.3 mpi/man/man3/MPI_Exscan.3 mpi/man/man3/MPI_Iexscan.3 mpi/man/man3/MPI_Fetch_and_op.3 mpi/man/man3/MPI_File_c2f.3 mpi/man/man3/MPI_File_call_errhandler.3 mpi/man/man3/MPI_File_close.3 mpi/man/man3/MPI_File_create_errhandler.3 mpi/man/man3/MPI_File_delete.3 mpi/man/man3/MPI_File_f2c.3 mpi/man/man3/MPI_File_get_amode.3 mpi/man/man3/MPI_File_get_atomicity.3 mpi/man/man3/MPI_File_get_byte_offset.3 mpi/man/man3/MPI_File_get_errhandler.3 mpi/man/man3/MPI_File_get_group.3 mpi/man/man3/MPI_File_get_info.3 mpi/man/man3/MPI_File_get_position.3 mpi/man/man3/MPI_File_get_position_shared.3 mpi/man/man3/MPI_File_get_size.3 mpi/man/man3/MPI_File_get_type_extent.3 mpi/man/man3/MPI_File_get_view.3 mpi/man/man3/MPI_File_iread.3 mpi/man/man3/MPI_File_iread_at.3 mpi/man/man3/MPI_File_iread_all.3 mpi/man/man3/MPI_File_iread_at_all.3 mpi/man/man3/MPI_File_iread_shared.3 mpi/man/man3/MPI_File_iwrite.3 mpi/man/man3/MPI_File_iwrite_at.3 mpi/man/man3/MPI_File_iwrite_all.3 '/usr/local/openmpi-4.0.0_64_cc/share/man/man3'
log.make.Linux.x86_64.64_cc:  GENERATE mpi/man/man3/MPI_Error_class.3
log.make.Linux.x86_64.64_cc:  GENERATE mpi/man/man3/MPI_Error_string.3
loki openmpi-v4.0.x-201809290241-a7e275c-Linux.x86_64.64_cc 125

config.log


@hjelmn suggested in a followup that Siegmar should add #include <stdint.h> and try again.

@jsquyres
Copy link
Member Author

jsquyres commented Oct 2, 2018

From @siegmargross ...


it works for Sun C 5.14 and Sun C 5.15.

loki tmp 111 cc atomic_test.c
loki tmp 112 a.out
loki tmp 113 cc -V
cc: Studio 12.5 Sun C 5.14 Linux_i386 2016/05/31
loki tmp 114 exit
loki tmp 113 cc -V
cc: Studio 12.6 Sun C 5.15 Linux_i386 2017/05/30
loki tmp 114 cc atomic_test.c
loki tmp 115 a.out
loki tmp 116 more atomic_test.c
#include <stdlib.h>
#include <stdint.h>

int main (int argc, char *argv[]) {
   _Atomic intptr_t test;
   intptr_t x = 0;

   test = x;

   return 0;
}
loki tmp 117

@jsquyres
Copy link
Member Author

jsquyres commented Oct 2, 2018

In talking to @hjelmn on the 2018-10-02 webex, we wonder if his simple test is sufficient -- he's going to make a slightly more complicated test (more like the actual OMPI source code) and ask @siegmargross to try it.

@hjelmn
Copy link
Member

hjelmn commented Oct 2, 2018

Compiler bug! We can work around it I think:

cat foo.c
#include <stdlib.h>
#include <stdint.h>
#include <stdatomic.h>

int main (int argc, char *argv[]) {
  _Atomic intptr_t x, y;
  intptr_t z = 0;

  x = y = z;

  return 0;
}
$cc foo.c -o a.out
"foo.c", line 9: operand cannot have void type: op "="
"foo.c", line 9: assignment type mismatch:
	long "=" void
cc: acomp failed for foo.c
$
#include <stdlib.h>
#include <stdint.h>
#include <stdatomic.h>

int main (int argc, char *argv[]) {
  _Atomic intptr_t x, y;
  intptr_t z = 0;

  x = z;
  y = z;

  return 0;
}
$ cc foo.c -o a.out
$

hjelmn added a commit to hjelmn/ompi that referenced this issue Oct 2, 2018
This commit works around an Oracle C compiler bug in 5.15 (not sure
when it was introduced). The bug is triggered when we chain
assignments of atomic variables. Ex:

_Atomic intptr x, y;
intptr_t z = 0;

x = y = z;

Will produce a compiler error of the form:

operand cannot have void type: op "="
assignment type mismatch:
	long "=" void

To work around the issue we are removing the chain assignment and
setting the head and tail on different lines.

Fixes open-mpi#5814

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
hjelmn added a commit that referenced this issue Oct 3, 2018
This commit works around an Oracle C compiler bug in 5.15 (not sure
when it was introduced). The bug is triggered when we chain
assignments of atomic variables. Ex:

_Atomic intptr x, y;
intptr_t z = 0;

x = y = z;

Will produce a compiler error of the form:

operand cannot have void type: op "="
assignment type mismatch:
	long "=" void

To work around the issue we are removing the chain assignment and
setting the head and tail on different lines.

Fixes #5814

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
jsquyres pushed a commit to jsquyres/ompi that referenced this issue Oct 3, 2018
This commit works around an Oracle C compiler bug in 5.15 (not sure
when it was introduced). The bug is triggered when we chain
assignments of atomic variables. Ex:

_Atomic intptr x, y;
intptr_t z = 0;

x = y = z;

Will produce a compiler error of the form:

operand cannot have void type: op "="
assignment type mismatch:
	long "=" void

To work around the issue we are removing the chain assignment and
setting the head and tail on different lines.

Fixes open-mpi#5814

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit dfa8d3a)
jsquyres pushed a commit to jsquyres/ompi that referenced this issue Oct 3, 2018
This commit works around an Oracle C compiler bug in 5.15 (not sure
when it was introduced). The bug is triggered when we chain
assignments of atomic variables. Ex:

_Atomic intptr x, y;
intptr_t z = 0;

x = y = z;

Will produce a compiler error of the form:

operand cannot have void type: op "="
assignment type mismatch:
	long "=" void

To work around the issue we are removing the chain assignment and
setting the head and tail on different lines.

Fixes open-mpi#5814

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit dfa8d3a)
jsquyres pushed a commit to jsquyres/ompi that referenced this issue Oct 3, 2018
This commit works around an Oracle C compiler bug in 5.15 (not sure
when it was introduced). The bug is triggered when we chain
assignments of atomic variables. Ex:

_Atomic intptr x, y;
intptr_t z = 0;

x = y = z;

Will produce a compiler error of the form:

operand cannot have void type: op "="
assignment type mismatch:
	long "=" void

To work around the issue we are removing the chain assignment and
setting the head and tail on different lines.

Fixes open-mpi#5814

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit dfa8d3a)
jsquyres pushed a commit to jsquyres/ompi that referenced this issue Oct 3, 2018
This commit works around an Oracle C compiler bug in 5.15 (not sure
when it was introduced). The bug is triggered when we chain
assignments of atomic variables. Ex:

_Atomic intptr x, y;
intptr_t z = 0;

x = y = z;

Will produce a compiler error of the form:

operand cannot have void type: op "="
assignment type mismatch:
	long "=" void

To work around the issue we are removing the chain assignment and
setting the head and tail on different lines.

Fixes open-mpi#5814

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit dfa8d3a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants