-
Notifications
You must be signed in to change notification settings - Fork 931
mpi-f: link the opal-pal.la library directly #610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
libmpi_mpifh.la makes some direct calls to libopen-pal.la. In many (most?) cases, having libmpi_mpifh link against libmpi is sufficient (because libmpi pulls in libopen-pal). But when building RPMs on SLES, some compiler/linker flags are used that seem to make this implicit linking not sufficient -- we get missing opal symbols when creating libmpi_mpifh.la. So link in open-pal directly (vs. indirectly), which solves the problem.
|
|
Refer to this link for build results (access rights to CI server needed): |
|
@jsquyres - thanks for PR, checked it on sles11sp4 and now it fails here: from pregister_datarep_f.c:24:
../../../../../ompi/op/op.h: In function 'ompi_op_is_valid':
../../../../../ompi/op/op.h:484: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result
../../../../../ompi/op/op.h:492: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result
../../../../../ompi/op/op.h:496: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result
FCLD libmpi_mpifh_pmpi.la
make[3]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h/profile'
make[3]: Entering directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h'
CCLD libmpi_mpifh.la
make[3]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h'
make[2]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h'
Making all in mpi/fortran/use-mpi-tkr
make[2]: Entering directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/use-mpi-tkr'
PPFC mpi.lo
FC mpi_comm_spawn_multiple_f90.lo
FC mpi_testall_f90.lo
FC mpi_testsome_f90.lo
FC mpi_waitall_f90.lo
FC mpi_waitsome_f90.lo
FC mpi_wtick_f90.lo
FC mpi_wtime_f90.lo
FCLD libmpi_usempi.la
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_bottom_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_unweighted_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_weights_empty_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_in_place_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: warning: libopen-pal.so.0, needed by ../../../../ompi/mpi/fortran/mpif-h/.libs/libmpi_mpifh.so, not found (try using -rpath or -rpath-link)
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: .libs/mpi.o(.debug_info+0x6e): unresolvable R_X86_64_64 relocation against symbol `mpi_fortran_argv_null_'
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status
make[2]: *** [libmpi_usempi.la] Error 1
make[2]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/use-mpi-tkr'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi'
make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.86472 (%build) |
|
and make output with V=1: /bin/sh ../../../../libtool --tag=FC --mode=compile gfortran -I../../../../ompi/include -I../../../../ompi/include -I. -I. -I. -I../../../../ompi/mpi/fortran/use-mpi-tkr -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -c -o mpi_wtime_f90.lo mpi_wtime_f90.f90
libtool: compile: gfortran -I../../../../ompi/include -I../../../../ompi/include -I. -I. -I. -I../../../../ompi/mpi/fortran/use-mpi-tkr -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -c mpi_wtime_f90.f90 -fPIC -o .libs/mpi_wtime_f90.o
/bin/sh ../../../../libtool --tag=FC --mode=link gfortran -I../../../../ompi/include -I../../../../ompi/include -I. -I. -I. -I../../../../ompi/mpi/fortran/use-mpi-tkr -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -version-info 0:0:0 -o libmpi_usempi.la -rpath /usr/lib64 mpi.lo mpi_comm_spawn_multiple_f90.lo mpi_testall_f90.lo mpi_testsome_f90.lo mpi_waitall_f90.lo mpi_waitsome_f90.lo mpi_wtick_f90.lo mpi_wtime_f90.lo ../../../../ompi/mpi/fortran/mpif-h/libmpi_mpifh.la -lrt -lm -lutil -lrt -lm -lutil
libtool: link: gfortran -shared -fPIC .libs/mpi.o .libs/mpi_comm_spawn_multiple_f90.o .libs/mpi_testall_f90.o .libs/mpi_testsome_f90.o .libs/mpi_waitall_f90.o .libs/mpi_waitsome_f90.o .libs/mpi_wtick_f90.o .libs/mpi_wtime_f90.o -Wl,-rpath -Wl,/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h/.libs ../../../../ompi/mpi/fortran/mpif-h/.libs/libmpi_mpifh.so -lrt -lutil -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.3/../../.. -lgfortranbegin -lgfortran -lm -lc -lgcc_s -O2 -m64 -pthread -Wl,-soname -Wl,libmpi_usempi.so.0 -o .libs/libmpi_usempi.so.0.0.0
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_bottom_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_unweighted_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_weights_empty_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_in_place_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: warning: libopen-pal.so.0, needed by ../../../../ompi/mpi/fortran/mpif-h/.libs/libmpi_mpifh.so, not found (try using -rpath or -rpath-link)
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: .libs/mpi.o(.debug_info+0x6e): unresolvable R_X86_64_64 relocation against symbol `mpi_fortran_argv_null_'
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status
make[2]: *** [libmpi_usempi.la] Error 1
make[2]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/use-mpi-tkr'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi'
make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.60222 (%build)
|
|
@miked-Mellanox what version of gcc is this? It looks like these are the relevant compiler flags: -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables I'm AFK for the next several hours. If you compile OMPI with these same flags in SLES and or RHEL, do you see the same failure? BTW, "/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: final link failed: Nonrepresentable section on output" - that's a new one for me... |
|
|
I know that those flags are added automatically by the rpmbuild environment. :-) I'm asking if you can replicate the problem outside of the rpmbuild environment -- particularly with gcc on RHEL (because it's quite difficult for me to get access to SLES machines). |
|
tried w/ rhel 6.5 gcc using same flags - it passes. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
boo |
|
Ah, I see that with this PR patch, you are advancing to the So now it's trying to link Here's the final link line from your You can see that Can you post the full/complete output of "make V=1" for the build in a gist? I'd like to see how other libraries were linked, etc. |
|
|
I do believe that this is the first time you've supplied the configure/make command lines. :-) Your second command line seems incorrect -- you shouldn't be overriding the CFLAGS at "make" time. When you do that, you're eliminating the "-pthreads" value in CFLAGS, which makes pthread_atfork be not found. Instead, you should set the desired CFLAGS when you run configure, and then just run "make" (without setting CFLAGS): $ ./configure CFLAGS="-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables" ...
$ make -j 32
...
# Completes successfully |
|
@jsquyres - the command lines were used directly from shell and configure flags were not the same as coming from src.rpm. It still fails when used as part of rpmbuild, but it uses different configure args. sorry for confusion. |
|
If the issue is the same -- i.e., that rpmbuild is overriding CFLAGS at "make" time (vs. setting them at "configure" time) -- then you're going to get the same symptom: pthreads symbols won't be found. |
|
rpmbuild uses CFLAGS in the correct way. (it works for all distro, but sles11sp4). |
|
If rpmbuild uses CFLAGS the right way, then I'm unable to replicate the issue. Specifically: $ ./configure CFLAGS="-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables" ...
$ make -j 32
...
# Completes successfullyCan you post the complete rpmbuild output somewhere, and make sure it uses V=1 to build? |
|
@jsquyres - somehow this pr looks "closed" - is it intentionally? The log file can be found here: ftp://bgate.mellanox.com/upload/bb.txt |
|
I closed it because a prior comment made it sounds like rpmbuild was acting incorrectly (overriding CFLAGS at I updated the PR to include the same kind of fixes for the 2 use-mpi dirs; see if that works for you. That should fix the "not able to find libopal" errors. But there's a relocation issue also shown in your output that I don't know how to fix. If that still happens with the latest version of this PR, I'm open to suggestions on how to fix it. |
|
Github won't let me re-open this PR, so opened a new one against the same branch: #711. Continue the discussion there. |
…figure-fix fortran ignore TKR: update for strange Intel 2016 compiler suite behavior
libmpi_mpifh.la makes some direct calls to libopen-pal.la. In many (most?) cases, having libmpi_mpifh link against libmpi is sufficient (because libmpi pulls in libopen-pal). But when building RPMs on SLES, some compiler/linker flags are used that seem to make this implicit linking not sufficient -- we get missing opal symbols when creating libmpi_mpifh.la. So link in open-pal directly (vs. indirectly), which solves the problem.
@rhc54 This is not worth holding up v1.8.6. I'm just putting that milestone on it because it's the next logical release.
@miked-mellanox Please review.
@miked-mellanox Also, I'd like to see the full output of "make V=1" from your rpmbuild without this PR (you'll need to insert that "V=1" in the "make all" of the rpm build somehow). I'd like to see what gcc/linker flags are being used that is triggering this kind of behavior. Can you send that output?