Skip to content

Conversation

@jsquyres
Copy link
Member

libmpi_mpifh.la makes some direct calls to libopen-pal.la. In many (most?) cases, having libmpi_mpifh link against libmpi is sufficient (because libmpi pulls in libopen-pal). But when building RPMs on SLES, some compiler/linker flags are used that seem to make this implicit linking not sufficient -- we get missing opal symbols when creating libmpi_mpifh.la. So link in open-pal directly (vs. indirectly), which solves the problem.

@rhc54 This is not worth holding up v1.8.6. I'm just putting that milestone on it because it's the next logical release.

@miked-mellanox Please review.

@miked-mellanox Also, I'd like to see the full output of "make V=1" from your rpmbuild without this PR (you'll need to insert that "V=1" in the "make all" of the rpm build somehow). I'd like to see what gcc/linker flags are being used that is triggering this kind of behavior. Can you send that output?

libmpi_mpifh.la makes some direct calls to libopen-pal.la.  In many
(most?) cases, having libmpi_mpifh link against libmpi is sufficient
(because libmpi pulls in libopen-pal).  But when building RPMs on
SLES, some compiler/linker flags are used that seem to make this
implicit linking not sufficient -- we get missing opal symbols when
creating libmpi_mpifh.la.  So link in open-pal directly
(vs. indirectly), which solves the problem.
@mellanox-github
Copy link

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/569/
Test PASSed.

@mike-dubman
Copy link
Member

@jsquyres - thanks for PR, checked it on sles11sp4 and now it fails here:

                 from pregister_datarep_f.c:24:
../../../../../ompi/op/op.h: In function 'ompi_op_is_valid':
../../../../../ompi/op/op.h:484: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result
../../../../../ompi/op/op.h:492: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result
../../../../../ompi/op/op.h:496: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result
  FCLD     libmpi_mpifh_pmpi.la
make[3]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h/profile'
make[3]: Entering directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h'
  CCLD     libmpi_mpifh.la
make[3]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h'
make[2]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h'
Making all in mpi/fortran/use-mpi-tkr
make[2]: Entering directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/use-mpi-tkr'
  PPFC     mpi.lo
  FC       mpi_comm_spawn_multiple_f90.lo
  FC       mpi_testall_f90.lo
  FC       mpi_testsome_f90.lo
  FC       mpi_waitall_f90.lo
  FC       mpi_waitsome_f90.lo
  FC       mpi_wtick_f90.lo
  FC       mpi_wtime_f90.lo
  FCLD     libmpi_usempi.la
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_bottom_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_unweighted_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_weights_empty_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_in_place_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: warning: libopen-pal.so.0, needed by ../../../../ompi/mpi/fortran/mpif-h/.libs/libmpi_mpifh.so, not found (try using -rpath or -rpath-link)
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: .libs/mpi.o(.debug_info+0x6e): unresolvable R_X86_64_64 relocation against symbol `mpi_fortran_argv_null_'
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status
make[2]: *** [libmpi_usempi.la] Error 1
make[2]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/use-mpi-tkr'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi'
make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.86472 (%build)

@mike-dubman
Copy link
Member

and make output with V=1:

/bin/sh ../../../../libtool  --tag=FC   --mode=compile gfortran -I../../../../ompi/include -I../../../../ompi/include -I. -I. -I. -I../../../../ompi/mpi/fortran/use-mpi-tkr  -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -c -o mpi_wtime_f90.lo  mpi_wtime_f90.f90
libtool: compile:  gfortran -I../../../../ompi/include -I../../../../ompi/include -I. -I. -I. -I../../../../ompi/mpi/fortran/use-mpi-tkr -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -c mpi_wtime_f90.f90  -fPIC -o .libs/mpi_wtime_f90.o
/bin/sh ../../../../libtool  --tag=FC   --mode=link gfortran -I../../../../ompi/include -I../../../../ompi/include -I. -I. -I. -I../../../../ompi/mpi/fortran/use-mpi-tkr  -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -version-info 0:0:0   -o libmpi_usempi.la -rpath /usr/lib64 mpi.lo mpi_comm_spawn_multiple_f90.lo mpi_testall_f90.lo mpi_testsome_f90.lo mpi_waitall_f90.lo mpi_waitsome_f90.lo mpi_wtick_f90.lo mpi_wtime_f90.lo  ../../../../ompi/mpi/fortran/mpif-h/libmpi_mpifh.la -lrt -lm -lutil   -lrt -lm -lutil
libtool: link: gfortran -shared  -fPIC  .libs/mpi.o .libs/mpi_comm_spawn_multiple_f90.o .libs/mpi_testall_f90.o .libs/mpi_testsome_f90.o .libs/mpi_waitall_f90.o .libs/mpi_waitsome_f90.o .libs/mpi_wtick_f90.o .libs/mpi_wtime_f90.o   -Wl,-rpath -Wl,/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h/.libs ../../../../ompi/mpi/fortran/mpif-h/.libs/libmpi_mpifh.so -lrt -lutil -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.3/../../.. -lgfortranbegin -lgfortran -lm -lc -lgcc_s  -O2 -m64   -pthread -Wl,-soname -Wl,libmpi_usempi.so.0 -o .libs/libmpi_usempi.so.0.0.0
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_bottom_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_unweighted_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_weights_empty_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: Warning: alignment 8 of symbol `mpi_fortran_in_place_' in /usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/.libs/libmpi.so.0 is smaller than 16 in .libs/mpi.o
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: warning: libopen-pal.so.0, needed by ../../../../ompi/mpi/fortran/mpif-h/.libs/libmpi_mpifh.so, not found (try using -rpath or -rpath-link)
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: .libs/mpi.o(.debug_info+0x6e): unresolvable R_X86_64_64 relocation against symbol `mpi_fortran_argv_null_'
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status
make[2]: *** [libmpi_usempi.la] Error 1
make[2]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/use-mpi-tkr'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi'
make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.60222 (%build)

@jsquyres
Copy link
Member Author

jsquyres commented Jun 1, 2015

@miked-Mellanox what version of gcc is this?

It looks like these are the relevant compiler flags:

-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables

I'm AFK for the next several hours. If you compile OMPI with these same flags in SLES and or RHEL, do you see the same failure?

BTW, "/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: final link failed: Nonrepresentable section on output" - that's a new one for me...

@mike-dubman
Copy link
Member

  • it is gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973]
  • These flags being added automatically by rpmbuild environment

@jsquyres
Copy link
Member Author

jsquyres commented Jun 2, 2015

I know that those flags are added automatically by the rpmbuild environment. :-) I'm asking if you can replicate the problem outside of the rpmbuild environment -- particularly with gcc on RHEL (because it's quite difficult for me to get access to SLES machines).

@mike-dubman
Copy link
Member

tried w/ rhel 6.5 gcc using same flags - it passes.

@lanl-ompi
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
http://jenkins.open-mpi.org/job/ompi_master_pr_distcheck/14/
Test PASSed.

@lanl-ompi
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
http://jenkins.open-mpi.org/job/ompi_master_pr_cle5.2up02/103/
Test PASSed.

@mike-dubman
Copy link
Member

boo

@jsquyres
Copy link
Member Author

Ah, I see that with this PR patch, you are advancing to the ompi/mpi/fortran/use-mpi-tkr tree (the first reported problem was in ompi/mpi/fortran/mpif-h). So that's good -- we solved one problem (but I still want to understand why it is necessary).

So now it's trying to link libmpi_usempi.so, and it's complaining that it is unable to find libopen-pal.so.1, and therefore there are some unresolved symbols. _Is this the same thing that was happening with the originally-reported problem?_

Here's the final link line from your make V=1 output, above, but broken up into multiple lines to make it easier to read:

libtool: link: gfortran -shared  -fPIC  .libs/mpi.o .libs/mpi_comm_spawn_multiple_f90.o
 .libs/mpi_testall_f90.o .libs/mpi_testsome_f90.o .libs/mpi_waitall_f90.o .libs/mpi_waitsome_f90.o
 .libs/mpi_wtick_f90.o .libs/mpi_wtime_f90.o  
 -Wl,-rpath -Wl,/usr/src/packages/BUILD/openmpi-2.0.0a1/ompi/mpi/fortran/mpif-h/.libs
 ../../../../ompi/mpi/fortran/mpif-h/.libs/libmpi_mpifh.so 
 -lrt -lutil 
 -L/usr/lib64/gcc/x86_64-suse-linux/4.3
 -L/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../lib64
 -L/lib/../lib64
 -L/usr/lib/../lib64
 -L/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/lib
 -L/usr/lib64/gcc/x86_64-suse-linux/4.3/../../.. -lgfortranbegin
 -lgfortran -lm -lc -lgcc_s  -O2 -m64  
 -pthread
 -Wl,-soname -Wl,libmpi_usempi.so.0 -o .libs/libmpi_usempi.so.0.0.0

You can see that libopen-pal.so* is not listed, meaning that it's relying on libmpi_mpifh.so to bring it in implicitly.

Can you post the full/complete output of "make V=1" for the build in a gist? I'd like to see how other libraries were linked, etc.

@mike-dubman
Copy link
Member

  • The "./configure; make V1=1" output is here (it passes) ftp://bgate.mellanox.com/upload/build1.log
  • The "./configure;make -j 16 CFLAGS="-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables" V=1 output is here:
    ftp://bgate.mellanox.com/upload/build2.log

@jsquyres
Copy link
Member Author

I do believe that this is the first time you've supplied the configure/make command lines. :-)

Your second command line seems incorrect -- you shouldn't be overriding the CFLAGS at "make" time. When you do that, you're eliminating the "-pthreads" value in CFLAGS, which makes pthread_atfork be not found. Instead, you should set the desired CFLAGS when you run configure, and then just run "make" (without setting CFLAGS):

$ ./configure CFLAGS="-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables"  ...
$ make -j 32
...
# Completes successfully

@jsquyres jsquyres closed this Jun 26, 2015
@jsquyres jsquyres deleted the pr/possible-sles-rpm-linker-fix branch June 26, 2015 04:43
@mike-dubman
Copy link
Member

@jsquyres - the command lines were used directly from shell and configure flags were not the same as coming from src.rpm.

It still fails when used as part of rpmbuild, but it uses different configure args.
will extract and try it and report

sorry for confusion.

@jsquyres
Copy link
Member Author

If the issue is the same -- i.e., that rpmbuild is overriding CFLAGS at "make" time (vs. setting them at "configure" time) -- then you're going to get the same symptom: pthreads symbols won't be found.

@mike-dubman
Copy link
Member

rpmbuild uses CFLAGS in the correct way. (it works for all distro, but sles11sp4).
I tried to mimic rpmbuild behave w/o running rpmbuild and did mistake.

@jsquyres
Copy link
Member Author

If rpmbuild uses CFLAGS the right way, then I'm unable to replicate the issue. Specifically:

$ ./configure CFLAGS="-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables"  ...
$ make -j 32
...
# Completes successfully

Can you post the complete rpmbuild output somewhere, and make sure it uses V=1 to build?

@mike-dubman
Copy link
Member

@jsquyres - somehow this pr looks "closed" - is it intentionally?

The log file can be found here: ftp://bgate.mellanox.com/upload/bb.txt

@jsquyres jsquyres restored the pr/possible-sles-rpm-linker-fix branch July 11, 2015 02:20
@jsquyres
Copy link
Member Author

I closed it because a prior comment made it sounds like rpmbuild was acting incorrectly (overriding CFLAGS at make time rather than setting them at configure time).

I updated the PR to include the same kind of fixes for the 2 use-mpi dirs; see if that works for you. That should fix the "not able to find libopal" errors. But there's a relocation issue also shown in your output that I don't know how to fix. If that still happens with the latest version of this PR, I'm open to suggestions on how to fix it.

@jsquyres
Copy link
Member Author

Github won't let me re-open this PR, so opened a new one against the same branch: #711. Continue the discussion there.

jsquyres pushed a commit to jsquyres/ompi that referenced this pull request Sep 19, 2016
…figure-fix

fortran ignore TKR: update for strange Intel 2016 compiler suite behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants