Skip to content

v1.10 build failing on some platforms with link error: missing -pthread #1699

@jsquyres

Description

@jsquyres

On RHEL 6.5, it looks like the addition of

OPAL_SEARCH_LIBS_CORE([clock_gettime], [rt])

in open-mpi/ompi-release#1181 caused -pthread to disappear from CFLAGS ( found this via git bisect). This PR was merged after the nightly v1.10 snapshot tarball was made last night, so it didn't show up in MTT results from last night.

Specifically, before this commit, config.log shows:

configure:63325: checking if C compiler and POSIX threads work as is
configure:63372: gcc -o conftest -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing   conftest.c -lm -lutil  >&5
/tmp/ccRfWycq.o: In function `main':
conftest.c:(.text+0xb6): undefined reference to `__pthread_register_cancel'

and then configure later adds -pthread to CFLAGS. This seems to be the correct course of action.

But after that commit, config.log shows:

configure:63545: checking if C compiler and POSIX threads work as is
configure:63592: gcc -o conftest -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing   conftest.c -lrt -lm -lutil  >&5
configure:63592: $? = 0
configure:63608: result: yes

I.e., configure concludes that it does not need any additional CFLAGS to compile pthreads things, so it doesn't add -pthread to CFLAGS.

The only difference between the two test commands is the addition of -lrt, which came from the OPAL_SEARCH_LIBS_CORE addition to configure.ac.


I did not follow the entire cause-and-effect to see what exactly is happening, but I can see the following difference in the resulting builds:

Before (i.e., works properly):

libtool: link: gcc -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread -o .libs/opal_wrapper opal_wrapper.o  ../../../opal/.libs/libopen-pal.so -lnuma -ldl -lrt -lm -lutil -pthread -Wl,-rpath -Wl,/home/jsquyres/bogus/lib

After (i.e., fails with pthread_atfork missing symbol):

libtool: link: gcc -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -o .libs/opal_wrapper opal_wrapper.o  ../../../opal/.libs/libopen-pal.so -lnuma -ldl -lrt -lm -lutil -Wl,-rpath -Wl,/home/jsquyres/bogus/lib

The only difference between those two is the missing -pthread in the second (bad) one.

This may not be happening on all platforms -- e.g., @rhc54 has cited that it is not happening for him on CentOS 7. I'm guessing that the pthread libraries are getting pulled in indirectly somehow on those platforms...?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions