-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libhwloc causes application coredump due by corrupting the parents environ when dlopened and then dlclosed #533
Comments
Hello. Your problem seems exactly identical to #514 which was fixed by using setenv() instead of putenv() (except on Windows) in hwloc 2.7.1. |
i will look forward to the 2.7.1 , i wish you had updated your current stable version 2.6.0 at that time . |
Usually people either use distribution packages that are so old that backporting is either impossible or not needed (e.g. 2.2 in RHEL8 isn't affected by this issue) or things like spack or guix that use the latest release (2.7.0 was considered stable at that time). Hence I don't backport to older release series unless somebody asks. But I could do a 2.6.1 if a distribution wants an official release instead of backporting this specific fix. |
thank you i do understand , i believe the source of my confusion was here https://www.open-mpi.org/projects/hwloc/doc/ |
Oh right, thanks, I ineed need to think about uniformizing this! |
I am closing this since it seems identical to #514 (fixed in 2.7.1). |
What version of hwloc are you using?
hwloc 2.6.0
Which operating system and hardware are you running on?
Cray / Suse sles 15.2 hpc clusters
rhel7.9 hpc clusters
Details of the problem
Using libhwloc with an shared library that is first dlopened and then closed caused the parent application to core dump . The core dump is caused by the use of putenv within topology.c that refers to a local static string that becomes invalid after the library is dlclosed . Below is a simple reproducer
#!/bin/bash
set -x
cat <test4.c
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <errno.h>
int main (int argc, char *argv[]) {
("/opt/pmix/hwloc/gcc4x/2.6.0/lib64/libhwloc.so.15", RTLD_NOW);
printf ("handle1=%p\n", handle1);
printf ("error=%s\n", dlerror ());
printf ("ZES_ENABLE_SYSMAN=%s\n", getenv ("ZES_ENABLE_SYSMAN"));
printf ("ptr to ZES_ENABLE_SYSMAN=%p\n", getenv ("ZES_ENABLE_SYSMAN"));
int dlrc = dlclose (handle1);
printf ("dlclose returns %d\n", dlrc);
char *my_var = "MULBERRY_BUSH";
printf ("searching environment table for %s\n", my_var);
getenv (my_var);
return 0;
}
EOF
gcc -c -g -O0 test4.c
gcc -g -O0 test4.o -ldl
./a.out
All done
....................
XXXXX@tt-rfe1:~/> sh test4.sh
handle1=0x14d82a0
error=(null)
ZES_ENABLE_SYSMAN=1
ptr to ZES_ENABLE_SYSMAN=0x7f9e53b3ffa5
dlclose returns 0
searching environment table for MULBERRY_BUSH
test4.sh: line 27: 23603 Segmentation fault (core dumped) ./a.out
.............
hwloc_strdup.patch.txt
hwlocissue521.patch.txt
in order to work around the issues i did two things , first i backported/used the patch from hwloc2.7.1 to avoid using the putenv command if the oneAPI LevelZero commands were disabled or not found by autotools . I also set the putenv command to use a dynamically malloc'ed string with strdup . Note this should probably really be reworked to use setenv rather than putenv.
at some point i may update to hwloc 2.7.1 but issues with using putenv as well as the consequence will remain in the new version and it will need to be corrected as well
The text was updated successfully, but these errors were encountered: