Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libhwloc causes application coredump due by corrupting the parents environ when dlopened and then dlclosed #533

Closed
sheepherder82 opened this issue May 26, 2022 · 6 comments

Comments

@sheepherder82
Copy link

What version of hwloc are you using?

hwloc 2.6.0

Which operating system and hardware are you running on?

Cray / Suse sles 15.2 hpc clusters
rhel7.9 hpc clusters

Details of the problem

Using libhwloc with an shared library that is first dlopened and then closed caused the parent application to core dump . The core dump is caused by the use of putenv within topology.c that refers to a local static string that becomes invalid after the library is dlclosed . Below is a simple reproducer

#!/bin/bash

set -x
cat <test4.c
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <errno.h>

int main (int argc, char *argv[]) {

     void *handle1 = dlopen 

("/opt/pmix/hwloc/gcc4x/2.6.0/lib64/libhwloc.so.15", RTLD_NOW);
printf ("handle1=%p\n", handle1);
printf ("error=%s\n", dlerror ());
printf ("ZES_ENABLE_SYSMAN=%s\n", getenv ("ZES_ENABLE_SYSMAN"));
printf ("ptr to ZES_ENABLE_SYSMAN=%p\n", getenv ("ZES_ENABLE_SYSMAN"));
int dlrc = dlclose (handle1);
printf ("dlclose returns %d\n", dlrc);
char *my_var = "MULBERRY_BUSH";
printf ("searching environment table for %s\n", my_var);
getenv (my_var);
return 0;
}
EOF
gcc -c -g -O0 test4.c
gcc -g -O0 test4.o -ldl
./a.out

All done

....................
XXXXX@tt-rfe1:~/> sh test4.sh

  • cat
  • gcc -c -g -O0 test4.c
  • gcc -g -O0 test4.o -ldl
  • ./a.out
    handle1=0x14d82a0
    error=(null)
    ZES_ENABLE_SYSMAN=1
    ptr to ZES_ENABLE_SYSMAN=0x7f9e53b3ffa5
    dlclose returns 0
    searching environment table for MULBERRY_BUSH
    test4.sh: line 27: 23603 Segmentation fault (core dumped) ./a.out

.............
hwloc_strdup.patch.txt
hwlocissue521.patch.txt

in order to work around the issues i did two things , first i backported/used the patch from hwloc2.7.1 to avoid using the putenv command if the oneAPI LevelZero commands were disabled or not found by autotools . I also set the putenv command to use a dynamically malloc'ed string with strdup . Note this should probably really be reworked to use setenv rather than putenv.

at some point i may update to hwloc 2.7.1 but issues with using putenv as well as the consequence will remain in the new version and it will need to be corrected as well

@bgoglin
Copy link
Contributor

bgoglin commented May 26, 2022

Hello. Your problem seems exactly identical to #514 which was fixed by using setenv() instead of putenv() (except on Windows) in hwloc 2.7.1.

@sheepherder82
Copy link
Author

i will look forward to the 2.7.1 , i wish you had updated your current stable version 2.6.0 at that time .

@bgoglin
Copy link
Contributor

bgoglin commented May 26, 2022

Usually people either use distribution packages that are so old that backporting is either impossible or not needed (e.g. 2.2 in RHEL8 isn't affected by this issue) or things like spack or guix that use the latest release (2.7.0 was considered stable at that time). Hence I don't backport to older release series unless somebody asks. But I could do a 2.6.1 if a distribution wants an official release instead of backporting this specific fix.

@sheepherder82
Copy link
Author

thank you i do understand , i believe the source of my confusion was here https://www.open-mpi.org/projects/hwloc/doc/
it might be better if you remove the attributes for "new,stable,old etc..." or synchronized them with the side bar if possible .

@bgoglin
Copy link
Contributor

bgoglin commented May 27, 2022

Oh right, thanks, I ineed need to think about uniformizing this!

@bgoglin
Copy link
Contributor

bgoglin commented Jun 9, 2022

I am closing this since it seems identical to #514 (fixed in 2.7.1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants