Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__data_start/_end mismatch with -Bsymbolic-functions #321

Closed
nathanal opened this issue Jul 20, 2020 · 21 comments
Closed

__data_start/_end mismatch with -Bsymbolic-functions #321

nathanal opened this issue Jul 20, 2020 · 21 comments

Comments

@nathanal
Copy link

nathanal commented Jul 20, 2020

I'm not sure this is fixable outside of disabling -Bsymbolic-functions (i.e. adding export DEB_LDFLAGS_MAINT_STRIP=-Wl,-Bsymbolic-functions to debian/rules, as suggested for grpc/grpc#11195 (comment)), but thought I'd ask here first before opening a report in https://bugs.launchpad.net/ubuntu/+source/libgc.

In Ubuntu 20.04, libgc is compiled with -Wl,-Bsymbolic-functions. Compiling the example file (loop.c) from https://www.hboehm.info/gc/simple_example.html with a dynamically linked libgc will have mismatch __data_start/_end. GC_data_start will be in loop, but data_end will be in libgc.so. This is noticable if you modify os_dep.c so GC_init_linux_data_start prints out these values before comparing them.

  1. apt source libgc
  2. In os_dep.c:446, add printf("__data_start %p _end %p\n", (void *)GC_data_start, (void *)data_end);
  3. build debuild -i -us -uc
  4. install dpkg -i ../libgc*.deb
  5. Take example from https://www.hboehm.info/gc/simple_example.html
  6. Compile example gcc -I/usr/include -L/usr/lib/x86_64-linux-gnu loop.c -lgc -o loop
  7. Run (possibly in gdb, breaking at exit and examining info proc mappings)

I'm not familiar with libgc, I imagine this might also cause problems as porting.md mentions that the collector will trace all memory between DATASTART and DATAEND for root pointers.

In a real world case: for the Inkscape appimage, libgc is mapped in an address before libinkscape_base.so , resulting in a crash at startup in GC_init_linux_data_start (ABORT_ARG2("Wrong __data_start/_end pair"...) if libgc is enabled, downstream issue: https://gitlab.com/inkscape/inkscape/-/issues/1637

Edit: fixed steps to replicate. Previously suggested self proc mappings instead of info proc mappings
Update: also simpler steps to replicate are the install instructions in the readme, but replace configure with ./configure --prefix=/home/USER/gc --disable-threads CFLAGS="-Wl,-Bsymbolic-functions", w/ appropriate modifications to compiling/running loop

@ivmai
Copy link
Owner

ivmai commented Jul 20, 2020

GC_data_start will be in loop, but data_end will be in libgc.so

Could you please try with passing -DIGNORE_PROG_DATA_START to CFLAGS, e.g. gcc -DIGNORE_PROG_DATA_START -I/usr/include -L/usr/lib/x86_64-linux-gnu loop.c -lgc -o loop

@ivmai
Copy link
Owner

ivmai commented Jul 20, 2020

Related issue (for Android): #259

@nathanal
Copy link
Author

Unfortunately, running gcc -DIGNORE_PROG_DATA_START -I/usr/include -L/usr/lib/x86_64-linux-gnu loop.c -lgc -o loop did not change behavior.

It does fix it if compiling libgc with -DIGNORE_PROG_DATA_START

@ivmai
Copy link
Owner

ivmai commented Jul 21, 2020

Unfortunately, running gcc -DIGNORE_PROG_DATA_START -I/usr/include -L/usr/lib/x86_64-linux-gnu loop.c -lgc -o loop did not change behavior.

Sorry, I meant compiling of libgc.so with the -D option.

It does fix it if compiling libgc with -DIGNORE_PROG_DATA_START

Good. Could you please check if it works for the Inkscape appimage?
If yes, I will decide on the patch.

@nathanal
Copy link
Author

Yup, confirmed that compiling with -DIGNORE_PROG_DATA_START, Inkscape runs successfully.

Digging into it a tiny bit more, the Inkscape appimage (and normal Inkscape build) passes through

bdwgc/os_dep.c

Lines 456 to 461 in 4d1a516

if (GC_no_dls) {
/* Not needed, avoids the SIGSEGV caused by */
/* GC_find_limit which complicates debugging. */
GC_data_start = data_end; /* set data root size to 0 */
return;
}

loop skips that early return. When run inside gdb, loop has a segfault in GC_find_limit_with_bound but runs normally after continuing.

ivmai added a commit that referenced this issue Aug 1, 2020
Issue #321 (bdwgc).

If garbage collector code is compiled with -Wl,-Bsymbolic-functions
option and put into a shared library on Linux, then __data_start may
belong to the client executable file while _end belongs to the shared
library.  Thus, the fast approach of detecting data region boundaries
using [__]data_start and _end values directly is not reliable.

This commit disables this approach on Linux (and Hurd) unless
-D USE_PROG_DATA_START is passed to CFLAGS during libgc build (i.e.,
the client guarantees that either -Bsymbolic-functions is not passed
and/or libgc code is linked statically into the client binary).

* os_dep.c [(LINUX || HURD) && !USE_PROG_DATA_START]
(GC_init_linux_data_start): Skip use of [__]data_start (regardless of
IGNORE_PROG_DATA_START); add comment.
@ivmai
Copy link
Owner

ivmai commented Aug 1, 2020

I have created the patch (f9b288c) but it causes fails of ASan builds and a build with -D GC_NO_SIGSETJMP

@ivmai
Copy link
Owner

ivmai commented Aug 2, 2020

I've fixed ASan builds fail.

@nathanal
Copy link
Author

nathanal commented Aug 2, 2020

Thanks for looking into this and patching this up! How'd you fix the ASan builds?

@ivmai
Copy link
Owner

ivmai commented Aug 3, 2020

By adding an attribute to suppress ASan warnings (I don't think there's a better way because of the nature of the function):
GC_ATTR_NO_SANITIZE_ADDR
STATIC ptr_t GC_find_limit_with_bound(ptr_t p, GC_bool up, ptr_t bound)

I have this patch locally, I will push it along with the fix for GC_NO_SISETJMP issue.

ivmai added a commit that referenced this issue Aug 4, 2020
(fix of commit f9b288c)

Issue #321 (bdwgc).

* os_dep.c [NEED_FIND_LIMIT || USE_PROC_FOR_LIBRARIES && THREADS]
(GC_find_limit_with_bound): Add GC_ATTR_NO_SANITIZE_ADDR attribute.
ivmai added a commit that referenced this issue Aug 6, 2020
…d VDB

(fix of commit f9b288c)

Issue #321 (bdwgc).

* include/private/gcconfig.h [GC_NO_SIGSETJMP && MPROTECT_VDB
&& !DARWIN && !MSWIN32 && !MSWINCE] (MPROTECT_VDB): Undefine macro.
@ivmai ivmai closed this as completed Aug 6, 2020
ivmai added a commit that referenced this issue Aug 20, 2021
(a cherry-pick of commits f9b288c, b6173fd, 1465f95 from 'master')

Issue #321 (bdwgc).

If garbage collector code is compiled with -Wl,-Bsymbolic-functions
option and put into a shared library on Linux, then __data_start may
belong to the client executable file while _end belongs to the shared
library.  Thus, the fast approach of detecting data region boundaries
using [__]data_start and _end values directly is not reliable.

This commit disables this approach on Linux (and Hurd) unless
-D USE_PROG_DATA_START is passed to CFLAGS during libgc build (i.e.,
the client guarantees that either -Bsymbolic-functions is not passed
and/or libgc code is linked statically into the client binary).

* include/private/gcconfig.h [GC_NO_SIGSETJMP && MPROTECT_VDB
&& !DARWIN && !MSWIN32 && !MSWINCE] (MPROTECT_VDB): Undefine macro.
* os_dep.c [(LINUX || HURD) && !USE_PROG_DATA_START]
(GC_init_linux_data_start): Skip use of [__]data_start (regardless of
IGNORE_PROG_DATA_START); add comment.
* os_dep.c [NEED_FIND_LIMIT || USE_PROC_FOR_LIBRARIES && THREADS]
(GC_find_limit_with_bound): Add GC_ATTR_NO_SANITIZE_ADDR attribute.
@dimpase
Copy link
Contributor

dimpase commented Feb 15, 2022

Is there a quick test one can do in ./configure, using autoconf macros, for this bug?
Indeed, it's in recent Ubuntu releases except Jammy, but we'd rather prefer a generic test, not chasing versions...

@ivmai
Copy link
Owner

ivmai commented Feb 15, 2022

Is there a quick test one can do in ./configure, using autoconf macros, for this bug?

What do you mean? Do you want your project configure to detect whether the above fix is applied or not?

@dimpase
Copy link
Contributor

dimpase commented Feb 15, 2022

Do you want your project configure to detect whether the above fix is applied or not?

yes, exactly this.

@ivmai
Copy link
Owner

ivmai commented Feb 15, 2022

I think you could detect the absence of the fix by e.g.: grep "Wrong __data_start/_end pair" libgc.so

@dimpase
Copy link
Contributor

dimpase commented Feb 15, 2022

Well, ./configure doesn't know the full path to libgc.so, typically. While few years ago I wrote the following to
get the full path to libz.so, and this would work for gc, with obvious changes,

#include <stdio.h>
#define __USE_GNU
#include <dlfcn.h>
#ifdef __APPLE__
#define EXT "dylib"
#else
#define EXT "so"
#endif
int main() {
void *z;
Dl_info  info;
int res;
void *ha;
z = dlopen("libz."EXT,  RTLD_LAZY); /* load zlib */
ha = dlsym(z, "inflateEnd");       /* get address of inflateEnd in libz */
res = dladdr(ha, &info);           /* get info for the function */
printf("%s\n", info.dli_fname);
return res;
}

and then indeed, I can grep, the .so file,
I'd rather have a saner way. Can the grepping be done already in C code? After all, we know the addresses...

@ivmai
Copy link
Owner

ivmai commented Feb 15, 2022

You can also write some test code using libgc.so and call GC_init, and then check the failure reason.

@dimpase
Copy link
Contributor

dimpase commented Feb 15, 2022

Sure, I can probably do this, except that I don't quite see what to check for. Something with the values of GC_data_start and data_end ?

@ivmai
Copy link
Owner

ivmai commented Feb 15, 2022

You can also write some test code using libgc.so and call GC_init, and then check the failure reason.

I mean libgc.so should crash in certain scenario (if linked with -Bsymbolic-functions). But this might be not guaranteed.
So, it looks like the test based grepping is better.

@ivmai
Copy link
Owner

ivmai commented Feb 15, 2022

Can the grepping be done already in C code?

You mean load libgc.do to memory and search there using some public function address as a hint. Yes, possible.

@dimpase
Copy link
Contributor

dimpase commented Feb 15, 2022

I think you could detect the absence of the fix by e.g.: grep "Wrong __data_start/_end pair" libgc.so

Unfortunately this appears to reject all the 8.0.4 installations of libgc, whether they are broken as in Ubuntu, or not.

@ivmai
Copy link
Owner

ivmai commented Feb 15, 2022

In a real world case: for the Inkscape appimage, libgc is mapped in an address before libinkscape_base.so, resulting in a crash at startup in GC_init_linux_data_start (ABORT_ARG2("Wrong __data_start/_end pair"...) if libgc is enabled

Maybe you could make a small reproducible test case and then insert it to configure?

@dimpase
Copy link
Contributor

dimpase commented Feb 15, 2022

I'll see what I can do.
Any idea how much data one needs to allocate to trigger GC_data_start > data_end ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants