Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASan needs to keep track of all the libraries loaded during the process lifetime #89

Closed
ramosian-glider opened this issue Aug 31, 2015 · 34 comments

Comments

@ramosian-glider
Copy link
Member

Originally reported on Google Code with ID 89

In the following situation:

> malloc or free gets calls from xyz.dylib
> xyz.dylib gets unloaded 
> a bug happens and we want to report the stack trace of malloc/free which has xyz.dylib
in it. 

we need to restore the library layout at the stack collection time in order to symbolize
it correctly.

Possible solution:

> We keep an epoch counter that is incremented for each dlopen and 
> dlclose (we also write down the [un]loaded library and the slide value 
> each time we do that). For each stack we just sacrifice one frame to 
> keep the corresponding counter. When symbolizing, it's easy to replay 
> the sequence of dlopen/dlclose events and find out which libraries 
> were loaded.

Reported by ramosian.glider on 2012-07-18 09:21:08

@ramosian-glider
Copy link
Member Author

Reported by pbos@webrtc.org on 2015-04-23 09:21:22

  • Blocking: #3402

@obfuscated
Copy link

Any chance for this getting fixed some day?

@google google deleted a comment from ramosian-glider Aug 10, 2017
@google google deleted a comment from ramosian-glider Aug 10, 2017
@google google deleted a comment from ramosian-glider Aug 10, 2017
@kcc
Copy link
Contributor

kcc commented Aug 10, 2017

@obfuscated we are not working on this. What exactly do you need?

@obfuscated
Copy link

@kcc I'm getting unknown modules lines in the callstacks printed by the leak report from asan.

Our application uses plugins (dlopened .so files on linux) quite extensively, so the leak reports aren't too useful. Also there are some leaks that I want to suppress, but I'm not sure I can, because of the lines.

I'm using clang-3.9.1, centos 6, linux.

@kcc
Copy link
Contributor

kcc commented Aug 10, 2017

Understood.
Yes, this is the exact problem discussed here and no, we don't have plans to address it in near future, sorry.
The reasons are that a) this is not a very common use case among other users and b) implementation is unlikely to be simple and we alreay have enough complexity to maintain.

I think there could also be a simple workaround on your side: don't dlclose anything when testing under asan/lsan

@obfuscated
Copy link

OK, going with the no-dlclose workaround. I remembered that I've done this for valgrind, but there is another place in our code that does dlclose calls, so patching these resolved the problem.

I've stopped seeing the leaks for the global variables in dlopened shared libraries, but I guess this is expected.

@kcc
Copy link
Contributor

kcc commented Aug 14, 2017

I've stopped seeing the leaks for the global variables in dlopened shared libraries, but I guess this is expected.

Depends on what exactly you mean here

@obfuscated
Copy link

I'm still investigating, so I'm not sure if it is a problem of the tool, my setup or real bug in the application.

@jteplitz
Copy link

Is this still unlikely to be fixed? Given the prevalence of shard libraries in most production system it seems very likely for people to trip over this.

We're running into this problem working with the CUDA libraries for example. They have a leak in an internal driver library that they dlopen and dlclose themselves.

@kcc
Copy link
Contributor

kcc commented Feb 16, 2018

We are not planing any work in this space, my previous comment (Aug 10 2017) still holds.

If you have some CUDA-specific problem I suggest you open a separate issue -- we may be able to find a specialized solution.

@arr2036
Copy link

arr2036 commented Apr 4, 2018

I just ran into this too. It's a common problem for different leak detectors. Our solution (like @kcc suggests and @obfuscated implemented) was to simply not dlclose() any handles when running under a leak detector.

For valgrind the solution was:

#ifdef HAVE_VALGRIND_H
#  include <valgrind.h>
#else
#  define RUNNING_ON_VALGRIND 0
#endif

static int fr_dlfree(dl_t *module)
{
	...
	/*
	 *	Only dlclose() handle if we're *NOT* running under valgrind
	 *	as it unloads the symbols valgrind needs.
	 */
	if (!RUNNING_ON_VALGRIND) dlclose(module->handle);
	...
}

There's a couple of approaches for LSAN detection in this stack overflow post.

It would be nice if there were a similar function/macro available to the one valgrind provides, so that we could do this in a way that didn't rely on implementation details.

@morehouse
Copy link
Contributor

@kcc: If we still don't plan to fix this, can we close this bug?

@kcc
Copy link
Contributor

kcc commented Jun 5, 2018

Not going to work on this any time soon. Closing for now, will reopen if there is
both high user pressure and resources on our side.

@kcc kcc closed this as completed Jun 5, 2018
@JDSteve
Copy link

JDSteve commented Jun 18, 2018

I have this same problem in multiple projects and a good example of how to reproduce this problem was provided here https://stackoverflow.com/questions/44627258/addresssanitizer-and-loading-of-dynamic-libraries-at-runtime-unknown-module

Valgrind over the same code appears to be working fine for me and tracking down these leaks with ASAN isn't straightforward so a fix for this dynamic linking issue to give better symbols would be appreciated.

@serboupal
Copy link

Just in case it's helpful:
When this problem occurs to me I simply do a LD_PRELOAD of a fake dlclose that does nothing and then I don't have to fill my code with #ifdef.. etc.

#include <stdio.h>
int dlclose(void *handle) {
	;
}

LD_PRELOAD="/usr/lib/libasan.so ../fake-dlclose/dlclose.so" ./run

@ertheis
Copy link

ertheis commented Sep 7, 2018

Just ran into this, mentioning it here in case resources ever become available for a fix

@davidcl
Copy link

davidcl commented Feb 4, 2019

An option to get a reliable stack trace on Linux is to use dlopen(foo.so, RTLD_NODELETE) at library loading. This keeps the lib loaded at exit and let ASAN resolve symbols and report memleaks correctly.

@Timmmm
Copy link

Timmmm commented Apr 18, 2019

@bungow Thanks that worked, but you might want to return 0;!

#include <stdio.h>
int dlclose(void*) { return 0; }
clang++ --shared dlclose.c -o libdlclose.so
LD_PRELOAD="./libdlclose.so" ./my_command

Note the ./, otherwise it won't find it.

@Timmmm
Copy link

Timmmm commented Apr 18, 2019

Also, since there is a decent workaround, perhaps at the point where it prints <unknown module> it could instead print <unknown module; see https://..../dlopen_workaround.html>?

starseeker added a commit to BRL-CAD/brlcad that referenced this issue Dec 8, 2021
Oof.  Writing down the details on this one, as I doubt I'll remember...

The start was turning on both the AddressSanitizer and Qt.  When doing
so, most programs suddenly began reporting memory leaks.  However, the
report was rather cryptic, being four entries similar to this one:

==1262202==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 21 byte(s) in 1 object(s) allocated from:
    #0 0x498087 in posix_memalign (build/src/libdm/tests/dm_test+0x498087)
    #1 0x7f4e90e53675 in alloc brlcad/src/libbu/malloc.c:129:10
    #2 0x7f4e90e53400 in bu_malloc brlcad/src/libbu/malloc.c:167:12
    #3 0x7f4e90ee1c02 in bu_strdupm brlcad/src/libbu/str.c:165:17
    #4 0x7f4e87dcd3e9  (<unknown module>)
    #5 0x7f4e87dcdea5  (<unknown module>)
    #6 0x7f4e9650eb89  (/lib64/ld-linux-x86-64.so.2+0x11b89)

I wasn't immediately sure if the unknown module error was related to the
plugin loading, and spent a lot of time trying to find a string memory
issue in dm_init.cpp.  After that proved fruitless, other than to
confirm that the error disappeared if i removed the bu_dlclose-ing of
the handles saved at initialization, I began to look for ways to decode
the "unknown module" entries.  That led to the following issue:
google/sanitizers#89 which describes the
problem ASan has with dynamic libraries.  Comments indicated that it
would see real issues, but won't report which dynamic library they come
from (and there are no plans to fix this anytime soon... grr...).  In
fairness, valgrind can see the error too but also has the same reporting
problem; it appears to be ubiquitous.

Fortunately, we have an alternative due to the way our plugin system and
test apps work - we can simply add and remove .so files to the directory
and see how the error reporting changes to zero in on which file(s) are
triggering the problem.  Doing so quickly made apparent what should have
been obvious in retrospect - two of the errors each were coming from the
Qt and swrast plugins, which had been off in earlier testing.

Since there was no backtrace beyond the bu_strdupm call itself, and
there were two errors per file, the suspect was the bu_strdup calls
initializing the "char *" names for the fb structures.  The C files use
static strings (which is why the non-Qt plugins didn't show the issue),
but C++ doesn't tolerate the type mismatch.  The original hack
workaround was just to bu_strdup and create a (char *) string, but as
the leak detectors correctly note this also means there's no way to
clean up the allocated memory.

As far as I can tell there is no reason for these strings to be editable
(char *) strings - the dm container's equivalents are static.  This
commit removes any logic assuming if_name is dynamic, and also removes
the bu_strdup hack from Qt and swrast.
@diorcety
Copy link

diorcety commented Feb 3, 2022

Valgrind supports this feature since https://bugs.kde.org/show_bug.cgi?id=79362 with the parameter --keep-debuginfo=yes

kamil-holubicki added a commit to kamil-holubicki/wsrep-API that referenced this issue Apr 4, 2022
bill-torpey added a commit to nyfix/OpenMAMA that referenced this issue Oct 4, 2022
…nitizers#89)

- build for debug w/UBSAN to avoid ub being optimized away
bill-torpey added a commit to nyfix/OpenMAMA that referenced this issue Oct 6, 2022
…nitizers#89)

- build for debug w/UBSAN to avoid ub being optimized away

(cherry picked from commit c7868e8)
bill-torpey added a commit to nyfix/OpenMAMA that referenced this issue Oct 6, 2022
…nitizers#89)

- build for debug w/UBSAN to avoid ub being optimized away

(cherry picked from commit c7868e8)
fquinner pushed a commit to finos/OpenMAMA that referenced this issue Oct 24, 2022
…nitizers#89)

- build for debug w/UBSAN to avoid ub being optimized away
@fergushenderson
Copy link

I ran into this issue, and the work-arounds of not unloading the library didn't work for me,
since the leak was only detected when the library was unloaded.

FYI: here's links to the patches in Valgrind which support the equivalent feature in Valgrind:
https://sourceware.org/git/?p=valgrind.git;a=commit;h=cceed053ce876560b9a7512125dd93c7fa059778
https://sourceware.org/git/?p=valgrind.git;a=commit;h=f8ae2f95d6d717aa6d3923635b9f6f87af9b7cf1

bors bot added a commit to godot-rust/gdext that referenced this issue Feb 26, 2023
133: Fix use-after-free and 3 memory leaks; enforce AddressSanitizer in CI r=Bromeon a=Bromeon

Changes implementation of the `Gd` smart pointer to cache the instance ID in each object. To my knowledge, instance IDs are the only reliable way to check for object validity in Godot, as raw object pointers may become dangling.

In addition, this PR fixes 3 memory leaks around arrays and dictionaries. Those occurred due to `from_sys_init()` using `T::default()` in too many places. This essentially lead to allocating a default-constructed object, which is then immediately overwritten by the `init` function, which _also_ allocates a new object. I refactored the `GodotFfi::from_sys_init()` to **never** call `default()`, and add an explicit `from_sys_init_default()` for cases where this is desired. This also cuts down on some of the boilerplate.

---

Both use-after-free and memory leaks were discovered using AddressSanitizer/LeakSanitizer. Great tooling from the C++ world, which also proves useful for us, as miri can't be used in FFI contexts. From now on, UB or leaks detected by ASan/LSan will cause a hard error in CI. The tools are not 100% bullet-proof; they didn't detect the following UAF case in a short test, but they are still of great value as a more systematic counter to memory errors.

<details><summary>use-after-free, false negative</summary>

```rs
let mut boks = Box::new(44);
let ptr = std::ptr::addr_of_mut!(*boks);
println!("deref={}", unsafe { *ptr }); // output: deref=44
drop(boks);
println!("deref={}", unsafe { *ptr }); // output: deref=1719666059
```

</details>

On a side note, getting these working correctly in CI was a bit of a marathon because ASan/LSan don't have stacktraces for dynamically loaded libraries (a known wontfix problem, see google/sanitizers#89). Additionally, false positives for memory leaks were reported: a simple `println!` would cause 1024 bytes of non-reclaimable memory. Therefore, I had to compile a special version of nightly Godot that disables dynamic library unloading via `dlclose`, to keep the stacktrace around, and this seemed to fix the false-positive issue as well. Although likely unrelated, what I also found during research was rust-lang/rust#19776.

Fixes #89.

Co-authored-by: Jan Haller <bromeon@gmail.com>
@0Xellos
Copy link

0Xellos commented Sep 11, 2023

I have a specific case of this problem where all workarounds mentioned here fail: There's a Java connector for a native library libfoo.so loaded with JNA and I want to detect any leaks caused by libfoo while running tests for that connector (./gradlew build, it's a larger Java project by other people). The problem is in system library liblcms2.so.2, which seemingly unloads at random, giving some leaks which are nicely marked as caused by that library and some which are only marked as "unknown module". I know it's this library because if I don't suppress liblcms2, it's always the same leaks, just some/all/none become "unknown module" in repeated runs. I don't know how to fix it.

  • preloading a fake dlclose has no effect
  • preloading a library that dlopens liblcms2 with RTLD_NODELETE has no effect

Ideally, I'd like to suppress all leaks which don't involve libfoo, including those coming from "unknown modules" only. I haven't found a way to do that, though - leak suppressions don't offer something like negation.

The whole thing is running in a Ubuntu 18.04 docker container. Full command is LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libasan.so.4" ASAN_OPTIONS="handle_segv=0" LSAN_OPTIONS="suppressions=lsan.supp" ./gradlew clean build, where the file lsan.supp contains

leak:libjvm
leak:libjli
leak:libz
leak:liblcms2

@ericriff
Copy link

Just in case it's helpful: When this problem occurs to me I simply do a LD_PRELOAD of a fake dlclose that does nothing and then I don't have to fill my code with #ifdef.. etc.

#include <stdio.h>
int dlclose(void *handle) {
	;
}

LD_PRELOAD="/usr/lib/libasan.so ../fake-dlclose/dlclose.so" ./run

Another option I found is to override dlopen() to inject RTLD_NODELETE.

#define _GNU_SOURCE
#include <dlfcn.h>
#include <link.h>
#include <stdio.h>
#include <string.h>

// Override dlopen() function and inject RTLD_NODELETE so the library
// doesn't get deleted on close().
// This helps with asan traces with <unknown module>
void* dlopen(const char* filename, int flags){
    typedef void* (*dlopen_t)(const char*, int);
    dlopen_t original_dlopen = (dlopen_t)dlsym(RTLD_NEXT, "dlopen");

    printf("Intercepted a dlopen call, injecting RTLD_NODELETE\n");
    flags |= RTLD_NODELETE;
    return original_dlopen(filename, flags);
}

Build with

gcc-10 -fpic --shared interceptor.c -o libinterceptor.so -ldl

Then pre-load it after asan. Asan must be loaded first. You'll have to adjust the following paths.

LD_PRELOAD=/lib/x86_64-linux-gnu/libasan.so.6.0.0:/home/eriff/dlopeninterceptor/libinterceptor.so ./app

@syzop
Copy link

syzop commented Apr 19, 2024

@ericriff that may work for some people, but unfortunately making dlclose() a no-op, or not calling dlclose() at all, or using RTLD_NODELETE on dlopen() means we no longer see some of the memory leaks, as antoneliasson mentioned in #89 (comment).

The situation is unchanged. For us it still a pain because in our software we have 200+ dynamicaly loaded modules (.so) and when a (small) memory leak is reported, we have no idea which one is leaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests