Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] (!asan_init_is_running && "ASan init calls itself!") #52728

Closed
FrancescElies opened this issue Dec 15, 2021 · 23 comments
Closed

[question] (!asan_init_is_running && "ASan init calls itself!") #52728

FrancescElies opened this issue Dec 15, 2021 · 23 comments
Labels
compiler-rt:asan Address sanitizer worksforme Resolved as "works for me"

Comments

@FrancescElies
Copy link

In Azure DevOps
AddressSanitizer: CHECK failed: asan_rtl.cpp:394 "((!asan_init_is_running && "ASan init calls itself!")) != (0)" (0x0, 0x0) (tid=5336).
Sadly I cannot provide you a link to the failing build.

O googled and found google/sanitizers#682 but so far no luck.

I checked where this is coming from in the code, is this like some sort of protection between multiple threads?

The strange thing is that locally works fine, any ideas what might I be doing wrong?

@vitalybuka
Copy link
Collaborator

Maybe you have multiple threads at this point?
Ideally asan is initialized from .preinit_array when there are no other threads.

@vitalybuka
Copy link
Collaborator

what is the version?

@FrancescElies
Copy link
Author

Thanks for your quick reply.

clang version 13.0.0
Target: x86_64-pc-windows-msvc
Thread model: posix

Our setup is a bit complex. We are loading a dll compiled with asan into
python and doing some testing there.

To my understanding one needs to make sure asan magic is loaded first,
thus in linux one would use LD_PRELOAD to make sure this happens in
Windows that's not possible as far as I know.

The workaround we made after reading around

  • Compiling our dll linked with clang_rt.asan_dll_thunk-x86_64.lib
  • Make an executable linked with clang_rt.asan-x86_64.lib
    clang_rt.asan_cxx-x86_64.lib and clang_rt.asan-preinit-x86_64.lib
    This executable is now responsible of executing python, if I
    understand correctly this way I make sure asan stuff is loaded first.

I hope I am managing to explain this properly, I am still struggling to
digest this.

In the docs I could not find much info about the *.lib files mentioned
above, only scattered info in different blogs.

The strange thing is that locally works but CI is throwing that error, I
am sseeking to understand what the problem might be.

Here the links I am basing my comments on.

@FrancescElies
Copy link
Author

@vitalybuka two questions

  • I saw 11 days ago the following commit d71775c was done. Might this help me with my issue?

  • Where do I find the docs for what the following libs are and do?

    • linked into the dll
      • clang_rt.asan_dll_thunk-x86_64.lib import library which allows an ASAN instrumented DLL to use the static ASan library which is linked into the main executable.
    • linked into the main executable
      • clang_rt.asan-x86_64.lib: static runtime linked into the main executable
      • clang_rt.asan_cxx-x86_64.lib: import library which adds support for new and delete
      • clang_rt.asan-preinit-x86_64.lib: not sure what it does and if I needed, at the moment I linked it into the main executable

Are the previous descriptions I wrote correct?

@vitalybuka
Copy link
Collaborator

vitalybuka commented Dec 20, 2021

@vitalybuka two questions

  • I saw 11 days ago the following commit d71775c was done. Might this help me with my issue?

Probably not.
SANITIZER_START_BACKGROUND_THREAD_IN_ASAN_INTERNAL is defined only for ARM THUMB
I am aware only about few ARM THUMB bots which need this.

  • Where do I find the docs for what the following libs are and do?

    • linked into the dll

      • clang_rt.asan_dll_thunk-x86_64.lib import library which allows an ASAN instrumented DLL to use the static ASan library which is linked into the main executable.
    • linked into the main executable

      • clang_rt.asan-x86_64.lib: static runtime linked into the main executable
      • clang_rt.asan_cxx-x86_64.lib: import library which adds support for new and delete
      • clang_rt.asan-preinit-x86_64.lib: not sure what it does and if I needed, at the moment I linked it into the main executable

You don't need the last one, it's for asan as shared lib. clang_rt.asan-x86_64.lib already includes it

add_compiler_rt_runtime(clang_rt.asan

That's how clang links it:

const SanitizerArgs &SanArgs = TC.getSanitizerArgs(Args);

Are the previous descriptions I wrote correct?

yes

@FrancescElies
Copy link
Author

Thanks for your reply. I hoped taking out clang_rt.asan-preinit-x86_64.lib might solve my issue, it didn't.

Any suggestion where I could look at? I am pretty sure the code I am running is the same in CI and our machines (using same scripts), yet I still consider I might oversee something the way we run the code locally & in CI.

@vitalybuka
Copy link
Collaborator

Maybe full stack trace will help, but because it's init it likely can't produce stack traces.
So maybe debuger? Can you connect and gdb on CI host?
Another possibility is local using docker or chroot if necessary to match CI environment.
Maybe core-dump on remote host and debug locally?

@FrancescElies
Copy link
Author

@vitalybuka sorry for my late reply. As you suspected it can't produce a stack trace.

AddressSanitizer: CHECK failed: asan_rtl.cpp:394 "((!asan_init_is_running && "ASan init calls itself!")) != (0)" (0x0, 0x0) (tid=6768)
    <empty stack>

Connect a debugger, not really, the thing is running on azure devops ms-hosted machines. I will try to run in in a self-hosted over there I can do more.
Core-dump: it seems like under windows one could use procdump I'll give it a try.

I'll write back as soon as I have more info.

@FrancescElies
Copy link
Author

@vitalybuka I switched to azure devops self-hosted machine, here AddressSanitizer does not complain, I am not sure what might be different there. Nor sure if this helps.

@EugeneZelenko EugeneZelenko added compiler-rt:asan Address sanitizer and removed new issue labels Jan 15, 2022
@vitalybuka
Copy link
Collaborator

my guess glibc version

@vitalybuka
Copy link
Collaborator

I guess it's not actionable with the current info, so I'll close.
If you can provide reproducer e.g. in docker or other hermetic way, please comment and we maybe we will find a fix.

@FrancescElies
Copy link
Author

Fair request, I'll write back as soon as I get a reprex working.

Is glibc also being used in windows?

@vitalybuka
Copy link
Collaborator

Windows should not use glibc.

@FrancescElies
Copy link
Author

Hi @vitalybuka

I pulled together a reprex, probably a docker thing would be good for you to test locally, not sure if that's possible for windows.

Summary

  • The code

  • CI steps to reproduce the issue

    • Build dll with asan (links clang_rt.asan_dll_thunk-x86_64.lib)
    • Build python runner (links clang_rt.asan-x86_64.lib & clang_rt.asan_cxx-x86_64.lib)
    • Calls a python test file via python runner. This acrobatic is needed because if I call the python test that loads a dll with asan asan will complain about python doin a bad-free because asan was loaded after python did some mallocs.
  • A failing build

    image

    locally though I get the right output (locally and CI has the same msvc version 16.11.9).

    image

  • The different compiler calls to build the dll with asan and the python test runner can be found here

Repository

asan-init-calls-itself/src:
    __main__.py     -> builds mylib.dll and the python test runner
    mylib.c         -> a dead simple exported function
    mylib.dll       -> the dll with asan
    mylib.h         -> the header
    py_file_run.c   -> python runner c code, this is needed to make sure asan lib is loaded before python is started (no LD_PRELOAD in windows)
    py_file_run.exe -> the runner executable
    test.py         -> a simple test

Questions

  • I hope I manage the explain it properly, if not I am happy to clarify.
  • Is this example good enough? I am aware, it is not a dead simple example but I am willing to try to reduce it even more, maybe removing compiler flags (but probably not the issue here).
  • If the example is good, how should we proceed?

Thanks!

@FrancescElies
Copy link
Author

@vitalybuka
Could this problem have to do something with OS virtualization?

In azure devops and Appveyor (two different CI providers) I could reproduce the issue.

I can provide an appveyor machine where one can log into it, e.g. this one probably by the time you read this the machine won't be available anymore.

Over there I can attach myself to the process with lldb, any suggestions what I should be looking for?

@vitalybuka
Copy link
Collaborator

Jumping back and forth between issues I missed that this is Windows.
If you can consistently reproduce, and this is the new thing, maybe you can git bisect to a particular patch?

@vitalybuka vitalybuka reopened this Jan 27, 2022
@vitalybuka
Copy link
Collaborator

lldb: symbolized stack trace could be nice, but you wrote that it's empty

@FrancescElies
Copy link
Author

FrancescElies commented Jan 27, 2022

Sorry for the confusion this is not what I meant, stack trace was empty, but I did not perform the attaching with lldb (lldb does not work out of the box on windows when multiple python versions are installed, needed a workaround).

This trace doesn't tell me much but hopefully will help you, see below, it seems to encounter a problem during __sanitizer_install_malloc_and_free_hooks.

lldb output (click me)

PS C:\projects\asan-init-calls-itself> .\scripts\lldb-start.ps1
(lldb) target create "./src/py_file_run.exe"
Current executable set to 'C:\projects\asan-init-calls-itself\src\py_file_run.exe' (x86_64).
(lldb) settings set -- target.run-args  "C:\\Python39-x64" ".\\test.py"
(lldb) r
Process 5996 launched: 'C:\projects\asan-init-calls-itself\src\py_file_run.exe' (x86_64)
Process 5996 stopped
* thread #1, stop reason = Exception 0x80000003 encountered at address 0x7ff7d007f426
    frame #0: 0x00007ff7d007f427 py_file_run.exe`__sanitizer_install_malloc_and_free_hooks + 12135
py_file_run.exe`__sanitizer_install_malloc_and_free_hooks:
->  0x7ff7d007f427 <+12135>: xorl   %eax, %eax
    0x7ff7d007f429 <+12137>: retq
    0x7ff7d007f42a <+12138>: nop
    0x7ff7d007f42c <+12140>: sbbl   %esp, %edi
(lldb) c
ocess 5996 resuming
(lldb) Process 5996 stopped
* thread #1, stop reason = Exception 0x80000003 encountered at address 0x7ff7d007f426
    frame #0: 0x00007ff7d007f427 py_file_run.exe`__sanitizer_install_malloc_and_free_hooks + 12135
py_file_run.exe`__sanitizer_install_malloc_and_free_hooks:
->  0x7ff7d007f427 <+12135>: xorl   %eax, %eax
    0x7ff7d007f429 <+12137>: retq
    0x7ff7d007f42a <+12138>: nop
    0x7ff7d007f42c <+12140>: sbbl   %esp, %edi
(lldb) c
ocess 5996 resuming
(lldb) Process 5996 stopped
* thread #1, stop reason = Exception 0x80000003 encountered at address 0x7ff7d007f426
    frame #0: 0x00007ff7d007f427 py_file_run.exe`__sanitizer_install_malloc_and_free_hooks + 12135
py_file_run.exe`__sanitizer_install_malloc_and_free_hooks:
->  0x7ff7d007f427 <+12135>: xorl   %eax, %eax
    0x7ff7d007f429 <+12137>: retq
    0x7ff7d007f42a <+12138>: nop
    0x7ff7d007f42c <+12140>: sbbl   %esp, %edi
(lldb) c
Process 5996 resuming
Process 5996 exited with status = 0 (0x00000000)
(lldb) c
error: Process must be launched.

@FrancescElies
Copy link
Author

FrancescElies commented Jan 27, 2022

I am suspecting the issue might have to do with the OS version differing in CI and locally.

See below the versions.
💥--> test always fails
✅ --> can't reproduce the issue, tests are ok.

Where OS Version Result
CI - Azure DevOps 2019 10.0.17763.0 💥
CI - Azure DevOps 2022 10.0.20348 💥
CI - Appveyor 6.2.9200.0 💥
PC (a) 10.0.18363.0
PC (b) 10.0.19043.0

@FrancescElies
Copy link
Author

FrancescElies commented Jan 31, 2022

ℹI managed to get my hands on a 10.0.17763.0 machine locally (via hyper-v).

Over there I can run the tests without a problem, Azure DevOps 2019 with the same version fails though.

I will try my luck in actions/runner-images#4978

@FrancescElies
Copy link
Author

@vitalybuka
Two questions:

  1. When we link clang_rt.asan_dll_thunk-x86_64.lib, clang_rt.asan-x86_64.lib & clang_rt.asan_cxx-x86_64.lib should we be linking the libs provided by llvm (the ones provided in the folder lib/clang/13.0.0/lib/windows) or the ones the msvc provides (e.g. C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\lib\x64\clang_rt.asan-x86_64.lib)

  2. If we directly link those libs, do we need to provide -fsanitize=address flag to clang, or should we remove it in that case?

@FrancescElies
Copy link
Author

it seems like passing the flags -g -gdwarf-4 has an impact on "asan init calls itself" in this build without those flags it fails, while in this other build when I add them back it successfully runs.

@FrancescElies
Copy link
Author

The thing seems to be gone from CI machines, I can't tell you why but we don't care about this issue anymore, thus closing.

@vitalybuka thanks for your help

@EugeneZelenko EugeneZelenko added the worksforme Resolved as "works for me" label Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-rt:asan Address sanitizer worksforme Resolved as "works for me"
Projects
None yet
Development

No branches or pull requests

3 participants