Skip to content

Conversation

@definelicht
Copy link
Collaborator

Use disallowlists to deselect attributes on arguments and return values that are not yet supported, but allow the rest to be present when inlining.

To achieve fast lookup in isLegalToInline the sets are cached on the inliner interface object during construction. By using sets of StringAttr this should just amount to a pointer lookup.

@definelicht definelicht requested a review from gysit March 22, 2023 16:49
Copy link
Collaborator

@gysit gysit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that works but I wonder what Alex or so says. I assume they probably don't like that the interface has state. But since we can recompute it when ever the interface is recreated it should be fine.

The solution with a thread_local static variable I proposed below may be nice as well (assuming it works :)). It initializes the set exactly once per thread. Not sure when the actual interface is instantiated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible alternative may be to have a static function with a static and thread local member:

static bool isDisallowedArgumentAttr(NamedAttribute attr) {
  thread_local DenseSet<StringAttr> disallowedAttrs = {
    StringAttr::get(attr.getContext(), LLVM::LLVMDialect::getAlignAttrName()),
    ...
  };
  return disallowedAttrs.contains(attr.getName());
}

That would result in one copy of the set per thread if I understand correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why thread_local and not just static?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the initialization of a static is not thread safe and it may happen in multiple inliner threads at the same time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static should be at load time, I think? Not really an expert here, to be honest

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting back: This does not seem to work; the variables are indeed only initialized once, but contains never finds anything, probably because they have been initialized with StringAttrs from a different context, and thus the pointer comparison fails. So I think this would only work by doing string comparisons. Hence the current solution seems better. But I understand the controversy... Unfortunately it seems to me like there's no way of doing this without keeping state on the specific dialect instantiation 🤕

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static should be at load time, I think? Not really an expert here, to be honest

in C++ a static variable is initialized when the function is called for the first time AFAIK.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in C++ a static variable is initialized when the function is called for the first time AFAIK.

Yes, apparently this is the case in C++11 and onwards. It's thread safe, so that's not a problem. The problem is that StringAttrs are tried to a specific context, and the context can change throughout execution, so the lookup does not work when a new object is instantiated with a different context, because it will not re-initialize the maps.

The current solution with having the maps on the object itself is the "correct" one as far as I can tell 🙂

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting back: This does not seem to work; the variables are indeed only initialized once, but contains never finds anything, probably because they have been initialized with StringAttrs from a different context, and thus the pointer comparison fails. So I think this would only work by doing string comparisons. Hence the current solution seems better. But I understand the controversy... Unfortunately it seems to me like there's no way of doing this without keeping state on the specific dialect instantiation 🤕

Ok that's a shame. We could probably refill the array once the context changes.

gysit pushed a commit that referenced this pull request Mar 23, 2023
This change prevents rare deadlocks observed for specific macOS/iOS GUI
applications which issue many `dlopen()` calls from multiple different
threads at startup and where TSan finds and reports a race during
startup.  Providing a reliable test for this has been deemed infeasible.

Although I've only observed this deadlock on Apple platforms,
conceptually the cause is not confined to Apple code so the fix lives in
platform-independent code.

Deadlock scenario:
```
Thread 2                    | Thread 4
ReportRace()                |
Lock internal TSan mutexes  |
  &ctx->slot_mtx            |
                            | dlopen() interceptor
                            | OnLibraryLoaded()
                            | MemoryMappingLayout::DumpListOfModules()
                            | calls dyld API, which takes internal lock
                            | lock() interceptor
                            | TSan tries to take internal mutexes again
                            |   &ctx->slot_mtx
call into symbolizer        |
MemoryMappingLayout::DumpListOfModules()
calls dyld API, which hangs on trying to take lock
```
Resulting in:
* Thread 2 has internal TSan mutex, blocked on dyld lock
* Thread 4 has dyld lock, blocked on internal TSan mutex

The fix prevents this situation by not intercepting any of the calls
originating from `MemoryMappingLayout::DumpListOfModules()`.

Stack traces for deadlock between ReportRace() and dlopen() interceptor:
```
thread #2, queue = 'com.apple.root.default-qos'
  frame #0: libsystem_kernel.dylib
  frame #1: libclang_rt.tsan_osx_dynamic.dylib`::wrap_os_unfair_lock_lock_with_options(lock=<unavailable>, options=<unavailable>) at tsan_interceptors_mac.cpp:306:3
  frame #2: dyld`dyld4::RuntimeLocks::withLoadersReadLock(this=0x000000016f21b1e0, work=0x00000001814523c0) block_pointer) at DyldRuntimeState.cpp:227:28
  frame #3: dyld`dyld4::APIs::_dyld_get_image_header(this=0x0000000101012a20, imageIndex=614) at DyldAPIs.cpp:240:11
  frame #4: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::CurrentImageHeader(this=<unavailable>) at sanitizer_procmaps_mac.cpp:391:35
  frame #5: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(this=0x000000016f2a2800, segment=0x000000016f2a2738) at sanitizer_procmaps_mac.cpp:397:51
  frame #6: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::DumpListOfModules(this=0x000000016f2a2800, modules=0x00000001011000a0) at sanitizer_procmaps_mac.cpp:460:10
  frame #7: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::ListOfModules::init(this=0x00000001011000a0) at sanitizer_mac.cpp:610:18
  frame #8: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::FindModuleForAddress(unsigned long) [inlined] __sanitizer::Symbolizer::RefreshModules(this=0x0000000101100078) at sanitizer_symbolizer_libcdep.cpp:185:12
  frame #9: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::FindModuleForAddress(this=0x0000000101100078, address=6465454512) at sanitizer_symbolizer_libcdep.cpp:204:5
  frame #10: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::SymbolizePC(this=0x0000000101100078, addr=6465454512) at sanitizer_symbolizer_libcdep.cpp:88:15
  frame #11: libclang_rt.tsan_osx_dynamic.dylib`__tsan::SymbolizeCode(addr=6465454512) at tsan_symbolize.cpp:106:35
  frame #12: libclang_rt.tsan_osx_dynamic.dylib`__tsan::SymbolizeStack(trace=StackTrace @ 0x0000600002d66d00) at tsan_rtl_report.cpp:112:28
  frame #13: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedReportBase::AddMemoryAccess(this=0x000000016f2a2a90, addr=4381057136, external_tag=<unavailable>, s=<unavailable>, tid=<unavailable>, stack=<unavailable>, mset=0x00000001012fc310) at tsan_rtl_report.cpp:190:16
  frame #14: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ReportRace(thr=0x00000001012fc000, shadow_mem=0x000008020a4340e0, cur=<unavailable>, old=<unavailable>, typ0=1) at tsan_rtl_report.cpp:795:9
  frame #15: libclang_rt.tsan_osx_dynamic.dylib`__tsan::DoReportRace(thr=0x00000001012fc000, shadow_mem=0x000008020a4340e0, cur=Shadow @ x22, old=Shadow @ 0x0000600002d6b4f0, typ=1) at tsan_rtl_access.cpp:166:3
  frame #16: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(void *) at tsan_rtl_access.cpp:220:5
  frame #17: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(void *) [inlined] __tsan::MemoryAccess(thr=0x00000001012fc000, pc=<unavailable>, addr=<unavailable>, size=8, typ=1) at tsan_rtl_access.cpp:442:3
  frame llvm#18: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(addr=<unavailable>) at tsan_interface.inc:34:3
  <call into TSan from from instrumented code>

thread #4, queue = 'com.apple.dock.fullscreen'
  frame #0:  libsystem_kernel.dylib
  frame #1:  libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::FutexWait(p=<unavailable>, cmp=<unavailable>) at sanitizer_mac.cpp:540:3
  frame #2:  libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Semaphore::Wait(this=<unavailable>) at sanitizer_mutex.cpp:35:7
  frame #3:  libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Mutex::Lock(this=0x0000000102992a80) at sanitizer_mutex.h:196:18
  frame #4:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock(this=<unavailable>, mu=0x0000000102992a80) at sanitizer_mutex.h:383:10
  frame #5:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock(this=<unavailable>, mu=0x0000000102992a80) at sanitizer_mutex.h:382:77
  frame #6:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() at tsan_rtl.h:708:10
  frame #7:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __tsan::TryTraceFunc(thr=0x000000010f084000, pc=0) at tsan_rtl.h:751:7
  frame #8:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __tsan::FuncExit(thr=0x000000010f084000) at tsan_rtl.h:798:7
  frame #9:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor(this=0x000000016f3ba280) at tsan_interceptors_posix.cpp:300:5
  frame #10: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor(this=<unavailable>) at tsan_interceptors_posix.cpp:293:41
  frame #11: libclang_rt.tsan_osx_dynamic.dylib`::wrap_os_unfair_lock_lock_with_options(lock=0x000000016f21b1e8, options=OS_UNFAIR_LOCK_NONE) at tsan_interceptors_mac.cpp:310:1
  frame #12: dyld`dyld4::RuntimeLocks::withLoadersReadLock(this=0x000000016f21b1e0, work=0x00000001814525d4) block_pointer) at DyldRuntimeState.cpp:227:28
  frame #13: dyld`dyld4::APIs::_dyld_get_image_vmaddr_slide(this=0x0000000101012a20, imageIndex=412) at DyldAPIs.cpp:273:11
  frame #14: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(__sanitizer::MemoryMappedSegment*) at sanitizer_procmaps_mac.cpp:286:17
  frame #15: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(this=0x000000016f3ba560, segment=0x000000016f3ba498) at sanitizer_procmaps_mac.cpp:432:15
  frame #16: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::DumpListOfModules(this=0x000000016f3ba560, modules=0x000000016f3ba618) at sanitizer_procmaps_mac.cpp:460:10
  frame #17: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::ListOfModules::init(this=0x000000016f3ba618) at sanitizer_mac.cpp:610:18
  frame llvm#18: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::LibIgnore::OnLibraryLoaded(this=0x0000000101f3aa40, name="<some library>") at sanitizer_libignore.cpp:54:11
  frame llvm#19: libclang_rt.tsan_osx_dynamic.dylib`::wrap_dlopen(filename="<some library>", flag=<unavailable>) at sanitizer_common_interceptors.inc:6466:3
  <library code>
```

rdar://106766395

Differential Revision: https://reviews.llvm.org/D146593
@definelicht definelicht requested a review from Dinistro March 23, 2023 13:53
@definelicht definelicht force-pushed the attr-disallowlist branch 4 times, most recently from d71d39e to 9a2d860 Compare March 23, 2023 15:08
@definelicht definelicht requested a review from gysit March 23, 2023 15:08
Copy link
Collaborator

@gysit gysit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@definelicht definelicht force-pushed the attr-disallowlist branch 2 times, most recently from 645e83d to 1d8f095 Compare March 23, 2023 15:54
Use a disallowlist approach to deselect attributes on arguments and
return values that are not yet supported, but allow the rest to be
present when inlining.

To achieve fast lookup in `isLegalToInline`, the set of unsupported
function attributes is cached on the inliner interface object during
construction. By using a set of StringAttrs this should just amount to a
pointer lookup.

Depends on D146633

Differential Revision: https://reviews.llvm.org/D146729
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants