Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reland [asan][windows] Eliminate the static asan runtime on windows #107899

Merged
merged 23 commits into from
Sep 9, 2024

Conversation

barcharcraz
Copy link
Contributor

This reapplies 8fa66c6 ([asan][windows] Eliminate the static asan runtime on windows) for a second time.

That PR bounced off the tests because it caused failures in the other sanitizer runtimes, these have been fixed by only building interception, sanitizer_common, and asan with /MD, and continuing to build the rest of the runtimes with /MT. This does mean that any usage of the static ubsan/fuzzer/etc runtimes will mean you're mixing different runtime library linkages in the same app, the interception, sanitizer_common, and asan runtimes are designed for this, however it does result in some linker warnings.

Additionally, it turns out when building in release-mode with LLVM_ENABLE_PDBs the build system forced /OPT:ICF. This totally breaks asan's "new" method of doing "weak" functions on windows, and so /OPT:NOICF was explicitly added to asan's link flags.

barcharcraz and others added 23 commits September 6, 2024 11:00
The profiling runtime is still built with /MT, as it does not work with
/MD (and not well supported on windows)

Co-authored-by: Amy Wishnousky <amyw@microsoft.com>
Instead build only ONE asan runtime on windows.

File Overview:

* asan_malloc_win_thunk.cpp
	Contains interceptors for malloc/free for applications using the static CRT.
	These are intercepted with an oldnames style library that takes precedence
	over the CRT because of linker search order. This is used instead of
	the library interception routines used for other functions so that we
	can intercept mallocs that happen before our interceptors run. Some
	other CRT functions are also included here, because they are provided by the same
	objects as allocation functions in the debug CRT.
* asan_win_common_runtime_thunk.cpp
	just the common init code for both static and dynamic CRTs
* asan_win_static_runtime_thunk.cpp
	static CRT specific initialization
* asan_win_dynamic_runtime_thunk.cpp
	dynamic crt initialization, most of the content that was here
	has been moved to the common runtime thunk
* asan_win_dll_thunk.cpp
	This used to provide functionality to redirect
	calls from DLLs to asan instrumented functions in the main library,
	but this never worked that well and was a nightmare. It's gone now
* sanitizer_common/sanitizer_common_interface.inc:
	The added functions are for the thunks to be able to delegate to the
	asan runtime DLL in order to override functions that live in the application
	executable at initialization. The ASAN dll can't do this itself because it
	doesn't have a way to get the addresses of these functions.
* sanitizer_common/sanitizer_win_immortalize:
	This is just an implementation of call_once that doens't require the CRT
	or C++ stdlib. We need this because we need to do this sort of thing
	before the CRT has initialized. This infrastructure is kinda ugly, we're sorry.
* sanitizer_common/sanitizer_win_interception.cpp:
	Provides the interface inside the sanitizer runtime DLL that instramented apps
	call to intercept stuff.
* sanitizer_common/sanitizer_win_thunk_interception.cpp:
	this is the code to setup and run the static initializers and/or TLS
	initializers, implemented basically how any initializers are on windows,
	these ones are registered before all the CRT initializers.
* sanitizer_common/sanitizer_win_thunk_interception.h
	INTERCEPT_LIBRARY_FUNCTION and REGISTER_WEAK_FUNCTION
	are the called initializers for each relevant function inside the instrumented
	binary. Note the optimizer is disabled for weak function registration routines
	because it detects that the two functions being compared have different names
	and deduces they must be the same, and no actual codegen for the if is required,
	causing an infinite loop. Clang does this in debug mode as well as release mode,
	and the cast to uintptr_t is required to suppress it in debug mode.

Co-Authored-By: Amy Wishnousky <amyw@microsoft.com>
Now that the static runtime is gone the required librares are different.

Note that /MD or -D_DLL causes the dynamic runtime to get linked,
this is a little gross but emulates the behavior of MSVC.

"Real" msvc will link the debug build of asan if you specify /DEBUG or _DEBUG,
but that's not really necessary and requires building multiple
configurations from the same tree.
weak functions are registered after the asan runtime initializes.

This means __asan_default_options isn't available during asan
runtime initialization. Thus we split asan flag processing into
two parts and register a callback for the second part that's
executed after __asan_default_options is registered.

Co-Authored-By: Amy Wishnousky <amyw@microsoft.com>
ALLOCATION_FUNCTION_ATTRIBUTE wasn't used elsewhere, and was just one
attribute, so abstracting it through a macro wasn't doing much good now that it's
not conditional on runtime type.

We're always in the dynamic runtime now so eliminate the preprocessor conditional.

The new exported functions are the interface used by the intercepted malloc/free family
in the instrumented binary to call the asan versions inside the dll runtime.

Co-authored-by: Amy Wishnousky <amyw@microsoft.com>
…runtime thunks

and the fact we're "always" using the dynamic asan runtime.

python formatting
These test changes are seperated from the test changes in
0360f32 because they require
various functional changes to asan.

This reverts commit f54e0b4.
… config

After the windows static runtime is removed these static=static CRT and dynamic=dynamic CRT, both using the dynamic asan runtime.
This is required because now that there's only one asan runtime dll
the dynamic/static CRT wholearchived runtime thunk "flavor"
 is determined by passing the /MD or /MDd options, or defining -D_DLL.
…which is requried for weak exports to function correctly
…ion, and asan. Keep the other sanitizers as /MT since they don't support OneDLL
@barcharcraz barcharcraz merged commit 53a81d4 into llvm:main Sep 9, 2024
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 10, 2024

LLVM Buildbot has detected a new failure on builder bolt-aarch64-ubuntu-clang running on bolt-worker-aarch64 while building clang,compiler-rt at step 5 "build-clang-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/128/builds/609

Here is the relevant piece of the build log for the reference
Step 5 (build-clang-bolt) failure: build (failure)
...
BOLT-INFO: padding code to 0x16400000 to accommodate hot text
BOLT-INFO: output linked against instrumentation runtime library, lib entry point is 0x190d88d4
BOLT-INFO: clear procedure is 0x190d74d0
BOLT-INFO: patched build-id (flipped last bit)
BOLT-INFO: setting __bolt_runtime_start to 0x190d8840
BOLT-INFO: setting __bolt_runtime_fini to 0x190d88d4
BOLT-INFO: setting __hot_start to 0x8c00000
BOLT-INFO: setting __hot_end to 0x162f67d4
558.554 [2/1/3040] Generating BOLT profile for Clang
-- Testing: 1 tests, 1 workers --
FAIL: Clang Perf Training :: cxx/hello_world.cpp (1 of 1)
******************** TEST 'Clang Perf Training :: cxx/hello_world.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/worker/buildbot-aarch64/bolt-aarch64-ubuntu-clang/build/bin/clang-bolt.inst --driver-mode=g++  -c /home/worker/buildbot-aarch64/llvm-project/clang/utils/perf-training/cxx/hello_world.cpp
# executed command: /home/worker/buildbot-aarch64/bolt-aarch64-ubuntu-clang/build/bin/clang-bolt.inst --driver-mode=g++ -c /home/worker/buildbot-aarch64/llvm-project/clang/utils/perf-training/cxx/hello_world.cpp
# .---command stderr------------
# | free(): invalid pointer
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
# | Stack dump:
# | 0.	Program arguments: /home/worker/buildbot-aarch64/bolt-aarch64-ubuntu-clang/build/bin/clang-bolt.inst --driver-mode=g++ -c /home/worker/buildbot-aarch64/llvm-project/clang/utils/perf-training/cxx/hello_world.cpp
# | 1.	<eof> parser at end of file
# | Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
# | 0  clang-bolt.inst 0x0000aaaacd7cc344 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 68
# | 1  clang-bolt.inst 0x0000aaaacd7c679c llvm::sys::RunSignalHandlers() + 156
# | 2  clang-bolt.inst 0x0000aaaacd61310c
# | 3  linux-vdso.so.1 0x0000ffff8341f7dc __kernel_rt_sigreturn + 0
# | 4  libc.so.6       0x0000ffff82f7f200
# | 5  libc.so.6       0x0000ffff82f3a67c raise + 28
# | 6  libc.so.6       0x0000ffff82f27130 abort + 228
# | 7  libc.so.6       0x0000ffff82f73308
# | 8  libc.so.6       0x0000ffff82f8957c
# | 9  libc.so.6       0x0000ffff82f8b2c4
# | 10 libc.so.6       0x0000ffff82f8dc84 free + 176
# | 11 clang-bolt.inst 0x0000aaaaccfc6264 llvm::MCContext::~MCContext() + 4552
# | 12 clang-bolt.inst 0x0000aaaacb9758e8
# | 13 clang-bolt.inst 0x0000aaaacc9928b0 llvm::PMTopLevelManager::~PMTopLevelManager() + 432
# | 14 clang-bolt.inst 0x0000aaaacc9932a0
# | 15 clang-bolt.inst 0x0000aaaacc98528c llvm::legacy::PassManager::~PassManager() + 132
# | 16 clang-bolt.inst 0x0000aaaaceb9266c clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) + 8236
# | 17 clang-bolt.inst 0x0000aaaaced85e44 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 2884
# | 18 clang-bolt.inst 0x0000aaaad37f570c clang::ParseAST(clang::Sema&, bool, bool) + 3212
# | 19 clang-bolt.inst 0x0000aaaacf6d9524 clang::FrontendAction::Execute() + 540
# | 20 clang-bolt.inst 0x0000aaaacf5b56f8 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 3128
# | 21 clang-bolt.inst 0x0000aaaacf8d05c8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 1400
# | 22 clang-bolt.inst 0x0000aaaaca61335c cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 16900

@mstorsjo
Copy link
Member

This PR seems to have broken ASAN in mingw configurations. The symptoms seem to be that the ASAN DLL just locks up, hard, when the process is loaded. (Or more precisely, has entered some infinite loop.)

I tried attaching to such a hung process with windbg, and it's showing this:

Break-in sent, waiting 30 seconds...
WARNING: Break-in timed out, suspending.
         This is usually caused by another thread holding the loader lock.

And a backtrace that looks like this:

libclang_rt_asan_dynamic_x86_64!_asan_default_options__dll
libclang_rt_asan_dynamic_x86_64!_asan_default_options__dll+0xa17
libclang_rt_asan_dynamic_x86_64!_sanitizer_register_weak_function+0xd2
stacksmash_asan+0x1dad
stacksmash_asan+0x16b7
stacksmash_asan+0x20e3
ntdll!LdrpCallInitRoutine+0x6f
ntdll!...

(Side note; the PR seems to have added a bunch of new files with CRLF newlines - it'd be nice to clean this up.)

To repro the issue for yourself, you can do the following:

  1. Download and unzip https://github.com/mstorsjo/llvm-mingw/releases/download/20240903/llvm-mingw-20240903-ucrt-x86_64.zip, and add the llvm-mingw-<version>-ucrt-x86_64\bin directory to your %PATH% within a terminal
  2. Configure a build of compiler-rt with the following parameters: cmake ..\compiler-rt -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER_TARGET=x86_64-w64-windows-gnu -DCOMPILER_RT_DEFAULT_TARGET_ONLY=TRUE -DCOMPILER_RT_USE_BUILTINS_LIBRARY=TRUE -DSANITIZER_CXX_ABI=libc++ -DCMAKE_INSTALL_PREFIX=c:\code\llvm-mingw-20240903-ucrt-x86_64\lib\clang\19
  3. Build and install (on top of the newly unpacked toolchain), ninja install
  4. Copy the newly built asan DLL into the current directory, copy /y lib\windows\libclang_rt.asan_dynamic-x86_64.dll .
  5. Compile a trivial hello world, like https://github.com/mstorsjo/llvm-mingw/blob/master/test/hello.c (really any snippet will do), with asan clang hello.c -o hello.exe -fsanitize=address
  6. Try to run hello.exe, which hangs.

It's possible to reproduce the same by building and running the whole compiler-rt testsuite as well, but that's a bit trickier to set up for this configuration.

Surprising side note; when I tried the repro procedure above, building compiler-rt with a slightly older Clang release, and 18.x version, the built ASAN seems to not hang. I'm not sure if this is a functional difference, or if it just so happens to link things slightly differently so whatever issue there is seems to just not happen.

This commit is kinda complex, as it not only removes the static asan configuration (which never was involved in mingw use cases before), but I guess also sets things up so the dynamic asan can be used even when linking the CRT statically?

This issue has caused my nightly builds to start failing: https://github.com/mstorsjo/llvm-mingw/actions/runs/10803005560

@aeubanks
Copy link
Contributor

I think this is also somehow affecting asan tests on macos: https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-RA/2034/

was able to reproduce locally build/cmake/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/asan-arm64-Darwin/allocator_returns_null.cpp failing at this commit and succeeding at the previous

@aeubanks
Copy link
Contributor

previous passing output

    2: ================================================================= 
    3: ==7382==ERROR: AddressSanitizer: requested allocation size 0x10000000001 (0x10000001008 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0) 
    4:  #0 0x0001032a9268 in _Znwm+0x6c (libclang_rt.asan_osx_dynamic.dylib:arm64+0x61268) 
    5:  #1 0x000102cd775c in main allocator_returns_null.cpp:82 
    6:  #2 0x00018cc07150 (<unknown module>) 
    7:  #3 0x3a08fffffffffffc (<unknown module>) 
    8:  
    9: ==7382==HINT: if you don't care about these errors you may set allocator_may_return_null=1 
   10: SUMMARY: AddressSanitizer: allocation-size-too-big allocator_returns_null.cpp:82 in main 
   11: ==7382==ABORTING 

failing output with this PR

             2: ================================================================= 
             3: ==6635==ERROR: AddressSanitizer: requested allocation size 0x10000000001 (0x10000001008 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0) 
             4:  #0 0x0001029b60ec in malloc+0x70 (libclang_rt.asan_osx_dynamic.dylib:arm64+0x520ec) 
             5:  #1 0x00018cf4abd0 in operator new(unsigned long)+0x1c (libc++abi.dylib:arm64+0x16bd0) 
             6:  #2 0xde678001023f375c (<unknown module>) 
             7:  #3 0x00018cc07150 (<unknown module>) 
             8:  #4 0x9c0d7ffffffffffc (<unknown module>) 
             9:  
            10: ==6635==HINT: if you don't care about these errors you may set allocator_may_return_null=1 
            11: SUMMARY: AddressSanitizer: allocation-size-too-big (libc++abi.dylib:arm64+0x16bd0) in operator new(unsigned long)+0x1c 
            12: ==6635==ABORTING 

@@ -136,7 +160,7 @@ append_list_if(MINGW "${MINGW_LIBRARIES}" ASAN_DYNAMIC_LIBS)
add_compiler_rt_object_libraries(RTAsan_dynamic
OS ${SANITIZER_COMMON_SUPPORTED_OS}
ARCHS ${ASAN_SUPPORTED_ARCH}
SOURCES ${ASAN_SOURCES} ${ASAN_CXX_SOURCES}
SOURCES ${ASAN_SOURCES}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line seems to be the problem, reverting it fixes the tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be safe to revert, and I'll make a PR reverting it (and probably just commit it if it doesn't get any reviews before I go to sleep).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #108329 which should revert this particular edit (which was arguably a driveby change anyway)

@barcharcraz
Copy link
Contributor Author

This PR seems to have broken ASAN in mingw configurations. The symptoms seem to be that the ASAN DLL just locks up, hard, when the process is loaded. (Or more precisely, has entered some infinite loop.)

I tried attaching to such a hung process with windbg, and it's showing this:

Break-in sent, waiting 30 seconds...
WARNING: Break-in timed out, suspending.
         This is usually caused by another thread holding the loader lock.

And a backtrace that looks like this:

libclang_rt_asan_dynamic_x86_64!_asan_default_options__dll
libclang_rt_asan_dynamic_x86_64!_asan_default_options__dll+0xa17
libclang_rt_asan_dynamic_x86_64!_sanitizer_register_weak_function+0xd2
stacksmash_asan+0x1dad
stacksmash_asan+0x16b7
stacksmash_asan+0x20e3
ntdll!LdrpCallInitRoutine+0x6f
ntdll!...

(Side note; the PR seems to have added a bunch of new files with CRLF newlines - it'd be nice to clean this up.)

To repro the issue for yourself, you can do the following:

  1. Download and unzip https://github.com/mstorsjo/llvm-mingw/releases/download/20240903/llvm-mingw-20240903-ucrt-x86_64.zip, and add the llvm-mingw-<version>-ucrt-x86_64\bin directory to your %PATH% within a terminal
  2. Configure a build of compiler-rt with the following parameters: cmake ..\compiler-rt -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER_TARGET=x86_64-w64-windows-gnu -DCOMPILER_RT_DEFAULT_TARGET_ONLY=TRUE -DCOMPILER_RT_USE_BUILTINS_LIBRARY=TRUE -DSANITIZER_CXX_ABI=libc++ -DCMAKE_INSTALL_PREFIX=c:\code\llvm-mingw-20240903-ucrt-x86_64\lib\clang\19
  3. Build and install (on top of the newly unpacked toolchain), ninja install
  4. Copy the newly built asan DLL into the current directory, copy /y lib\windows\libclang_rt.asan_dynamic-x86_64.dll .
  5. Compile a trivial hello world, like https://github.com/mstorsjo/llvm-mingw/blob/master/test/hello.c (really any snippet will do), with asan clang hello.c -o hello.exe -fsanitize=address
  6. Try to run hello.exe, which hangs.

It's possible to reproduce the same by building and running the whole compiler-rt testsuite as well, but that's a bit trickier to set up for this configuration.

Surprising side note; when I tried the repro procedure above, building compiler-rt with a slightly older Clang release, and 18.x version, the built ASAN seems to not hang. I'm not sure if this is a functional difference, or if it just so happens to link things slightly differently so whatever issue there is seems to just not happen.

This commit is kinda complex, as it not only removes the static asan configuration (which never was involved in mingw use cases before), but I guess also sets things up so the dynamic asan can be used even when linking the CRT statically?

This issue has caused my nightly builds to start failing: https://github.com/mstorsjo/llvm-mingw/actions/runs/10803005560

After investigating it is indeed mangled code due to the UB in register_weak_. I had to step through from the very beginning of process init, as it doesn't seem like the debuginfo generated by -gcodeview points to the right place, but anyway for __asan_default_options_dll the codegen for the weak registration function looks as follows

00007ff6`90501440 4883ec28                 sub     rsp, 28h
00007ff6`90501444 488d0d752c0000           lea     rcx, [hello!.refptr._newmode+0x38 (7ff6905040c0)]
00007ff6`9050144b 488d159e1f0000           lea     rdx, [hello!__asan_default_options__dll (7ff6905033f0)]
00007ff6`90501452 e8e9060000               call    hello!_ZN11__sanitizer13register_weakEPKcy (7ff690501b40)
00007ff6`90501457 89442424                 mov     dword ptr [rsp+24h], eax
00007ff6`9050145b 8b442424                 mov     eax, dword ptr [rsp+24h]
00007ff6`9050145f 4883c428                 add     rsp, 28h
00007ff6`90501463 c3                       ret     

Note that there is no comparison, the call to __sanitizer::register_weak is made unconditionally, even though in this case the local function should be the same as the default implementation from the dll. Worse, it seems like it's not even loading the correct pointer for the local function, so it ends up "intercepting" some random memory!

I've fixed this in #108327

barcharcraz added a commit that referenced this pull request Sep 12, 2024
the new/delete code was removed from RTAsan_dynamic in
#107899, but that broke things
on macos. This reverts the offending change.
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
…lvm#107899)

This reapplies 8fa66c6 ([asan][windows]
Eliminate the static asan runtime on windows) for a second time.

That PR bounced off the tests because it caused failures in the other
sanitizer runtimes, these have been fixed by only building interception,
sanitizer_common, and asan with /MD, and continuing to build the rest of
the runtimes with /MT. This does mean that any usage of the static
ubsan/fuzzer/etc runtimes will mean you're mixing different runtime
library linkages in the same app, the interception, sanitizer_common,
and asan runtimes are designed for this, however it does result in some
linker warnings.

Additionally, it turns out when building in release-mode with
LLVM_ENABLE_PDBs the build system forced /OPT:ICF. This totally breaks
asan's "new" method of doing "weak" functions on windows, and so
/OPT:NOICF was explicitly added to asan's link flags.

---------

Co-authored-by: Amy Wishnousky <amyw@microsoft.com>
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
the new/delete code was removed from RTAsan_dynamic in
llvm#107899, but that broke things
on macos. This reverts the offending change.
aeubanks added a commit that referenced this pull request Sep 12, 2024
Windows doesn't have a static runtime after #107899.
aarongable pushed a commit to chromium/chromium that referenced this pull request Sep 16, 2024
With llvm/llvm-project#107899, we need a different set of libraries. Handle both the old and new list for now.

Bug: 365980757, 343734021
Change-Id: I94e9e347698bed8040479bff91d25b40c801eeae
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5858358
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Arthur Eubanks <aeubanks@google.com>
Reviewed-by: Hans Wennborg <hans@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1355864}
@zmodem
Copy link
Collaborator

zmodem commented Sep 18, 2024

After this change we're running into #34485 ("ASan strtol interceptor breaks errno on Windows"):

C:\src\chromium\src>type a.cc
#include <stdio.h>
#include <stdlib.h>

int main() {
  errno = 0;
  long result = strtol("2147483648", nullptr, 10);
  printf("errno: %d\n", errno);
  printf("result: %ld\n", result);
  return 0;
}

C:\src\chromium\src>"third_party\llvm-build\Release+Asserts\bin\clang-cl.exe" a.cc /Feout\Release\a.exe && out\Release\a.exe
errno: 34
result: 2147483647

C:\src\chromium\src>"third_party\llvm-build\Release+Asserts\bin\clang-cl.exe" -fsanitize=address a.cc /Feout\Release\a.exe && out\Release\a.exe
errno: 0
result: 2147483647

That's an old bug, but it wasn't a problem in Chromium until this runtime change.

It sounds like Visual Studio users also hit this before: https://developercommunity.visualstudio.com/t/Cannot-detect-strtol-range-errors-with-A/10412869 Do you know what the fix for that was?

barcharcraz added a commit that referenced this pull request Sep 19, 2024
(nfc) 

#107899 Added some files with
CRLF line endings. Mixed line endings are somewhat gross, so I've
converted them all to unix.
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
(nfc) 

llvm#107899 Added some files with
CRLF line endings. Mixed line endings are somewhat gross, so I've
converted them all to unix.
zmodem added a commit to zmodem/llvm-project that referenced this pull request Sep 20, 2024
This fixes two problems with asan's interception of `strtol` on Windows:

1. In the dynamic runtime, the `strtol` interceptor calls out to ntdll's
   `strtol` to perform the string conversion. Unfortunately, that
   function doesn't set `errno`. This has been a long-standing problem
   (llvm#34485), but it was not an issue when using the static runtime.
   After the static runtime was removed recently (llvm#107899), the problem
   became more urgent.

2. A module linked against the static CRT will have a different instance
   of `errno` than the ASan runtime, since that's now always linked
   against the dynamic CRT. That means even if the ASan runtime sets
   `errno` correctly, the calling module will not see it.

This patch fixes the first problem by making the `strtol` interceptor
call out to `strtoll` instead, and do 32-bit range checks on the result.

I can't think of any reasonable way to fix the second problem, so we
should stop intercepting `strtol` in the static runtime thunk. I checked
the list of functions in the thunk, and `strtol` and `strtoll` are the
only ones that set `errno`. (`strtoll` was already missing, probably by
mistake.)
zmodem added a commit that referenced this pull request Sep 22, 2024
This fixes two problems with asan's interception of `strtol` on Windows:

1. In the dynamic runtime, the `strtol` interceptor calls out to ntdll's
`strtol` to perform the string conversion. Unfortunately, that function
doesn't set `errno`. This has been a long-standing problem (#34485), but
it was not an issue when using the static runtime. After the static
runtime was removed recently (#107899), the problem became more urgent.

2. A module linked against the static CRT will have a different instance
of `errno` than the ASan runtime, since that's now always linked against
the dynamic CRT. That means even if the ASan runtime sets `errno`
correctly, the calling module will not see it.

This patch fixes the first problem by making the `strtol` interceptor
call out to `strtoll` instead, and do 32-bit range checks on the result.

I can't think of any reasonable way to fix the second problem, so we
should stop intercepting `strtol` in the static runtime thunk. I checked
the list of functions in the thunk, and `strtol` and `strtoll` are the
only ones that set `errno`. (`strtoll` was already missing, probably by
mistake.)
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Sep 26, 2024
This fixes two problems with asan's interception of `strtol` on Windows:

1. In the dynamic runtime, the `strtol` interceptor calls out to ntdll's
`strtol` to perform the string conversion. Unfortunately, that function
doesn't set `errno`. This has been a long-standing problem (llvm#34485), but
it was not an issue when using the static runtime. After the static
runtime was removed recently (llvm#107899), the problem became more urgent.

2. A module linked against the static CRT will have a different instance
of `errno` than the ASan runtime, since that's now always linked against
the dynamic CRT. That means even if the ASan runtime sets `errno`
correctly, the calling module will not see it.

This patch fixes the first problem by making the `strtol` interceptor
call out to `strtoll` instead, and do 32-bit range checks on the result.

I can't think of any reasonable way to fix the second problem, so we
should stop intercepting `strtol` in the static runtime thunk. I checked
the list of functions in the thunk, and `strtol` and `strtoll` are the
only ones that set `errno`. (`strtoll` was already missing, probably by
mistake.)
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
This fixes two problems with asan's interception of `strtol` on Windows:

1. In the dynamic runtime, the `strtol` interceptor calls out to ntdll's
`strtol` to perform the string conversion. Unfortunately, that function
doesn't set `errno`. This has been a long-standing problem (llvm#34485), but
it was not an issue when using the static runtime. After the static
runtime was removed recently (llvm#107899), the problem became more urgent.

2. A module linked against the static CRT will have a different instance
of `errno` than the ASan runtime, since that's now always linked against
the dynamic CRT. That means even if the ASan runtime sets `errno`
correctly, the calling module will not see it.

This patch fixes the first problem by making the `strtol` interceptor
call out to `strtoll` instead, and do 32-bit range checks on the result.

I can't think of any reasonable way to fix the second problem, so we
should stop intercepting `strtol` in the static runtime thunk. I checked
the list of functions in the thunk, and `strtol` and `strtoll` are the
only ones that set `errno`. (`strtoll` was already missing, probably by
mistake.)
MichelleCDjunaidi added a commit to MichelleCDjunaidi/llvm-project that referenced this pull request Oct 25, 2024
commit 56905dab7da50bccfcceaeb496b206ff476127e1
Author: JinjinLi868 <lijinjin.868@bytedance.com>
Date:   Tue Sep 10 10:47:33 2024 +0800

    [clang] fix half && bfloat16 convert node expr codegen (#89051)

    Data type conversion between fp16 and bf16 will generate fptrunc and
    fpextend nodes, but they are actually bitcast nodes.

commit ffcff4af59712792712b33648f8ea148b299c364
Author: Yingwei Zheng <dtcxzyw2333@gmail.com>
Date:   Tue Sep 10 10:38:21 2024 +0800

    [ValueTracking] Infer is-power-of-2 from assumptions. (#107745)

    This patch tries to infer is-power-of-2 from assumptions. I don't see
    that this kind of assumption exists in my dataset.
    Related issue: https://github.com/rust-lang/rust/issues/129795

    Close https://github.com/llvm/llvm-project/issues/58996.

commit eb0e4b1415800e34b86319ce1d57ad074d5ca202
Author: Petr Hosek <phosek@google.com>
Date:   Mon Sep 9 19:21:59 2024 -0700

    [Fuzzer] Passthrough zlib CMake paths into the test (#107926)

    We shouldn't assume that we're using system zlib installation.

commit 761bf333e378b52614cf36cd5db2837d5e4e0ae4
Author: Yuxuan Chen <ych@fb.com>
Date:   Mon Sep 9 18:57:39 2024 -0700

    [LLVM][Coroutines] Switch CoroAnnotationElidePass to a FunctionPass (#107897)

    After landing https://github.com/llvm/llvm-project/pull/99285 we found
    that the call graph update was causing the following crash when
    expensive checks are turned on
    ```
    llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC || RC->isAncestorOf(Targe
    tRC)) && "New call edge is not trivial!"' failed.
    ```
    I have to admit I believe that the call graph update process I did for
    that patch could be wrong.

    After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced
    that `CoroAnnotationElidePass` can be a FunctionPass and rely on the
    adaptor to update the call graph for us, so long as we properly
    invalidate the caller's analyses.

    After this patch,
    `llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer
    fails under expensive checks.

commit 7a8e9dfe5cc6f049f918e528ef476d9e7aada8a5
Author: Jordan Rupprecht <rupprecht@google.com>
Date:   Mon Sep 9 20:34:43 2024 -0500

    [bazel][libc][NFC] Add missing layering deps (#107947)

    After 277371943fa48f2550df02870951f5e5a77efef5

    e.g.

    ```
    external/llvm-project/libc/test/src/math/smoke/NextTowardTest.h:12:10: error: module llvm-project//libc/test/src/math/smoke:nexttowardf_test does not depend on a module exporting 'src/__support/CPP/bit.h'
    ```

commit 1ca411ca451e0e86caf9207779616f32ed9fd908
Author: wanglei <wanglei@loongson.cn>
Date:   Tue Sep 10 09:28:15 2024 +0800

    [LoongArch] Codegen for concat_vectors with LASX

    Fixes: #107355

    Reviewed By: SixWeining

    Pull Request: https://github.com/llvm/llvm-project/pull/107523

commit e64a1c00c1d612dccd976c06fdac85afa3b06fbe
Author: Mircea Trofin <mtrofin@google.com>
Date:   Mon Sep 9 18:25:50 2024 -0700

    Fix unintended extra commit in PR #107499

commit f7479b5ff43261a20258743da5fa583a0c729564
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 18:24:07 2024 -0700

    [NFC][TableGen] Simplify DirectiveEmitter using range for loops (#107909)

    Make constructors that take const Record * implicit, allowing us to
    simplify some range based loops to use that class instance as the loop
    variable.

    Change remaining constructor calls to use () instead of {} to construct
    objects.

commit a111f9119a5ec77c19a514ec09454218f739454f
Author: Yingwei Zheng <dtcxzyw2333@gmail.com>
Date:   Tue Sep 10 09:19:39 2024 +0800

     [LoongArch][ISel] Check the number of sign bits in `PatGprGpr_32` (#107432)

    After https://github.com/llvm/llvm-project/pull/92205, LoongArch ISel
    selects `div.w` for `trunc i64 (sdiv i64 3202030857, (sext i32 X to
    i64)) to i32`. It is incorrect since `3202030857` is not a signed 32-bit
    constant. It will produce wrong result when `X == 2`:
    https://alive2.llvm.org/ce/z/pzfGZZ

    This patch adds additional `sexti32` checks to operands of
    `PatGprGpr_32`.
    Alive2 proof: https://alive2.llvm.org/ce/z/AkH5Mp

    Fix #107414.

commit f3b4e47b34e59625e2c8420ce8bf789373177d6d
Author: Longsheng Mou <longshengmou@gmail.com>
Date:   Tue Sep 10 09:19:22 2024 +0800

    [mlir][linalg][NFC] Drop redundant rankReductionStrategy (#107875)

    This patch drop redundant rankReductionStrategy in
    `populateFoldUnitExtentDimsViaSlicesPatterns` and fixes comment typos.

commit 3b2261809471a018de50e745c0d475b048c66fd4
Author: Mircea Trofin <mtrofin@google.com>
Date:   Mon Sep 9 18:16:24 2024 -0700

    [ctx_prof] Insert the ctx prof flattener after the module inliner (#107499)

    This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore.

    Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)

commit b0d2411b53a0b55baf6d6dc7986d285ce59807fa
Author: Alex MacLean <amaclean@nvidia.com>
Date:   Mon Sep 9 17:37:09 2024 -0700

    [NVPTX] Support copysign PTX instruction (#107800)

    Lower `fcopysign` SDNodes into `copysign` PTX instructions where
    possible. See [PTX ISA: 9.7.3.2. Floating Point Instructions: copysign]
    (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-copysign).

commit 81ef8e2fdbdfac4e186e12a874242b294d05d4e0
Author: Vitaly Buka <vitalybuka@google.com>
Date:   Mon Sep 9 17:00:06 2024 -0700

    [NFC][sanitizer] Extract GetDTLSRange (#107934)

commit ae02211eaef305f957b419e5c39499aa472b956e
Author: vporpo <vporpodas@google.com>
Date:   Mon Sep 9 16:52:54 2024 -0700

    [SandboxIR] Implement UndefValue (#107628)

    This patch implements sandboxir::UndefValue mirroring llvm::UndefValue.

commit 33c1325a73c4bf6bacdb865c2550038afe4377d2
Author: Anton Korobeynikov <anton@korobeynikov.info>
Date:   Mon Sep 9 16:34:41 2024 -0700

    [PAC] Make __is_function_overridden pauth-aware on ELF platforms (#107498)

    Apparently, there are two almost identical implementations: one for
    MachO and another one for ELF. The ELF bits somehow slipped while
    https://github.com/llvm/llvm-project/pull/84573 was reviewed.

    The particular implementation is identical to MachO case.

commit 88bd507dc2dd9c235b54d718cf84e4ef80d94bc9
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Mon Sep 9 11:07:38 2024 -0700

    [X86] Handle shifts + and in `LowerSELECTWithCmpZero`

    shifts are the same as sub where rhs == 0 is identity.
    and is the inverted case where:
        `SELECT (AND(X,1) == 0), (AND Y, Z), Y`
            -> `(AND Y, (OR NEG(AND(X, 1)), Z))`
    With -1 as the identity.

    Closes #107910

commit d148a1a40461ed27863f4b17ac2bd5914499f413
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Mon Sep 9 11:07:36 2024 -0700

    [X86] Add tests support shifts + and in `LowerSELECTWithCmpZero`; NFC

commit 26b786ae2f15bfbf6f0925856a788ae0bfb2f8c1
Author: Artem Belevich <tra@google.com>
Date:   Mon Sep 9 16:15:00 2024 -0700

    [NVPTX] Restrict combining to properly aligned v16i8 vectors. (#107919)

    Fixes generation of invalid loads leading to misaligned access errors.
    The bug got exposed by SLP vectorizer change ec360d6 which allowed SLP
    to produce `v16i8` vectors.

    Also updated the tests to use automatic check generator.

commit f12e10b513686a12f20f0c897dcc9ffc00cbce09
Author: vporpo <vporpodas@google.com>
Date:   Mon Sep 9 15:41:30 2024 -0700

    [SandboxVec] Implement Pass class (#107617)

    This patch implements the Pass base class and the FunctionPass sub-class
    that operate on Sandbox IR.

commit bdf02249e7f8f95177ff58c881caf219699acb98
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 14:33:21 2024 -0700

    [TableGen] Change CGIOperandList::OperandInfo::Rec to const pointer (#107858)

    Change CGIOperandList::OperandInfo::Rec and CGIOperandList::TheDef to
    const pointer.

    This is a part of effort to have better const correctness in TableGen
    backends:

    https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

commit a9a5a18a0e99b0251c0fe6ce61c5e699bf6b379b
Author: Tim Gymnich <tgymnich@icloud.com>
Date:   Mon Sep 9 23:27:27 2024 +0200

    [SPIRV] Add sign intrinsic part 1 (#101987)

    partially fixes #70078
    - Added `int_spv_sign` intrinsic in `IntrinsicsSPIRV.td`
    - Added lowering and map to `int_spv_sign in
    `SPIRVInstructionSelector.cpp`
    - Added SPIR-V backend test case in
    `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/sign.ll`
    - https://github.com/llvm/llvm-project/pull/101988
    - https://github.com/llvm/llvm-project/pull/101989

commit 66e9078f827383f77c1c239f6c09f2b07a963649
Author: Steven Wu <stevenwu@apple.com>
Date:   Mon Sep 9 14:12:12 2024 -0700

    [LTO] Fix a use-after-free in legacy LTO C APIs (#107896)

    Fix a bug that `lto_runtime_lib_symbols_list` is returning the address
    of a local variable that will be freed when getting out of scope. This
    is a regression from #98512 that rewrites the runtime libcall function
    lists into a SmallVector.

    rdar://135559037

commit d9a996020394a8181d17e4f0a0fc89d59371f9af
Author: ChiaHungDuan <chiahungduan@google.com>
Date:   Mon Sep 9 13:59:03 2024 -0700

    [scudo] Add fragmentation info for each memory group (#107475)

    This information helps with tuning the heuristic of selecting memory
    groups to release the unused pages.

commit 6f8d2781f604cfcf9ea6facecc0bea8e4d682e1e
Author: Sterling-Augustine <56981066+Sterling-Augustine@users.noreply.github.com>
Date:   Mon Sep 9 20:49:49 2024 +0000

    [SandboxIR] Add missing VectorType functions (#107650)

    Fills in many missing functions from VectorType

commit 53a81d4d26f0409de8a0655d7af90f2bea222a12
Author: Charlie Barto <chbarto@microsoft.com>
Date:   Mon Sep 9 13:41:08 2024 -0700

    Reland [asan][windows] Eliminate the static asan runtime on windows (#107899)

    This reapplies 8fa66c6ca7272268747835a0e86805307b62399c ([asan][windows]
    Eliminate the static asan runtime on windows) for a second time.

    That PR bounced off the tests because it caused failures in the other
    sanitizer runtimes, these have been fixed by only building interception,
    sanitizer_common, and asan with /MD, and continuing to build the rest of
    the runtimes with /MT. This does mean that any usage of the static
    ubsan/fuzzer/etc runtimes will mean you're mixing different runtime
    library linkages in the same app, the interception, sanitizer_common,
    and asan runtimes are designed for this, however it does result in some
    linker warnings.

    Additionally, it turns out when building in release-mode with
    LLVM_ENABLE_PDBs the build system forced /OPT:ICF. This totally breaks
    asan's "new" method of doing "weak" functions on windows, and so
    /OPT:NOICF was explicitly added to asan's link flags.

    ---------

    Co-authored-by: Amy Wishnousky <amyw@microsoft.com>

commit 34034381b7d54da864f8794f578d9c501d6d4f3b
Author: Florian Hahn <flo@fhahn.com>
Date:   Mon Sep 9 21:35:59 2024 +0100

    [VPlan] Consistently use VTC for vector trip count in vplan-printing.ll.

    The inconsistency surfaced in
    https://github.com/llvm/llvm-project/pull/95305. Split off the reduce
    the diff.

commit 3f22756f391e20040fa3581206b77c409433bd9f
Author: Justin Bogner <mail@justinbogner.com>
Date:   Mon Sep 9 13:21:22 2024 -0700

    [DirectX] Lower `@llvm.dx.typedBufferLoad` to DXIL ops

    The `@llvm.dx.typedBufferLoad` intrinsic is lowered to `@dx.op.bufferLoad`.
    There's some complexity here in translating to scalarized IR, which I've
    abstracted out into a function that should be useful for samples, gathers, and
    CBuffer loads.

    I've also updated the DXILResources.rst docs to match what I'm doing here and
    the proposal in llvm/wg-hlsl#59. I've removed the content about stores and raw
    buffers for now with the expectation that it will be added along with the work.

    Note that this change includes a bit of a hack in how it deals with
    `getOverloadKind` for the `dx.ResRet` types - we need to adjust how we deal
    with operation overloads to generate a table directly rather than proxy through
    the OverloadKind enum, but that's left for a later change here.

    Part of #91367

    Pull Request: https://github.com/llvm/llvm-project/pull/104252

commit 985600dcd3fcef4095097bea5b556e84c8143a7f
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 13:09:53 2024 -0700

    [TableGen] Migrate CodeGenHWModes to use const RecordKeeper (#107851)

    Migrate CodeGenHWModes to use const RecordKeeper and const Record
    pointers.

    This is a part of effort to have better const correctness in TableGen
    backends:

    https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

commit b3d2d5039b9b8aa10a86c593387f200b15c02aef
Author: Alexey Bataev <a.bataev@outlook.com>
Date:   Mon Sep 9 12:32:45 2024 -0700

    [SLP][NFC]Reorder code for better structural complexity, NFC

commit e62bf7cd0beb530bc0842bb7aa8ff162607a82b9
Author: Sean Perry <perry@ca.ibm.com>
Date:   Mon Sep 9 15:24:16 2024 -0400

    [z/OS] Set the default arch for z/OS to be arch10 (#89854)

    The default arch level on z/OS is arch10. Update the code so z/OS has
    arch10 without changing the default for zLinux.

commit 98815f7878c3240e27f516e331255532087f5fcb
Author: c8ef <c8ef@outlook.com>
Date:   Tue Sep 10 03:13:29 2024 +0800

    [clang][docs] Add clang-tutor to External Clang Examples (#107665)

commit 3681d8552fb9e6cb15e9d45849ff2e34a25c518e
Author: Nikita Popov <nikita.ppv@gmail.com>
Date:   Mon Sep 9 21:10:12 2024 +0200

    Revert "[Clang][Sema] Use the correct lookup context when building overloaded 'operator->' in the current instantiation (#104458)"

    This reverts commit 3cdb30ebbc18fa894d3bd67aebcff76ce7c741ac.

    Breaks clang bootstrap.

commit ab82f83dae065a9aa4716618524eddf4aad5fcf0
Author: Mingming Liu <mingmingl@google.com>
Date:   Mon Sep 9 11:53:07 2024 -0700

    [LTO][NFC] Fix forward declaration (#107902)

    Fix after https://github.com/llvm/llvm-project/pull/107792

commit 6776d65ceaea84fe815845da3c41b2f1621521fb
Author: NoumanAmir-10xe <66777536+NoumanAmir657@users.noreply.github.com>
Date:   Mon Sep 9 23:49:22 2024 +0500

    [libc++] Implement LWG3953 (#107535)

    Closes #105303

commit eec1ee8ef10820c61c03b00b68d242d8c87d478a
Author: Abhina Sree <Abhina.Sreeskantharajan@ibm.com>
Date:   Mon Sep 9 14:37:53 2024 -0400

    [SystemZ][z/OS] Enable lit testing for z/OS (#107631)

    This patch fixes various errors to enable llvm-lit to run on z/OS

commit 78c1009c3e54e59b6177deb4d74dd3a3083a3f01
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 11:35:13 2024 -0700

    [NFC][TableGen] DirectiveEmitter code cleanup (#107775)

    Eliminate unnecessary llvm:: prefix as this code is in llvm namespace.
    Use ArrayRef<> instead of std::vector references when appropriate.
    Use .empty() instead of .size() == 0.

commit 99ea357f7b5e7e01e42b8d68dd211dc304b3115b
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Mon Sep 9 11:34:53 2024 -0700

    [MLGO] Fix logging verbosity in scripts (#107818)

    This patch fixes issues related to logging verbosity in the MLGO python
    scripts. This was an oversight when converting from absl.logging to the
    python logging API as absl natively supports a --verbosity flag to set
    the desired logging level. This patch adds a flag to support similar
    functionality in Python's logging library and additionally updates
    docstrings where relevant to point to the new values.

commit a7c26aaf2eca61cd5d885194872471c63d68f3bc
Author: Zequan Wu <zequanwu@google.com>
Date:   Mon Sep 9 11:34:13 2024 -0700

    Revert "[Coverage] Ignore unused functions if the count is 0." (#107901)

    Reverts llvm/llvm-project#107661

    Breaks llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp

commit 02fff933d0eff71db8ff44f4acf1641bb1ad4d38
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Mon Sep 9 18:28:23 2024 +0000

    [MLGO] Remove unused imports

    Remove unused imports from python files in the MLGO library.

commit 048e46ad53bedef076df868524f0a15eb7cbd38c
Author: Brian Cain <bcain@quicinc.com>
Date:   Mon Sep 9 13:27:13 2024 -0500

    [clang, hexagon] Update copyright, license text (#107161)

    When this file was first contributed - `28b01c59c93d ([hexagon] Add
    {hvx,}hexagon_{protos,circ_brev...}, 2021-06-30)` - I incorrectly
    included a QuIC copyright statement with "All rights reserved". I should
    have contributed this file with the `Apache+LLVM exception` license.

commit b1b9b7b853fc4301aedd9ad6b7c22b75f5546b94
Author: Eduard Satdarov <sath@yandex-team.ru>
Date:   Mon Sep 9 21:17:53 2024 +0300

    [libc++] Cache file attributes during directory iteration (#93316)

    This patch adds caching of file attributes during directory iteration
    on Windows. This improves the performance when working with files being
    iterated on in a directory.

commit 09b231cb38755e1bd122dbab9c57c4847bf64204
Author: Mingming Liu <mingmingl@google.com>
Date:   Mon Sep 9 11:16:58 2024 -0700

    Re-apply "[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF" (#107792)

    Fix the use-after-free bug and re-apply
    https://github.com/llvm/llvm-project/pull/106193
    * Without the fix, the string referenced by `objSym.Name` could be
    destroyed even if string saver keeps a copy of the referenced string.
    This caused use-after-free.
    * The fix ([latest
    commit](https://github.com/llvm/llvm-project/pull/107792/commits/9776ed44cfb26172480145aed8f59ba78a6fa2ea))
    updates `objSym.Name` to reference (via `StringRef`) the string saver's
    copy.

    Test:
    1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible
    with `-DLLVM_USE_SANITIZER=Address` and gone with the fix.
    3. Run all tests by following
    https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes.
    * Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at
    `@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined
    [here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30)
    * With the fix, the [multi-stage
    test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh)
    pass stage2 {asan, ubsan, masan}. This is also the test used by
    https://lab.llvm.org/buildbot/#/builders/169

    **Original commit message**

    `StringMap<T>` creates a [copy of the
    string](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMapEntry.h#L55-L58)
    for entry insertions and intentionally keep copies [since the
    implementation optimizes string memory
    usage](https://github.com/llvm/llvm-project/blob/d4c519e7b2ac21350ec08b23eda44bf4a2d3c974/llvm/include/llvm/ADT/StringMap.h#L124).
    On the other hand, linker keeps copies of symbol names [1] in
    `lld::elf::parseFiles` [2] before invoking `compileBitcodeFiles` [3].

    This change proposes to optimize away string copies inside
    [LTO::GlobalResolutions](https://github.com/llvm/llvm-project/blob/24e791b4164986a1ca7776e3ae0292ef20d20c47/llvm/include/llvm/LTO/LTO.h#L409),
    which will make LTO indexing more memory efficient for ELF. There are
    similar opportunities for other (COFF, wasm, MachO) formats.

    The optimization takes place for lld (ELF) only. For the rest of use
    cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep
    copies and use global resolution key for de-duplication.

    Together with @kazutakahirata's work to make `ComputeCrossModuleImport`
    more memory efficient, we see a ~20% peak memory usage reduction in a
    binary where peak memory usage needs to go down. Thanks to the
    optimization in
    https://github.com/llvm/llvm-project/commit/329ba523ccbbe68a12434926c92fd9a86494d958,
    the max (as opposed to the sum) of `ComputeCrossModuleImport` or
    `GlobalResolution` shows up in peak memory usage.
    * Regarding correctness, the set of
    [resolved](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/lib/LTO/LTO.cpp#L739)
    [per-module
    symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L188-L191)
    is a subset of
    [llvm::lto::InputFile::Symbols](https://github.com/llvm/llvm-project/blob/80c47ad3aec9d7f22e1b1bdc88960a91b66f89f1/llvm/include/llvm/LTO/LTO.h#L120).
    And bitcode symbol parsing saves symbol name when iterating
    `obj->symbols` in `BitcodeFile::parse` already. This change updates
    `BitcodeFile::parseLazy` to keep copies of per-module undefined symbols.
    * Presumably the undefined symbols in a LTO unit (copied in this patch
    in linker unique saver) is a small set compared with the set of symbols
    in global-resolution (copied before this patch), making this a
    worthwhile trade-off. Benchmarking this change alone shows measurable
    memory savings across various benchmarks.

    [1] ELF
    https://github.com/llvm/llvm-project/blob/1cea5c2138bef3d8fec75508df6dbb858e6e3560/lld/ELF/InputFiles.cpp#L1748
    [2]
    https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2863
    [3]
    https://github.com/llvm/llvm-project/blob/ef7b18a53c0d186dcda1e322be6035407fdedb55/lld/ELF/Driver.cpp#L2995

commit 277371943fa48f2550df02870951f5e5a77efef5
Author: lntue <35648136+lntue@users.noreply.github.com>
Date:   Mon Sep 9 14:15:46 2024 -0400

    [libc][bazel] Update bazel overlay for math functions and their tests. (#107862)

commit 4a501a4556bb191bd6eb5398a7330a28437e5087
Author: Artem Belevich <tra@google.com>
Date:   Mon Sep 9 11:14:41 2024 -0700

    [CUDA/HIP] propagate -cuid to a host-only compilation. (#107483)

    Right now we're bailing out too early, and `-cuid` does not get set for
    the host-only compilations.

commit 6850410562123b6e4fbb039e7ba4a2325b994b84
Author: Zequan Wu <zequanwu@google.com>
Date:   Mon Sep 9 11:14:21 2024 -0700

    [Coverage] Ignore unused functions if the count is 0. (#107661)

    Relax the condition to ignore the case when count is 0.

    This fixes a bug on
    https://github.com/llvm/llvm-project/commit/381e9d2386facea7f2acc0f8c16a6d0731267f80.
    This was reported at
    https://discourse.llvm.org/t/coverage-from-multiple-test-executables/81024/.

commit 5f74671c85877e03622e8d308aee15ed73ccee7c
Author: Tarun Prabhu <tarun@lanl.gov>
Date:   Mon Sep 9 12:10:16 2024 -0600

    [flang][Driver] Support -Xlinker in flang (#107472)

    Partially addresses: https://github.com/llvm/llvm-project/issues/89888

commit 0f349b7a9cde0080e626f6cfd362885341eb63b4
Author: Sarah Spall <spall@users.noreply.github.com>
Date:   Mon Sep 9 11:07:20 2024 -0700

    [HLSL] Implement support for HLSL intrinsic - select (#107129)

    Implement support for HLSL intrinsic select.
    This would close issue #75377

commit 34e3007c69eb91c16f23f20548305a2fb8feb75e
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 10:51:52 2024 -0700

    [ARM] Fix a warning

    This patch fixes:

      llvm/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h:214:5: error: default
      label in switch which covers all enumeration values
      [-Werror,-Wcovered-switch-default]

commit 6cc0138ca3dbdb21f4c4a5fa39cf05c38da4bb75
Author: Chris B <chris.bieneman@me.com>
Date:   Mon Sep 9 12:34:50 2024 -0500

    Fix implicit conversion rank ordering (#106811)

    DXC prefers dimension-preserving conversions over precision-losing
    conversions. This means a double4 -> float4 conversion is preferred over
    a double4 -> double3 or double4 -> double conversion.

commit cd8229bb4bfa4de45528ce101d9dceb9be8bff9e
Author: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
Date:   Mon Sep 9 10:32:35 2024 -0700

    [flang][cuda] Support c_devptr in c_f_pointer intrinsic (#107470)

    This is an extension of CUDA Fortran. The iso_c_binding intrinsic can
    accept a `TYPE(c_devptr)` as its first argument. This patch relax the
    semantic check to accept it and update the lowering to unwrap the cptr
    field from the c_devptr.

commit 7543d09b852695187d08aa5d56d50016fea8f706
Author: Andrew Ng <andrew.ng@sony.com>
Date:   Mon Sep 9 18:18:41 2024 +0100

    [llvm-ml] Fix RIP-relative addressing for ptr operands (#107618)

    Fixes #54773

commit 7f90479b2300b3758fd90015a2e6e7e94cfcf1e7
Author: Leandro Lupori <leandro.lupori@linaro.org>
Date:   Mon Sep 9 14:09:45 2024 -0300

    [flang][OpenMP] Don't abort when default is used on an invalid directive (#107586)

    The previous assert was not considering programs with semantic errors.

    Fixes https://github.com/llvm/llvm-project/issues/107495
    Fixes https://github.com/llvm/llvm-project/issues/93437

commit 95831f012d76558fe78f5f3e71b1003a773384e5
Author: David Green <david.green@arm.com>
Date:   Mon Sep 9 18:04:38 2024 +0100

    [ARM] Add a default unreachable case to AddrModeToString. NFC

    Fixes #107739

commit c36c462cc719d47aa2408bca91a028300b2be6d4
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 09:44:37 2024 -0700

    [LTO] Simplify calculateCallGraphRoot (NFC) (#107765)

    The function returns an instance of FunctionSummary populated by
    calculateCallGraphRoot regardless of whether Edges is empty or not.

commit 7d371725cdf993d16f6debf74cf740c3aea84f9b
Author: Mingming Liu <mingmingl@google.com>
Date:   Mon Sep 9 09:43:47 2024 -0700

    [NFCI][BitcodeReader]Read real GUID from VI as opposed to storing it in map (#107735)

    Currently, `ValueIdToValueInfoMap` [1] stores `std::tuple<ValueInfo,
    GlobalValue::GUID /* original GUID */, GlobalValue::GUID /* real GUID*/
    >`. This change updates the stored value type to `std::pair<ValueInfo,
    GlobalValue::GUID /* original GUID */>`, and reads real GUID from
    ValueInfo.

    When an entry is inserted into `ValueIdToValueInfoMap`, ValueInfo is
    created or inserted using real GUID [2]. ValueInfo keeps a pointer to
    GlobalValueMap [3], using either `GUID` or `{GUID, Name}` [4] when
    reading per-module summaries to create a combined summary.

    [1] owned by per module-summary bitcode reader
    https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L947-L950
    [2]
    [first](https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7130-L7133),
    [second](https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7221-L7222),
    [third](https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7622-L7623)
    [3]
    https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1427-L1431
    [4]
    https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1631
    and
    https://github.com/llvm/llvm-project/blob/caebb4562ce634a22f7b13480b19cffc2a6a6730/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1621

    ---------

    Co-authored-by: Kazu Hirata <kazu@google.com>

commit 60f052edc66a5b5b346635656f231930c436a008
Author: Petr Hosek <phosek@google.com>
Date:   Mon Sep 9 09:43:02 2024 -0700

    [CMake] Passthrough variables for packages to subbuilds (#107611)

    These packaged are imported by LLVMConfig.cmake and so we should be
    passing through the necessary variables from the parent build into the
    subbuilds.

    We use `CMAKE_CACHE_DEFAULT_ARGS` so subbuilds can override these
    variables if needed.

commit 5c8fd1eece8fff69871cef57a2363dc0f734a7d1
Author: Sam Clegg <sbc@chromium.org>
Date:   Mon Sep 9 09:28:08 2024 -0700

    [lld][WebAssembly] Fix use of uninitialized stack data with --wasm64 (#107780)

    In the case of `--wasm64` we were setting the type of the init expression
    to be 64-bit but were only setting the low 32-bits of the value (by
    assigning to Int32).

    Fixes: https://github.com/emscripten-core/emscripten/issues/22538

commit 95753ffa49f57c284a4682a8ca03e05d59f2c112
Author: LLVM GN Syncbot <llvmgnsyncbot@gmail.com>
Date:   Mon Sep 9 16:13:05 2024 +0000

    [gn build] Port ea2da571c761

commit db6051dae085c35020c1273ae8d38508c9958bc7
Author: Pavel Skripkin <paskripkin@gmail.com>
Date:   Mon Sep 9 19:12:38 2024 +0300

    [analyzer] fix crash on binding to symbolic region with `void *` type (#107572)

    As reported in
    https://github.com/llvm/llvm-project/pull/103714#issuecomment-2295769193.
    CSA crashes on trying to bind value to symbolic region with `void *`.
    This happens when such region gets passed as inline asm input and engine
    tries to bind `UnknownVal` to that region.

    Fix it by changing type from void to char before calling
    `GetElementZeroRegion`

commit 3cdb30ebbc18fa894d3bd67aebcff76ce7c741ac
Author: Krystian Stasiowski <sdkrystian@gmail.com>
Date:   Mon Sep 9 12:06:45 2024 -0400

    [Clang][Sema] Use the correct lookup context when building overloaded 'operator->' in the current instantiation (#104458)

    Currently, clang erroneously rejects the following:
    ```
    struct A
    {
        template<typename T>
        void f();
    };

    template<typename T>
    struct B
    {
        void g()
        {
            (*this)->template f<int>(); // error: no member named 'f' in 'B<T>'
        }

        A* operator->();
    };
    ```

    This happens because `Sema::ActOnStartCXXMemberReference` does not adjust the `ObjectType` parameter when `ObjectType` is a dependent type (except when the type is a `PointerType` and the class member access is the `->` form). Since the (possibly adjusted) `ObjectType` parameter (`B<T>` in the above example) is passed to `Parser::ParseOptionalCXXScopeSpecifier`, we end up looking up `f` in `B` rather than `A`.

    This patch fixes the issue by identifying cases where the type of the object expression `T` is a dependent, non-pointer type and:
    - `T` is the current instantiation and lookup for `operator->` finds a member of the current instantiation, or
    - `T` has at least one dependent base case, and `operator->` is not found in the current instantiation

    and using `ASTContext::DependentTy` as the type of the object expression when the optional _nested-name-specifier_ is parsed.

    Fixes #104268.

commit eba6160deec5a32e4b31c2a446172d0e388195c9
Author: Tarun Prabhu <tarun@lanl.gov>
Date:   Mon Sep 9 09:57:49 2024 -0600

    [flang][Driver] Support --no-warnings option (#107455)

    Because of the way visibility is implemented in Options.td, options that
    are aliases do not inherit the visibility of the option being aliased.
    Therefore, explicitly set the visibility of the alias to be the same as
    the aliased option.

    This partially addresses
    https://github.com/llvm/llvm-project/issues/89888

commit 914ab366c24cf494a798ce3a178686456731861a
Author: sstipanovic <146831748+sstipanovic@users.noreply.github.com>
Date:   Mon Sep 9 17:54:30 2024 +0200

    [AMDGPU] Overload image atomic swap to allow float as well. (#107283)

    LLPC can generate llvm.amdgcn.image.atomic.swap intrinsic with data
    argument as float type as well as float return type. This went unnoticed
    until CreateIntrinsic with implicit mangling was used.

commit ea2da571c761066542f8d2273933d2523279e631
Author: Tyler Nowicki <tyler.nowicki@amd.com>
Date:   Mon Sep 9 11:50:27 2024 -0400

    [Coroutines] Move the SuspendCrossingInfo analysis helper into its own header/source (#106306)

    * Move the SuspendCrossingInfo analysis helper into its own
    header/source

    See RFC for more info:
    https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057

    Co-authored-by: tnowicki <tnowicki.nowicki@amd.com>

commit 1651014960b90bd1398f61bec0866d4a187910ef
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 08:47:42 2024 -0700

    [TableGen] Change SetTheory set/vec to use const Record * (#107692)

    Change SetTheory::RecSet/RecVec to use const Record pointers.

commit e46f03bc31a61a903416f1d3c68063ab75aebe6e
Author: Teresa Johnson <tejohnson@google.com>
Date:   Mon Sep 9 08:17:41 2024 -0700

    [MemProf] Remove unnecessary data structure (NFC) (#107643)

    Recent change #106623 added the CallToFunc map, but I subsequently
    realized the same information is already available for the calls being
    examined in the StackIdToMatchingCalls map we're iterating through.

commit 86e5c5468ae3fcd65b23fd7b3cb0182e676829bd
Author: Nicolas van Kempen <nvankemp@gmail.com>
Date:   Mon Sep 9 11:15:28 2024 -0400

    [clang-tidy][run-clang-tidy] Fix minor shutdown noise (#105724)

    On my new machine, the script outputs some shutdown noise:
    ```
    Ctrl-C detected, goodbye.
    Traceback (most recent call last):
      File "/home/nvankempen/llvm-project/./clang-tools-extra/clang-tidy/tool/run-clang-tidy.py", line 626, in <module>
        asyncio.run(main())
      File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
        return loop.run_until_complete(main)
      File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
        self.run_forever()
      File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
        self._run_once()
      File "/usr/lib/python3.10/asyncio/base_events.py", line 1871, in _run_once
        event_list = self._selector.select(timeout)
      File "/usr/lib/python3.10/selectors.py", line 469, in select
        fd_event_list = self._selector.poll(timeout, max_ev)
    KeyboardInterrupt
    ```

    This fixes it. Also remove an unused typing import.
    Relevant documentation:
    https://docs.python.org/3/library/asyncio-runner.html#handling-keyboard-interruption

commit 763bc9249cf0b7da421182e24716d9a569fb5184
Author: Jakub Kuderski <jakub@nod-labs.com>
Date:   Mon Sep 9 11:12:26 2024 -0400

    [mlir][amdgpu] Align Chipset with TargetParser (#107720)

    Update the Chipset struct to follow the `IsaVersion` definition from
    llvm's `TargetParser`. This is a follow up to
    https://github.com/llvm/llvm-project/pull/106169#discussion_r1733955012.

    * Add the stepping version. Note: This may break downstream code that
    compares against the minor version directly.
    * Use comparisons with full Chipset version where possible.

    Note that we can't use the code in `TargetParser` directly because the
    chipset utility is outside of `mlir/Target` that re-exports llvm's
    target library.

commit 6cc3bf7d1d343f910b40cee24d4cda873a6ddd55
Author: Quinn Dawkins <quinn.dawkins@gmail.com>
Date:   Mon Sep 9 11:05:37 2024 -0400

    [mlir][tensor] Add canonicalization to fold consecutive tensor.pad ops (#107302)

    `tensor.pad(tensor.pad)` with the same constant padding value can be
    combined into a single pad that pads to the sum of the high and low
    padding amounts.

commit ea9204505cf1099b98b1fdcb898f0bd35e463984
Author: Lei Huang <lei@ca.ibm.com>
Date:   Mon Sep 9 11:01:22 2024 -0400

    Fix codegen for transparent_union function params (#104816)

    Update codegen for func param with transparent_union attr to be that of
    the first union member.

    This is a followup to #101738 to fix non-ppc codegen and closes #76773.

commit 6634d44e5e6079e19efe54c2de35e2e63108b085
Author: Amy Wang <kai.ting.wang@huawei.com>
Date:   Mon Sep 9 10:57:13 2024 -0400

    [MLIR][Transform] Allow stateInitializer and stateExporter for applyTransforms (#101186)

    This is discussed in RFC:

    https://discourse.llvm.org/t/rfc-making-the-constructor-of-the-transformstate-class-protected/80377

commit 111932d5cae0199d9c59669b37232a011f8b8757
Author: Luke Lau <luke@igalia.com>
Date:   Mon Sep 9 22:45:44 2024 +0800

    [RISCV] Fix same mask vmerge peephole discarding false operand (#107827)

    This fixes the issue raised in
    https://github.com/llvm/llvm-project/pull/106108#discussion_r1749677510

    True's passthru needs to be equivalent to vmerge's false, but we also
    allow true's passthru to be undef.

    However if it's undef then we need to replace it with false, otherwise
    we end up discarding the false operand entirely.

    The changes in fixed-vectors-strided-load-store-asm.ll undo the changes
    in #106108 where we introduced this miscompile.

commit 2d338bed00b2bba713bceb4915400063b95929b2
Author: Tobias Stadler <mail@stadler-tobias.de>
Date:   Mon Sep 9 16:30:44 2024 +0200

    [CodeGen] Refactor DeadMIElim isDead and GISel isTriviallyDead (#105956)

    Merge GlobalISel's isTriviallyDead and DeadMachineInstructionElim's
    isDead code and remove all unnecessary checks from the hot path by
    looping over the operands before doing any other checks.

    See #105950 for why DeadMIElim needs to remove LIFETIME markers even
    though they probably shouldn't generally be considered dead.

    x86 CTMark O3: -0.1%
    AArch64 GlobalISel CTMark O0: -0.6%, O2: -0.2%

commit a2f659c1349cb70c09b183eb214e2a24cf04c2c6
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 07:15:12 2024 -0700

    [StructurizeCFG] Avoid repeated hash lookups (NFC) (#107797)

commit ab95ed5ce0b099913eb5c9b03fef7f322c24acd2
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 07:14:40 2024 -0700

    [IPO] Avoid repeated hash lookups (NFC) (#107796)

commit 3940a1ba1454afec916be86385bb2031526e3e13
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 07:13:52 2024 -0700

    [Float2Int] Avoid repeated hash lookups (NFC) (#107795)

commit 563dc226fe17f7638d02a957d1b2870dfa968f01
Author: Kazu Hirata <kazu@google.com>
Date:   Mon Sep 9 07:13:27 2024 -0700

    [Analysis] Avoid repeated hash lookups (NFC) (#107794)

commit 620b8d994b8abdcf31271d9f4db7e7422fc9bd65
Author: Samuel Thibault <samuel.thibault@ens-lyon.org>
Date:   Mon Sep 9 15:53:33 2024 +0200

    [hurd] Fix accessing f_type field of statvfs (#71851)

    f4719c4d2cda ("Add support for GNU Hurd in Path.inc and other places")
    made llvm use an internal __f_type name for the f_type field (which it
    is not supposed to since accessing double-underscore names is explicitly
    not supported by standards). In glibc 2.39 this field was renamed to
    f_type so application can now access the field as the standard says.

commit eaac4a26136ca8e3633bf91795343cd060d7af87
Author: Pierre van Houtryve <pierre.vanhoutryve@amd.com>
Date:   Mon Sep 9 15:35:28 2024 +0200

    [AMDGPU] Document & Finalize GFX12 Memory Model (#98599)

    Documents the memory model implemented as of #98591, with some
    fixes/optimizations to the implementation.

commit 1a5a1e97817c9a3db4d1f9795789c99790cf88e2
Author: Florian Hahn <flo@fhahn.com>
Date:   Mon Sep 9 14:26:08 2024 +0100

    [VPlan] Assert that VFxUF is always used.

    Add assertion to ensure invariant discussed in
    https://github.com/llvm/llvm-project/pull/95305.

commit 1f2a634c44dedef11f590956f297b2c7a1659fcf
Author: Sergey Kachkov <sergey.kachkov@syntacore.com>
Date:   Wed Sep 4 17:42:03 2024 +0300

    Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380)

    Motivating example: https://godbolt.org/z/eb97zrxhx
    Here we have 2 induction variables in the loop: one is corresponding to
    i variable (add rdx, 4), the other - to res (add rax, 2). The second
    induction variable can be removed by rewriteLoopExitValues() method
    (final value of res at loop exit is unroll_iter * -2); however, this
    doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
    ```
    ; Preheader:
    for.body.preheader.new:                           ; preds = %for.body.preheader
      %unroll_iter = and i64 %N, -4
      br label %for.body

    ; Loop:
    for.body:                                         ; preds = %for.body, %for.body.preheader.new
      %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
      %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
      %inc.3 = add nuw i64 %i.07, 4
      %lsr.iv.next = add nsw i64 %lsr.iv, -2
      %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
      br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

    ; Exit blocks
    for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
      %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
      %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
      %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
      br label %for.end.loopexit.unr-lcssa
    ```
    rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
    one in LCSSA phi node, the other - in induction phi node. Here we have 3
    uses of this value because of duplicated lcssa nodes, so the transform
    doesn't apply and leads to an extra add operation inside the loop. The
    proposed solution is to accumulate inserted instructions that will
    require LCSSA form update into SetVector and then call
    formLCSSAForInstructions for this SetVector once, so the same
    instructions don't process twice.

    Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation
    when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are
    moved into separate duplicated-phis.ll test with explicit x86 target requirement
    to fix bots which are not building this target.

commit 17f0c5dfaab8bc72e19cb68e73b0944e5ee27b88
Author: Sergey Kachkov <sergey.kachkov@syntacore.com>
Date:   Fri Aug 30 16:00:42 2024 +0300

    [LSR][NFC] Add pre-commit test

commit aa158bf40285925d3c019d9e697cd2c88421297a
Author: Florian Hahn <flo@fhahn.com>
Date:   Mon Sep 9 14:10:12 2024 +0100

    [LV] Update tests to replace some code with loop varying instructions.

    Update some tests with loop-invariant instructions, where hoisting them
    out of the loop changes the vectorization decision. This should preserve
    their original spirit when making further improvements.

commit e25eb1433110d94d16fd69e5aca9bdf72259263d
Author: Florian Hahn <flo@fhahn.com>
Date:   Mon Sep 9 13:05:54 2024 +0100

    [ConstraintElim] Add tests for loops with chained header conditions.

commit 1199e5b9ce5a001445463ba8da1f70fa4558fbcc
Author: Nikita Popov <npopov@redhat.com>
Date:   Mon Sep 9 12:45:48 2024 +0200

    [MemCpyOpt] Add more tests for memcpy passed to readonly arg (NFC)

commit cf8fb4320f1be29c55909adf5ff8ad47e02b2dbe
Author: Momchil Velikov <momchil.velikov@arm.com>
Date:   Mon Sep 9 13:34:41 2024 +0100

    [AArch64] Implement NEON vamin/vamax intrinsics (#99041)

    This patch implements the intrinsics of the form

        floatNxM_t vamin[q]_fN(floatNxM_t vn, floatNxM_t vm);
        floatNxM_t vamax[q]_fN(floatNxM_t vn, floatNxM_t vm);

    as defined in https://github.com/ARM-software/acle/pull/324

    ---------

    Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>

commit 32cef07885e112d05bc2b1c285f40e353d80e18f
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 05:27:38 2024 -0700

    [LLDB][TableGen] Migrate lldb-tblgen to use const RecordKeeper (#107536)

    Migrate LLDB TableGen backend to use const RecordKeeper.

    This is a part of effort to have better const correctness in TableGen
    backends:

    https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

commit cca54e347ac34912cdfb9983533c61836db135e0
Author: Martin Storsjö <martin@martin.st>
Date:   Mon Sep 9 15:08:19 2024 +0300

    Revert "Reapply "[Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer" (#97308)"

    This reverts commit 45c8766973bb3bb73dd8d996231e114dcf45df9f
    and 049512e39d96995cb373a76cf2d009a86eaf3aab.

    This change triggers failed asserts on inputs like this:

        struct a {
        } constexpr b;
        class c {
        public:
          c(a);
        };
        class B {
        public:
          using d = int;
          struct e {
            enum { f } g;
            int h;
            c i;
            d j{};
          };
        };
        B::e k{B::e::f, int(), b};

    Compiled like this:

        clang -target x86_64-linux-gnu -c repro.cpp
        clang: ../../clang/lib/CodeGen/CGExpr.cpp:3105: clang::CodeGen::LValue
        clang::CodeGen::CodeGenFunction::EmitDeclRefLValue(const clang::DeclRefExpr*):
        Assertion `(ND->isUsed(false) || !isa<VarDecl>(ND) || E->isNonOdrUse() ||
        !E->getLocation().isValid()) && "Should not use decl without marking it used!"' failed.

commit 7a930ce327fdbc5c77b50ee6304645084100c037
Author: Jeremy Morse <jeremy.morse@sony.com>
Date:   Mon Sep 9 12:54:45 2024 +0100

    [DWARF] Emit a minimal line-table for totally empty functions (#107267)

    In degenerate but legal inputs, we can have functions that have no source
    locations at all -- all the DebugLocs attached to instructions are empty.
    LLVM didn't produce any source location for the function; with this patch
    it will at least emit the function-scope source location. Demonstrated by
    empty-line-info.ll

    The XCOFF test modified has similar symptoms -- with this patch, the size
    of the ".dwline" section grows a bit, thus shifting some of the file
    internal offsets, which I've updated.

commit 959d84044a70da08923fe221f999f4e406094ee9
Author: pvanhout <pierre.vanhoutryve@amd.com>
Date:   Mon Sep 9 13:50:48 2024 +0200

    [AMDGPU] Remove unused SplitGraph::Node::getFullCost

commit b8b8fbe19dea2825b801c4738ff78dbf26aae430
Author: Rahul Joshi <rjoshi@nvidia.com>
Date:   Mon Sep 9 04:18:55 2024 -0700

    [NFC][TableGen] Migrate LLVM Attribute Emitter to const RecordKeeper (#107698)

    Migrate LLVM Attribute Emitter to const RecordKeeper.

commit d84d9559bdc7aeb4ce14c251f6a3490c66db8d3a
Author: Nicolas van Kempen <nvankemp@gmail.com>
Date:   Mon Sep 9 07:12:46 2024 -0400

    [clang][analyzer] Fix #embed crash (#107764)

    Fix #107724.

commit 09c00b6f0463f6936be5d2100f9d47c0077700f8
Author: Benjamin Kramer <benny.kra@googlemail.com>
Date:   Mon Sep 9 13:03:38 2024 +0200

    [bazel] Add missing dependencies for 345cc47ba7a28811ae4ec7d113059ccb39c500a3

commit 049512e39d96995cb373a76cf2d009a86eaf3aab
Author: yronglin <yronglin777@gmail.com>
Date:   Mon Sep 9 19:01:11 2024 +0800

    [NFC][clang] Fix clang version in the test for the implementation of cwg1815 (#107838)

    This PR fix the clang version in
    https://github.com/llvm/llvm-project/pull/97308 .

    Signed-off-by: yronglin <yronglin777@gmail.com>

commit 345cc47ba7a28811ae4ec7d113059ccb39c500a3
Author: Daniil Fukalov <dfukalov@gmail.com>
Date:   Mon Sep 9 12:44:03 2024 +0200

    [NFC] Add explicit #include llvm-config.h where its macros are used, lldb part. (#107603)

    (this is lldb part)

    Without these explicit includes, removing other headers, who implicitly
    include llvm-config.h, may have non-trivial side effects. For example,
    `clangd` may report even `llvm-config.h` as "no used" in case it defines
    a macro, that is explicitly used with #ifdef. It is actually amplified
    with different build configs which use different set of macros.

commit dbd81ba2e85c2f244f22c983d96a106eae65c06a
Author: Mikhail Goncharov <goncharov.mikhail@gmail.com>
Date:   Mon Sep 9 11:47:47 2024 +0200

    complete rename of __orc_rt namespace

    for 3e04ad428313dde40c779af6d675b162e150125e

    it's bizzare that none of the builbots were broken, only bazel build
    https://buildkite.com/llvm-project/upstream-bazel/builds/109623#0191d5d0-2b3e-4ee7-b8dd-1e2580977e9b

commit 663e9cec9c96169aa4e72ab9b6bf08b2d6603093
Author: Artem Kroviakov <71938912+akroviakov@users.noreply.github.com>
Date:   Mon Sep 9 11:49:16 2024 +0200

    [Func][GPU] Use SymbolUserOpInterface in func::ConstantOp  (#107748)

    This PR enables `func::ConstantOp` creation and usage for device
    functions inside GPU modules.
    The current main returns error for referencing device functions via
    `func::ConstantOp`, because during the `ConstantOp` verification it only
    checks symbols in `ModuleOp` symbol table, which, of course, does not
    contain device functions that are defined in `GPUModuleOp`. This PR
    proposes a more general solution.

    Co-authored-by: Artem Kroviakov <artem.kroviakov@tum.de>

commit aa21ce4a792c170074193c32e8ba8dd35e57c628
Author: Jonas Rickert <Jonas.Rickert@amd.com>
Date:   Mon Sep 9 11:48:13 2024 +0200

    [mlir] Do not set lastToken in AsmParser's resetToken function and add a unit test for AsmParsers's locations (#105529)

    This changes the function `resetToken` to not update `lastToken`.

    The member `lastToken` is the last token that was consumed by the
    parser.
    Resetting the lexer position to a different position does not cause any
    token to be consumed, so `lastToken` should not be updated.
    Setting it to `curToken` can cause the scopeLoc.end location of
    `OperationDefinition `to be off-by-one, pointing to the
    first token after the operation.

    An example for an operation for which the scopeLoc.end location was
    wrong before is:
    ```
    %0 = torch.vtensor.literal(dense_resource<__elided__> : tensor<768xbf16>) : !torch.vtensor<[768],bf16>
    ```
    Here the scope end loc always pointed to the next token

    This also adds a test for the Locations of `OperationDefinitions`.
    Without the change to `resetToken` the test failes, with the scope end
    location for `llvm.mlir.undef` pointing to the `func.return` in the next
    line

commit b98aa6fb1d5f5fa904ce6d789a8fa4a245a90ee6
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Mon Sep 9 10:29:04 2024 +0100

    [X86] LowerABD - lower i8/i16 cases directly to CMOV(SUB(X,Y),SUB(Y,X)) pattern

    Better codegen (shorter dependency chain for better ILP) than via the TRUNC(ABS(SUB(EXT(LHS),EXT(RHS)))) expansion

commit d57be195e37f9c11a26e8e3fe8da5ef62bb921af
Author: Lukacma <Marian.Lukac@arm.com>
Date:   Mon Sep 9 10:28:01 2024 +0100

    [AArch64] replace SVE intrinsics with no active lanes with zero (#107413)

    This patch extends https://github.com/llvm/llvm-project/pull/73964 and
    optimises SVE intrinsics into zero constants when predicate is zero.

commit 476b1a661f6846537d232e9a3bc5a68c5f15efb3
Author: Jerry-Ge <jerry.ge@arm.com>
Date:   Mon Sep 9 02:26:39 2024 -0700

    [TOSA] Update input name for Sin and Cos operators (#107606)

    Update the dialect input names from input to input1 for Sin/Cos for
    consistency.

    Signed-off-by: Jerry Ge <jerry.ge@arm.com>

commit da11ede57d034767a6f5d5e211c06c1c3089d7fd
Author: vabridgers <58314289+vabridgers@users.noreply.github.com>
Date:   Mon Sep 9 03:47:39 2024 -0500

    [analyzer] Remove overzealous "No dispatcher registered" assertion (#107294)

    Random testing revealed it's possible to crash the analyzer with the
    command line invocation:

    clang -cc1 -analyze -analyzer-checker=nullability empty.c

    where the source file, empty.c is an empty source file.

    ```
    clang: <root>/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp:56:
       void clang::ento::CheckerManager::finishedCheckerRegistration():
         Assertion `Event.second.HasDispatcher && "No dispatcher registered for an event"' failed.

    PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/

    Stack dump:
    0.      Program arguments: clang -cc1 -analyze -analyzer-checker=nullability nullability-nocrash.c
     ...
                 clang::AnalyzerOptions&, clang::Preprocessor const&,
                 llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>,
                 std::allocator<char>>>, llvm::ArrayRef<std::function<void (clang::ento::CheckerRegistry&)>>)
    ```

    This commit removes the assertion which failed here, because it was
    logically incorrect: it required that if an Event is handled by some
    (enabled) checker, then there must be an **enabled** checker which can
    emit that kind of Event. It should be OK to disable the event-producing
    checkers but enable an event-consuming checker which has different
    responsibilities in addition to handling the events.

    Note that this assertion was in an `#ifndef NDEBUG` block, so this
    change does not impact the non-debug builds.

    Co-authored-by: Vince Bridgers <vince.a.bridgers@ericsson.com>

commit 04742f34b343af87dda93edacbb06f6e98a1d80f
Author: Nikita Popov <npopov@redhat.com>
Date:   Mon Sep 9 10:24:54 2024 +0200

    [SCCP] Add test for nonnull argument inference (NFC)

commit 3b1146e050657f40954e8e1f977837f884df2488
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Mon Sep 9 01:27:22 2024 -0700

    [llvm-exegesis] Use MCRegister instead of unsigned to hold registers (#107820)

commit 74ad2540523ec78122ba5a32e35e0b65ee27b7b3
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Mon Sep 9 08:10:11 2024 +0000

    [Github][MLGO] Fix mlgo-utils path in new-prs-labeler

    This patch (hopefully) fixes the mlgo-utils path in new-prs-labeler so
    that it actually matches all files in that directory. Currently it is
    not catching the files as they are relatively deeply nested within the
    folder.

commit 3e04ad428313dde40c779af6d675b162e150125e
Author: Lang Hames <lhames@gmail.com>
Date:   Mon Sep 9 17:59:47 2024 +1000

    [ORC-RT] Remove double underscore from the orc_rt namespace.

    We should use `orc_rt` as the public C++ API namespace for the ORC runtime and
    control symbol visibility to hide implementation details, rather than rely on
    the '__' prefix.

commit d5f6f30664ed53ef27d949fad0ce3994ea9988dd
Author: Aiden Grossman <aidengrossman@google.com>
Date:   Mon Sep 9 07:49:54 2024 +0000

    [MLGO] Add spaces at the end of lines in multiline string

    This patch adds spaces at the end of lines in multiline strings in the
    extract_ir script. Without this patch, the warning/info messages will be
    printed without spaces between words when there is a line break in the
    source which looks/reads weird.

commit 8549b324bc1f450f4477f46f18db67439dbf6d75
Author: Younan Zhang <zyn7109@gmail.com>
Date:   Mon Sep 9 15:09:43 2024 +0800

    [Clang] Don't assert non-empty packs for FunctionParmPackExprs (#107561)

    `FunctionParmPackExpr`s are peculiar in that they have to be of
    unexpanded dependency while they don't introduce any unexpanded packs.
    So this patch rules them out in the non-empty pack assertion in
    `DiagnoseUnexpandedParameterPack()`.

    There was a fix #69224, but that turned out to be insufficient.

    I also moved the separate tests to a pre-existing file.

    Fixes https://github.com/llvm/llvm-project/issues/86361

commit 022b3c27e27832f27c61683095899227c26e0cca
Author: Piyou Chen <piyou.chen@sifive.com>
Date:   Mon Sep 9 15:07:39 2024 +0800

    [Clang][RISCV] Recognize unsupport target feature by supporting isValidFeatureName (#106495)

    This patch makes unsupported target attributes emit a warning and ignore
    the target attribute during semantic checks. The changes include:

    1. Adding the RISCVTargetInfo::isValidFeatureName function.
    2. Rejecting non-full-arch strings in the handleFullArchString function.
    3. Adding test cases to demonstrate the warning behavior.

commit 9347b66cfcd9acf84dbbd500b6344041c587f6a9
Author: Pierre van Houtryve <pierre.vanhoutryve@amd.com>
Date:   Mon Sep 9 09:06:34 2024 +0200

    Reland "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)" (#107076)

    Relands #104763 with
    - Fixes for EXPENSIVE_CHECKS test failure (due to sorting operator
    failing if the input is shuffled first)
     - Fix for broken proposal selection
     - c3cb27370af40e491446164840766478d3258429 included

    Original commit description below
    ---

    Major rewrite of the AMDGPUSplitModule pass in order to better support
    it long-term.

    Highlights:
    - Removal of the "SML" logging system in favor of just using CL options
    and LLVM_DEBUG, like any other pass in LLVM.
    - The SML system started from good intentions, but it was too flawed and
    messy to be of any real use. It was also a real pain to use and made the
    code more annoying to maintain.
     - Graph-based module representation with DOTGraph printing support
    - The graph represents the module accurately, with bidirectional, typed
    edges between nodes (a node usually represents one function).
    - Nodes are assigned IDs starting from 0, which allows us to represent a
    set of nodes as a BitVector. This makes comparing 2 sets of nodes to
    find common dependencies a trivial task. Merging two clusters of nodes
    together is also really trivial.
     - No more defaulting to "P0" for external calls
    - Roots that can reach non-copyable dependencies (such as external
    calls) are now grouped together in a single "cluster" that can go into
    any partition.
     - No more defaulting to "P0" for indirect calls
    - New representation for module splitting proposals that can be graded
    and compared.
    - Graph-search algorithm that can explore multiple branches/assignments
    for a cluster of functions, up to a maximum depth.
    - With the default max depth of 8, we can create up to 256 propositions
    to try and find the best one.
    - We can still fall back to a greedy approach upon reaching max depth.
    That greedy approach uses almost identical heuristics to the previous
    version of the pass.

    All of this gives us a lot of room to experiment with new heuristics or
    even entirely different splitting strategies if we need to. For
    instance, the graph representation has room for abstract nodes, e.g. if
    we need to represent some global variables or external constraints. We
    could also introduce more edge types to model other type of relations
    between nodes, etc.

    I also designed the graph representation & the splitting strategies to
    be as fast as possible, and it seems to have paid off. Some quick tests
    showed that we spend pretty much all of our time in the CloneModule
    function, with the actual splitting logic being >1% of the runtime.

commit bdcbfa7fb4ac6f23262095c401d28309d689225e
Author: LLVM GN Syncbot <llvmgnsyncbot@gmail.com>
Date:   Mon Sep 9 06:28:13 2024 +0000

    [gn build] Port a416267a5f3f

commit a416267a5f3fffb3d1e9d8d53245aef8169c5ddb
Author: Yuxuan Chen <ych@fb.com>
Date:   Sun Sep 8 23:09:40 2024 -0700

    [LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (#99285)

    This patch is episode three of the middle end implementation for the
    coroutine HALO improvement project published on discourse:
    https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

    After we attribute the calls to some coroutines as "coro_elide_safe" in
    the C++ FE and creating a `noalloc` ramp function, we use a new middle
    end pass to move the call to coroutines to the noalloc variant.

    This pass should be run after CoroSplit. For each node we process in
    CoroSplit, we look for its callers and replace the attributed ones in
    presplit coroutines to the noalloc one. The transformed `noalloc` ramp
    function will also require a frame pointer to a block of memory it can
    use as an activation frame. We allocate this on the caller's frame with
    an alloca.

    Please note that we cannot safely transform such attributed calls in
    post-split coroutines due to memory lifetime reasons. The CoroSplit pass
    is responsible for creating the coroutine frame spills for all the
    allocas in the coroutine. Therefore it will be unsafe to create new
    allocas like this one in post-split coroutines. This happens relatively
    rarely because CGSCC performs the passes on the callees before the
    caller. However, if multiple coroutines coexist in one SCC, this
    situation does happen (and prevents us from having potentially unbound
    frame size due to recursion.)

    You can find episode 1: Clang FE of this patch series at
    https://github.com/llvm/llvm-project/pull/99282
    Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283

commit 234cc81625030e934651d6ae0ace66e37138ba4a
Author: Yuxuan Chen <ych@fb.com>
Date:   Sun Sep 8 23:09:20 2024 -0700

    [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (#99283)

    This patch is episode two of the coroutine HALO improvement project
    published on discourse:
    https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

    Previously CoroElide depends on inlining, and its analysis does not work
    very well with code generated by the C++ frontend due the existence of
    many customization points. There has been issue reported to upstream how
    ineffective the original CoroElide was in real world applications.

    For C++ users, this set of patches aim to fix this problem by providing
    library authors and users deterministic HALO behaviour for some
    well-behaved coroutine `Task` types. The stack begins with a library
    side attribute on the `Task` class that guarantees no unstructured
    concurrency when coroutines are awaited directly with `co_await`ed as a
    prvalue. This attribute on Task types gives us lifetime guarantees and
    makes C++ FE capable to telling the ME which coroutine calls are
    elidable. We convey such information from FE through the attribute
    `coro_elide_safe`.

    This patch modifies CoroSplit to create a variant of the coroutine ramp
    function that 1) does not use heap allocated frame, instead take an
    additional parameter as the pointer to the frame. Such parameter is
    attributed with `dereferenceble` and `align` to convey size and align
    requirements for the frame. 2) always stores cleanup instead of destroy
    address for `coro.destroy()` actions.

    In a later patch, we will have a new pass that runs right after
    CoroSplit to find usages of the callee coroutine attributed
    `coro_elide_safe` in presplit coroutine callers, allocates the frame on
    its "stack", transform those usages to call the `noalloc` ramp function
    variant.

    (note I put quotes on the word "stack" here, because for presplit
    coroutine, any alloca will be spilled into the frame when it's being
    split)

    The C++ Frontend attribute implementation that works with this change
    can be found at https://github.com/llvm/llvm-project/pull/99282
    The pass that makes use of the new `noalloc` split can be found at
    https://github.com/llvm/llvm-project/pull/99285

commit e17a39bc314f97231e440c9e68d9f46a9c07af6d
Author: Yuxuan Chen <ych@fb.com>
Date:   Sun Sep 8 23:08:58 2024 -0700

    [Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] (#99282)

    This patch is the frontend implementation of the coroutine elide
    improvement project detailed in this discourse post:
    https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

    This patch proposes a C++ struct/class attribute
    `[[clang::coro_await_elidable]]`. This notion of await elidable task
    gives developers and library authors a certainty that coroutine heap
    elision happens in a predictable way.

    Originally, after we lower a coroutine to LLVM IR, CoroElide is
    responsible for analysis of whether an elision can happen. Take this as
    an example:
    ```
    Task foo();
    Task bar() {
      co_await foo();
    }
    ```
    For CoroElide to happen, the ramp function of `foo` must be inlined into
    `bar`. This inlining happens after `foo` has been split but `bar` is
    usually still a presplit coroutine. If `foo` is indeed a coroutine, the
    inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
    then runs an analysis to figure out whether the SSA value of
    `coro.begin()` of `foo` gets destroyed before `bar` terminates.

    `Task` types are rarely simple enough for the destroy logic of the task
    to reference the SSA value from `coro.begin()` directly. Hence, the pass
    is very ineffective for even the most trivial C++ Task types. Improving
    CoroElide by implementing more powerful analyses is possible, however it
    doesn't give us the…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants