Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang-cl can not support function targets #53520

Closed
carewolf opened this issue Feb 1, 2022 · 40 comments · Fixed by #75711
Closed

clang-cl can not support function targets #53520

carewolf opened this issue Feb 1, 2022 · 40 comments · Fixed by #75711
Labels
clang:headers Headers provided by Clang, e.g. for intrinsics confirmed Verified by a second party

Comments

@carewolf
Copy link

carewolf commented Feb 1, 2022

The immintrin.h headers has a bug where it does not include sub-arch intrinsics headers if _MSC_VER is defined unless the subarch target is current.

This is inconsistent with MSVC which always defines intrinsics regardless of active arch, and is also inconsistent with normal-clang (gcc-style) which defines them for use with subarch function targets. The result is there is no way of using intrinsics for subarch targets with clang-cl, since neither gcc nor msvc style works.

This has forced us to disable many optimizations in Qt with clang-cl, see https://bugreports.qt.io/browse/QTBUG-88081, https://bugreports.qt.io/browse/QTBUG-88434 and https://bugreports.qt.io/browse/QTBUG-98253

I suggest at least allowing gcc style, and the intrinsics working in target attributed functions, if you can't support the MSVC style.

@zero9178
Copy link
Member

zero9178 commented Feb 1, 2022

This is indeed a nasty divergence from MSVC and one that IIRC is actually quite hard to fix as its also an issue in LLVM, not just clang. There is a workaround however as you can use the /clang: flag to pass GNU style command line options. That way you can eg. pass /clang:-mrdseed to enable that sub-arch feature.

@zero9178 zero9178 added clang:codegen clang:headers Headers provided by Clang, e.g. for intrinsics confirmed Verified by a second party and removed new issue labels Feb 1, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 1, 2022

@llvm/issue-subscribers-clang-codegen

@carewolf
Copy link
Author

carewolf commented Feb 1, 2022

That won't help. The problem as I see it is that immintrin.h does not include non-current subarch targets when _MSC_VER is defined. I think if that was removed it still wouldn't work as MSVC does, but it would atleast work in functions with an appropiate target function attribute (clang/gcc style).

@zero9178
Copy link
Member

zero9178 commented Feb 1, 2022

I am not quite sure as to how my above suggestion differs from clang/gcc style. Using the GCC style -m options will lead to the definition of the various feature test macros and therefore the inclusion of the sub arch headers despite _MSC_VER being defined. As an example from Qt source I wrote:

#include <immintrin.h>

int main()
{
        unsigned int value;
        _rdrand32_step(&value);
}

and then compiled it using clang-cl test.cpp /clang:-mrdrnd. As far as I could tell this should work with all the intrinsics.

But yes this won't match MSVC behaviour, which is clearly a bug. The above would simply serve as a workaround

@carewolf
Copy link
Author

carewolf commented Feb 1, 2022

I think we are talking past each other. I am talking about using intrinsics without having similar commandline flags for runtime target detection

MSVC style:

#include <immintrin.h>
void foo_avx2(args) {
 _mm256_avx_command();
}

clang/gcc style:

#include <immintrin.h>
__attribute__(__target__("arch=haswell")) // or __attribute__(__target__("haswell")), can't remember which is gcc and which is clang
void foo_avx2(args) {
 _mm256_avx_command();
}

This is for runtime detection of CPU feature, not anything enabled at compile time.

In MSVC non-target intrinsics always works (somehow), in clang/gcc they work if appropriate function target has been declared where used. In clang-cl neither works because the compile-time flags are checked before defining the functions in the immintrin.h header.

@wangwenx190
Copy link

Please fix this bug in clang-cl, this issue is really annoying.

@ThiagoIze
Copy link

I think this has already been explained by Allan, but just to make sure it's clear. This bug is preventing us from writing a single Windows binary that can be optimized for various architectures. For instance, we'd like the binary to work on older CPUs that don't have AVX and on newer machines with AVX we can run AVX optimized functions instead of the generic SSE2 functions. We might even have optimizations for AVX-512.

Fixing this can give applications a 4x speedup on Windows while still allowing the application to run on older CPUs. It's not everyday a compiler can do something that gives a 4x speedup, so I think this should be a high priority to fix.

@wangwenx190
Copy link

@ThiagoIze is right, this is performance critical for many applications, please prioritize this issue.

@wangwenx190
Copy link

Any progress on this?

@wangwenx190
Copy link

🤔

@thiagomacieira
Copy link

Does anyone know what the limitation is? If I copy the definitions from avxintrin.h:

#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx"), __min_vector_width__(256)))
typedef int __v8si __attribute__ ((__vector_size__ (32)));
typedef long long __m256i __attribute__((__vector_size__(32), __aligned__(32)));
typedef long long __m256i_u __attribute__((__vector_size__(32), __aligned__(1)));

static __inline __m256i __DEFAULT_FN_ATTRS
_mm256_set_epi32(int __i0, int __i1, int __i2, int __i3,
                 int __i4, int __i5, int __i6, int __i7)
{
  return __extension__ (__m256i)(__v8si){ __i7, __i6, __i5, __i4, __i3, __i2, __i1, __i0 };
}
static __inline __m256i __DEFAULT_FN_ATTRS
_mm256_set1_epi32(int __i)
{
  return _mm256_set_epi32(__i, __i, __i, __i, __i, __i, __i, __i);
}
static __inline void __DEFAULT_FN_ATTRS
_mm256_storeu_si256(__m256i_u *__p, __m256i __a)
{
  struct __storeu_si256 {
    __m256i_u __v;
  } __attribute__((__packed__, __may_alias__));
  ((struct __storeu_si256*)__p)->__v = __a;
}

__attribute__((target("avx2"))) void fill(void *ptr, int n)
{
    __m256i v = _mm256_set1_epi32(n);
    _mm256_storeu_si256((__m256i *)ptr, v);
}

it compiles and works just fine (LLVM 15.0.6):

$ clang-cl -c -O2 test.cpp
$ objdump -dr test.obj    

test.obj:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 <?fill@@YAXPEAXH@Z>:
   0:   c5 f9 6e c2             vmovd  %edx,%xmm0
   4:   c4 e2 7d 58 c0          vpbroadcastd %xmm0,%ymm0
   9:   c5 fe 7f 01             vmovdqu %ymm0,(%rcx)
   d:   c5 f8 77                vzeroupper
  10:   c3                      ret

qtprojectorg pushed a commit to qt/qtbase that referenced this issue Dec 15, 2022
clang-cl's intrinsics support is broken, it doesn't declare the AVX2
intrinsics if they are disabled and this doesn't match GCC or MSVC
behavior: llvm/llvm-project#53520

This fix allows to disable x86 intrinsiscs during configuration of
clang-cl build.

clang-cl build is still not guaranteed to work with enabled x86 intrinsics.

Change-Id: Icd295f6b4d868adf10bcd425d5280c56b43cb9f7
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
@wangwenx190
Copy link

I've encountered with another similar issue, which prevents me from building Qt using clang-cl: https://bugreports.qt.io/browse/QTBUG-113231. However, I've fixed it and the workaround is really simple. For some unknown reason, clang-cl can't find some intrin functions, so I modified immintrin.h to let the corresponding headers be included unconditionally. And then the compilation error is solved. I'm not sure what I'm doing is correct or not, but at least the compilation now goes smoothly without any errors and the generated binary file also seem to work fine.

@carewolf
Copy link
Author

carewolf commented May 1, 2023

I've encountered with another similar issue, which prevents me from building Qt using clang-cl: https://bugreports.qt.io/browse/QTBUG-113231. However, I've fixed it and the workaround is really simple. For some unknown reason, clang-cl can't find some intrin functions, so I modified immintrin.h to let the corresponding headers be included unconditionally. And then the compilation error is solved. I'm not sure what I'm doing is correct or not, but at least the compilation now goes smoothly without any errors and the generated binary file also seem to work fine.

No I suspect that is the correct solution. The stupid include breaks are the issue, and it makes no sense they are there. Just nobody in the project have tried doing it. So it probably takes somebody outside of LLVM to fix this bug.

@thiagomacieira
Copy link

Those were added for a reason. The question is whether that reason is still valid. I suspect it isn't: the reason must have been that the __attribute__((target(xxxx))) didn't work in previous versions and that has since been corrected.

@wangwenx190
Copy link

These header guards were added in 379a195, which is included since llvm 3.9.0 (2016)

@AaronBallman
Copy link
Collaborator

Those were added for a reason. The question is whether that reason is still valid. I suspect it isn't: the reason must have been that the __attribute__((target(xxxx))) didn't work in previous versions and that has since been corrected.

They were added because including this header without them induces ~10-30% compile time overhead, which you often have no say in because it's included by system headers.

CC @nico for awareness

@thiagomacieira
Copy link

I prefer to pay the penalty of 10 to 30% slowness compared to not being able to compile code that Clang-non-CL and MSVC compile.

@AaronBallman
Copy link
Collaborator

It's something we need to solve, but it's not acceptable to introduce that amount of compile time regression when solving it.

@thiagomacieira
Copy link

I agree it's something to solve, but disagree that the cost is unacceptable. As I said, this is the difference between "good compiler generates really good code" (hopefully) and "broken compiler, don't even report bugs to us". If there is a good chance that the compilation time slowness will get sufficiently solved in the short term, then the delay is acceptable. Conversely, if there's no chance of that happening soon (too difficult, no one working on it, etc.), then an indefinite delay is not acceptable.

I also don't know much code that includes immintrin.h and family in public headers. They're usually kept in private headers used exclusively for implementations, so the cost in compilation time is reduced. More importantly, they're also the ones that want to use the header and right now can't.

@ADKaster
Copy link
Contributor

ADKaster commented May 4, 2023

I also don't know much code that includes immintrin.h and family in public headers.

According to this comment by @StephanTLavavej Microsoft/STL, immintrin.h is included by intrin.h, which seems to be included in a core STL header.

microsoft/STL#3285 (comment)

If I'm reading that right, every TU that uses the C++ standard library on Windows includes this header? That seems to suggest that a performance impact as high as 10-30% would be quite unacceptable...

Unless I'm misreading the STL code, of course. 😅

@thiagomacieira
Copy link

Ah, I see. They're probably using intrinsics for and similar functionality. I think I've seen other uses too in their headers, like in https://github.com/microsoft/STL/blob/091cad2eaaa5bc25873eb7261cae57ab123592f3/stl/inc/bit#L144-L145.

libstdc++ and libc++ usually use the __builtin_ type intrinsics which are always pre-defined and forego including the intrinsic headers. That same STL header has such a case:
https://github.com/microsoft/STL/blob/091cad2eaaa5bc25873eb7261cae57ab123592f3/stl/inc/bit#L35-L37

Anyway, this does mean the impact of changing immintrin.h is much higher than I'd thought.

@AaronBallman
Copy link
Collaborator

Anyway, this does mean the impact of changing immintrin.h is much higher than I'd thought.

Yup, that's why I was saying the slowdown was not acceptable -- it impacts roughly everything compiled on Windows, which makes this tricky to resolve. That said, I think we need a solution of some kind.

@thiagomacieira
Copy link

BTW, do you know if the slowdown is caused by the presence of the __attribute__ or if it is the number of functions defined in that header in the first place? In other words, does one suffer from this slow-down when using /arch:AVX512?

@AaronBallman
Copy link
Collaborator

BTW, do you know if the slowdown is caused by the presence of the __attribute__ or if it is the number of functions defined in that header in the first place? In other words, does one suffer from this slow-down when using /arch:AVX512?

It's the size of the header file, I believe.

PS F:\source\llvm-project> Measure-Command { .\llvm\out\build\x64-Debug\bin\clang-cl.exe /c "C:\Users\aballman\OneDrive - Intel Corporation\Desktop\test.cpp" }
C:\Users\aballman\OneDrive - Intel Corporation\Desktop\test.cpp(27,5): error: unknown type name '__m256i'
    __m256i v = _mm256_set1_epi32(n);
    ^
C:\Users\aballman\OneDrive - Intel Corporation\Desktop\test.cpp(27,17): error: use of undeclared identifier
      '_mm256_set1_epi32'; did you mean '_mm_set1_epi32'?
    __m256i v = _mm256_set1_epi32(n);
                ^~~~~~~~~~~~~~~~~
                _mm_set1_epi32
F:\source\llvm-project\llvm\out\build\x64-Debug\lib\clang\17\include\emmintrin.h(3618,46): note: '_mm_set1_epi32'
      declared here
static __inline__ __m128i __DEFAULT_FN_ATTRS _mm_set1_epi32(int __i) {
                                             ^
C:\Users\aballman\OneDrive - Intel Corporation\Desktop\test.cpp(28,28): error: unknown type name '__m256i'
    auto dst = static_cast<__m256i *>(ptr);
                           ^
C:\Users\aballman\OneDrive - Intel Corporation\Desktop\test.cpp(30,26): error: arithmetic on a pointer to void
        _mm256_storeu_si256(dst + i / sizeof(v), v);
                            ~~~ ^
C:\Users\aballman\OneDrive - Intel Corporation\Desktop\test.cpp(31,19): error: arithmetic on a pointer to void
    fill_tail(dst + i / sizeof(v), len - i, n);
              ~~~ ^
5 errors generated.


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 4
Milliseconds      : 480
Ticks             : 44800486
TotalDays         : 5.18524143518519E-05
TotalHours        : 0.00124445794444444
TotalMinutes      : 0.0746674766666667
TotalSeconds      : 4.4800486
TotalMilliseconds : 4480.0486



PS F:\source\llvm-project> Measure-Command { .\llvm\out\build\x64-Debug\bin\clang-cl.exe /c /arch:AVX512 "C:\Users\aballman\OneDrive - Intel Corporation\Desktop\test.cpp" }


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 16
Milliseconds      : 690
Ticks             : 166907541
TotalDays         : 0.000193180024305556
TotalHours        : 0.00463632058333333
TotalMinutes      : 0.278179235
TotalSeconds      : 16.6907541
TotalMilliseconds : 16690.7541

@thiagomacieira
Copy link

Comparing to regular Clang:

$ clang --version
clang version 16.0.2
Target: x86_64-w64-windows-gnu
Thread model: posix
InstalledDir: C:/msys/mingw64/bin
$ time sh -c 'for ((i=0;i<10;++i)); do clang -c test.cpp; done'
sh -c 'for ((i=0;i<10;++i)); do clang -c test.cpp; done'  0.06s user 0.20s system 5% cpu 4.663 total
$ time sh -c 'for ((i=0;i<10;++i)); do clang -c -march=sapphirerapids test.cpp; done'
sh -c 'for ((i=0;i<10;++i)); do clang -c -march=sapphirerapids test.cpp; done  0.00s user 0.15s system 3% cpu 4.573 total

There's no slow-down when adding the option, but it wouldn't be expected anyway because the entire header gets parsed.

Unfortunately, this is a different build of LLVM and a different C library, so the numbers aren't directly comparable.

@LuoYuanke
Copy link
Contributor

LuoYuanke commented May 4, 2023

Maybe allow user to specify a macro to include all intrinsics files.

#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) ||      \
    defined(__AVX2__) || defined(_INCLUDE_ALL_INTRINSICS)
#include <avx2intrin.h>
#endif

@thiagomacieira
Copy link

That has the problem that it wouldn't work if immintrin.h has already been included, which it would be if MS STL is using it. An upgrade to STL may break existing code because of that.

@wangwenx190
Copy link

I removed all the header guards in that file and the compilation speed is indeed significantly slow when building Qt. However, it seems it's still faster than MSVC, at least not slower. I think even if it becomes 1000x slower, it's still a lot better than compilation errors ...

@thiagomacieira
Copy link

Significantly slower than what? The code doesn't compile at all without that change, so there isn't much you can compare to.

@wangwenx190
Copy link

Significantly slower than what? The code doesn't compile at all without that change, so there isn't much you can compare to.

In fact, before clang-cl report that error, it can still compile hundreds of files, mostly the bundled 3rd party libraries, Qt's bootstrap library and some command line tools such as moc. I observed the file names flash very fast and the compiled file count increased really fast before I patch immintrin.h, however, after I patch it, the compilation speed decreased so much that I can see the slow down by my eyes, but I think it's not worse than MSVC.

@bebuch
Copy link

bebuch commented May 19, 2023

I ran into this today too. A fix would be great.

@echristo
Copy link
Contributor

@rnk as well.

@bebuch
Copy link

bebuch commented Nov 3, 2023

Should this issue have a bug label? After all, the behavior does not correspond to the expected behavior and it prevents the use of clang-cl under Windows for various libraries. (Qt for example.)

@MaxEW707
Copy link
Contributor

MaxEW707 commented Dec 14, 2023

This is still broken in mainline clang.

#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) ||      \
    defined(__AVX__)
#include <avxintrin.h>
#endif

I just ran into this when porting our internal projects to clang-cl.
This breaks all assumptions that every cross-platform software has about the SSE/AVX includes since MSVC, GCC and Clang all have the same behaviour for these includes except ClangCL which differs.
Side note that __SCE__ compiler also works as expected since we can guarantee those platforms have AVX support and thus __AVX__ is always defined but that is as much as I'll comment on there.

Also not all projects use the STL and actively avoid vendor STLs for one of the reasons outlined above; the insane amount of header includes.
While I understand the reasoning about compile-times with MSVC STL, we actively internally do many tricks to also wrangle compile-times, it isn't a valid assumption for all software projects that use clang since the usage of STL isn't ubiquitous.

In the interim I can work around it by using the __builtin_* functions where I can but that isn't a solution as a whole.

Anyways when I am free this weekend assuming there isn't already a PR up to fix this I am going to get one up to try to push a solution forward that satisfies the concerns around MSVC STL and users who need to be able to do run-time detection of cpu support for SSE/AVX.

@rnk
Copy link
Collaborator

rnk commented Dec 14, 2023

+@zmodem @nico

We should revisit this, it is unfortunate that the only way to use Intel intrinsics with clang-cl is to add additional command line flags. Both GCC and MSVC can call these intrinsics with only local source changes, either via target attributes or simply directly calling various AVX intrinsics.

I believe Intel's last proposal for addressing the compile time concerns was to ship a module map for Clang builtin intrinsic headers, but I think that hasn't advanced because folks are concerned about establishing a hard dependency on Clang header modules. They interfere with pre-processing, crash reproduction, and distributed build systems, and are not entirely aligned with C++ standard modules.

Perhaps another avenue for addressing the compile time concerns would be go down the path of providing an intrin0.h file similar to MSVC, which declares the minimal set of intrinsics that the MSVC STL needs, and then we could allow immintrin.h to be the expensive, catch-all, umbrella header that Intel seems to want it to be.

@MaxEW707
Copy link
Contributor

MaxEW707 commented Dec 15, 2023

Perhaps another avenue for addressing the compile time concerns would be go down the path of providing an intrin0.h file similar to MSVC

That sounds good to me. I'll try to give it a whirl this weekend and get a PR up if someone else doesn't beat me to it :).

then we could allow immintrin.h to be the expensive, catch-all, umbrella header that Intel seems to want it to be.

This is actually a major compile-time cost for us. Includes headers including all previous headers such as emmintrin.h including xmmintrin.h. Most of the untangling there can be accomplished by forward declaring the vector types which is easy with clang due to the vector size attribute.
For example emmintrin.h only needs __m128 for the conversion functions. So it is included just to get the vector typedef which then includes mmintrin.h and mm_malloc.h.

Especially true in games where a lot of these headers end up being included from your math library that is mostly only header files to ensure functions have a chance to be inlined which basically bloats every source file. I am in the process of writing our own SSE headers internally to combat some of this but the less platform/toolchain specific ifdefs that are required the better in my opinion.

The PR I was going to get up would allow users to include clang specific isolated headers such as avxintrin.h directly.
The intel headers can still do whatever intel desires but if a user targets clang then they can pick exactly what they desire without all the transitive includes.

The git blame for, https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/avxintrin.h#L11, shows that these checks were added since gcc has this behaviour.
However removing these checks still allows code following gcc semantics to compile on clang.
The inverse isn't true where code built solely for clang will not immediately work on gcc but I think that is a fine concession considering all the already clang specific attributes and behaviours.

My 2c.

@carewolf
Copy link
Author

I would prefer if we could use different defines or compile time flags. In a project supporting multiple compilers that is preferable over having different includes.

Could we just have a define that unlocks non-target intrinsics? -D_CLANG_CL_INTRINSICS?

rnk added a commit that referenced this issue Jan 12, 2024
Placing the class id at offset 0 should make `isa` and `dyn_cast` faster
by eliminating the field offset (previously 0x10) from the memory
operand, saving encoding space on x86, and, in theory, an add micro-op.
You can see the load encodes one byte smaller here:
https://godbolt.org/z/Whvz4can9

The compile time tracker shows some modestly positive results in the
on the `cycle` metric and in the final clang binary size metric:
https://llvm-compile-time-tracker.com/compare.php?from=33b54f01fe32030ff60d661a7a951e33360f82ee&to=2530347a57401744293c54f92f9781fbdae3d8c2&stat=cycles
Clicking through to the per-library size breakdown shows that
instcombine size reduces by 0.68%, which is meaningful, and I believe
instcombine is known to be a hotspot.

It is, however, potentially noise. I still think we should do this,
because notionally, the class id really acts as the vptr of the Value,
and conventionally the vptr is always at offset 0.
fadlyas07 pushed a commit to greenforce-project/llvm-project that referenced this issue Jan 13, 2024
* llvm-project/main:
  [ASan][libc++] Annotating `std::basic_string` with all allocators (llvm#75845)
  [MLIR][Presburger] Fold loop into assert
  [MLIR][Presburger] Helper functions to compute the constant term of a generating function (llvm#77819)
  Add OpenSSF Best Practice Badge (llvm#77398)
  [MLIR][Presburger] Implement Matrix::moveColumns (llvm#68362)
  [RISCV] Change required features for Zvfhmin intrinsics from ZvfhminOrZvfh to Zvfhmin (llvm#77866)
  [lldb][libc++] Adds missing C++20 calendar data formatters. (llvm#77954)
  [LLVM][NVPTX]: Add aligned versions of cluster barriers (llvm#77940)
  [GISel] Erase the root instruction after emitting all its potential uses (llvm#77494)
  [mlir] Use llvm::is_contained (NFC)
  [CodeGen] Use DenseMap::contains (NFC)
  [lld] Use StringRef::consume_front_insensitive (NFC)
  [clang][Interp] Diagnose reads from non-const global variables (llvm#71919)
  [Kaleidoscope] LLVM is not needed for chapter two (llvm#69823)
  [gn build] Port 8566cd6
  [flang] Allow different linker name (llvm#77849)
  [CodeGen] Let `PassBuilder` support machine passes (llvm#76320)
  [sanitizer_common][fuchsia] Get correct vmar info
  [Support] Use StringRef::ltrim (NFC)
  [clang-tidy] Use StringRef::ltrim (NFC)
  [clang] Use SmallString::operator std::string() (NFC)
  [llvm] Use llvm::is_contained (NFC)
  [SPARC] Consume `tune-cpu` directive in the backend (llvm#77195)
  [flang] Fix const cast issue in FreeMemory function call in execute_command_line (llvm#77906)
  [XRay] Reserve memory space ahead-of-time when reading native format log (llvm#76853)
  [NFC]fix incorrect autosar link in clang-tidy doc
  [NFC]update autosar link in clang-tidy doc
  [IR] Reorder Value fields to put the SubclassID first (llvm#53520)
  Fix buildbots after llvm#66726.
  [Flang][OpenMP] Separate creation of work-sharing and SIMD loops, NFC (llvm#77757)
  [libc++] tests with picolibc: update picolibc (llvm#77908)
  [lldb][test] Add tests for target.max-string-summary-length setting (llvm#77920)
  [mlir][Linalg] Change `linalg.transpose` to use the output indexing map as identity. (llvm#77951)
  [lldb-dap] Updating VariableDescription to use GetDescription() as a fallback. (llvm#77026)
  [OpenACC] Remove mistakenly left TODO and fix format issue
  [compiler-rt][builtins] Add a missing 'const' to the Apple __init_cpu_features_resolver
  [lldb][NFCI] Remove CommandReturnObject from BreakpointIDList (llvm#77858)
  Allow the dumping of .dwo files contents to show up when dumping an executable with split DWARF. (llvm#66726)
  [lldb] Move MD5 Checksum from FileSpec to SupportFile
  [BOLT] Embed cold mapping info into function entry in BAT (llvm#76903)
  Revert "[OpenMP] Fix two usm tests for amdgpus." (llvm#77983)
  [libc++][Android] Add NDK ABI lists for i686 and x86_64 (llvm#69272)
  [OpenMP] Fix two usm tests for amdgpus. (llvm#77851)
  [libc][math] Remove wrong fabsf128 entrypoint in aarch64 list. (llvm#77974)
  [clang] Adjust -mlarge-data-threshold handling (llvm#77958)
  [libc][math] Add C23 math function fabsf128. (llvm#77825)
  [RISCV] Update descriptions for Zvk* shorthands. (llvm#77961)
  [libc++][NFC] Make AssertionInfoMatcher::CheckMessageMatches Stricter (llvm#77721)
  [libc] Build native libc-hdrgen when crosscompiling (llvm#77848)
  [AArch64][SVE2] Lower OR to SLI/SRI (llvm#77555)
  [CommandLine][NFCI] Do not add 'All' to 'RegisteredSubCommands' (llvm#77722)
  [bazel] fix for 5417a5f
  [BOLT] Fix double conversion in CacheMetrics (llvm#75253)
  [clang-format] Handle possible crash in `getCells` (llvm#77723)
  [Driver,test] Update sanitizer test after RequiresPIE removal llvm#77689
  [mlir][scf] Fix `for-loop-peeling` crash (llvm#77697)
  [RISCV] Add missing tests for inttoptr/ptrtoint on scalable vectors (llvm#77857)
  [clang-tidy] Fix false-positives in misc-static-assert caused by non-constexpr variables (llvm#77203)
  [AMDGPU] Allow buffer intrinsics to be marked volatile at the IR level (llvm#77847)
  [AMDGPU] Refactor getNonSoftWaitcntOpcode and its callers (llvm#77933)
  [AMDGPU] Fix VS_CNT overflow assertion (llvm#77935)
  [mlir] fix bazel
  [libc++][test] Fix a logical mistake introduced by llvm#77058 (llvm#77867)
  [gn build] Port a300b24
  [lldb][ValueObject][NFC] Remove unused parameter to ReadPointedString (llvm#77919)
  [clang[test] Require x86 target for new tests
  [clang][analyzer] Add function 'fprintf' to StreamChecker. (llvm#77613)
  Revert "[TLI] Fix replace-with-veclib crash with invalid arguments (llvm#77112)"
  [SCEV] Special case sext in isKnownNonZero (llvm#77834)
  [gn build] Port 9fdc568
  Reland "[flang] Fix a warning"
  [TLI] Fix replace-with-veclib crash with invalid arguments (llvm#77112)
  [mlir][bufferization] Clone simplify fails when input and result type not cast compatiable (llvm#71310)
  [SLP]Fix PR77916: transform the whole mask, not only the elements for the second vector.
  [TLI][NFC] Fix ordering of ArmPL and SLEEF tests (llvm#77609)
  Revert "[flang] Fix a warning"
  [SPIR-V] Add Float16 support when targeting Vulkan (llvm#77115)
  [InstCombine] Fold `icmp pred (inttoptr X), (inttoptr Y) -> icmp pred X, Y` (llvm#77832)
  [LLVM][DWARF] Fix accelerator table switching between CU and TU (llvm#77511)
  [mlir][ArmSME] Add rudimentary support for tile spills to the stack (llvm#76086)
  [libc++] Deprecate the _LIBCPP_ENABLE_CXX20_REMOVED_ALLOCATOR_MEMBERS macro (llvm#77692)
  [mlir][Transforms] `GreedyPatternRewriteDriver`: log successful folding (llvm#77796)
  [OpenACC] Implement the 'rest' of the simple 'var-list' clauses
  [mlir][vector] Fix dominance error in warp vector distribution (llvm#77771)
  [clang] Reapply Handle templated operators with reversed arguments (llvm#72213)
  [Sema] Use lexical DC for friend functions when getting constraint instantiation args (llvm#77552)
  [SimplifyCFG] `switch`: Do Not Transform the Default Case if the Condition is Too Wide (llvm#77831)
  [mlir] update bazel for transform debug extension
  [IRBuilder] Add CreatePtrAdd() method (NFC) (llvm#77582)
  [mlir][nvvm] Introduce `cp.async.bulk.wait_group` (llvm#77917)
  [VPlan] Support narrowing widened loads in truncateToMinimimalBitwidths.
  [SLP]Fix a crash for reduced values with minbitwidth, which are reused.
  [mlir][vector] Fix rewrite pattern API violation in `VectorToSCF` (llvm#77909)
  [DWARFLinker][NFC] Rename libraries to match with directories name. (llvm#77592)
  [mlir] introduce debug transform dialect extension (llvm#77595)
  [flang] Get ProvenanceRange from CharBlock starting with expanded macro (llvm#77791)
  [flang] include sys/wait.h for EXECUTE_COMMAND_LINE (llvm#77675)
  [mlir][vector] Support warp distribution of `transfer_read` with dependencies (llvm#77779)
  [Clang] Revert inintentional changes to cmake committed in 33e5db6
  [AMDGPU][NFC] Rename DotIUVOP3PMods to VOP3PModsNeg (llvm#77785)
  [AArch64][SME2] Fix SME2 mla/mls tests (llvm#76711)
  [AArch64][SME] Fix multi vector cvt builtins (llvm#77656)
  Revert "[clang][dataflow] Remove unused private field 'StmtToEnv' (NFC)"
  [AArch64] Disable FP loads/stores when fp-armv8 not enabled (llvm#77817)
  [ObjC]: Make type encoding safe in symbol names (llvm#77797)
  [mlir][Interfaces] `DestinationStyleOpInterface`: Rename `hasTensor/BufferSemantics` (llvm#77574)
  Revert "[clang][dataflow] Process terminator condition within `transferCFGBlock()`." (llvm#77895)
  [flang] finish BIND(C) VALUE derived type passing ABI on X86-64 (llvm#77742)
  [clang-format] SpacesInSquareBrackets not working for Java (llvm#77833)
  [clang][dataflow] Remove unused private field 'StmtToEnv' (NFC)
  [clang][dataflow] Process terminator condition within `transferCFGBlock()`. (llvm#77750)
  [clangd] Use starts_with instead of startswith in CompileCommands.cpp (NFC)
  [clangd] Fix sysroot flag handling in CommandMangler to prevent duplicates (llvm#75694)
  [mlir] Fix linking failure of libMLIRTilingInterfaceTestPasses.dylib (NFC)
  [mlir][nvgpu] Improve verifier of `ldmatrix` (llvm#77807)
  [RISCV] Simplify the description for ssaia and smaia. (llvm#77870)
  [test] Improve x86 inline asm tests
  [llvm] Use SmallString::operator std::string() (NFC)
  [AMDGPU][GFX12] Default component broadcast store (llvm#76212)
  [mlir][Bazel] Adjust BUILD file for aa2a96a
  Revert "[builtins] Generate __multc3 for z/OS" (llvm#77881)
  [IPO] Use a range-based for loop (NFC)
  [llvm] Use StringRef::consume_front_insensitive (NFC)
  [StaticAnalyzer] Use StringRef::contains_insensitive (NFC)
  [GlobalISel][NFC] Use GPhi wrapper in more places instead of iterating over operands.
  [docs] Update doc for C++20 Modules after dc4e85b
  Fix test failure introduced in 3baedb4
  [C++20] [Modules] Remove hardcoded path to imported module in BMIs
  [mlir][TilingInterface] Move TilingInterface tests to use transform dialect ops. (llvm#77204)
  [AMDGPU] Handle bf16 operands the same way as f16. NFC. (llvm#77826)
  [MLIR][Presburger] Fix style violations in ff80414 (NFC) (llvm#76720)
  [X86] Correct the asm comment for compression NF_ND -> NF
  [TwoAddressInstruction] Recompute live intervals for partial defs (llvm#74431)
  [builtins] Generate __multc3 for z/OS (llvm#77554)
  [GISel] Fix llvm#77762: extend correct source registers in combiner helper rule extend_through_phis (llvm#77765)
  [GlobalISel] Revise 'assignCustomValue' interface (llvm#77824)
  [SelectionDAG] Add space-optimized forms of OPC_CheckPredicate (llvm#77763)
  [RISCV] Remove period from Zvbb extension description.
  [NFC] Updating the tests for combine-ext.mir (llvm#77756)
  [SelectionDAG,TableGen] Use MapVector after llvm#73310
  AMDGPU: Cleanup MAIFrag predicate code (llvm#77734)
  [SelectionDAG,TableGen] Use stable_sort after llvm#73310
  [CodeGen] Fix ponential memory leak in CodeGenPassBuilderTest (llvm#77864)
  [Driver,sanitizer] Remove RequiresPIE and msan's NeedPIE setting (llvm#77689)
  [X86][CodeGen] Support EVEX compression: NDD to nonNDD (llvm#77731)
  [clang-format] Don't allow casts in front of ampamp (llvm#77704)
  [CodeGen] Make CodeGenPassBuilder Pipeline test x86-64 only (llvm#77860)
  [OpenMP] Fix or disable NVPTX tests failing currently (llvm#77844)
  github-automation: Use the llvm/llvm-project repo for backport pull requests (llvm#71727)
  [CMake][Release] Add option for enabling LTO to cache file (llvm#77035)
  [llvm-ifs] Treat unknown symbol types as error. (llvm#75872)
  [clang][FatLTO][UnifiedLTO] Pass -enable-matrix to the LTO driver
  [gn build] Port ae1c1ed
  [CodeGen] Allow `CodeGenPassBuilder` to add module pass after function pass (llvm#77084)
  [clang] Mark clang-format-ignore.cpp as unsupported on Windows
  [llvm-driver] Fix usage of `InitLLVM` on Windows (llvm#76306)
  [LLD] Fix llvm-driver cmake integration for LLD (llvm#76305)
  [compiler-rt][fuchsia] Preallocate a vmar for sanitizer internals (llvm#75256)
  [analyzer] NFC: Don't regenerate duplicate HTML reports.
  Revert "[mlir][arith] Add overflow flags support to arith ops (llvm#77211)"
  Revert "[mlir][arith][nfc] Fix typos (llvm#77700)"
  Revert "[mlir][spirv] Lower `arith` overflow flags to corresponding SPIR-V op decorations (llvm#77714)"
  Set the default value for MaxAtomicSizeInBitsSupported to 0.
  [mlir] Add op printing flag to skip regions (llvm#77726)
  [BOLT] Delta-encode function start addresses in BAT (llvm#76902)
  [BOLT] Delta-encode offsets in BAT (llvm#76900)
  [libc++] Re-export libc++abi symbols on Apple platforms when using system-libcxxabi (llvm#77218)
  [asan] Enable StackSafetyAnalysis by default
  [flang][openacc] Carry device dependent info for routine in the module file
  [StackSafetyAnalysis] Bail out if MemIntrinsic length is -1 (llvm#77837)
  [flang][openacc] Do not accept static and num for gang clause on routine dir (llvm#77673)
  Add sync-up for floating-point working group (llvm#71885)
  [Fuchsia] Add stage2 cmake options
  Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V."
  [flang][openacc] Apply mutually exclusive clauses restriction to routine (llvm#77802)
  Revert "[Flang][Parser] Add missing dependencies to CMakeLists.txt (llvm#77483)"
  [mlir][openacc][flang] Simplify gang, vector and worker representation (llvm#77667)
  [Libomptarget] Fix GPU Dtors referencing possibly deallocated image (llvm#77828)
  [ASan][libc++] Initialize `__r_` variable with lambda (llvm#77394)
  Revert "[asan] Enable StackSafetyAnalysis by default"
  [Dialect] Fix a warning
  [BOLT] Encode BAT using ULEB128 (llvm#76899)
  [BOLT] Add BOLT Address Translation documentation (llvm#76899)
  [flang] Fix a warning
  [Format] Fix a warning
  [flang] Fix a warning
  [clang-format] TableGen keywords support. (llvm#77477)
  [BOLT][NFC] Print BAT section size (llvm#76897)
  [AArch64][SVE2] Generate XAR (llvm#77160)
  [mlir][memref] Transpose: allow affine map layouts in result, extend folder (llvm#76294)
  [mlir][affine] Add dependency on `UBDialect` for `PoisonAttr` (llvm#77691)
  Add more ZA modes (llvm#77361)
  [NFC] Remove trailing whitespace
  [LSR] Require non-zero step when considering wrap around for term folding (llvm#77809)
  [lldb] Fix MaxSummaryLength target property type (llvm#72233)
  [mlir][spirv] Lower `arith` overflow flags to corresponding SPIR-V op decorations (llvm#77714)
  [OpenACC] Implement 'use_device' clause parsing
  [Libomptarget] Fix JIT on the NVPTX target by calling ptx manually (llvm#77801)
  [RISCV] Add test for strided gather with recursive disjoint or. NFC
  [OpenACC] Implement 'copy' Clause
  [lld][ELF] Allow Arm PC-relative relocations in PIC links (llvm#77304)
  [SLP] Add a set of tests with non-power-of-2 operations.
  [AArch64] Fix missing `pfalse` diagnostic (llvm#77746)
  [AMDGPU] Add new GFX12 image atomic float instructions (llvm#76946)
  [CloneFunction][DebugInfo] Avoid cloning DILocalVariables of inlined functions (llvm#75385)
  [flang][driver] Fix exec.f90 test with shared libs
  [MemProf] Add missing <unordered_map> include to fix buildbot (llvm#77788)
  [SEH] Redirect test output to /dev/null (llvm#77784)
  [libc][NFC] Use 16-byte indices for _mmXXX_shuffle_epi8 (llvm#77781)
  [runtimes] Use LLVM libunwind from libc++abi by default (llvm#77687)
  [RISCV] Add test for strided gather with disjoint or. NFC
  [flang] Handle missing LOGIN_NAME_MAX definition in runtime (llvm#77775)
  [SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (llvm#72679)
  [MemProf] Handle missing tail call frames (llvm#75823)
  [SLP][NFC]Add a test for final vector with minbitwidth, NFC.
  [AArch64] Enable certain instruction aliases for SVE/SME (llvm#77745)
  [PowerPC] Add test for llvm#77748 (NFC)
  [TOSA] Fix -Wdangling-gsl and -Wunused-variable in TosaToLinalg.cpp (NFC)
  [IndVars] Add additional test for preserving NSW.
  [OpenMP] Add missing weak definitions of missing variables (llvm#77767)
  [InstrRef] Add debug hint for not reachable blocks from entry (llvm#77725)
  [BranchFolding] Fix missing predecessors of landing-pad (llvm#77608)
  [clang][AArch64] Fix incorrect rebase (llvm#77769)
  [flang] Fix fveclib on Darwin (llvm#77605)
  [flang][driver] Add support for -isysroot in the frontend (llvm#77365)
  [AArch64] Add missing field 'GuardedControlStack' initializer (NFC)
  [AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (llvm#77628)
  [clang][AArch64] Add a -mbranch-protection option to enable GCS (llvm#75486)
  [ClangFormat] Fix formatting bugs. (llvm#76245)
  [sanitizer] Fix asserts in asan and tsan in pthread interceptors. (llvm#75394)
  [TOSA] FFT2D operator (llvm#77005)
  [flang] FDATE extension implementation: get date and time in ctime format (llvm#71222)
  [Flang][Parser] Add missing dependencies to CMakeLists.txt (llvm#77483)
  Revert "[SelectionDAG] Add space-optimized forms of OPC_CheckPredicate (llvm#73488)"
  [GlobalISel][Localizer] Allow localization of a small number of repeated phi uses. (llvm#77566)
  [X86][test] Pre-commit test for llvm#77731
  [clang][analyzer] Support 'tello' and 'fseeko' in the StreamChecker (llvm#77580)
  [libc] Fix buggy AVX2 / AVX512 `memcmp` (llvm#77081)
  [clang]not lookup name containing a dependent type (llvm#77587)
  [mlir][ArmSME][test] Make use of arm_sme.streaming_vl (NFC) (llvm#77322)
  [mlir] Improve `GreedyPatternRewriteDriver` and pass documentation (llvm#77614)
  [DWARFLinker][NFC] Move common code into the base library: Utils.h (llvm#77604)
  [clang][ASTImporter] Improve import of friend class templates. (llvm#74627)
  Revert "[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (llvm#72679)"
  [libc] Add memcmp / bcmp fuzzers (llvm#77741)
  [STLExtras] Add out-of-line definition of friend operator== for C++20 (llvm#72348)
  [AArch64] MI Scheduler: create more LDP/STP pairs (llvm#77565)
  [DAG] Fold (sext (sext_inreg x)) -> (sext (trunc x)) if the trunc is free (llvm#77616)
  [AArch64LoadStoreOptimizer] Debug messages to track decision making. NFC (llvm#77593)
  [clang] Fix color consistency in C paper tracking web page
  [clang] Improve colors in status tracking web pages.
  [NFC][OpenMP][Flang] Add smoke test for omp target parallel (llvm#77579)
  [clangd] Handle lambda scopes inside Node::getDeclContext() (llvm#76329)
  [GlobalIsel] Combine select to integer minmax (second attempt). (llvm#77520)
  [Clang] Set writable and dead_on_unwind attributes on sret arguments (llvm#77116)
  [InstCombine] Fix worklist management in select fold (llvm#77738)
  [AMDGPU] Update tests for GFX12 errors and unsupported instructions (llvm#77624)
  [AMDGPU] Don't send DEALLOC_VGPRs after calls (llvm#77439)
  [RISCV] Deduplicate version struct in RISCVISAInfo. NFC (llvm#77645)
  [clang][Interp] Implement __builtin_addressof (llvm#77303)
  [flang][hlfir] Support box in user defined assignments (llvm#77578)
  [SelectionDAG] Add space-optimized forms of OPC_CheckPredicate (llvm#73488)
  [SelectionDAG] Add space-optimized forms of OPC_CheckPatternPredicate (llvm#73319)
  [SelectionDAG] Add space-optimized forms of OPC_CheckComplexPat (llvm#73310)
  [InstCombine] Handle a bitreverse idiom which ends with a bswap (llvm#77677)
  [Clang] Implement the 'counted_by' attribute (llvm#76348)
  [RISCV] Allow vsetvlis with same register AVL in doLocalPostpass (llvm#76801)
  [Target] Use getConstantOperandAPInt (NFC)
  [clang-query] Use StringRef::ltrim (NFC)
  [Target] Use isNullConstant (NFC)
  [X86][CodeGen] Support lowering for NDD ADD/SUB/ADC/SBB/OR/XOR/NEG/NOT/INC/DEC/IMUL (llvm#77564)
  [mlir][arith][nfc] Fix typos (llvm#77700)
  [clang-format] Fix crash involving array designators (llvm#77045)
  [clang-format]: Split alignment of declarations around assignment (llvm#69340)
  [clang-format] Don't apply severe penalty if no possible column formats (llvm#76675)
  [X86][MC] Fix wrong action when encoding enqcmd/enqcmds (llvm#77571)
  [Instrumentation] Use a range-based for loop (NFC)
  [Clang][doc] Add blank line before lists (llvm#77573)
  [libc++][test] Replace uses of `_LIBCPP_ABI_MICROSOFT` in tests (llvm#77233)
  [Pass] Remove trailing whitespace in `PassRegistry.def` NFC (llvm#77710)
  Revert "[Clang] Implement the 'counted_by' attribute (llvm#76348)"
  Revert "[CommandLine][NFCI] Do not add 'All' to 'RegisteredSubCommands' (llvm#77041)"
  [mlir][mesh] fix ProcessMultiIndexOp building (llvm#77676)
  [clang][analyzer] Fix incorrect range of 'ftell' in the StdLibraryFunctionsChecker (llvm#77576)
  [mlir][verifyMemref] Fix bug and support more types for verifyMemref (llvm#77682)
  [RISCV][AMDGPU] Mark test/CodeGen/Generic/live-debug-label.ll XFAIL for RISCV and AMDGPU (llvm#77631)
  [docs][IRPGO]Document two binary formats for instrumentation-based profiles, with a focus on IRPGO. (llvm#76105)
  [MLIR][LLVM] DI Expression Rewrite & Legalization (llvm#77541)
  [Clang] Update 'counted_by' documentation
  [llvm-exegesis] Update validation counters enum
  [Clang] Implement the 'counted_by' attribute (llvm#76348)
  [llvm-exegesis] Fix validation counters
  [llvm][lld] Support R_RISCV_GOT32_PCREL (llvm#72587)
  [llvm][lld] Support R_AARCH64_GOTPCREL32 (llvm#72584)
  [llvm-exegesis] Add tablegen support for validation counters (llvm#76652)
  [libc++] Rename local variable to avoid shadowing error (llvm#77672)
  [Flang][OpenMP][Offloading][Test] Adjust slightly incorrect tests now cmake configuration works
  [clang][NFC] Improve comments in C++ DR test suite (llvm#77670)
  [mlir][sparse] allow unknown ops in one-shot bufferization in mini-pipeline (llvm#77688)
  [LLD] [MinGW] Add support for more ThinLTO specific options (llvm#77387)
  [CommandLine][NFCI] Do not add 'All' to 'RegisteredSubCommands' (llvm#77041)
  [RISCV] Use any_extend for type legalizing atomic_compare_swap with Zacas. (llvm#77669)
  [LV] Use value_or to simplify code. NFC (llvm#77030)
  [HLSL][Docs] Add documentation for HLSL functions (llvm#75397)
  [RISCV] Support isel for Zacas for XLen and i32. (llvm#77666)
  [MLIR][Presburger] Implement computation of generating function for unimodular cones (llvm#77235)
  [mlir][mesh] fix unused variable error
  [clang][Interp] Fix discarded integral and floating casts (llvm#77295)
  [asan] Enable StackSafetyAnalysis by default
  [SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (llvm#72679)
  [scudo] Condition variable can be disabled by setting the flag to off (llvm#77532)
  [CMake] Deprecate GCC_INSTALL_PREFIX (llvm#77537)
  [libc++][NFC] Fix typo in comments
  [mlir] Change end of OperationDefinition. (llvm#77273)
  [compiler-rt][profile] remove unneeded freebsd hack. (llvm#77209)
  [ci] Set timeout for individual tests and report slowest tests (llvm#76300)
  [SLP][NFC]Replace constant by some meaningfull values to make test more relevant, NFC.
  [libc++] Remove _LIBCPP_C_HAS_NO_GETS (llvm#77346)
  [OpenACC] Implement 'var' parsing correctly, support array sections (llvm#77617)
  [ADT] Make StringRef std::string_view conversion operator constexpr. NFC (llvm#77506)
  [lldb] Add color support to StreamString (llvm#77380)
  [libc++][NFC] Add comment in test to explain the presence of some assertions
  [MLIR][NVVM]: Update setmaxregister NVVM Op (llvm#77594)
  [Libomptarget] Do not abort on failed plugin init (llvm#77623)
  [Flang] Support -mrvv-vector-bits flag (llvm#77588)
  [mlir] allow inlining complex ops (llvm#77514)
  [MLIR][Tensor] Fix checks for `fold-into-pack-and-unpack.mlir` (llvm#77622)
  [RISCV] Re-implement Zacas MC layer support to make it usable for CodeGen. (llvm#77418)
  [Clang][LLVM][AArch64]SVE2.1 update the intrinsics according to acle[1] (llvm#76844)
  [AArch64][SME] Fix definition of uclamp/sclamp instructions. (llvm#77619)
  [mlir][tosa]Fix Rescale shift attr data type (llvm#71084)
  Objective C: use C++ exceptions on MinGW+GNUstep (llvm#77255)
  [bazel] Port 79aa776
  [mlir][tensor] Enhance pack/unpack simplification for identity outer_dims_perm cases. (llvm#77409)
  [Flang][Parser] Add missing #include "flang/Common/idioms.h" (llvm#77484)
  [Libomptarget][NFC] Format in-line comments consistently (llvm#77530)
  [Libomptarget] Add error message back in after changes (llvm#77528)
  [Headers][X86] Reformat ia32intrin.h doc to match the other headers (llvm#77525)
  [libc++][docs] Document the libc++ Lit testing format naming scheme (llvm#73136)
  [RewriteStatepointsForGC] Remove unnecessary bitcasts (NFCI)
  [SLP][TTI]Improve detection of the insert-subvector pattern for SLP. (llvm#74749)
  [mlir][linalg] Add a test to demonstrate peeling + vectorisation (llvm#77590)
  [RISCV] Remove extraneous semicolons. NFC
  [LV] Re-add early exit in VPRecipeBuilder::createBlockInMask.
  [mlir][mesh] Add lowering of process multi-index op (llvm#77490)
  [TableGen] Support non-def operators in !getdagop (llvm#77531)
  [SimplifyCFG] Emit `rotl` directly in `ReduceSwitchRange` (llvm#77603)
  [mlir][tensor] Fold producer linalg transpose with consumer tensor pack (llvm#75658)
  [BranchFolding][SEH] Add test to track SEH CFG optimization (llvm#77598)
  [LoopFlatten] Recognise gep+gep (llvm#72515)
  [SystemZ] Fix 256-bit shifts when i128 is legal
  [Libomptarget] Do not run CPU tests if FFI was not found
  [lldb][ClangASTImporter][NFC] Remove redundant do-while loop (llvm#77596)
  [AMDGPU] Fix predicates for various True16 instructions. (llvm#77581)
  [mlir][math] Add math.acosh|asin|asinh|atanh op (llvm#77463)
  [X86] lower1BitShuffle - fold permute(setcc(x,y)) -> setcc(permute(x),permute(y)) for 32/64-bit element vectors
  [flang][doc] Correct spelling of CMake
  [NFC][TLI] order SLEEF and ArmPL mappings by alphabetical order (llvm#77500)
  [MLIR][SCF] Add checks to verify that the pipeliner schedule is correct. (llvm#77083)
  [InstCombine] Fold the `log2_ceil` idiom (llvm#76661)
  [flang] Document DEFAULT_SYSROOT usage on Darwin (llvm#77353)
  [emacs] Fix Emacs library formatting (llvm#76110)
  [X86] pr77459.ll - add missing AVX512 check prefixes
  [AArch64][SVE] Add optimisation for SVE intrinsics with no active lanes (llvm#73964)
  [InstCombine] Fold bitwise logic with intrinsics (llvm#77460)
  [SeparateConstOffsetFromGEP] Always emit i8 gep
  [AMDGPU] Fix broken sign-extended subword buffer load combine (llvm#77470)
  [clang-repl] Enable native CPU detection by default (llvm#77491)
  [SeparateConstOFfsetFromGEP] Regenerate test checks (NFC)
  [SLSR] Always generate i8 GEPs
  [SLSR] Regenerate test checks (NFC)
  [mlir][ArmSME] Add arm_sme.streaming_vl operation (llvm#77321)
  [lld][LoongArch] Handle extreme code model relocs according to psABI v2.30 (llvm#73387)
  [AArch64] Enable AArch64 loop idiom transform pass (llvm#77480)
  [flang] Add EXECUTE_COMMAND_LINE runtime and lowering intrinsics implementation (llvm#74077)
  [clang][coverage] Fix "if constexpr" and "if consteval" coverage report (llvm#77214)
  [Flang] Any and All elemental lowering (llvm#75776)
  Revert "[clang][dataflow] Add an early-out to `flowConditionImplies()` / `flowConditionAllows()`." (llvm#77570)
  [CodeGen][NewPM] Port AssignmentTrackingAnalysis to new pass manager (llvm#77550)
  [DWARFLinker] backport line table patch into the DWARFLinkerParallel. (llvm#77497)
  [LVI] Assert that only one value is pushed (NFC)
  [clang] [Driver] Treat MuslEABIHF as a hardfloat environment wrt multiarch directories (llvm#77536)
  [libunwind] Convert a few options from CACHE PATH to CACHE STRING (llvm#77534)
  [OpenMP] Allow setting OPENMP_INSTALL_LIBDIR (llvm#77533)
  [clang-format][NFC] Don't use clang-format style in config files
  [clang] Add tests for CWG1800-1804 (llvm#77509)
  [AMDGPU][True16] Support V_CEIL_F16. (llvm#73108)
  [Flang][OpenMP][MLIR] Add support for -nogpulib option (llvm#71045)
  [clang][analyzer] Add function 'ungetc' to StreamChecker. (llvm#77331)
  [RISCV][NFC] Remove unused CHECK prefixes to fix buildbots. NFC
  Changed Checks from TriviallyCopyable to TriviallyCopyConstructible  (llvm#77194)
  [GlobalISel] Lowering of {get,set,reset}_fpenv (llvm#75086)
  [WebAssembly] Correctly consider signext/zext arg flags at function declaration (llvm#77281)
  [clang][Interp][NFC] Make a few pointers const
  Reland "[clang-format] Optimize processing .clang-format-ignore files"
  [RISCV][GISel] IRTranslate and Legalize some instructions with scalable vector type
  [RISCV] Reorder RISCVInstrInfoA.td. NFC (llvm#77539)
  [RISCV][ISel] Use vaaddu with rounding mode rnu for ISD::AVGCEILU. (llvm#77473)
  [gn build] Port a828cda
  Revert "[PGO] Exposing PGO's Counter Reset and File Dumping APIs (llvm#76471)"
  Revert "[PGO] Fix `instrprof-api.c` on Windows (llvm#77508)"
  [docs] Fix formatting issues in MyFirstTypoFix (llvm#77527)
  Revert "[SEH][CodeGen] Add test to track CFG optimization bug for SEH" (llvm#77542)
  [PowerPC] Make verifier happy when lowering `llvm.trap` (llvm#77266)
  Revert "[X86][NFC] Remove dead code for "_REV" instructions"
  [clang][Parser] Pop scope prior VarDecl invalidating by invalid init (llvm#77434)
  [mlir][sparse][CRunnerUtils] Add shuffle in CRunnerUtils (llvm#77124)
  Make SANITIZER_MIN_OSX_VERSION a cache variable (llvm#74394)
  Fix -Wunused-variable in TestSimplifications.cpp (NFC)
  [clangd] Fix typo in function name in AST.cpp (llvm#77504)
  [lldb] [Mach-O] don't strip the end of the "kern ver str" LC_NOTE (llvm#77538)
  [libc] Disable Death Tests While Hermetic (llvm#77388)
  [mlir][arith] Add overflow flags support to arith ops (llvm#77211)
  [lldb-dap] Create a typescript extension for lldb-dap (llvm#75515)
  [libc++][NFC] Create and use test-defined simple_view concept (llvm#77334)
  [sanitizer] Select non-internal frames in ReportErrorSummary (llvm#77406)
  [Coverage] Mark coverage sections as metadata sections on COFF. (llvm#76834)
  [mlir][mesh] Add folding of ClusterShapeOp (llvm#77033)
  [LangRef] Tweak description of `@llvm.is.constant.*` (llvm#77519)
  [Instrumentation] Remove redundant LLVM_DEBUG (NFC)
  [Clang] Wide delimiters ('{{{') for expect strings (llvm#77326)
  LangRef: rint, nearbyint: mention that default rounding mode is assumed (llvm#77191)
  [lldb] Fix a warning
  [LLVM][NVPTX]: Add intrinsic for setmaxnreg (llvm#77289)
  [Libomptarget] Allow the CPU targets to be built without libffi (llvm#77495)
  [PGO] Fix `instrprof-api.c` on Windows (llvm#77508)
  [NFC][AMDGPU] Require `x86-registered-target` for `llvm/test/Transforms/MemCpyOpt/no-libcalls.ll`
  Fixed shared_ptr comparisons with nullptr_t when spaceship is unavailable. (llvm#76781)
  [libc++] Fix `regex_search` to match `$` alone with `match_default` flag (llvm#77256)
  [RISCV] Force relocations if initial MCSubtargetInfo contains FeatureRelax (llvm#77436)
  [libc++][test] try to directly create socket file in /tmp when filepath is too long (llvm#77058)
  [AMDGPU][MC] Use normal ELF syntax for section switching (llvm#77267)
  [acc] Fix OpenACC documentation (llvm#77502)
  [X86] Fold (iX bitreverse(bitcast(vXi1 X))) -> (iX bitcast(shuffle(X)))
  [Flang] Xfail hlfir test case on AIX (llvm#76802)
  [ELF,test] Set alignment of SHT_GROUP to 4
  [lldb] Fix Intel PT plugin compile errors (llvm#77252)
  [X86] Add test coverage for llvm#77459
  [DAG] Use FoldConstantArithmetic for unary bitops constant folding.
  [MC,ELF] .section: unconditionally print section flag 'G' after 'o'
  [lldb] DWARFDIE: Follow DW_AT_specification when computing CompilerCo… (llvm#77157)
  [MC] Parse SHF_LINK_ORDER argument before section group name (llvm#77407)
  [clang][modules] Objective-C test lacks support on AIX/zOS (llvm#77485)
  [clang][dataflow] Add an early-out to `flowConditionImplies()` / `flowConditionAllows()`. (llvm#77453)
  [libc++][CI] Moves CI badge to main README. (llvm#77247)
  [libc++] Implements P2517R1. (llvm#77239)
  Revert "[GVNSink] Skip debug intrinsics when identifying sinking candidates (llvm#77419)"
  [flang][openacc] Fix clauses check with device_type (llvm#77389)
  [mlir][openacc] Restore unit tests for device_type functions (llvm#77122)
  [Flang] Remove unnecessary static_assert
  [TextAPI] Skip adding empty attributes (llvm#77400)
  [Flang] Generate inline reduction loops for elemental count intrinsics (llvm#75774)
  [clang-tidy] Improve performance of misc-const-correctness (llvm#72705)
  [RISCV] Refactor GPRF64 register class to make it usable for Zacas. (llvm#77408)
  [clang] Improve bit-field in ref NTTP diagnostic (llvm#71077)
  AMDGPU: Drop amdgpu-no-lds-kernel-id attribute in LDS lowering (llvm#71481)
  [mlir][Vector] Add nontemporal attribute, mirroring memref (llvm#76752)
  [openmp][AIX] Add AIX to __kmp_set_stack_info() (llvm#77421)
  AMDGPU: Break vop3p handling out of vop3 base patterns (llvm#77472)
  [bazel] Fix compiler-rt build after 07c9189
  libclc: generic: add half implementation for erf/erfc (llvm#66901)
  [GVNSink] Skip debug intrinsics when identifying sinking candidates (llvm#77419)
  [lldb][libc++] Adds some C++20 calendar data formatters. (llvm#76983)
  [lldb][Type] Add TypeQuery::SetLanguages API (llvm#75926)
  [gn] Make sync script print github URLs
  [gn] port 07c9189
  [gn] port 07c9189 (DWARFLinker/Classic)
  [clang]use correct this scope to evaluate noexcept expr (llvm#77416)
  [mlir][gpu] Use DenseI32Array for NVVM's maxntid and reqntid (NFC) (llvm#77466)
  [libc++] Allow running the test suite with optimizations (llvm#68753)
  [PGO] Exposing PGO's Counter Reset and File Dumping APIs (llvm#76471)
  Disable autolink_private_module.m for z/OS & AIX
  [acc] OpenACC dialect design philosophy and details (llvm#75548)
  [MLIR][NVVM] Add missing `;` when lowering stmatrix Op (llvm#77471)
  [llvm/unittests] Reset the IsSSA property when using finalizeBundle() (llvm#77469)
  [DAG] XformToShuffleWithZero - use dyn_cast instead of isa/cast pair. NFCI.
  [GISel] Add RegState::Define to temporary defs in apply patterns (llvm#77425)
  [SEH][CodeGen] Add test to track CFG optimization bug for SEH (llvm#77441)
  [SelectionDAG] Add and use SDNode::getAsAPIntVal() helper (llvm#77455)
  [PhaseOrdering] Regenerate test checks (NFC)
  [JumpThreading] Regenerate test checks (NFC)
  [clang][Sema][NFC] Make a few parameters const
  [AArch64] Fix regression introduced by c714846… (llvm#77467)
  [mlir] Add global and program memory space handling to the data layout subsystem (llvm#77367)
  [Flang][Driver] Enable gpulibc/nogpulibc options for Flang, which allows linking of GPU LIBC for the fortran and OpenMP runtime (llvm#77135)
  [LoongArch] Implement LoongArchRegisterInfo::canRealignStack() (llvm#76913)
  [LoongArch] Pre-commit test for llvm#76913. NFC
  [RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (llvm#76710)
  [mlir] add a chapter on matchers to the transform dialect tutorial (llvm#76725)
  [mlir] introduce transform.collect_matching (llvm#76724)
  [AMDGPU][NFC] Update left over tests for COV5 (llvm#76984)
  [AMDGPU] Make isScalarLoadLegal a member of AMDGPURegisterBankInfo. NFC.
  [NewPM] Update `CodeGenPreparePass` reference in `CodeGenPassBuilder.h` (llvm#77446)
  [CodeGen] Fix friend declaration in SSPLayoutAnalysis (llvm#77447)
  [mlir][docs] Fix a broken passes documentation (llvm#77402)
  [X86] Emit Warnings for frontend options to enable knl/knm specific ISAs. (llvm#75580)
  [AArch64] Add an AArch64 pass for loop idiom transformations (llvm#72273)
  [flang] Fix fir::isPolymorphic for TYPE(*) assumed-size arrays (llvm#77339)
  [CodeGen] Fix -Wmismatched-tags in StackProtector.h (NFC)
  [CodeGen] Port `StackProtector` to new pass manager (llvm#75334)
  [ARM] arm_acle.h add Coprocessor Instrinsics (llvm#75440)
  AMDGPU: Regenerate test checks
  [clang] Update cxx_dr_status.html (llvm#77372)
  [LV] Create block in mask up-front if needed. (llvm#76635)
  [GISel] Infer the type of an immediate when there is one element in TEC (llvm#77399)
  [mlir][bufferization][NFC] Clean up Bazel build files (llvm#77429)
  [AMDGPU] Flip the default value of maybeAtomic. NFCI. (llvm#75220)
  AMDGPU: Make v32bf16 a legal type (llvm#76679)
  [CodeGen] Port `GCLowering` to new pass manager (llvm#75305)
  [AST] Teach TextNodeDumper to print the "implicit" bit for coroutine AST nodes (llvm#77311)
  [bazel] update build for 2357e89
  Set dllstorage on ObjectiveC ivar offsets (llvm#77385)
  [NFC] [lld] [MTE] Rename MemtagDescriptors to MemtagGlobalDescriptors (llvm#77300)
  [AMDGPU] Add GFX12 S_WAIT_* instructions (llvm#77336)
  [mlir][ArmSME] Add `arm_sme.intr.cnts(b|h|w|d)` intrinsics (llvm#77319)
  [RISCV] Deduplicate RISCVISAInfo::toFeatures/toFeatureVector. NFC (llvm#76942)
  [DWARFLinker][DWARFLinkerParallel][NFC] Refactor DWARFLinker&DWARFLinkerParallel to have a common library. Part 1. (llvm#75925)
  [flang] add folding support for quad bessels (llvm#77314)
  [clang] Fix assertion failure when initializing union with FAM (llvm#77298)
  AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (llvm#74197)
  [mlir][vector] Don't treat memrefs with empty stride as non-contiguous (llvm#76848)
  [mlir][Bazel] Adjust BUILD.bazel file for b43c504
  [MC][RISCV] Check hasEmitNops before call shouldInsertExtraNopBytesForCodeAlign (llvm#77236)
  [LoongArch] Support R_LARCH_{ADD,SUB}_ULEB128 for .uleb128 and force relocs when sym is not in section (llvm#76433)
  [Documentation] fix invalid links in documentation (llvm#76502)
  [BinaryFormat][LoongArch] Define psABI v2.30 relocs (llvm#77039)
  [RISCV] Add documentation in the LangRef on GHC CC (llvm#72762)
  Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (llvm#77182)
  [X86][test] Add test to check ah is not allocatable for register class gr8_norex2
  [RISCV][ISel] Use vaaddu with rounding mode rdn for ISD::AVGFLOORU. (llvm#76550)
  AMDGPU: Avoid instantiating PatFrag with null_frag (llvm#77271)
  [mlir] Use StringRef::ltrim (NFC)
  [OpenMP] Patch for Support to loop bind clause : Checking Parent Region (llvm#76938)
  [ELF] -r: fix crash when SHF_LINK_ORDER linked-to section has a larger index
  [Instrumentation] Remove -pgo-instr-old-cfg-hashing (llvm#77357)
  Make clang report invalid target versions. (llvm#75373)
  [CMake] Add support for building on illumos (llvm#74930)
  [ELF] Support R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 in SHF_ALLOC sections (llvm#77261)
  [mlir] Declare promised interfaces for the ConvertToLLVM extension (llvm#76341)
  [test][sanitizer] Check summary function and a single stack frame
  [RISCV] Use getELen() instead of hardcoded 64 in lowerBUILD_VECTOR. (llvm#77355)
  Set MaxAtomicSizeInBitsSupported for remaining targets. (llvm#75703)
  [RISCV] Add support predicating for ANDN/ORN/XNOR with short-forward-branch-opt. (llvm#77077)
  [doc][StackMaps] Fix typo
  Improve modeling of 'getcwd' in the StdLibraryFunctionsChecker (llvm#77040)
  [Analysis] Use StringRef::rtrim (NFC)
  [msan] Unwind stack before fatal reports (llvm#77168)
  [Sema] Use StringRef::ltrim (NFC)
  [test][hwasan] Test function name in summaries llvm#77391 (llvm#77397)
  [CostModel][X86] Fix fpext conversion cost for 16 elements (llvm#76278)
  [mlir][complex] Support Fastmath flag for complex.mulf (llvm#74554)
  [libc] temporarily set -Wno-shorten-64-to-32 (llvm#77396)
  [libc] fix up llvm#77384
  [ELF] OVERLAY: support optional start address and LMA
  [libc] fix -Wconversion (llvm#77384)
  [clang-tidy]unused using decls only check cpp files (llvm#77335)
  [AArch64][SVE2] Add pattern for BCAX (llvm#77159)
  Revert "[ASan][libc++] String annotations optimizations fix with lambda (llvm#76200)"
  [RISCV] Add branch+c.mv macrofusion for sifive-p450. (llvm#76169)
  [Clang][NFC] Fix out-of-bounds access (llvm#77193)
  [NVPTX] remove incorrect NVPTX intrinsic transformations (llvm#76870)
  [mlir][spirv] Drop support for SPV_NV_cooperative_matrix (llvm#76782)
  [Libomptarget] Remove extra cache for offloading entries (llvm#77012)
  [libc++abi] Handle catch null pointer-to-object (llvm#68076)
  [CommandLine] Do not print empty categories with '--help-hidden' (llvm#77043)
  [LLD] [MinGW] Sync --thinlto-cache-dir option details with ELF (llvm#77010)
  [libc] fix more -Wmissing-brace (llvm#77382)
  [lldb] Change interface of StructuredData::Array::GetItemAtIndexAsInteger (llvm#71993)
  [lldb] Deprecate SBBreakpoint::AddName in favor of AddNameWithErrorHandling (llvm#71228)
  [RISCV] Remove tab character from RISCVRegisterInfo.td. NFC
  [mlir][TilingInterface] Allow controlling what fusion is done within tile and fuse (llvm#76871)
  [libc] make off_t 32b for 32b arm (llvm#77350)
  Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (llvm#77371)
  [Driver] Add the --gcc-triple option (llvm#73214)
  [OpenMP][libomptarget] Enable automatic unified shared memory executi… (llvm#75999)
  [OpenACC] Implement 'self' clause parsing
  [AccelTable][nfc] Add helper function to cast AccelTableData (llvm#77100)
  [NFC][msan] Switch allocator interface to use BufferedStackTrace (llvm#77363)
  [llvm-exegesis] Align loop MBB in loop repetitor (llvm#77264)
  [Libomptarget] Remove unnecessary CMake definition of endiannness (llvm#77205)
  [AMDGPU] Add CodeGen support for GFX12 s_mul_u64 (llvm#75825)
  [BOLT] Update test case after llvm#77253
  [clang][modules] Remove `_Private` suffix from framework auto-link hints. (llvm#77120)
  [RISCV] Mark VFIRST and VCPOP as SignExtendingOpW (llvm#77022)
  [lldb][NFCI] Change return type of BreakpointIDList::GetBreakpointIDAtIndex (llvm#77166)
  [lldb][NFCI] Remove BreakpointIDList::InsertStringArray (llvm#77161)
  [MLIR] Handle materializeConstant failure in GreedyPatternRewriteDriver (llvm#77258)
  [libc] fix -Wmissing-braces (llvm#77345)
  [RISCV] Use COPY to create artificial 64-bit uses in RISCVOptWInstrs's tests
  [DAG] SimplifyDemandedBits - don't fold sext(x) -> aext(x) if we lose an 0/-1 allsignbits mask (llvm#77296)
  [libc++] Remove usage of internal string function in sstream (llvm#75858)
  Revert "[GitHub] Fix slow sccache install on macOS by upgrading macOS version (llvm#77165)" (llvm#77270)
  [polly][ScheduleOptimizer] Reland Fix long compile time(hang) reported in polly (llvm#77280)
  [Sema] Clean up -Wc++11-narrowing-const-reference code after llvm#76094. NFC (llvm#77278)
  [X86] ftrunc.ll - replace X32 checks with X86. NFC.
  [X86] vector-shuffle-mmx.ll - replace X32 checks with X86. NFC.
  [X86] legalize-shl-vec.ll - replace X32 checks with X86. NFC.
  [X86] inline-sse.ll - replace X32 checks with X86. NFC.
  [X86] lea-2.ll - replace X32 checks with X86. NFC.
  [X86] i64-mem-copy.ll - replace X32 checks with X86. NFC.
  [X86] combine-bextr.ll - replace X32 checks with X86. NFC.
  Reapply "[libc] build with -Werror (llvm#73966)" (llvm#74506)
  [ASan][libc++] String annotations optimizations fix with lambda (llvm#76200)
  Replace print-at-pass-number cl::opt with print-before-pass-number (llvm#76211)
  [libc] set -Wno-frame-address for thread.cpp (llvm#77140)
  [AArch64][GlobalISel] Allow anyexting loads from 32b -> 64b to be legal.
  [X86] Check if machine loop is passed while getting loop alignment (llvm#77283)
  [libc++][doc] Marks LWG3257 as complete (llvm#77237)
  [Libomptarget][NFC] Fix unhandled allocator enum value
  [Clang] Fix IsOverload for function templates (llvm#77323)
  [clang][ASTImporter] Only reorder fields of RecordDecls (llvm#77079)
  [SLP]Fix PR76850: do the analysis of the submask.
  Revert "[Flang][OpenMP] Disable declarate target tests on Windows" (llvm#77324)
  [OpenACC] Implement 'if' clause
  [RISCV] Fix collectNonISAExtFeature returning negative extension features (llvm#76962)
  [InstSimplify] Consider bitcast as potential cross-lane operation
  [InstSimplify] Add test for llvm#77320 (NFC)
  [lldb][DWARFIndex][nfc] Factor out fully qualified name query (llvm#76977)
  [lldb][DWARFASTParserClang] GetClangDeclForDIE: don't create VarDecl for static data members (llvm#77155)
  [AMDGPU] Add new cache flushing instructions for GFX12 (llvm#76944)
  [clang][Sema][NFC] Clean up BuildOverloadedCallExpr
  [Flang][OpenMP] Disable declarate target tests on Windows (llvm#77306)
  [mlir][gpu] Use `known_block_size` to set `maxntid` for NVVM target (llvm#77301)
  [openmp][AIX]Initial changes for porting to AIX (llvm#76841)
  [X86] avx2-nontemporal.ll - replace X32 checks with X86. NFC.
  [X86] avx2-gather.ll - replace X32 checks with X86. NFC.
  [X86] vector-lzcnt-256.ll / vector-tzcnt-256.ll - replace X32 checks with X86. NFC.
  [X86] vec_extract - replace X32 checks with X86. NFC.
  [lldb][test] Skip part of nested expressions test on Windows
  [lldb][test] Skip DWARF inline source file test on Windows
  [SCCP] Check whether the default case is reachable (llvm#76295)
  [flang] Remove duplicate tests. (llvm#77059)
  AMDGPU: Make v8bf16/v16bf16 legal types (llvm#76678)
  [RemoveDIs][NFC] Update SelectionDAG test to check RemoveDIs mode too
  [X86] Emit NDD2NonNDD entris in the EVEX comprerssion table, NFCI
  [Flang] Remove unused triple variable. NFC (llvm#77275)
  [RISCV][NFC] Fix gcc -Wparentheses warning in RISCVISelDAGToDAG.cpp warning: RISCVISelDAGToDAG.cpp:767: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]   767 |          AM == ISD::POST_INC && "Unexpected addressing mode");
  [clang] Fix a crash when referencing the result if the overload fails (llvm#77288)
  [SPIR-V] Add pre-headers to loops. (llvm#75844)
  [lld] [MTE] Allow android note for static executables. (llvm#77078)
  [clang-format] Break after string literals with trailing line breaks (llvm#76795)
  [MLIR][Bufferizer][NFC]  Simplify some codes. (llvm#77254)
  [Sema][test] Split format attribute test cases for _Float16
  [MSSA] Don't require clone creation to succeed (llvm#76819)
  [Clang] Fix reference to sve in rvv driver test comment. NFC
  [ConstraintElim] Support signed induction variables (llvm#77103)
  [libcxx] Require qemu-system-arm for armv7m builder (llvm#77067)
  [TLI] replace-with-veclib works with FRem Instruction. (llvm#76166)
  [X86] Support EVEX compression for EGPR (llvm#77202)
  [VFABI] Reject demangled variants with unexpected number of params. (llvm#76855)
  [mlir] Add explicit call to flush
  [GlobalISel][IRTranslator] Port switch binary tree search optimization. (llvm#77279)
  [mlir] Apply ClangTidy performance finding
  [clang][Interp] Fix nullptr array dereferencing (llvm#75798)
  [mlir][llvm] Do not inline variadic functions (llvm#77241)
  [MLIR][LLVM] Add distinct identifier to the DISubprogram attribute (llvm#77093)
  NFC: Another pre-commit test change.
  Update pre-committed test. Accidentally committed the wrong version, this one properly demonstrates the upcoming change.
  [NFC] [Modules] Add a test case for selecting specializations with aliased template args
  [MLIR][LLVM] Add distinct identifier to DICompileUnit attribute (llvm#77070)
  [AArch64][NFC] Pre-commit IR translator switch lowering test.
  [X86][NFC] Remove duplicate comments in X86CompressEVEX.cpp
  [ELF] Improve OVERLAY tests
  [gn] port 92e2431
  [GlobalISel][NFC]Delete the comments of XXLegalizerInfo (llvm#76918)
  Revert "[CMake] Include riscv32-unknown-elf runtimes in Fuchsia toolchain (llvm#76849)"
  [RISCV][NFC] Move Zawrs/Zacas implementation to RISCVInstrInfoZa.td (llvm#76940)
  [PowerPC] Precommit test for lowering llvm.trap on ppc64le. NFC.
  [CMake] Include riscv32-unknown-elf runtimes in Fuchsia toolchain (llvm#76849)
  [Sema] Warning for _Float16 passed to format specifier '%f' (llvm#74439)
  [PowerPC] make LR/LR8 CTR/CTR8 aliased (llvm#76926)
  [NFC] Remove trailing whitespace in `llvm/lib/Target/AMDGPU/VOP2Instructions.td`
  [InstrProfiling] No runtime registration for ELF, COFF, Mach-O and XCOFF (llvm#77225)
  [ELF,test] Add eh-frame-nonzero-offset-riscv.s for llvm#65966
  [NFC][ObjectSize] Make method public
  [clang] [MinGW] Don't look for a GCC in path if the install base has a proper mingw sysroot (llvm#76949)
  [RISCV] Merge machine operand flag MO_PLT into MO_CALL (llvm#77253)
  [RISCV] Omit "@plt" in assembly output "call foo@plt" (llvm#72467)
  [lld][ELF][X86] Add missing X86_64_TPOFF64 case in switches (llvm#77208)
  [libc] Attempt to fix incorrect pathin on Linux builds
  [libc] Fix GPU tests not running after recent patches (llvm#77248)
  Reapply "[libc++][streams] P1759R6: Native handles and file streams" (llvm#77190)
  NFC: Extract switch lowering binary tree splitting code from DAG into SwitchLoweringUtils.
  [PatternMatch] Fix typo in comment (NFC) (llvm#77240)
  [OpenMP][Obvious] Fix test failing on BE architectures
  [libc++][doc] Minor release notes style fixes.
  [NFC][libc++] Formats tuple.
  [VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe.
  [LV] Add test showing overly aggressive dropping of inbounds.
  [libc++][test] Improves suspurious clang diagnostics. (llvm#77234)
  [mlir][IR] `DominanceInfo`: Add function to query dominator of a range of block (llvm#77098)
  [mlir][Interfaces][NFC] Move region loop detection to `RegionBranchOpInterface` (llvm#77090)
  [InstCombine] Relax the one-use constraints for `icmp pred (binop X, Z), (binop Y, Z)` (llvm#76384)
  Revert "[NFC][ObjectSizeOffset] Add template stuff for Visual Studio"
  [NFC][ObjectSizeOffset] Add template stuff for Visual Studio
  [MLIR][Presburger] Implement IntegerRelation::mergeAndAlignSymbols (llvm#76736)
  [MLIR][Presburger] Fix ParamPoint to be column-wise instead of row-wise (llvm#77232)
  [MLIR][Presburger] Definitions for basic functions related to cones (llvm#76650)
  [Clang][NFC] Fix trailing whitespace in ReleaseNotes.rst
  [AArch64] Fix condition for combining UADDV and Add. (llvm#76809)
  [RISCV] Don't attempt PRE if available info is SEW/LMUL ratio only (llvm#77063)
  [asan,test] Improve tests to ensure instrumentation even in the presence of StackSafetyAnalysis
  [mlir][spirv] Use assemblyFormat to define atomic op assembly (llvm#76323)
  Revert "[RISCV] Refactor subreg indices. (llvm#77173)"
  [test] Test StackSafetyAnalysis handles MemIntrinsic even in the presence of __asan_memcpy
  [docs] Small spelling fix ("if <...>`than` -> if <...> `then`") (llvm#77215)
  [mlir][python] add MemRefTypeAttr attr builder (llvm#76371)
  [gn build] Manually port ba3ef33
  [mlir][spirv] Support alias/restrict function argument decorations (llvm#76353)
  [RISCV] Refactor subreg indices. (llvm#77173)

Change-Id: Ia3ccd09f1f198cf7988afb60658d35b573507050
Signed-off-by: greenforce-auto-merge <greenforce-auto-merge@users.noreply.github.com>
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this issue Jan 28, 2024
Placing the class id at offset 0 should make `isa` and `dyn_cast` faster
by eliminating the field offset (previously 0x10) from the memory
operand, saving encoding space on x86, and, in theory, an add micro-op.
You can see the load encodes one byte smaller here:
https://godbolt.org/z/Whvz4can9

The compile time tracker shows some modestly positive results in the
on the `cycle` metric and in the final clang binary size metric:
https://llvm-compile-time-tracker.com/compare.php?from=33b54f01fe32030ff60d661a7a951e33360f82ee&to=2530347a57401744293c54f92f9781fbdae3d8c2&stat=cycles
Clicking through to the per-library size breakdown shows that
instcombine size reduces by 0.68%, which is meaningful, and I believe
instcombine is known to be a hotspot.

It is, however, potentially noise. I still think we should do this,
because notionally, the class id really acts as the vptr of the Value,
and conventionally the vptr is always at offset 0.
AaronBallman pushed a commit that referenced this issue Mar 19, 2024
…for clang-cl (#75711)

Fixes #53520.

#### Description ####

Provide `intrin0.h` to be the minimal set of intrinsics that the MSVC
STL requires.
The `intrin0.h` header matches the latest header provided by MSVC 1939
which does include some extra intrinsics that the MSVC STL does not use.

Inside `BuiltinHeaders.def` I kept the header description as `intrin.h`.
If you want me to change those to `intrin0.h` for the moved intrinsics
let me know.

This should now allow `immintrin.h` to be used with function targets for
runtime cpu detection of simd instruction sets without worrying about
the compile-time overhead from MSVC STL including `intrin.h` on clang.

I still need to figure out how to best update MSVC STL to detect for the
presence of `intrin0.h` from clang and to use this header over
`intrin.h`.

#### Testing ####

Built clang locally and ran the test suite. I still need to do a pass
over the existing unit tests for the ms intrinsics to make sure there
aren't any gaps. Wanted to get this PR up for discussion first.

Modified latest MSVC STL from github to point to `intrin0.h` for clang.

Wrote some test files that included MSVC STL headers that rely on
intrinsics such as `atomic`, `bit` and `vector`. Built the unit tests
against x86, arm, aarch64, and x64.

#### Benchmarks ####

The following include times are based on the x64 target with the
modified headers in this PR.
These timings were done by using `clang-cl.exe -ftime-trace` and taking
the wall time for parsing `intrin.h` and `intrin0.h`.

`intrin.h` takes ~897ms to parse.
`intrin0.h` takes ~1ms to parse.

If there is anything required or a different approach is preferred let
me know. I would very much like to move this over the finish line so we
can use function targets with clang-cl.
chencha3 pushed a commit to chencha3/llvm-project that referenced this issue Mar 23, 2024
…for clang-cl (llvm#75711)

Fixes llvm#53520.

#### Description ####

Provide `intrin0.h` to be the minimal set of intrinsics that the MSVC
STL requires.
The `intrin0.h` header matches the latest header provided by MSVC 1939
which does include some extra intrinsics that the MSVC STL does not use.

Inside `BuiltinHeaders.def` I kept the header description as `intrin.h`.
If you want me to change those to `intrin0.h` for the moved intrinsics
let me know.

This should now allow `immintrin.h` to be used with function targets for
runtime cpu detection of simd instruction sets without worrying about
the compile-time overhead from MSVC STL including `intrin.h` on clang.

I still need to figure out how to best update MSVC STL to detect for the
presence of `intrin0.h` from clang and to use this header over
`intrin.h`.

#### Testing ####

Built clang locally and ran the test suite. I still need to do a pass
over the existing unit tests for the ms intrinsics to make sure there
aren't any gaps. Wanted to get this PR up for discussion first.

Modified latest MSVC STL from github to point to `intrin0.h` for clang.

Wrote some test files that included MSVC STL headers that rely on
intrinsics such as `atomic`, `bit` and `vector`. Built the unit tests
against x86, arm, aarch64, and x64.

#### Benchmarks ####

The following include times are based on the x64 target with the
modified headers in this PR.
These timings were done by using `clang-cl.exe -ftime-trace` and taking
the wall time for parsing `intrin.h` and `intrin0.h`.

`intrin.h` takes ~897ms to parse.
`intrin0.h` takes ~1ms to parse.

If there is anything required or a different approach is preferred let
me know. I would very much like to move this over the finish line so we
can use function targets with clang-cl.
devnexen pushed a commit to devnexen/llvm-project that referenced this issue Mar 23, 2024
…for clang-cl (llvm#75711)

Fixes llvm#53520.

#### Description ####

Provide `intrin0.h` to be the minimal set of intrinsics that the MSVC
STL requires.
The `intrin0.h` header matches the latest header provided by MSVC 1939
which does include some extra intrinsics that the MSVC STL does not use.

Inside `BuiltinHeaders.def` I kept the header description as `intrin.h`.
If you want me to change those to `intrin0.h` for the moved intrinsics
let me know.

This should now allow `immintrin.h` to be used with function targets for
runtime cpu detection of simd instruction sets without worrying about
the compile-time overhead from MSVC STL including `intrin.h` on clang.

I still need to figure out how to best update MSVC STL to detect for the
presence of `intrin0.h` from clang and to use this header over
`intrin.h`.

#### Testing ####

Built clang locally and ran the test suite. I still need to do a pass
over the existing unit tests for the ms intrinsics to make sure there
aren't any gaps. Wanted to get this PR up for discussion first.

Modified latest MSVC STL from github to point to `intrin0.h` for clang.

Wrote some test files that included MSVC STL headers that rely on
intrinsics such as `atomic`, `bit` and `vector`. Built the unit tests
against x86, arm, aarch64, and x64.

#### Benchmarks ####

The following include times are based on the x64 target with the
modified headers in this PR.
These timings were done by using `clang-cl.exe -ftime-trace` and taking
the wall time for parsing `intrin.h` and `intrin0.h`.

`intrin.h` takes ~897ms to parse.
`intrin0.h` takes ~1ms to parse.

If there is anything required or a different approach is preferred let
me know. I would very much like to move this over the finish line so we
can use function targets with clang-cl.
SquallATF pushed a commit to SquallATF/llvm-project that referenced this issue Mar 27, 2024
…for clang-cl (llvm#75711)

Fixes llvm#53520.

#### Description ####

Provide `intrin0.h` to be the minimal set of intrinsics that the MSVC
STL requires.
The `intrin0.h` header matches the latest header provided by MSVC 1939
which does include some extra intrinsics that the MSVC STL does not use.

Inside `BuiltinHeaders.def` I kept the header description as `intrin.h`.
If you want me to change those to `intrin0.h` for the moved intrinsics
let me know.

This should now allow `immintrin.h` to be used with function targets for
runtime cpu detection of simd instruction sets without worrying about
the compile-time overhead from MSVC STL including `intrin.h` on clang.

I still need to figure out how to best update MSVC STL to detect for the
presence of `intrin0.h` from clang and to use this header over
`intrin.h`.

#### Testing ####

Built clang locally and ran the test suite. I still need to do a pass
over the existing unit tests for the ms intrinsics to make sure there
aren't any gaps. Wanted to get this PR up for discussion first.

Modified latest MSVC STL from github to point to `intrin0.h` for clang.

Wrote some test files that included MSVC STL headers that rely on
intrinsics such as `atomic`, `bit` and `vector`. Built the unit tests
against x86, arm, aarch64, and x64.

#### Benchmarks ####

The following include times are based on the x64 target with the
modified headers in this PR.
These timings were done by using `clang-cl.exe -ftime-trace` and taking
the wall time for parsing `intrin.h` and `intrin0.h`.

`intrin.h` takes ~897ms to parse.
`intrin0.h` takes ~1ms to parse.

If there is anything required or a different approach is preferred let
me know. I would very much like to move this over the finish line so we
can use function targets with clang-cl.
SquallATF pushed a commit to SquallATF/llvm-project that referenced this issue Apr 4, 2024
…for clang-cl (llvm#75711)

Fixes llvm#53520.

#### Description ####

Provide `intrin0.h` to be the minimal set of intrinsics that the MSVC
STL requires.
The `intrin0.h` header matches the latest header provided by MSVC 1939
which does include some extra intrinsics that the MSVC STL does not use.

Inside `BuiltinHeaders.def` I kept the header description as `intrin.h`.
If you want me to change those to `intrin0.h` for the moved intrinsics
let me know.

This should now allow `immintrin.h` to be used with function targets for
runtime cpu detection of simd instruction sets without worrying about
the compile-time overhead from MSVC STL including `intrin.h` on clang.

I still need to figure out how to best update MSVC STL to detect for the
presence of `intrin0.h` from clang and to use this header over
`intrin.h`.

#### Testing ####

Built clang locally and ran the test suite. I still need to do a pass
over the existing unit tests for the ms intrinsics to make sure there
aren't any gaps. Wanted to get this PR up for discussion first.

Modified latest MSVC STL from github to point to `intrin0.h` for clang.

Wrote some test files that included MSVC STL headers that rely on
intrinsics such as `atomic`, `bit` and `vector`. Built the unit tests
against x86, arm, aarch64, and x64.

#### Benchmarks ####

The following include times are based on the x64 target with the
modified headers in this PR.
These timings were done by using `clang-cl.exe -ftime-trace` and taking
the wall time for parsing `intrin.h` and `intrin0.h`.

`intrin.h` takes ~897ms to parse.
`intrin0.h` takes ~1ms to parse.

If there is anything required or a different approach is preferred let
me know. I would very much like to move this over the finish line so we
can use function targets with clang-cl.
SquallATF pushed a commit to SquallATF/llvm-project that referenced this issue Apr 21, 2024
…for clang-cl (llvm#75711)

Fixes llvm#53520.

#### Description ####

Provide `intrin0.h` to be the minimal set of intrinsics that the MSVC
STL requires.
The `intrin0.h` header matches the latest header provided by MSVC 1939
which does include some extra intrinsics that the MSVC STL does not use.

Inside `BuiltinHeaders.def` I kept the header description as `intrin.h`.
If you want me to change those to `intrin0.h` for the moved intrinsics
let me know.

This should now allow `immintrin.h` to be used with function targets for
runtime cpu detection of simd instruction sets without worrying about
the compile-time overhead from MSVC STL including `intrin.h` on clang.

I still need to figure out how to best update MSVC STL to detect for the
presence of `intrin0.h` from clang and to use this header over
`intrin.h`.

#### Testing ####

Built clang locally and ran the test suite. I still need to do a pass
over the existing unit tests for the ms intrinsics to make sure there
aren't any gaps. Wanted to get this PR up for discussion first.

Modified latest MSVC STL from github to point to `intrin0.h` for clang.

Wrote some test files that included MSVC STL headers that rely on
intrinsics such as `atomic`, `bit` and `vector`. Built the unit tests
against x86, arm, aarch64, and x64.

#### Benchmarks ####

The following include times are based on the x64 target with the
modified headers in this PR.
These timings were done by using `clang-cl.exe -ftime-trace` and taking
the wall time for parsing `intrin.h` and `intrin0.h`.

`intrin.h` takes ~897ms to parse.
`intrin0.h` takes ~1ms to parse.

If there is anything required or a different approach is preferred let
me know. I would very much like to move this over the finish line so we
can use function targets with clang-cl.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:headers Headers provided by Clang, e.g. for intrinsics confirmed Verified by a second party
Projects
None yet
Development

Successfully merging a pull request may close this issue.