merge main #2

jjmarr-amd · 2025-09-22T14:31:10Z

No description provided.

Summary: The added bit counting builtins for vectors used `cttz` and `ctlz`, which is consistent with the LLVM naming convention. However, these are clang builtins and implement exactly the `__builtin_ctzg` and `__builtin_clzg` behavior. It is confusing to people familiar with other other builtins that these are the only bit counting intrinsics named differently. This includes the additional operation for the undefined zero case, which was added as a `clzg` extension.

…maller set of dependencies (#155929) Define lit testsuite for FileCheck and TableGen with smaller set of dependencies. This uses the new `SKIP` argument to `add_lit_testsuites` that was added in #157176.

)

…ectors. (#159331) The current implementation assumes ConstantInt return values are scalar, which is not true when use-constant-int-for-fixed-length-splat is enabled.

…159757) Fix two older FIXME items from the `functions.cpp` test.

Just directly check x86_64. isArch64Bit just adds extra steps around this.

#159712) #121943 rewrote `__atomic_test_and_set` and `__atomic_clear` to be lowered through AtomicExpr StmtPrinter::VisitAtomicExpr still treated them like other atomic builtins with a Val1 operand. This led to incorrect pretty-printing when dumping the AST. Skip Val1 for these two builtins like atomic loads.

…9572) In this commit: (1) Added new pass manager support for `ReachingDefAnalysis`. (2) Added printer pass. (3) Make old pass manager use `ReachingDefInfoWrapperPass`

Replace the target uses of PointerLikeRegClass with RegClassByHwMode

reapply #131804 and #159289 Fixed cmake link issue. --------- Co-authored-by: DeNiCoN <denicon1234@gmail.com> Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>

AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strlen routine is a millicode implementation; we use millicode for the strlen function instead of a library call to improve performance.

Change-Id: Id229f849b1d8552bbe59d6e18114042ef1614fad

…59398) The result type of the vector extend intrinsics generated by the BUILD_VECTOR lowering code should match how they are actually defined. Currently the result type is defaulting to the operand type there. This can conflict with calls to the same intrinsic from other paths.

…9606) Based on testing on processors that use pointer metadata, and with all the work done to delay calls to FixDataAddress, this is no longer necessary. Note that, with debugserver in particular, this is an NFC change: the code path here is for frame zero, and debugserver will strip metadata when reading fp from frame zero anyway.

This should eventually be done using `lnt` instead, but for the time being this makes it easy to visualize historical data without having an instance of `lnt` running.

) The atomic_wait benchmarks are great, but they tend to overload the system they're running on. For that reason, we can't run them on our CI infrastructure on a regular basis. Instead of removing them, make them unsupported outside of dry-running, which allows keeping the benchmarks around and ensuring they don't rot, but doesn't run them along with the other benchmarks. If we need to investigate atomic_wait performance, it's trivial to mark the benchmark as supported and run it for local investigations. This is an alternative to #158289.

When build with assertions, there will be an output like the following that needs to be filtered out, similar to the other ones. `'Build config: +assertions'`

#157435) First added in #153585 for Darwin only. All Linux AArch64 systems also have Top Byte Ignore enabled in userspace so the test "just works" there. FreeBSD has very recently gained Top Byte Ignore support: freebsd/freebsd-src@4c6c27d However it's so recent, I don't want to assume it'll be available on any random FreeBSD system out there. There isn't really a good place to put this test, so I put it in the top level of API, next to the other non-address bit test that didn't have a good home either.

The GNU Fortran library function FNUM(u) returns the UNIX file descriptor that corresponds to an open Fortran unit number, if any; otherwise -1. This implementation is a library extension only, not an intrinsic.

Reverts #158161 Due to reported failures on remote Linux and Swift buildbots.

This patch adds a new %{readfile:<file name>} substitution to lit. This is needed for porting a couple of tests to lit's internal shell. These tests are all using subshells to pass some option to a command are not feasible to run within the internal shell without this functionality. Reviewers: petrhosek, jh7370, ilovepi, cmtice Reviewed By: jh7370, cmtice Pull Request: #158441

Planning to add to the list in #159791, so format it. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

…0019) add `\` to avoid a blank first line

…ison" issue (#159786) Despite the difference in the order of fcmp operands, `%lhs, %rhs` and`%rhs, %lhs`, generated assembly remains the same. This is a baseline test for #159723

If a COPY uses Reg but only in an implicit operand then the new implementation ignores it but the old implementation would have treated it as a copy of Reg. Probably this case never occurs in practice. Other than that, this patch is NFC. Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

Make the actual use context less ugly.

…ntTypes.cpp (NFC)

…plementation (#158075) Move the logic for building "out-of-thin-air" source materializations during op replacements from `replaceOp` to `findOrBuildReplacementValue`. That function already builds source materializations and can handle the case where an op result is dropped. This commit is in preparation of turning `replaceOp` into a non-virtual function. (It is sufficient for `replaceAllUsesWith` and `eraseOp` to be virtual.)

When building with latest MSVC on Windows, this fixes some compile-time warnings from last week's integration in #157885: ``` [321/5941] Building CXX object lib\Support\LSP\CMakeFiles\LLVMSupportLSP.dir\Transport.cpp.obj C:\git\llvm-project\llvm\lib\Support\LSP\Transport.cpp(123): warning C4930: 'std::lock_guard<std::mutex> responseHandlersLock(llvm::lsp::MessageHandler::ResponseHandlerTy)': prototyped function not called (was a variable definition intended?) [384/5941] Building CXX object unittests\Support\LSP\CMakeFiles\LLVMSupportLSPTests.dir\Transport.cpp.obj C:\git\llvm-project\llvm\unittests\Support\LSP\Transport.cpp(190): warning C4804: '+=': unsafe use of type 'bool' in operation ```

This used to happen in the global destruction, after `main()` has exited. Previously, we were re-creating the `llvm::TimerGlobals` object at this point. <img width="855" height="270" alt="image" src="https://github.com/user-attachments/assets/757e9416-a74a-406a-841e-d3e4cc6a69a1" />

…mp' (#159813) Moves the implementation of the `cert-err52-cpp` check into `modernize` module and gives it a clearer name: `modernize-avoid-setjmp-longjmp`. This is part of the cleanup described in #157287. Closes #157297

This PR introduces the support for the SPIR-V extension `SPV_KHR_bfloat16`. This extension extends the `OpTypeFloat` instruction to enable the use of bfloat16 types with cooperative matrices and dot products. TODO: Per the `SPV_KHR_bfloat16` extension, there are a limited number of instructions that can use the bfloat16 type. For example, arithmetic instructions like `FAdd` or `FMul` can't operate on `bfloat16` values. Therefore, a future patch should be added to either emit an error or fall back to FP32 for arithmetic in cases where bfloat16 must not be used. Reference Specification: https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_bfloat16.asciidoc

std::realloc is declared there

Add DAGCombiner patterns for pairs of 2-operand min/max instructions to be fused into a single 3-operand min/max instruction for f32s (only for PTX 8.8+ and sm100+).

This patch introduces a new pass, SPIRVCBufferAccess, which is responsible for translating accesses to HLSL constant buffer (cbuffer) global variables into accesses to the proper SPIR-V resource. The pass operates by: 1. Identifying all cbuffers via the `!hlsl.cbs` metadata. 2. Replacing all uses of cbuffer member global variables with `llvm.spv.resource.getpointer` intrinsics. 3. Cleaning up the original global variables and metadata. This approach allows subsequent passes, like SPIRVEmitIntrinsics, to correctly fold GEPs into a single OpAccessChain instruction. The patch also includes a comprehensive set of lit tests to cover various scenarios: - Basic cbuffer access direct load and GEPs. - Unused and partially unused cbuffers. This implements the SPIR-V version of https://github.com/llvm/wg-hlsl/blob/main/proposals/0016-constant-buffers.md#lowering-to-buffer-load-intrinsics.

… (NFC) (#155825) Since the size of the last dimension of TMA is no longer fixed at 128 bytes, remove the kMaxTMALastdimByte.

* Fix infinite recursion with nested structs. * Drop `::getExtensions` function from derived types, so that there's only one entry point that queries type extensions. * Move all extension logic to a new helper class -- this way the `::getExtensions` functions can't diverge across concrete types and 'convenience types' like `CompositeType`. We should also fix `::getCapabilities` in a similar way and move the testcase to `vce-deduction.mlir`. Issue: #159963

Add tests with pointer-based loop guards.

Summary: This patch exposes `__builtin_masked_gather` and `__builtin_masked_scatter` to clang. These map to the underlying intrinsic relatively cleanly, needing only a level of indirection to take a vector of indices and a base pointer to a vector of pointers.

They're not formatted correctly anymore, since clang-format was updated.

…ry(A,X, XOR(B,C)) and ternary(A,X, OR(B,C)) (#157909) Adds support for ternary equivalent operations of the form - `ternary(A, X, xor(B,C))` where `X=[and(B,C)| nor(B,C)| or(B,C)| B | C]`. - `ternary(A, X, or(B,C))` where `X = [and(B,C)| eqv(B,C)| not(B)| not(C)| nand(B,C)| B | C]`. The following are the patterns involved and the imm values: ``` ternary(A, and(B,C), xor(B,C)) 97 ternary(A, B, xor(B,C)) 99 ternary(A, C, xor(B,C)) 101 ternary(A, or(B,C), xor(B,C)) 103 ternary(A, nor(B,C), xor(B,C)) 104 ternary(A, and(B,C), or(B,C)) 113 ternary(A, B, or(B,C)) 115 ternary(A, C, or(B,C)) 117 ternary(A, eqv(B,C), or(B,C)) 121 ternary(A, not(C), or(B,C)) 122 ternary(A, not(B), or(B,C)) 124 ternary(A, nand(B,C), or(B,C)) 126 ``` eg. `xxeval XT, XA, XB, XC, 97` performs the ternary operation: `XA ? and(XB, XC) : xor(XB, XC)` and places the result in `XT`. This is the continuation of: - [[PowerPC] Exploit xxeval instruction for ternary patterns - ternary(A, X, and(B,C))](#141733 (comment)) - [[PowerPC] Exploit xxeval instruction for operations of the form ternary(A,X,B) and ternary(A,X,C).](#152956 (comment)) --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>

Summary: The changes made in #156057 allows the alignment value to be increased. We assert effectively infinite alignment when the pointer argument is invalid / null. The problem is that for whatever reason the masked load / store functions use i32 for their alignment value which means this gets truncated to zero. Add a special check for this, long term we probably want to just remove this argument entirely.

We compile our monorepo with `/D_MBCS` and flang-rt compilation breaks as it explicitly uses `wchar_t` (i. e. not TCHAR). Use STARTUPINFOW / CreateProcessW method explicitly to make the code work disregarding global settings.

Need this as `mlir/dialects/transform/smt.py` imports it: ```py from .._transform_smt_extension_ops_gen import * from .._transform_smt_extension_ops_gen import _Dialect ```

jhuber6 and others added 30 commits September 19, 2025 07:00

[LLVM] Specialize test suites for TableGen and FileCheck to use s…

13605ab

…maller set of dependencies (#155929) Define lit testsuite for FileCheck and TableGen with smaller set of dependencies. This uses the new `SKIP` argument to `add_lit_testsuites` that was added in #157176.

[AMDGPU] Fix the magic number RegisterClass for SReg_32 in test (#159761

eed99d5

)

[LLVM][CodeGen] Update PPCFastISel::SelectRet for ConstantInt based v…

b7e4edc

…ectors. (#159331) The current implementation assumes ConstantInt return values are scalar, which is not true when use-constant-int-for-fixed-length-splat is enabled.

[clang][bytecode] Typecheck called function pointers more thorougly (#…

68c9ddb

…159757) Fix two older FIXME items from the `functions.cpp` test.

SPARC: Use RegClassByHwMode instead of PointerLikeRegClass (#158271)

9113592

X86: Avoid using isArch64Bit for 64-bit checks (#157412)

cc680fc

Just directly check x86_64. isArch64Bit just adds extra steps around this.

[X86] Add test coverage for #159670 (#159767)

188c7ed

[CodeGen][NewPM] Port ReachingDefAnalysis to new pass manager. (#15…

5621464

…9572) In this commit: (1) Added new pass manager support for `ReachingDefAnalysis`. (2) Added printer pass. (3) Make old pass manager use `ReachingDefInfoWrapperPass`

X86: Switch to RegClassByHwMode (#158274)

2654b51

Replace the target uses of PointerLikeRegClass with RegClassByHwMode

reapply "[clang-tidy] support query based custom check" (#159547)

584af2f

reapply #131804 and #159289 Fixed cmake link issue. --------- Co-authored-by: DeNiCoN <denicon1234@gmail.com> Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>

Mips: Switch to RegClassByHwMode (#158273)

084872a

[flang][OpenMP] Remove no longer used OmpLoopDirective, NFC (#159576)

e2467cb

[mlir][rocdl] Add gfx1250+ cvt scale intrinsics (#159649)

cd0f191

[gn] port 584af2f (clang-tidy custom)

c7054d9

[AMDGPU] Precommit test for memory intrinics CGP handling

ac8f3cd

Change-Id: Id229f849b1d8552bbe59d6e18114042ef1614fad

PPC: Replace PointerLikeRegClass with RegClassByHwMode (#158777)

acc156d

[libc++] Add a utility to visualize historical benchmark data locally

00333ed

This should eventually be done using `lnt` instead, but for the time being this makes it easy to visualize historical data without having an instance of `lnt` running.

Fix perf-helper.py (#159745)

50ef746

When build with assertions, there will be an output like the following that needs to be filtered out, similar to the other ones. `'Build config: +assertions'`

[flang] Implement FNUM() (#159433)

80fa3bd

The GNU Fortran library function FNUM(u) returns the UNIX file descriptor that corresponds to an open Fortran unit number, if any; otherwise -1. This implementation is a library extension only, not an intrinsic.

[LLVM] Exclude specialized lit test suites from check-all (#159781)

8109c3a

Revert "RISCV unwinding enable" (#159790)

19bc0d6

Reverts #158161 Due to reported failures on remote Linux and Swift buildbots.

[llvm-readobj][NFC] Format ElfMachineType array definition (#159793)

69465eb

Planning to add to the list in #159791, so format it. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

HerrCai0907 and others added 28 commits September 22, 2025 18:33

[clang-tidy][NFC] let multi-line string first line does not wrap (#16…

6884cc7

…0019) add `\` to avoid a blank first line

[X86] Baseline test for "invalid operand order for fp16 vector compar…

ab76686

…ison" issue (#159786) Despite the difference in the order of fcmp operands, `%lhs, %rhs` and`%rhs, %lhs`, generated assembly remains the same. This is a baseline test for #159723

Regalloc: Add operator >= to EvictionCost (#160070)

c077822

Make the actual use context less ugly.

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in NormalizeQua…

2ab5186

…ntTypes.cpp (NFC)

[mlir] Fix bazel after d8b84be. (#160078)

7c8b3f3

[docs][OpenMP] Claim compound directive handling (#160077)

ec5460b

[Driver] Enable __float128 support on X86 on Hurd (#160045)

47211c4

[mlir] Fix bazel after 2bcccdd. (#160081)

e9db38c

Add missing #include <cstdlib> (#157840)

bb79448

std::realloc is declared there

[NVPTX] Add 3-operand fmin/fmax DAGCombines (#159729)

0dc2148

Add DAGCombiner patterns for pairs of 2-operand min/max instructions to be fused into a single 3-operand min/max instruction for f32s (only for PTX 8.8+ and sm100+).

[mlir][nvgpu] Delete nvgpu dialect unused variable kMaxTMALastdimByte…

dfd50f9

… (NFC) (#155825) Since the size of the last dimension of TMA is no longer fixed at 128 bytes, remove the kMaxTMALastdimByte.

[IndVars,LV] Add tests with pointer-based loop guards.

f8a7f36

Add tests with pointer-based loop guards.

[gn build] Port ac69f9d

45a0843

[libc++][NFC] Reformat some deduction guides (#160085)

e40bbba

They're not formatted correctly anymore, since clang-format was updated.

[AMDGPU] Skip debug uses in SIPeepholeSDWA (#160092)

3cb2174

[AMDGPU] Skip debug uses in SIInstrInfo::foldImmediate (#160102)

b7a848e

jjmarr-amd merged this pull request into jjmarr-amd:main Sep 22, 2025
12 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main #2

merge main #2

Uh oh!

jjmarr-amd commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

124 participants

merge main #2

merge main #2

Uh oh!

Conversation

jjmarr-amd commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

124 participants