forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
merge main #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
merge main #2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: The added bit counting builtins for vectors used `cttz` and `ctlz`, which is consistent with the LLVM naming convention. However, these are clang builtins and implement exactly the `__builtin_ctzg` and `__builtin_clzg` behavior. It is confusing to people familiar with other other builtins that these are the only bit counting intrinsics named differently. This includes the additional operation for the undefined zero case, which was added as a `clzg` extension.
…ectors. (#159331) The current implementation assumes ConstantInt return values are scalar, which is not true when use-constant-int-for-fixed-length-splat is enabled.
…159757) Fix two older FIXME items from the `functions.cpp` test.
Just directly check x86_64. isArch64Bit just adds extra steps around this.
#159712) #121943 rewrote `__atomic_test_and_set` and `__atomic_clear` to be lowered through AtomicExpr StmtPrinter::VisitAtomicExpr still treated them like other atomic builtins with a Val1 operand. This led to incorrect pretty-printing when dumping the AST. Skip Val1 for these two builtins like atomic loads.
…9572) In this commit: (1) Added new pass manager support for `ReachingDefAnalysis`. (2) Added printer pass. (3) Make old pass manager use `ReachingDefInfoWrapperPass`
Replace the target uses of PointerLikeRegClass with RegClassByHwMode
AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strlen routine is a millicode implementation; we use millicode for the strlen function instead of a library call to improve performance.
Change-Id: Id229f849b1d8552bbe59d6e18114042ef1614fad
…59398) The result type of the vector extend intrinsics generated by the BUILD_VECTOR lowering code should match how they are actually defined. Currently the result type is defaulting to the operand type there. This can conflict with calls to the same intrinsic from other paths.
…9606) Based on testing on processors that use pointer metadata, and with all the work done to delay calls to FixDataAddress, this is no longer necessary. Note that, with debugserver in particular, this is an NFC change: the code path here is for frame zero, and debugserver will strip metadata when reading fp from frame zero anyway.
This should eventually be done using `lnt` instead, but for the time being this makes it easy to visualize historical data without having an instance of `lnt` running.
) The atomic_wait benchmarks are great, but they tend to overload the system they're running on. For that reason, we can't run them on our CI infrastructure on a regular basis. Instead of removing them, make them unsupported outside of dry-running, which allows keeping the benchmarks around and ensuring they don't rot, but doesn't run them along with the other benchmarks. If we need to investigate atomic_wait performance, it's trivial to mark the benchmark as supported and run it for local investigations. This is an alternative to #158289.
When build with assertions, there will be an output like the following that needs to be filtered out, similar to the other ones. `'Build config: +assertions'`
#157435) First added in #153585 for Darwin only. All Linux AArch64 systems also have Top Byte Ignore enabled in userspace so the test "just works" there. FreeBSD has very recently gained Top Byte Ignore support: freebsd/freebsd-src@4c6c27d However it's so recent, I don't want to assume it'll be available on any random FreeBSD system out there. There isn't really a good place to put this test, so I put it in the top level of API, next to the other non-address bit test that didn't have a good home either.
The GNU Fortran library function FNUM(u) returns the UNIX file descriptor that corresponds to an open Fortran unit number, if any; otherwise -1. This implementation is a library extension only, not an intrinsic.
Reverts #158161 Due to reported failures on remote Linux and Swift buildbots.
This patch adds a new %{readfile:<file name>} substitution to lit. This is needed for porting a couple of tests to lit's internal shell. These tests are all using subshells to pass some option to a command are not feasible to run within the internal shell without this functionality. Reviewers: petrhosek, jh7370, ilovepi, cmtice Reviewed By: jh7370, cmtice Pull Request: #158441
Planning to add to the list in #159791, so format it. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
…0019) add `\` to avoid a blank first line
If a COPY uses Reg but only in an implicit operand then the new implementation ignores it but the old implementation would have treated it as a copy of Reg. Probably this case never occurs in practice. Other than that, this patch is NFC. Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Make the actual use context less ugly.
…ntTypes.cpp (NFC)
…plementation (#158075) Move the logic for building "out-of-thin-air" source materializations during op replacements from `replaceOp` to `findOrBuildReplacementValue`. That function already builds source materializations and can handle the case where an op result is dropped. This commit is in preparation of turning `replaceOp` into a non-virtual function. (It is sufficient for `replaceAllUsesWith` and `eraseOp` to be virtual.)
When building with latest MSVC on Windows, this fixes some compile-time warnings from last week's integration in #157885: ``` [321/5941] Building CXX object lib\Support\LSP\CMakeFiles\LLVMSupportLSP.dir\Transport.cpp.obj C:\git\llvm-project\llvm\lib\Support\LSP\Transport.cpp(123): warning C4930: 'std::lock_guard<std::mutex> responseHandlersLock(llvm::lsp::MessageHandler::ResponseHandlerTy)': prototyped function not called (was a variable definition intended?) [384/5941] Building CXX object unittests\Support\LSP\CMakeFiles\LLVMSupportLSPTests.dir\Transport.cpp.obj C:\git\llvm-project\llvm\unittests\Support\LSP\Transport.cpp(190): warning C4804: '+=': unsafe use of type 'bool' in operation ```
This used to happen in the global destruction, after `main()` has exited. Previously, we were re-creating the `llvm::TimerGlobals` object at this point. <img width="855" height="270" alt="image" src="https://github.com/user-attachments/assets/757e9416-a74a-406a-841e-d3e4cc6a69a1" />
This PR introduces the support for the SPIR-V extension `SPV_KHR_bfloat16`. This extension extends the `OpTypeFloat` instruction to enable the use of bfloat16 types with cooperative matrices and dot products. TODO: Per the `SPV_KHR_bfloat16` extension, there are a limited number of instructions that can use the bfloat16 type. For example, arithmetic instructions like `FAdd` or `FMul` can't operate on `bfloat16` values. Therefore, a future patch should be added to either emit an error or fall back to FP32 for arithmetic in cases where bfloat16 must not be used. Reference Specification: https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_bfloat16.asciidoc
std::realloc is declared there
Add DAGCombiner patterns for pairs of 2-operand min/max instructions to be fused into a single 3-operand min/max instruction for f32s (only for PTX 8.8+ and sm100+).
This patch introduces a new pass, SPIRVCBufferAccess, which is responsible for translating accesses to HLSL constant buffer (cbuffer) global variables into accesses to the proper SPIR-V resource. The pass operates by: 1. Identifying all cbuffers via the `!hlsl.cbs` metadata. 2. Replacing all uses of cbuffer member global variables with `llvm.spv.resource.getpointer` intrinsics. 3. Cleaning up the original global variables and metadata. This approach allows subsequent passes, like SPIRVEmitIntrinsics, to correctly fold GEPs into a single OpAccessChain instruction. The patch also includes a comprehensive set of lit tests to cover various scenarios: - Basic cbuffer access direct load and GEPs. - Unused and partially unused cbuffers. This implements the SPIR-V version of https://github.com/llvm/wg-hlsl/blob/main/proposals/0016-constant-buffers.md#lowering-to-buffer-load-intrinsics.
… (NFC) (#155825) Since the size of the last dimension of TMA is no longer fixed at 128 bytes, remove the kMaxTMALastdimByte.
* Fix infinite recursion with nested structs. * Drop `::getExtensions` function from derived types, so that there's only one entry point that queries type extensions. * Move all extension logic to a new helper class -- this way the `::getExtensions` functions can't diverge across concrete types and 'convenience types' like `CompositeType`. We should also fix `::getCapabilities` in a similar way and move the testcase to `vce-deduction.mlir`. Issue: #159963
Add tests with pointer-based loop guards.
Summary: This patch exposes `__builtin_masked_gather` and `__builtin_masked_scatter` to clang. These map to the underlying intrinsic relatively cleanly, needing only a level of indirection to take a vector of indices and a base pointer to a vector of pointers.
They're not formatted correctly anymore, since clang-format was updated.
…ry(A,X, XOR(B,C)) and ternary(A,X, OR(B,C)) (#157909) Adds support for ternary equivalent operations of the form - `ternary(A, X, xor(B,C))` where `X=[and(B,C)| nor(B,C)| or(B,C)| B | C]`. - `ternary(A, X, or(B,C))` where `X = [and(B,C)| eqv(B,C)| not(B)| not(C)| nand(B,C)| B | C]`. The following are the patterns involved and the imm values: ``` ternary(A, and(B,C), xor(B,C)) 97 ternary(A, B, xor(B,C)) 99 ternary(A, C, xor(B,C)) 101 ternary(A, or(B,C), xor(B,C)) 103 ternary(A, nor(B,C), xor(B,C)) 104 ternary(A, and(B,C), or(B,C)) 113 ternary(A, B, or(B,C)) 115 ternary(A, C, or(B,C)) 117 ternary(A, eqv(B,C), or(B,C)) 121 ternary(A, not(C), or(B,C)) 122 ternary(A, not(B), or(B,C)) 124 ternary(A, nand(B,C), or(B,C)) 126 ``` eg. `xxeval XT, XA, XB, XC, 97` performs the ternary operation: `XA ? and(XB, XC) : xor(XB, XC)` and places the result in `XT`. This is the continuation of: - [[PowerPC] Exploit xxeval instruction for ternary patterns - ternary(A, X, and(B,C))](#141733 (comment)) - [[PowerPC] Exploit xxeval instruction for operations of the form ternary(A,X,B) and ternary(A,X,C).](#152956 (comment)) --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
Summary: The changes made in #156057 allows the alignment value to be increased. We assert effectively infinite alignment when the pointer argument is invalid / null. The problem is that for whatever reason the masked load / store functions use i32 for their alignment value which means this gets truncated to zero. Add a special check for this, long term we probably want to just remove this argument entirely.
We compile our monorepo with `/D_MBCS` and flang-rt compilation breaks as it explicitly uses `wchar_t` (i. e. not TCHAR). Use STARTUPINFOW / CreateProcessW method explicitly to make the code work disregarding global settings.
jjmarr-amd
pushed a commit
that referenced
this pull request
Sep 25, 2025
Need this as `mlir/dialects/transform/smt.py` imports it: ```py from .._transform_smt_extension_ops_gen import * from .._transform_smt_extension_ops_gen import _Dialect ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.