Skip to content

Conversation

jjmarr-amd
Copy link
Owner

No description provided.

jhuber6 and others added 30 commits September 19, 2025 07:00
Summary:
The added bit counting builtins for vectors used `cttz` and `ctlz`,
which is consistent with the LLVM naming convention. However, these are
clang builtins and implement exactly the `__builtin_ctzg` and
`__builtin_clzg` behavior. It is confusing to people familiar with other
other builtins that these are the only bit counting intrinsics named
differently. This includes the additional operation for the undefined
zero case, which was added as a `clzg` extension.
…maller set of dependencies (#155929)

Define lit testsuite for FileCheck and TableGen with smaller set of
dependencies. This uses the new `SKIP` argument to `add_lit_testsuites`
that was added in #157176.
…ectors. (#159331)

The current implementation assumes ConstantInt return values are scalar,
which is not true when use-constant-int-for-fixed-length-splat is
enabled.
…159757)

Fix two older FIXME items from the `functions.cpp` test.
Just directly check x86_64. isArch64Bit just adds extra
steps around this.
#159712)

#121943 rewrote
`__atomic_test_and_set` and `__atomic_clear` to be lowered through
AtomicExpr

StmtPrinter::VisitAtomicExpr still treated them like other atomic
builtins with a Val1 operand. This led to incorrect pretty-printing when
dumping the AST.

Skip Val1 for these two builtins like atomic loads.
…9572)

In this commit:
  (1) Added new pass manager support for `ReachingDefAnalysis`.
  (2) Added printer pass.
  (3) Make old pass manager use `ReachingDefInfoWrapperPass`
Replace the target uses of PointerLikeRegClass with RegClassByHwMode
reapply #131804 and #159289
Fixed cmake link issue.

---------

Co-authored-by: DeNiCoN <denicon1234@gmail.com>
Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strlen routine is a millicode implementation;
we use millicode for the strlen function instead of a library call to
improve performance.
Change-Id: Id229f849b1d8552bbe59d6e18114042ef1614fad
…59398)

The result type of the vector extend intrinsics generated by the
BUILD_VECTOR lowering code should match how they are actually defined.
Currently the result type is defaulting to the operand type there. This
can conflict with calls to the same intrinsic from other paths.
…9606)

Based on testing on processors that use pointer metadata, and with all
the work done to delay calls to FixDataAddress, this is no longer
necessary.

Note that, with debugserver in particular, this is an NFC change: the
code path here is for frame zero, and debugserver will strip metadata
when reading fp from frame zero anyway.
This should eventually be done using `lnt` instead, but for the time
being this makes it easy to visualize historical data without having
an instance of `lnt` running.
)

The atomic_wait benchmarks are great, but they tend to overload the
system they're running on. For that reason, we can't run them on our CI
infrastructure on a regular basis.

Instead of removing them, make them unsupported outside of dry-running,
which allows keeping the benchmarks around and ensuring they don't rot,
but doesn't run them along with the other benchmarks. If we need to
investigate atomic_wait performance, it's trivial to mark the benchmark
as supported and run it for local investigations.

This is an alternative to #158289.
When build with assertions, there will be an output like the following
that needs to be filtered out, similar to the other ones.

`'Build config: +assertions'`
#157435)

First added in #153585 for Darwin only. All Linux AArch64 systems also
have Top Byte Ignore enabled in userspace so the test "just works"
there.

FreeBSD has very recently gained Top Byte Ignore support:
freebsd/freebsd-src@4c6c27d

However it's so recent, I don't want to assume it'll be available on any
random FreeBSD system out there.

There isn't really a good place to put this test, so I put it in the top
level of API, next to the other non-address bit test that didn't have a
good home either.
The GNU Fortran library function FNUM(u) returns the UNIX file
descriptor that corresponds to an open Fortran unit number, if any;
otherwise -1.

This implementation is a library extension only, not an intrinsic.
Reverts #158161

Due to reported failures on remote Linux and Swift buildbots.
This patch adds a new %{readfile:<file name>} substitution to lit. This
is needed for porting a couple of tests to lit's internal shell. These
tests are all using subshells to pass some option to a command are not
feasible to run within the internal shell without this functionality.

Reviewers: petrhosek, jh7370, ilovepi, cmtice

Reviewed By: jh7370, cmtice

Pull Request: #158441
Planning to add to the list in
#159791, so format it.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
HerrCai0907 and others added 28 commits September 22, 2025 18:33
…ison" issue (#159786)

Despite the difference in the order of fcmp operands, `%lhs, %rhs` and`%rhs, %lhs`, generated assembly remains the same.

This is a baseline test for #159723
If a COPY uses Reg but only in an implicit operand then the new
implementation ignores it but the old implementation would have treated
it as a copy of Reg. Probably this case never occurs in practice. Other
than that, this patch is NFC.

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Make the actual use context less ugly.
…plementation (#158075)

Move the logic for building "out-of-thin-air" source materializations
during op replacements from `replaceOp` to
`findOrBuildReplacementValue`. That function already builds source
materializations and can handle the case where an op result is dropped.

This commit is in preparation of turning `replaceOp` into a non-virtual
function. (It is sufficient for `replaceAllUsesWith` and `eraseOp` to be
virtual.)
When building with latest MSVC on Windows, this fixes some compile-time
warnings from last week's integration in
#157885:
```
[321/5941] Building CXX object lib\Support\LSP\CMakeFiles\LLVMSupportLSP.dir\Transport.cpp.obj
C:\git\llvm-project\llvm\lib\Support\LSP\Transport.cpp(123): warning C4930: 'std::lock_guard<std::mutex> responseHandlersLock(llvm::lsp::MessageHandler::ResponseHandlerTy)': prototyped function not called (was a variable definition intended?)
[384/5941] Building CXX object unittests\Support\LSP\CMakeFiles\LLVMSupportLSPTests.dir\Transport.cpp.obj
C:\git\llvm-project\llvm\unittests\Support\LSP\Transport.cpp(190): warning C4804: '+=': unsafe use of type 'bool' in operation
```
This used to happen in the global destruction, after `main()` has
exited. Previously, we were re-creating the `llvm::TimerGlobals` object
at this point.

<img width="855" height="270" alt="image"
src="https://github.com/user-attachments/assets/757e9416-a74a-406a-841e-d3e4cc6a69a1"
/>
…mp' (#159813)

Moves the implementation of the `cert-err52-cpp` check into `modernize`
module and gives it a clearer name: `modernize-avoid-setjmp-longjmp`.

This is part of the cleanup described in #157287.
Closes #157297
This PR introduces the support for the SPIR-V extension
`SPV_KHR_bfloat16`. This extension extends the `OpTypeFloat` instruction
to enable the use of bfloat16 types with cooperative matrices and dot
products.

TODO:
Per the `SPV_KHR_bfloat16` extension, there are a limited number of
instructions that can use the bfloat16 type. For example, arithmetic
instructions like `FAdd` or `FMul` can't operate on `bfloat16` values.
Therefore, a future patch should be added to either emit an error or
fall back to FP32 for arithmetic in cases where bfloat16 must not be
used.

Reference Specification:

https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_bfloat16.asciidoc
std::realloc is declared there
Add DAGCombiner patterns for pairs of 2-operand min/max instructions to
be fused into a single 3-operand min/max instruction for f32s (only for
PTX 8.8+ and sm100+).
This patch introduces a new pass, SPIRVCBufferAccess, which is
responsible for translating accesses to HLSL constant buffer (cbuffer)
global variables into accesses to the proper SPIR-V resource.

The pass operates by:
1. Identifying all cbuffers via the `!hlsl.cbs` metadata.
2. Replacing all uses of cbuffer member global variables with
`llvm.spv.resource.getpointer` intrinsics.
3. Cleaning up the original global variables and metadata.

This approach allows subsequent passes, like SPIRVEmitIntrinsics, to
correctly fold GEPs into a single OpAccessChain instruction.

The patch also includes a comprehensive set of lit tests to cover
various scenarios:
- Basic cbuffer access direct load and GEPs.
- Unused and partially unused cbuffers.

This implements the SPIR-V version of

https://github.com/llvm/wg-hlsl/blob/main/proposals/0016-constant-buffers.md#lowering-to-buffer-load-intrinsics.
… (NFC) (#155825)

Since the size of the last dimension of TMA is no longer fixed at 128
bytes, remove the kMaxTMALastdimByte.
* Fix infinite recursion with nested structs.
* Drop `::getExtensions` function from derived types, so that there's
only one entry point that queries type extensions.
* Move all extension logic to a new helper class -- this way the
`::getExtensions` functions can't diverge across concrete types and
'convenience types' like `CompositeType`.

We should also fix `::getCapabilities` in a similar way and move the
testcase to `vce-deduction.mlir`.

Issue: #159963
Add tests with pointer-based loop guards.
Summary:
This patch exposes `__builtin_masked_gather` and
`__builtin_masked_scatter` to clang. These map to the underlying
intrinsic relatively cleanly, needing only a level of indirection to
take a vector of indices and a base pointer to a vector of pointers.
They're not formatted correctly anymore, since clang-format was updated.
…ry(A,X, XOR(B,C)) and ternary(A,X, OR(B,C)) (#157909)

Adds support for ternary equivalent operations of the form 
- `ternary(A, X, xor(B,C))` where `X=[and(B,C)| nor(B,C)| or(B,C)| B |
C]`.
- `ternary(A, X, or(B,C))` where `X = [and(B,C)| eqv(B,C)| not(B)|
not(C)| nand(B,C)| B | C]`.

The following are the patterns involved and the imm values:

```
ternary(A,  and(B,C),   xor(B,C))	97
ternary(A,  B,          xor(B,C))	99
ternary(A,  C,          xor(B,C))	101
ternary(A,  or(B,C),    xor(B,C))	103
ternary(A,  nor(B,C),   xor(B,C))	104

ternary(A,  and(B,C),   or(B,C))	113
ternary(A,  B,          or(B,C))	115
ternary(A,  C,          or(B,C))	117
ternary(A,  eqv(B,C),   or(B,C))	121
ternary(A,  not(C),     or(B,C))	122
ternary(A,  not(B),     or(B,C))	124
ternary(A,  nand(B,C),  or(B,C))	126
```
eg. `xxeval XT, XA, XB, XC, 97`

performs the ternary operation: `XA ? and(XB, XC) : xor(XB, XC)` and
places the result in `XT`.

This is the continuation of:
- [[PowerPC] Exploit xxeval instruction for ternary patterns -
ternary(A, X,
and(B,C))](#141733 (comment))
- [[PowerPC] Exploit xxeval instruction for operations of the form
ternary(A,X,B) and
ternary(A,X,C).](#152956 (comment))

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
Summary:
The changes made in #156057
allows the alignment value to be increased. We assert effectively
infinite alignment when the pointer argument is invalid / null. The
problem is that for whatever reason the masked load / store functions
use i32 for their alignment value which means this gets truncated to
zero.

Add a special check for this, long term we probably want to just remove
this argument entirely.
We compile our monorepo with `/D_MBCS` and flang-rt compilation breaks
as it explicitly uses `wchar_t` (i. e. not TCHAR).

Use STARTUPINFOW / CreateProcessW method explicitly to make the code
work disregarding global settings.
@jjmarr-amd jjmarr-amd merged this pull request into jjmarr-amd:main Sep 22, 2025
12 of 13 checks passed
jjmarr-amd pushed a commit that referenced this pull request Sep 25, 2025
Need this as `mlir/dialects/transform/smt.py` imports it:

```py
from .._transform_smt_extension_ops_gen import *
from .._transform_smt_extension_ops_gen import _Dialect
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.