[X86] Add test coverage for f16/bf16 fabs/fneg load-store tests
Future extension to #118680
Revert "[clang-format] Add cmake target clang-format-style-options fo…
[InstCombine] Fold `icmp spred (X *nsw Z), (Y *nsw Z) -> icmp pred Z,…
… 0` if `scmp(X, Y)` is known (#118726) ``` icmp spred (X *nsw Z), (Y *nsw Z) -> icmp swap(spred) Z, 0 if X s< Y icmp spred (X *nsw Z), (Y *nsw Z) -> icmp spred Z, 0 if X s> Y ``` Alive2: https://alive2.llvm.org/ce/z/F2D0GE
[LLD][COFF] Add basic ARM64X dynamic relocations support (#118035)
This modifies the machine field in the hybrid view to be AMD64, aligning it with expectations from ARM64EC modules. While this provides initial support, additional relocations will be necessary for full functionality. Many of these cases depend on implementing separate namespace support first. Move clearing of the .reloc section from addBaserels to assignAddresses to ensure it is always cleared, regardless of the relocatable configuration. This change also clarifies the reasoning for adding the dynamic relocations chunk in that location.
[Sched] Skip MemOp with unknown size when clustering (#118443)
In #83875, we changed the type of `Width` to `LocationSize`. To get the clsuter bytes, we use `LocationSize::getValue()` to calculate the value. But when `Width` is an unknown size `LocationSize`, an assertion "Getting value from an unknown LocationSize!" will be triggered. This patch simply skips MemOp with unknown size to fix this issue and keep the logic the same as before. This issue was found when implementing software pipeliner for RISC-V in #117546. The pipeliner may clone some memory operations with `BeforeOrAfterPointer` size.
[NFC] Fix uninitialized scalar field in constructor. (#118324)
Non-static class field is not initialized in constructor.
[flang][test] Recognize !$acc and !$omp spelled with capital letters (#…
…118666) If there are any continuation lines in the source, they will be printed by the unparser with capital letters (at least in case of OpenMP). To avoid having them stripped out, recognize their spellings using capital letters as well. --------- Co-authored-by: Michael Kruse <github@meinersbur.de>
[Matrix] Fix crash in liftTranspose when instructions are folded.
Builder.Create(F)Add may constant fold the inputs, return a constant instead of an instruction. Account for that instead of crashing.
[flang] fix private pointers and default initialized variables (#118494)
Both OpenMP privatization and DO CONCURRENT LOCAL lowering was incorrect for pointers and derived type with default initialization. For pointers, the descriptor was not established with the rank/type code/element size, leading to undefined behavior if any inquiry was made to it prior to a pointer assignment (and if/when using the runtime for pointer assignments, the descriptor must have been established). For derived type with default initialization, the copies were not default initialized.
[SystemZ] SIMM32 is a signed constant (#118634)
A follow-up to PR #117181: SIMM32 must use getSignedTargetConstant(), too.
[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144)
If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z/ihztLy
[Support] Use macro var args to allow templates within DEBUG_WITH_TYPE (
#117614) Use variadic args with DEBUG_WITH_TYPE("name", ...) macros to resolve a compilation failure that occurs when using a comma within the last macro argument. Commas come up when instantiating templates such as SmallMapVector that require multiple template args.
[MLIR][EmitC] arith-to-emitc: Fix lowering of fptoui (#118504)
`arith.fptoui %arg0 : f32 to i16` was lowered to ``` %0 = emitc.cast %arg0 : f32 to ui32 emitc.cast %0 : ui32 to i16 ``` and is now lowered to ``` %0 = emitc.cast %arg0 : f32 to ui16 emitc.cast %0 : ui16 to i16 ```
[SCCP] Regenerate test checks (NFC)
The checks generated by old UTC version fail on this test due to missing signature matching, so regenerate it with a newer one.
[AMDGPU] Refine AMDGPULateCodeGenPrepare class. NFC. (#118792)
Use references instead of pointers for most state and initialize it all in the constructor, and similarly for the LiveRegOptimizer class.
[ASTWriter] Do not allocate source location space for module maps use…
…d only for textual headers (#116374) This is a follow up to #112015 and it reduces the unnecessary duplication of source locations further. We do not need to allocate source location space in the serialized PCMs for module maps used only to find textual headers. Those module maps are never referenced from anywhere in the serialized ASTs and are re-read in other compilations. This change should not affect correctness of Clang compilations or clang-scan-deps in any way. We do need the InputFile entry in the serialized AST because clang-scan-deps relies on it. The previous patch introduced a mechanism to do exactly that. We have found that to finally remove any duplication of module maps we use internally in our build system.
[InstCombine] Move gep of phi fold into separate function
This makes sure that an early return during this fold doesn't end up skipping later gep folds.
[LoopVectorize] Restore cost check lines in test (NFC)
Accidentally dropped these while updating the test.
[OpenACC] Implement 'gang' clause for Combined Constructs
This one is a bit complicated, as it has some interesting interactions, as 'gang' Sema is required to look at its containing compute construct. Except in the case of a combined construct, they are the same. This resulted in a large refactor of the checking code for CheckGangExpr, plus some additional work on the diagnostics for its interaction with 'num_gangs' and 'vector'/'worker'.
Skip escaped newlines before checking for whitespace in Lexer::getRaw…
…Token. (#117548) The Lexer used in getRawToken is not told to keep whitespace, so when it skips over escaped newlines, it also ignores whitespace, regardless of getRawToken's IgnoreWhiteSpace parameter. Instead of letting this case fall through to lexing, check for whitespace after skipping over any escaped newlines.
[RISCV] Update matchSplatAsGather to convert vectors if they have dif…
…ferent sizes (#117878) This patch updates the matchSplatAsGather function so we can handle vectors of different sizes. The goal is to improve the code gen for @llvm.experimental.vector.match on RISCV. Currently, we use a scalar extract and splat instead of vrgather, and the patch changes that.
[NFC][SystemZ] Use SExt for signed constants (#118803)
Use SExt instead of ZExt in XForms which produce a signed value. This is only to make it clear that the XForm handles a signed value.
[NFC] Complete proper copying and resource cleanup in classes. (#118655)
Provide, where missing, a copy constructor, a copy assignment operator or a destructor to prevent potential issues that can arise.
[Clang] Fix -Wunused-private-field false negative with defaulted comp…
[RISCV][NFC] Don't set UnrollAndJamInnerLoopThreshold in getUnrolling…
…Preferences (#118572) This has no effect since its the default value used in llvm::gatherUnrollingPreferences.
[flang][test] Change re.I to flags=re.I in re.sub
Follow-up to da6099c. As a positional argument, the `re.I` was in place of `count`, not `flags`.
[InstCombine] Prevent infinite loop with two shifts (#118806)
The following pattern: `(C2 << X) << C1` will usually be transformed into `(C2 << C1) << X`, essentially swapping `X` and `C1`. However, this should only be done when `C1` is an immediate constant, otherwise thiscan lead to both constants being swapped forever. This fixes #118798.
[AMDGPU][True16][CodeGen] uaddsat/usubsat sdag for true16 format (#11…
…8708) uaddsat and usubsat SDAG codeGen pattern for True16 format witth V_ADD/SUB_NC_U16
[libc][docgen] update to POSIX.1-2024 (#118717)
The recently ratified POSIX.1-2024 is newer than POSIX.1-2017.
[ProfileData] Add InstrProfWriter::writeBinaryIds (NFC) (#118754)
The patch makes InstrProfWriter::writeImpl less monolithic by adding InstrProfWriter::writeBinaryIds to serialize binary IDs. This way, InstrProfWriter::writeImpl can simply call the new function instead of handling all the details within writeImpl.
[RISCV] Clear vill for whole vector register moves in vsetvli inserti…
…on (#118283) This is an alternative to #117866 that works by demanding a valid vtype instead of using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for #114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
[flang][cuda] Use async id for device stream allocation (#118733)
When stream is specified use cudaMallocAsync with the specified stream
[libc] revert all process_mrelease changes (#118650)
Revert as its test is unstable. #118057
[lldb] Fix the SocketTest failure on unsupported hosts (#118673)
The test `SocketTest::TCPListen0MultiListenerGetListeningConnectionURI` is failing on hosts that do not map `localhost` to both an ipv4 and ipv6 address. For example this build https://lab.llvm.org/buildbot/#/builders/195/builds/1909. To fix this, I added a helper to validate if the host has an /etc/hosts entry for both ipv4 and ipv6, otherwise we skip the test.
[flang] Assume matching shapes in elemental assignment with non-reall…
…oc lhs. (#118552) The optimized bufferization pass cannot optimize very simple cases of elemental assignments, because of the suboptimal checks order. This patch relies on the fact that in a legal program the lhs and rhs of an assignment have matching shapes, when lhs is not an allocatable and rhs is a result of an elemental array operation.
[flang] Expand SUM(DIM=CONSTANT) into an hlfir.elemental. (#118556)
An array SUM with the specified constant DIM argument may be expanded into hlfir.elemental with a reduction loop inside it processing all elements of the specified dimension. The expansion allows further optimization of the cases like `A=SUM(B+1,DIM=1)` in the optimized bufferization pass (given that it can prove there are no read/write conflicts).
[mlir] Add ValueBoundsOpInterfaceImpl for scf.forall (#118817)
Adds a ValueBoundsOpInterface implementation for scf.forall ops. The implementation supports bounding for both induction variables, results, and block args of the forall op. Induction variables are given upper and lower bounds based on the lower and upper loop bounds, and dimensions of the results and init block arguments are constrained to be equal to the matching dims of the shared_outs operand. Signed-off-by: Max Dawkins <maxdawkins19@gmail.com> Co-authored-by: Max Dawkins <maxdawkins19@gmail.com>
[AArch64][SME] Fix bug on SMELd1St1 (#118109)
Patch[1] has update intrinsic interface for ld1/st1, while based on ARM's document, "If the intrinsic also has a vnum argument, the ZA slice number is calculated by adding vnum to slice.". But the "vnum" did not work for our realization now, this patch fix this point. [1]ee31ba0
[RISCV][GISel] Enable support for ArrayType arguments if the element …
…type is also supported. This allows us to handle small coerced structs that are passed as [2 x i64]. This is one of the last big reasons for -O0 fallbacks in some of my testing.