Revert commit ba8565fbcb975e2d067ce3ae5a7dbaae4953edd3 #3

- [Libomptarget] Make the references to 'malloc' and 'free' weak. - [Libomptarget][NFC] Use C++ style attributes instead

…ment (llvm#69405) When the code is built with -mbackchain, it is possible to retrieve the caller's frame and return addresses. GCC already can do this, add this support to Clang as well. Use RISCVTargetLowering and GCC's s390_return_addr_rtx() as inspiration. Add tests based on what GCC is emitting.

Use of llvm::Optional was migrated to std::optional. This included a change in the constructor of ArrayRef. However, there are still 2 places in the SubtargetEmitter which uses llvm::None, causing a compile error when emitted.

Specializing for 8-bit integers to ensure values are printed as integers Fixes llvm#69310

Fix 'Bullet list ends without a blank line; unexpected unindent.

stack-uar.c is flaky (1 in 256 executions) because the random tag may be zero (llvm#69221). This patch works around the issue in the same way as deep-recursion.c (llvm@aa4dfd3), by falling back to a neighboring object, which must have a different (non-zero) tag. This patch also does a minor cleanup of the aforementioned deep-recursion.c, for consistency with stack-uar.c. Co-authored-by: Thurston Dang <thurston@google.com>

The user of CodeExtractor should be able to specify that the aggregate argument should be passed as a pointer in zero address space. CodeExtractor is used to generate outlined functions required by OpenMP runtime. The arguments of the outlined functions for OpenMP GPU code are in 0 address space. 0 address space does not need to be the default address space for GPU device. That's why there is a need to allow the user of CodeExtractor to specify, that the allocated aggregate parameter is passed as pointer in zero address space.

Document clang support for function pointers and virtual functions with HIP

…ion (llvm#68628) When MergeFuncs creates a thunk, it does not modify the function in place, but creates a new one altogether. If type metadata is not properly forwarded to this new function, LowerTypeTests will be unable to put this thunk into the dispatch table. The fix here is to just forward the type metadata to the newly created functions.

The current test cases to guard against speculative execution can actually be safely speculated because the denominator is known to be not 0 or -1, and isSafeToSpeculativelyExecuteWithOpcode will account for this. This adds some more test cases and rejigs some existing ones to use an unknown variable instead.

This makes it a lot easier to make wide ranging changes like I am about to do in https://llvm.org/D150610.

…ctions (llvm#69407) This will make it easier to implement new(nothrow) without calling the throwing version of new when exceptions are disabled. See https://llvm.org/D150610 for the full discussion.

The MVETRUNC operation can perform the same truncate of two vectors, without requiring lane inserts/extracts from every vector lane. This moves the concat i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra stack space for less instructions.

…n to disable memcpy instrumentation (llvm#69240) Deploying llvm#67766 to a large internal codebase uncovers many bugs (many are probably benign but need cleaning up). There are also issues in high-profile open-source projects like v8. Add a cl::opt to disable builtin instrumentation for -fsanitize=alignment to help large codebase users. In the long term, this cl::opt option may still be useful to debug -fsanitize=alignment instrumentation on builtins, so we probably want to keep it around.

This patch fixes: compiler-rt/lib/builtins/int_to_fp_impl.inc:36:10: error: expression is not an integer constant expression; folding it to a constant is a GNU extension [-Werror,-Wgnu-folding-constant]

This enables reading block sparse from file using libgen! (and soon also direct IR codegen)

When compiling for Darwin, sigset is not initialized. When -Werror,-Wuninitialized-const-reference are enabled we see the error: asan_interceptors.cpp:260:38: error: variable 'sigset' is uninitialized when passed as a const reference argument here [-Werror,-Wuninitialized-const-reference] This fixes the error

We can use std::pop_heap first and then retrieve the top priority item with pop_back_val, saving one line of code.

function, NFC.

InlinerOrder::front was removed by: commit d3b95ec Author: Kazu Hirata <kazu@google.com> Date: Sun Sep 18 08:49:44 2022 -0700 This patch removes a mention of front.

Example: dimLevelType = [ "compressed", "compressed" ] to map = (d0, d1) -> (d0 : compressed, d1 : compressed)

With the legality checks in place it is now safe to do. S_MOV_B64 shall not be used with wide literals, thus updating the test.

…nder with undef elements. (llvm#69482) Division/remainder by undef is immediate UB across the entire vector.

This was broken by b71edfa.

…-pypi-publish (llvm#69438)

Skip TrailingAnnotation when looking for TrailingReturnArrow. Fixes llvm#69234.

…lvm#69510) Broke https://lab.llvm.org/buildbot/#/builders/181/builds/24470. Could we build the example/tutorial code in the submit checks? This breakage wasn't caught at submit time.

…#69495) Reduce memory usage by only extract unit DIEs when cloning clang modules. We don't need the full debug info yet at this stage. This reduces peak memory usage of dsymutil when linking the swift driver by multiple gigabytes. rdar://117156180

During flang handling of semantics of OpenACC private/firstprivate/ reduction clauses (including the implicitly private loop IV), a new scoped symbol was being created. This could lead to ambiguity in the lowered FIR - aka having multiple fir.declare for the same symbol. Because lowering of OpenACC does not materialize the meaning of the private clauses (by actually creating a scoped local symbol), it does not make sense to create a new symbol in semantics either. I updated the acc-symbols01.f90 test to reflect this updated approach. Technically, the test could be removed, but it made sense to keep in place to highlight this intentional decision.

This patch fixes variable names in the style guide. Specifically, names in the form xyz_abc are changed to the form xyzAbc Signed-off-by: Tai Ly <tai.ly@arm.com>

Currently for any wasm target, llvm will make a pass that removes irreducible control flow. (See [here](https://llvm.org/doxygen/WebAssemblyFixIrreducibleControlFlow_8cpp.html)). This can result in O(NumBlocks * NumNestedLoops * NumIrreducibleLoops + NumLoops * NumLoops) build time, which has resulted in exceedingly long build times when testing. This PR introduces a hidden flag to skip this pass, which brings some of our build times down from 30 minutes to ~6 seconds.

…67863) This patch consists of porting SDISel patterns of SHXADD instructions to GISel. Note that `non_imm12`, a predicate that was implemented with `PatLeaf`, is now turned into a `PatFrag` of `<op>_with_non_imm12` where `op` is the operator that uses `the non_imm12` operand, as GISel doesn't have equivalence of `PatLeaf` at this moment.

Summary: When a PCM file is loaded, it can go wrong in various ways. The current diagnostic only produces the name of the malformed PCM, not why it is malformed. Expand the diagnostic to display what went wrong! There is only one call site for this diagnostic, and it already passes the error message: https://github.com/llvm/llvm-project/blob/main/clang/lib/Serialization/ASTReader.cpp#L4763-L4764 Test Plan: The modified LIT test. --------- Co-authored-by: Nuri Amari <nuriamari@fb.com>

…vm#69396) An oversight in https://reviews.llvm.org/D148836 since this is a different code path.

AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267 AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343 RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-options-support/73672

…dling (llvm#69095) - Add in some other linker command line options that the other BSD's handle - Make use of AddFilePathLibArgs() - Handle OpenMP

This is needed when we run out of registers.

…tion (llvm#69540) This makes sure - GEN MAP dim=2 lvl=4 (d0, d1) -> (d0 floordiv 2, d1 floordiv 2, d0 mod 2, d1 mod 2) -- (d0, d1, d2, d3) -> (d0 * 2 + d2, d1 * 2 + d3) is indeed encoded as MAP-REF (dim=2, lvl=4) isperm=0 d2l = [ d0/2 d1/2 d0%2 d1%2 ] ld2 = [ l2+2*l0 l3+2*l1 ]

…lvm#68474) Current implementation of matchFunnelShift only allows opposite shift pattern. Refactor it to allow more pattern.

…ions and referenced by non-ALLOC bbf7b9d accidentally caused a regression that is fixed by llvm#69425. Add test to prevent regression.

With this patch, all CFRs can be used for register allocation.

PR llvm#67391 improved atomic codegen by handling memory ordering specified by the `cmpxchg` instruction. An acquire barrier needs to be generated when memory ordering includes an acquire operation. This PR improves the codegen further by only handling the failure ordering.

I have run into assertion failures quite often when calling this method via `DenseElementsAttr::get`, and I think this would help, at the very least, by printing out the bit width size mismatches, rather than a plain assertion failure. I included all the other cases in the method for completeness

The test is meant to execute on the native target and only initializes the native target. However, it then gets the default target triple instead of the process (host) triple. This fails in cases where the native target and the default target are not the same. The test was added here: https://reviews.llvm.org/D154100

In the future this should probably be autogenerated so it defines library version. See: Discussion in #libc https://discord.com/channels/636084430946959380/636732994891284500/1163979080979460176

llvm#67285) As the comments of `InvalidateInstructionCache`: Before the JIT can run a block of code that has been emitted it must invalidate the instruction cache on some platforms. I think it applies to LoongArch as LoongArch has a weak memory-model. But I'm not able to write a test to demonstrate this issue. Perhaps self-modifing code should be wrote?

…8920) This allows running these quick checks faster than in our Buildkite pipeline, which has much more latency. This will also avoid blocking the rest of the testing pipeline in case the generated-files checks are failing.

…ee (llvm#68109) std::atomic is implemented with the following (confusing!) hierarchy of types: std::atomic<T> : std::__atomic_base<T> { ... }; std::__atomic_base<T> { std::__cxx_atomic_impl<T> __impl; }; std::__cxx_atomic_impl<T> { _Atomic(T) __val; }; Inside std::__atomic_base, we implement the is_lock_free() and is_always_lock_free() functions. However, we used to implement them inconsistently: - is_always_lock_free() is based on whether __cxx_atomic_impl<T> is always lock free (using the builtin), which means that we include any potential padding added by _Atomic(T) into the determination. - is_lock_free() was based on whether T is lock free (using the builtin), which meant that we did not take into account any potential padding added by _Atomic(T). It is important to note that the padding added by _Atomic(T) can turn a type that wouldn't be lock free into a lock free type, for example by making its size become a power of two. The inconsistency of how the two functions were implemented could lead to cases where is_always_lock_free() would return true, but is_lock_free() would then return false. This is the case for example of the following type, which is always lock free on arm64 but was incorrectly reported as !is_lock_free() before this patch: struct Foo { float x[3]; }; This patch switches the determination of is_lock_free() to be based on __cxx_atomic_impl<T> instead to match how we determine is_always_lock_free(). rdar://115324353

X86 supports this calling convention but I don't find any special handling, so I think we can just handle it via CC_RISCV. This should fix llvm#69197.

…llvm#69386) Fixes llvm#69367

…lvm#67023) Same as https://reviews.llvm.org/D147089 but for std::nth_element

…d assignment operators Differential Revision: https://reviews.llvm.org/D154499 Co-authored-by: Louis Dionne <ldionne.2@gmail.com>

Avoids hitting assertions due to unsupported convolution patterns. See iree-org/iree#15207 (comment)

…7726)

…e binop is a shift. (llvm#69349) The RHS of a shift can have a different type than the LHS. If there are undefs in the vector, we need the undef added to the RHS to match the type of any shift amounts that are also added to the vector. For now just don't add shifts if their RHS and LHS don't match.

…8336)

Just like what other targets have done. And this will make DAG mutations like MacroFusion take effect.

If the first operand of ADDI is a frame index, then it won't have data dependency of predecessor LUI. So it is impossible to do the DAG mutation in these two instructions.

Currently the PR formatting job only runs clang-format. There isn't a lot of utility in running it if there aren't any C/C++ changes as there will be nothing to format. This isn't super noisy currently as the job doesn't fail if there aren't any C/C++ changes, but it's a bit of a waste. In addition, this patch names the code formatting job "Check C++ Formatting" to make it clear that this job only checks C/C++ formatting rather than Python formatting/other languages.

Recently, support for building the LLVM documentation within Github actions landed, allowing for easy testing of the docs both pre and post landing. This patch extends that functionality to clang and adds in additional support to the docs Github workflow to only build the docs for the subproject whose documentation has been touched.

This reverts commit fd31112. There are two different crash reports on llvm@fd31112

…lvm#69556)" This reverts commit 80b2aac. I mistakenly assumed this job didn't also do python formatting (should've grepped for more than just black in the python portion of this script). Pulling it out for now to get python formatting working again while the patch is iterated further.

llvm#68572) …… (llvm#68394)" The new warnings are now under a separate flag `-Wthread-safety-reference-return`, which is on by default under `-Wthread-safety-reference`. - People can opt out via `-Wthread-safety-reference -Wnothread-safety-reference-return`. This reverts commit 859f2d0.

Control flow does not necessary continue past guard intrinsics, so don't mark them as willreturn. This fixes the miscompile in the sdiv-guard.ll test.

…m#69314) replaceValuesPerBlockEntry() only handled simple and coerced load values, however the load may also be referenced by a select value. Additionally, I suspect that the previous code might have been incorrect if a load had an offset, as it always constructed the AvailableValue from scratch. Fixes llvm#69301.

) Fixes llvm#69369. Fixes clangd/clangd#1700.

The keyword is intended for debugging purpose. It prints a message to stderr. This patch is based on code originally written by Adam Nemet, and on the feedback received by the reviewers in https://reviews.llvm.org/D157492.

This patch fixes: compiler-rt/lib/builtins/cpu_model.c:590:5: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] by adding a missing "break;".

* `dump`, added in llvm#68793 * `!repr`, added in llvm#68716

…lvm#68897) * `dump`, added in llvm#68793 * `!repr`, added in llvm#68716 The keyword `assert` was missing, so I have added that too.

…d x, c)) with Zicond. (llvm#69563) It's only beneficial when cond is setcc with integer equality condition code. For other case, it has same instruction count as the original.

This PR adds `nvvm.stmatrix` Op to NVVM dialect. The Op collectively store one or more matrices across all threads in a warp to the given address location in shared memory.

Add tests with argmem variations.

…tor (llvm#69010) Adds a new `__builtin_vectorelements()` function which returns the number of elements for a given vector either at compile-time for fixed-sized vectors, e.g., created via `__attribute__((vector_size(N)))` or at runtime via a call to `@llvm.vscale.i32()` for scalable vectors, e.g., SVE or RISCV V. The new builtin follows a similar path as `sizeof()`, as it essentially does the same thing but for the number of elements in vector instead of the number of bytes. This allows us to re-use a lot of the existing logic to handle types etc. A small side addition is `Type::isSizelessVectorType()`, which we need to distinguish between sizeless vectors (SVE, RISCV V) and sizeless types (WASM). This is the [corresponding discussion](https://discourse.llvm.org/t/new-builtin-function-to-get-number-of-lanes-in-simd-vectors/73911).

…s. (llvm#69329) A recent commit (llvm#69190) broke the bazel builds. Turns out that Bazel uses symlinks for providing the test files, which the path expansion of the module loading mechanism did not handle correctly. This PR fixes that. It also reorganizes the tests better: It puts all `.mlir` files that are included by some other test into a common `include` folder. This greatly simplifies the definition of the dependencies between the different `.mlir` files in Bazel's `BUILD` file. The commit also adds a comment to all included files why these aren't tested themselves direclty and uses the `%{fs-sep}` expansion for paths more consistently. Finally, it uncomments all but one of the tests excluded in Bazel because they seem to run now. (The remaining one includes a file that it itself a test, so it would have to live *in* and *outside* of the `include` folder.)

…lvm#68962) The _mm_cmpistri instruction can be used to quickly parse identifiers. With this patch activated, clang pre-processes <iostream> 1.8% faster, and sqlite3.c amalgametion 1.5% faster, based on time measurements and number of executed instructions as measured by valgrind. The introduction of an extra helper function in the regular case has no impact on performance, see https://llvm-compile-time-tracker.com/compare.php?from=30240e428f0ec7d4a6d1b84f9f807ce12b46cfd1&to=12bcb016cde4579ca7b75397762098c03eb4f264&stat=instructions:u --------- Co-authored-by: serge-sans-paille <sguelton@mozilla.com>

…ize tests. (llvm#69329)" This reverts commit f681225. That commit changed the organization of the tests of the transform dialect interpreter but did not take into account some tests that were added in the meantime.

As described in: ARM-software/acle#257 Patch by : Sander de Smalen<sander.desmalen@arm.com> Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D151199

…nize tests. (llvm#69329)" This reverts commit c122b97 but fixes tests that were added between submitting llvm#69329 for review and landing it for the first time.

Building helloworld.c currently errors with "undefined symbol: __llvm_libc_syscall" See: llvm#67032

…variables" This reverts commit 3353f7d. Fixed test bug (unspecified order of arg evaluation)

…fault_mem_order REQUIRES clause This patch creates the `OmpRewriteMutator` pass that runs at the end of `RewriteParseTree()`. This pass is intended to make OpenMP-specific mutations to the PFT after name resolution. In the case of the `atomic_default_mem_order` clause of the REQUIRES directive, name resolution results in populating global symbols with information about the REQUIRES clauses that apply to that scope. The new rewrite pass is then able to use this information in order to explicitly set the memory order of ATOMIC constructs for which that is not already specified. Given that this rewrite happens before semantics checks, the check of the order in which ATOMIC constructs without explicit memory order and REQUIRES directives with `atomic_default_mem_order` appear is moved earlier into the rewrite pass. Otherwise, these problems would not be caught by semantics checks, since the PFT would be modified by that stage. This is patch 4/5 of a series splitting D149337 to simplify review. Depends on D157983. Differential Revision: https://reviews.llvm.org/D158096

To fix that ticket we only needed to address the V_LSHLREV_B16 case, but I did it for all insts just in case. Fixes llvm#66899

Small fix for failing tests after merge of llvm#69010. The tests need `REQUIRES` to ensure that the correct headers are available. I've also added a generic x86 build which does not need headers, so there is at least one run per test.

When the FPU was selected with "+(no)fp(.dp)" extensions in "-march" or "-mcpu" options, the FPU used for multilib selection was still the default one for given architecture or CPU.

We already save the information about signedness ourselves.

As described in: ARM-software/acle#257 Patch by : David Sherwood <david.sherwood@arm.com> Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D151307

Similar to what we already do for add/sub + saturation variants. Scalar support will be added in a future patch covering the other variants at the same time. Alive2: https://alive2.llvm.org/ce/z/rBDrNE Fixes llvm#69080

…vm#69539) The immediate legality checks are now embedded into the isOperandLegal(). It is not needed to check it again.

This is done by lowering v16i8 loads into LoadV4 operations with i32 results instead of letting ReplaceLoadVector split it into smaller loads during legalization. This is done at dag-combine1 time, so that vector operations with i8 elements can be optimised away instead of being needlessly split during legalization, which involves storing to the stack and loading it back.

In llvm#69582, I accidentally disabled all tests for the changed introduced in llvm#69010. This change should use the correct `REQUIRES` syntax to en-/disable target-specific tests.

…llvm#66924) The issue llvm#55208 noticed that std::rint is vectorized by the SLPVectorizer, but a very similar function, std::lrint, is not. std::lrint corresponds to ISD::LRINT in the SelectionDAG, and std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now, neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant, and the LangRef makes this clear in the documentation of llvm.lrint.* and llvm.llrint.*. This patch extends the LangRef to include vector variants of llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of scalarizing it for all targets. However, this patch would be devoid of motivation unless we show the utility of these new vector variants. Hence, the RISCV target has been chosen to implement a custom lowering to the vfcvt.x.f.v instruction. The patch also includes a CostModel for RISCV, and a trivial follow-up can potentially enable the SLPVectorizer to vectorize std::lrint and std::llrint, fixing llvm#55208. The patch includes tests, obviously for the RISCV target, but also for the X86, AArch64, and PowerPC targets to justify the addition of the vector variants to the LangRef.

…lvm#67432)' This reverts revert 1950507. An issue was fixed in bea3684 and some newly appeared tests updated.

…n that they never create undef/poison We need to assume that we generate poison if the assertions failed Fixes llvm#66603

For this code: struct O { int &&j; }; O o1(0); The generated AST for the initializer of o1 is: VarDecl 0x62100006ab08 <array.cpp:119:3, col:9> col:5 o1 'O':'O' parenlistinit `-ExprWithCleanups 0x62100006b250 <col:7, col:9> 'O':'O' `-CXXParenListInitExpr 0x62100006b210 <col:7, col:9> 'O':'O' `-MaterializeTemporaryExpr 0x62100006b1f0 <col:8> 'int' xvalue `-IntegerLiteral 0x62100006abd0 <col:8> 'int' 0 Before this patch, we create a local temporary variable for the MaterializeTemporaryExpr and destroy it again when destroying the EvalEmitter we create to interpret the initializer. However, since O::j is a reference, this reference now points to a local variable that doesn't exist anymore. Differential Revision: https://reviews.llvm.org/D156453

…vm#69518) Summary: The `libcgpu.a` file provides its own implementation of `__assert_fail`. This adds a test to make sure it's usable in OpenMP offloading as expected. Currently this requires linking `libcgpu.a` before the OpenMP device RTL however. We also disable the test on the CPU as the format of the string will be different.

…and omp_target_memset_async() (llvm#68706) Implement a slow-path version of omp_target_memset*() There is a TODO to implement a fast path that uses an on-device kernel instead of the host-based memory fill operation. This may require some additional plumbing to have kernels in libomptarget.so

Solves llvm#68315

The check for dladdr1 for shared libc is too strict. Depending on how the system is setup we sometimes pick up the none generic lib name with the version string in it. Update check to for libc to account for version string.

…ests (llvm#69558) Clean up usage of `DECLARE_SPECIAL_CONSTANTS` in global scope.

Now all wmma store builtins have src param marked const. Reviewers: Tra

…9121) The update stems from the discussion in https://discourse.llvm.org/t/adding-flang-specific-header-files-to-clang/72442 This is my second attempt at this. My first attempt was in pull request llvm#68756. I decided to put ISO_Fortran_binding.h in a place where it would be accessible with the include: "#include<ISO_Fortran_binding.h>" rather than "#include<fortran/ISO_Fortran_binding.h>" because this is what gfortran implements. Note that the file is also installed into ".../include/flang", so if a user wanted to access the file from a compiler other than clang, it would be available. I added a test in ".../flang/test/Examples". To make the test work, I also needed to put ISO_Fortran_binding.h into the build area. Although the flang project depends on clang, clang may not always be available in a flang build. For example, when building just the "check-flang" target, the "clang" executable may not be available at the time the new test gets run. To account for this, I made the test's script check for the existence of the "clang" executable. If "clang" is not available, it simply prints "PASS". If it is available, it fully builds and executes the test. On success, this will also print "PASS"

Patch by : David Sherwood <david.sherwood@arm.com> As described in: ARM-software/acle#257 Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D151433

…lvm#66784) Teach the LiveIntervals path in isPlainlyKilled to handle physical registers, to get equivalent functionality with the LiveVariables path. Test this by adding -early-live-intervals RUN lines to a handful of tests that would fail without this.

llvm#69512) Instead of raising an error for a misplaced `end loop directive`, just warn about it and ignore it. This directive is an extension and is optional.

…ds (llvm#69087) It's not a verifier enforced property that implicit_def may only have one operand. Fixes assertions after the coalescer implicit-defs to preserve super register liveness to arbitrary instructions. For some reason I'm unable to reproduce this as a MIR test running only the allocator for the x86 test. Not sure it's worth keeping around.

clang-cl is a driver mode that accepts options of MSVC cl.exe as a drop-in replacement for cl.exe. Currently clang-cl accepts mixed clang style options and cl style options. To let clang-cl accept a clang-style option, just need to add visibility CLOption to that option. Currently nvcc can pass cl style options to cl.exe, which allows nvcc to compile C++ and CUDA programs with mixed nvcc and cl style options. On the other hand, clang cannot use mixed clang and cl style options to compile CUDA/HIP programs. This patch add visibility CLOption to options needed to compile CUDA/HIP programs. This allows clang-cl to compile CUDA/HIP programs with mixed clang and cl style options.

…Format (llvm#68876) This is just a slight specialization of `TypesMatchWith` that returns success if an optional parameter is missing. There may be other places this could help e.g.: https://github.com/llvm/llvm-project/blob/eb21049b4b904b072679ece60e73c6b0dc0d1ebf/mlir/include/mlir/Dialect/X86Vector/X86Vector.td#L58-L59 ...but I'm leaving those to avoid some churn. This constraint will be handy for us in some later patches, it's a formalization of a short circuiting trick with the `comparator` of the `TypesMatchWith` constraint (devised for llvm#69195). ``` TypesMatchWith< "padding type matches element type of result (if present)", "result", "padding", "::llvm::cast<VectorType>($_self).getElementType()", // This returns true if no padding is present, or it's present with a type that matches the element type of `result`. "!getPadding() || std::equal_to<>()"> ``` This is a little non-obvious, so after this patch you can instead do: ``` OptionalTypesMatchWith< "padding type matches element type of result (if present)", "result", "padding", "::llvm::cast<VectorType>($_self).getElementType()"> ```

As described in: ARM-software/acle#257 Patch by: David Sherwood <david.sherwood@arm.com> Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D151439

…lvm#69491) stack-uas.c and stack-history-length.c both have -hwasan-record-stack-history=libcall, which makes the stack base tag fully randomized. They may therefore sometimes have a zero tag for a stack allocated variable, resulting in a false negative (llvm#69221 (comment)). This patch applies the same workaround as used for deep-recursion.c (llvm@aa4dfd3) and stack-uar.c (llvm@ddf1de2): have two adjacent stack-allocated variables, and use whichever is not zero-tagged. These are the last remaining test cases that use -hwasan-record-stack-history=libcall. stack-uas flakiness spotted in the wild: https://lab.llvm.org/buildbot/#/builders/269/builds/549/steps/11/logs/stdio stack-history-length: https://lab.llvm.org/buildbot/#/builders/269/builds/537 Co-authored-by: Thurston Dang <thurston@google.com>

…#69389) Updates: 1. Verification of block sparsity. 2. Verification of singleton level type can only follow compressed or loose_compressed levels. And all level types after singleton should be singleton. 3. Added getBlockSize function. 4. Added an invalid encoding test for an incorrect lvlToDim map that user provides.

- Be explicit about which program resource register is supported by which target - RSRC1 - FP16_OVFL is GFX9+ - WGP_MODE is GFX10+ - MEM_ORDERED is GFX10+ - FWD_PROGRESS is GFX10+ - RSRC3 - INST_PREF_SIZE is GFX11+ - TRAP_ON_START is GFX11+ - TRAP_ON_END is GFX11+ - IMAGE_OP is GFX11+ - Do not emit GFX11+ fields when disassembling GFX10 code objects - Tighten enforcement of reserved bits in disassembler --------- Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>

As described in: ARM-software/acle#257 Patch by: Kerry McLaughlin <kerry.mclaughlin@arm.com> Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D151461

…ation (llvm#69494) Previously we were just matching against a fixed list of VP intrinsics that we knew couldn't be speculated, but we can reuse the logic in isSafeToSpeculativelyExecuteWithOpcode. This also allows speculation in more cases, e.g. when the divisor is known to be non-zero. Unfortunately we can't reuse the exact same function call for VP intrinsics with functional intrinsics instead of opcodes, because isSafeToSpeculativelyExecute needs an instruction that already exists. So this just copies the logic by peeking into the function attributes of the intrinsic.

…:fixupIndex (llvm#69505) In ef762e5e7292, I shifted around where errors were reported when failing to parse and/or validate DWARFUnitHeaders. When we are doing so in DWARFContext::fixupIndex, the actual error message isn't prefixed with `warning:` like it would be elsewhere (because of the way `logAllUnhandledErrors` is implemented).

Rename lldb-vscode to lldb-dap. This change is largely mechanical. The following substitutions cover the majority of the changes in this commit: s/VSCODE/DAP/ s/VSCode/DAP/ s/vscode/dap/ s/g_vsc/g_dap/ Discourse RFC: https://discourse.llvm.org/t/rfc-rename-lldb-vscode-to-lldb-dap/74075/

Regression caused by e880e8a Due to aux-target mismatch. Add -target option to fix aux-target. https://lab.llvm.org/buildbot/#/builders/230/builds/20138

…69496) This function was returning failure when any of the intersection sets was empty, but this is actually legitimate in "matrix times vector" cases, where some of the operands have lower dimensionality, implying unit-dimension semantics for the "missing" dimensions. Example: ```mlir func.func @transpose_extend_batch_matmul( %vec: tensor<32x128xi16>, %mat: tensor<11008x32x128xi4>) -> tensor<11008x32xi32> { %c0_i32 = arith.constant 0 : i32 %cst_0 = arith.constant 0.000000e+00 : f32 %0 = tensor.empty() : tensor<11008x32xi32> %1 = linalg.fill ins(%c0_i32 : i32) outs(%0 : tensor<11008x32xi32>) -> tensor<11008x32xi32> %2 = tensor.empty() : tensor<11008xf32> %3 = linalg.fill ins(%cst_0 : f32) outs(%2 : tensor<11008xf32>) -> tensor<11008xf32> %batch_matmul_result = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"]} ins(%vec, %mat : tensor<32x128xi16>, tensor<11008x32x128xi4>) outs(%1 : tensor<11008x32xi32>) { ^bb0(%in: i16, %in_3: i4, %out: i32): %19 = arith.extsi %in : i16 to i32 %20 = arith.extui %in_3 : i4 to i32 %21 = arith.muli %19, %20 : i32 %22 = arith.addi %21, %out : i32 linalg.yield %22 : i32 } -> tensor<11008x32xi32> return %batch_matmul_result : tensor<11008x32xi32> } ``` Here, we were returning failure because `ac` is empty. With this PR, we return this useful information: ``` batch: [ 1 ] m: [ ] n: [ 0 ] k: [ 2 ] ```

…rrect order Otherwise you will likely get crashes.

This patch addresses the missed review comment from PR llvm#67063. It renames LIT flag "--disable-gtest-sharding" to "--no-gtest-sharding" and corrects the code style issue.

…ult (llvm#69474) At the moment, all alloc-like functions are assumed to return non-null pointers, if their return value is only used in a compare. This is based on being allowed to substitute the allocation function with one that doesn't fail to allocate the required memory. aligned_alloc however must also return null if the required alignment cannot be satisfied, so I don't think the same reasoning as above can be applied to it. This patch adds a bail-out for aligned_alloc calls to isAllocSiteRemovable.

See-also: llvm#69548

…69527)

Eliding the vReg to NZCV conversion instruction for G_UADDE/... is illegal if it causes the carry generating instruction to become dead because ISel will just remove the dead instruction. I accidentally introduced this here: https://reviews.llvm.org/D153164. As far as I can tell, this is not exposed on the default clang settings, because on O0 there is always a G_AND between boolean defs and uses, so the optimization doesn't apply. Thus, when I tried to commit https://reviews.llvm.org/D159140, which removes these G_ANDs on O0, I broke some UBSan tests. We fix this by recursively selecting the previous (NZCV-setting) instruction before continuing selection for the current instruction.

To calculate the LMUL with the same SEW/LMUL ratio when providing EEW.

…dden relocatable object file definition 1981b1b improved the check.

As mentioned in llvm#69619, C23 6.7.2.2p5 explicitly prohibits using a _BitInt as an underlying type to an enumeration. While we had this in the _ExtInt implementation, the justification for that limitation in C is compelling, so this is being removed to be compatible with the C23 standard. Fixes: llvm#69619

CompileUnit::SetSupportFiles had two overloads, one that took and lvalue reference and one that takes an rvalue reference. This removes both and replaces it with an overload that takes the FileSpecList by value and moves it into the member variable. Because we're storing the value as a member, this covers both cases. If the new FileSpecList was passed by lvalue reference, we'd copy it into the member anyway. If it was passed as an rvalue reference, we'll have created a new instance using its move and then immediately move it again into our member. In either case the number of copies remains unchanged.

…f `libm` (llvm#66034) This patch populates the GPU version of `libm` with missing vendor entrypoints. The vendor math entrypoints are disabled by default but can be enabled with the CMake option `LIBC_GPU_VENDOR_MATH=ON`.

Using `true` as a no-op unfortunately does not work on windows, which fails libcxx lit tests on windows. Lit provides the `:` internal shell builtin which is equivalent to `true`.

This adds test coverage for a crash exposed by d311126349b8fe1684d62154a9fa5a7bbb0b713.

This allows some basic variadic operands in rewrites. There were some workarounds employed (like "aliasing" the attribute). Couldn't find a way to do this directly with properties.

… reference roles (llvm#69370) Without this patch in expressions like `foo += 1` reference `foo` has no read and write roles. This happens because `CompoundAssignOperator` is also a `BinaryOperator`, thus handling `CompoindAssignOperator` in `else` branch is a dead code.

There's only one use and it eventually converts the pointer into a reference. Simplify things and always use references.

I could probably break this commit into more pieces. --- This patch adds libc++ support for Android L (Android 5.0+) and up, tested using the Android team's current compiler, a recent version of the AOSP sysroot, and the x86[-64] Android Emulator. CMake and Lit Configuration: Add runtimes/cmake/android/Arch-${ARCH}.cmake files that configure CMake to cross-compile to Android without using CMake's built-in NDK support (which only works with an actual packaged NDK). Add libcxx/cmake/caches/AndroidNDK.cmake that builds and tests libc++ (and libc++abi) for Android. This file configures libc++ to match what the NDK distributes, e.g.: - libc++_shared.so (includes libc++abi objects, there is no libc++abi.so). libunwind is linked statically but not exported. - libc++_static.a (does not include libc++abi) and libc++abi.a - `std::__ndk1` namespace - All the libraries are built with `__ANDROID_API__=21`, even when they are linked to something targeting a higher API level. (However, when the Android LLVM team builds these components, they do not use these CMake cache files. Instead they use Python scripts to configure the builds. See https://android.googlesource.com/toolchain/llvm_android/.) Add llvm-libc++[abi].android-ndk.cfg.in files that test the Android NDK's libc++_shared.so. These files can target old or new Android devices. The Android LLVM team uses these test files to test libc++ for both arm/arm64 and x86/x86_64 architectures. The Android testing mode works by setting %{executor} to adb_run.py, which uses `adb push` and `adb shell` to run tests remotely. adb_run.py always runs tests as the "shell" user even on an old emulator where "adb unroot" doesn't work. The script has workarounds for old Android devices. The script uses a Unix domain socket on the host (--job-limit-socket) to restrict concurrent adb invocations. Compiling the tests is a major part of libc++ testing run-time, so it's desirable to exploit all the host cores without overburdening the test devices, which can have far fewer cores. BuildKite CI: Add a builder to run-buildbot, `android-ndk-*`, that uses Android Clang and an Android sysroot to build libc++, then starts an Android emulator container to run tests. Run the emulator and an adb server in a separate Docker container (libcxx-ci-android-emulator), and create a separate Docker image for each emulator OS system image. Set ADB_SERVER_SOCKET to connect to the container's adb server. Running the only adb server inside the container makes cleanup more reliable between test runs, e.g. the adb client doesn't create a `~/.android` directory and the adb server can be restarted along with the emulator using docker stop/run. (N.B. The emulator insists on connecting to an adb server and will start one itself if it can't connect to one.) The suffix to the android-ndk-* job is a label that concisely specifies an Android SDK emulator image. e.g.: - "system-images;android-21;default;x86" ==> 21-def-x86 - "system-images;android-33;google_apis;x86_64" ==> 33-goog-x86_64 Fixes: llvm#69270 Differential Revision: https://reviews.llvm.org/D139147

…trings (llvm#69543) Prior to this patch, differing metadata operands to two otherwise identical instructions was not enough to consider the instructions different in the eyes of the function comparator. This breaks LLVM virtual function elimination, among other features. In this patch, we handle the case where two associated operands are MDStrings of different value. This patch does not differentiate more complex metadata operands. --------- Co-authored-by: Nuri Amari <nuriamari@fb.com>

Summary: The `fgets` function as implemented is not functional currently when called with multiple threads. This is because we rely on reapeatedly polling the character to detect EOF. This doesn't work when there are multiple threads that may with to poll the characters. this patch pulls out the logic into a standalone RPC call to handle this in a single operation such that calling it from multiple threads functions as expected. It also makes it less slow because we no longer make N RPC calls for N characters.

Summary: This patch partially implements the `rand` function on the GPU. This is partial because the GPU currently doesn't support thread local storage or static initializers. To implement this on the GPU. I use 1/8th of the local / shared memory quota to treak the shared memory as thread local storage. This is done by simply allocating enough storage for each thread in the block and indexing into this based off of the thread id. The downside to this is that it does not initialize `srand` correctly to be `1` as the standard says, it is also wasteful. In the future we should figure out a way to support TLS on the GPU so that this can be completely common and less resource intensive.

… acc serial (llvm#69622) For portability with other compilers, just issue a portability warning instead of a hard error when `num_gangs`, `num_workers` or `vector_length` are present on an `!$acc serial` directive

The method DWARFDebugInfoEntry::Extract needs to skip over all the data in the debug_info / debug_types section for each DIE. It had the logic to do so hardcoded inside a loop, when it already exists in a neatly isolated function.

Add a Dockerfile for a new Docker image, libcxx-builder-android, that extends libcxx-builder with support for testing Android. The image includes these things: * An Android Clang compiler and sysroot. * The Android platform-tools (e.g. adb), so that an Android buildbot can run programs on an Android device. At container startup, copy these platform tools to an "android-platform-tools" Docker volume to share them with an emulator container. This copying ensures that the emulator and libcxx-builder containers avoid mismatched adb versions. * Docker, so that an Android buildbot can manage a sibling Docker container that runs the Android emulator. Add an Android-specific run-buildbot-container script for local development. Currently using this script requires building libcxx-build-android and an emulator image locally. Fixes: llvm#69270 Differential Revision: https://reviews.llvm.org/D155271

This PR replaces the mixin `OpView` extension mechanism with the standard inheritance mechanism. Why? Firstly, mixins are not very pythonic (inheritance is usually used for this), a little convoluted, and too "tight" (can only be used in the immediately adjacent `_ext.py`). Secondly, it (mixins) are now blocking are correct implementation of "value builders" (see [here](llvm#68764)) where the problem becomes how to choose the correct base class that the value builder should call. This PR looks big/complicated but appearances are deceiving; 4 things were needed to make this work: 1. Drop `skipDefaultBuilders` in `OpPythonBindingGen::emitDefaultOpBuilders` 2. Former mixin extension classes are converted to inherit from the generated `OpView` instead of being "mixins" a. extension classes that simply were calling into an already generated `super().__init__` continue to do so b. (almost all) extension classes that were calling `self.build_generic` because of a lack of default builder being generated can now also just call `super().__init__` 3. To handle the [lone single use-case](https://sourcegraph.com/search?q=context%3Aglobal+select_opview_mixin&patternType=standard&sm=1&groupBy=repo) of `select_opview_mixin`, namely [linalg](https://github.com/llvm/llvm-project/blob/main/mlir/python/mlir/dialects/_linalg_ops_ext.py#L38), only a small change was necessary in `opdsl/lang/emitter.py` (thanks to the emission/generation of default builders/`__init__`s) 4. since the `extend_opview_class` decorator is removed, we need a way to register extension classes as the desired `OpView` that `op.opview` conjures into existence; so we do the standard thing and just enable replacing the existing registered `OpView` i.e., `register_operation(_Dialect, replace=True)`. Note, the upgrade path for the common case is to change an extension to inherit from the generated builder and decorate it with `register_operation(_Dialect, replace=True)`. In the slightly more complicated case where `super().__init(self.build_generic(...))` is called in the extension's `__init__`, this needs to be updated to call `__init__` in `OpView`, i.e., the grandparent (see updated docs). Note, also `<DIALECT>_ext.py` files/modules will no longer be automatically loaded. Note, the PR has 3 base commits that look funny but this was done for the purpose of tracking the line history of moving the `<DIALECT>_ops_ext.py` class into `<DIALECT>.py` and updating (commit labeled "fix").

Such that RISCVOptWInstrs can eliminate the redundant sign extend.

This function is unused and unimplemented.

…69630)

This may improve the waiting of `Region->MMLock` while trying to refill the freelist. Instead of always waiting on the completion of `populateFreeListAndPopBatch()` or `releaseToOSMaybe()`, `pushBlocks()` also refills the freelist. This increases the chance of earlier return from `popBatches()`. The support of condition variable hasn't been done for all platforms. Therefore, add another `popBatchWithCV()` and it can be configured in the allocator configuration by setting `Primary::UseConditionVariable` and the desired `ConditionVariableT`. Reviewed By: cferris Differential Revision: https://reviews.llvm.org/D156146

WebGPU does not currently support extended arithmetic, this is an issue when we want to lower from SPIR-V. This commit adds a pattern to transform and emulate spirv.IAddCarry with spirv.IAdd operations Fixes llvm#65154

llvm#68853 enabled a lot of nice cleanup. Note, I made sure each of the touched extensions had tests.

…ion. It can take raw pointers without triggering a warning. Also retire the support for makeRef and makeWeakPtr as they have been removed from WebKit.

This caused llc to assume the wrong target triple and broke some internal AS sanitizer bots.

… diff. (llvm#69652)

…ternal (llvm#69657) The symbol in bind clause on acc routine refers to a function or a subroutine. This patch avoids to raise error when the function or subroutine is declared later in the code or is external. This is in line with normal procedure name resolution in Fortran code.

Mark tests as necessary to accommodate Android L (5.0 / API 21) and up. Add three Android lit features: - android - android-device-api=(21,22,23,...) - LIBCXX-ANDROID-FIXME (for failures that need follow-up work) Enable an AIX workaround in filesystem_test_helper.h for the broken chmod on older Android devices. Mark failing test with XFAIL or UNSUPPORTED: - Mark modules tests as UNSUPPORTED, matching other configurations. - Mark a gdb test as UNSUPPORTED. - XFAIL tests for old devices that lack an API (fmemopen). - XFAIL various FS tests (because SELinux blocks FIFO and hard linking, because fchmodat is broken on old devices). - XFAIL various locale tests (because Bionic has limited locale support). (Also XFAIL an re.traits test.) - XFAIL some print.fun tests because the error exception has no system error string. - Mark std::{cin,wcin} tests UNSUPPORTED because they hang with adb_run.py on old devices. - Mark a few tests UNSUPPORTED because they allocate too much memory. - notify_one.pass.cpp is flaky on Android. - XFAIL libc++abi demangler test because of Android's special long double on x86[-64]. N.B. The `__ANDROID_API__` macro specifies a minimum required API level at build-time, whereas the android-device-api lit feature is the detected API level of the device at run-time. The android-device-api value will be >= `__ANDROID_API__`. This commit was split out from https://reviews.llvm.org/D139147. Fixes: llvm#69270

We don't have a pre-commit CI bot running Android tests yet, so this is still WIP.

…lvm#69549) This patch "constant propagates" LLVMContext::MD_mem_parallel_loop_access into wherever ParallelLoopAccessMDKind is used.

Exclude it from Darwin since /Users will be treated as a MSVC option. http://45.33.8.238/macm1/71368/step_7.txt

SiFive Int8 Matrix Multiplication Extensions Specification https://sifive.cdn.prismic.io/sifive/c4f0e51d-4dd3-402a-98bc-1ffad6011259_int8-matmul-spec.pdf

The immediate argument should be a target constant (`timm`).

…inatingCondition (llvm#67282) The second operand of a sdiv/udiv has to be non-null, as division by zero is UB. Proofs: https://alive2.llvm.org/ce/z/WttZbb Fixes llvm#64240.

To simplify some code.

…rallOp (llvm#67883) The `getSingle(IterationVar|UpperBound|LowerBound|Step)` methods of `LoopLikeOpInterface` are useful to quickly query the iteration space of unidimensional loops. Until now, `scf::ForallOp` always fell back to the default implementation of these methods, returning `std::nullopt`. This patch implements those methods, returning the respective bounds or steps in the special case of `rank == 1`.

VarLenCodeEmitterGen produced code that did not compile if using alternative encoding in different HwModes. It's not possbile to assign unsigned **Index = Index_<mode>[][2] = { ... }; As a fix, Index and InstBits where removed in favor of mode specific getInstBits_<mode> functions since this is the only place the arrays are accessed. Handling of HwModes is now concentrated to the VarLenCodeEmitterGen::run method reducing the overall amount of code and enabling other types of alternative encodings not related to HwModes. Added a test for VarLenCodeEmitterGen HwModes. Make sure that HwModes are supported in the same way they are supported for the standard CodeEmitter. It should be possible to define instructions with universal encoding across modes, distinct encodings for each mode or only define encodings for some modes. Fixed indentation in generated code.

In https://reviews.llvm.org/D125075, we switched to use FastPreTileConfig in O0 and abandoned X86PreAMXConfigPass. we can remove related code of X86PreAMXConfigPass safely.

The declaration was added without a corresponding function definition by: commit 0d8cb8b Author: David Blaikie <dblaikie@gmail.com> Date: Thu May 5 18:09:34 2022 +0000

Currently, all the analysis functions provided by `MCInstrAnalysis` work on a single instruction. On some targets, this limits the kind of instructions that can be successfully analyzed as common constructs may need multiple instructions. For example, a typical call sequence on RISC-V uses a auipc+jalr pair. In order to analyse the jalr inside `evaluateBranch`, information about the corresponding auipc is needed. Similarly, AArch64 uses adrp+ldr pairs to access globals. This patch proposes to add state to `MCInstrAnalysis` to support these use cases. Two new virtual methods are added: - `updateState`: takes an instruction and its address. This methods should be called by clients on every instruction and allows targets to store whatever information they need to analyse future instructions. - `resetState`: clears the state whenever it becomes irrelevant. Clients could call this, for example, when starting to disassemble a new function. Note that the default implementations do nothing so this patch is NFC. No actual state is stored inside `MCInstrAnalysis`; deciding the structure of the state is left to the targets. This patch also modifies llvm-objdump to use the new interface. This patch is an alternative to [D116677](https://reviews.llvm.org/D116677) and the idea of storing state in `MCInstrAnalysis` was first discussed there.

We should be able to merge the offset later.

…using the interface to generate `scf::forall`. (llvm#67083) Similar to `scf::tileUsingSCFForOp` that is a method that tiles operations that implement the `TilingInterface`, using `scf.for` operations, this method introduces tiling of operations using `scf.forall`. Most of this implementation is derived from `linalg::tileToForallOp` method. Eventually that method will either be deprecated or moved to use the method introduced here.

…ison operators (llvm#69373) If an iterator passed to std::uninitialized_copy & friends provided an unconstrained comparison operator, we would trigger an ambiguous overload resolution because we used to compare against __unreachable_sentinel in our implementation. This patch fixes that by only comparing the output iterator when it is actually required, i.e. in the <ranges> versions of the algorithms. Fixes llvm#69334

A new ComplexPattern `AddrRegImmLsb00000` is added, which is like `AddrRegImm` except that if the least significant 5 bits isn't all zeros, we will fail back to offset 0.

This patch implements `MCInstrAnalysis` state in order to be able analyze auipc+jalr pairs inside `evaluateBranch`. This is implemented as follows: - State: array of currently known GPR values; - Whenever an auipc is detected in `updateState`, update the state value of RD with the immediate; - Whenever a jalr is detected in `evaluateBranch`, check if the state holds a value for RS1 and use that to compute its target. Note that this is similar to how binutils implements it and the output of llvm-objdump should now mostly match the one of GNU objdump. This patch also updates the relevant llvm-objdump patches and adds a new one testing the output for interleaved auipc+jalr pairs.

This is primarily only useful when debugging. It's generally assumed that users will have their custom flags applied if it's specified in their CMake cache files. Addresses llvm#68393 (comment)

…rs (llvm#69571) When inferring readonly/writeonly on arguments, if the argument is passed to a call, we should only check the ArgMem effects implied by the call -- we don't care whether the call reads/writes non-arg memory (captured pointers are not relevant here, because they will abort the analysis entirely). This also fixes a regression that was introduced when moving to MemoryEffects: The code was still checking the old WriteOnly attribute on functions, which no longer exists.

This patch fixes the nesting of TosaValidation pass added in TosaToLinalg pipeline.

…rallelOp (llvm#68511) This adds implementations for `getSingleIterationVar`, `getSingleLowerBound`, `getSingleUpperBound`, `getSingleStep` of `LoopLikeOpInterface` to `scf::ParallelOp`. Until now, the implementations for these methods defaulted to returning `std::nullopt`, even in the special case where the parallel Op only has one dimension. Related: llvm#67883

…t-fixes (llvm#69453) Adding an additional parameter to run_clang_tidy.py to accept a directory where the clang-tidy fixes are saved to. This directory can then be used to run `clang-apply-replacements`. Closes llvm#69450

On Linux this contains a single register that determines memory tagging and tagged address ABI settings.

…(NFC)

…pp (NFC)

…sion.cpp (NFC)

… (NFC)

…cpp (NFC)

…te.cpp (NFC)

perf2bolt launches a few perf script commands and stores the output in temporary files before processing the output and cleaning them up before it exits. The command `perf script --show-mmap-events` outputs PERF_RECORD_MMAP2 and instruction tracing data but when processed it only looks for PERF_RECORD_MMAP2 and the instruction tracing data is ignored. This is fine for small amounts of instruction trace data but when I've recorded Arm ETM or Intel PT AUX I get lots of it By adding `--no-itrace` is will just show the PERF_RECORD_MMAP2 records and will save on time running the `perf script`, disk space storing the output & time parsing the output. It is the same for `perf script --show-task-events` where BOLT is only interested in the PERF_RECORD_COMM & PERF_RECORD_FORK records. ### Data | Perf Record | Perf Data Size | MMap Size | MMap No Itrace Size | |---|---|---|---| | perf record -e cs_etm/@tmc_etr0/u | 137K | 4468K | 0.632K | | perf record -e intel_pt//u | 890K | 33378K | 0.673K |

ISO_Fortran_binding.h was only added to in gcc 10.0. Flang should be buildable with older versions. Remove the test until a safe way to check that the compiler can run the test (that it is clang from the build for instance). Fix bots failure https://lab.llvm.org/buildbot/#/builders/181/builds/24526 Also in: https://lab.llvm.org/buildbot/#/builders/160 https://lab.llvm.org/buildbot/#/builders/268 https://lab.llvm.org/buildbot/#/builders/181

As described in: ARM-software/acle#257 Patch by: Rosie Sumpter <rosie.sumpter@arm.com> Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D151709

This makes the docs a little nicer to read, as these otherwise show up as "«unnamed»". The extra include is needed as naming means getters are generated, and the getters use the LLVM types.

…#69192) In TOSA MLIR dialect, fix the definition of the Clamp op to accept fp16 & bf16 datatype for the min_fp and max_fp attributes. Add ClampOp verifier to check attributes types compatibility. Add related test cases in Tosa/ops.mlir. Signed-off-by: Fabrizio Indirli <Fabrizio.Indirli@arm.com>

…LFIR (llvm#69441) The code in `copyHostAssociateVar` is using `createSomeArrayAssignment` for arrays which is using the soon legacy expression lowering. Update the copy to use hlfir.assign instead. I used the temporary_lhs flag to mimic the current behavior, but maybe user defined assignment should be called when needed .This flag also prevents any finalizers to be called on the LHS if the LHS type has finalizers (which would occur otherwise in normal intrinsic assignment). Again, I am not sure what the OpenMP spec wants here. Also, I added special handling for ALLOCATABLE, the current code seems broken to me since it is basically copying the descriptor which would lead to memory leak given the TEMP was previously allocated with the shape of the variable in createHostAssociateVarClone. So copying the DATA instead seemed like the right thing to do.

Power10 does not support Hardware Transactional Memory instructions. Remove to keep consistency.

) As requested in (llvm#66521) I confirmed a crash with "return" instead of "continue" in setVectorizedCallDecision's fmuladd reduction recognition.

Type extension is currently handled in FIR by inlining the parents components as the first member of the record type. This is not correct from a memory layout point of view since the storage size of the parent type may be bigger than the sum of the size of its component (due to alignment requirement). To avoid making FIR types target dependent and fix this issue, make the parent component a single component with the parent type at the beginning of the record type. This also simplifies addressing since parent component is now a "normal" component that can be designated with hlfir.designate. StructureComponent lowering however is a bit more complex since the symbols in the structure component may refer to subcomponents of parent types. Notes: 1. The fix is only done in HLFIR for now, a similar fix should be done in ConvertExpr.cpp to fix the path without HLFIR (I will likely still do it in a new patch since it would be an annoying bug to investigate for people testing flang without HLFIR). 2. The private component extra mangling is useless after this patch. I will remove it after 1. 3. The "parent component" TODO in constant CTOR is free to implement for HLFIR after this patch, but I would rather remove it and test it in a different patch.

…69615) This adds a flag to the `TransformDialectInterpreter` that relaxes the requirement for only a single top-level transform op. This is useful for supporting transforms that take transform IR as payload. This also aligns the function `findTopLevelTransform` [here](llvm@7b0f4c9#diff-551f92bb609487ccf981daf9571f0f1b1703ab2330560a388a5f0d133e520be4L59) with its documentation: In the presence of multiple top-level transform ops it now correctly returns the first of them after reporting the error instead of returning a `nullptr`.

This uses the fast-check allowlist added in the previous commit. This is behind a config option to allow users/developers to enable checks we haven't timed yet, and to allow the --check-tidy-time flag to work. Fixes clangd/clangd#1337 Differential Revision: https://reviews.llvm.org/D138505

Add AfterPlacementNew option to SpaceBeforeParensOptions to have more control on placement new expressions. Fixes llvm#41501 Relates to llvm#54703 Differential Revision: https://reviews.llvm.org/D127270

Fix two issues: * If a constant is used in another constant, we need to insert newly created instructions to worklist so that constant used in them will be converted. * Set debug info of original instruction to newly created instructions.

A recent change modified the parameter tileSize from Value to OpFoldResult. Therefore we should call getAsOpFoldResult before passing on the tileSize. Adjust a test regarding this new behavior.

Drop code inserting pointer casts. Check pointer types instead of address spaces.

Fixes llvm#67761 Trying `getDimSize()` before checking for 0-ranked-tensors throws assert errors. This PR ensures that it is checked for. Or should we throw an error if we have a 0-ranked-tensor in a tosa operation?

@PiotrZSL

…lvm#69700) If PyYAML is not installed, the `-export-fixes` can be used to specify a directory (not a file). Mentioning @PiotrZSL @dyung Follows llvm#69453

This patch adds "nice-to-have" feature in lit. it prints the total number of discovered tests at the beginning. It is covenient to see the total number of tests and avoid scrolling up to the beginning of log. Further, this patch also prints %ge of tests. Reviewed By: RoboTux, jdenny-ornl Co-authored-by: Madhur A <madhura@nvidia.com>

This reverts commit ba8565f.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert commit ba8565fbcb975e2d067ce3ae5a7dbaae4953edd3 #3

Revert commit ba8565fbcb975e2d067ce3ae5a7dbaae4953edd3 #3

Commits on Oct 18, 2023

Commits on Oct 19, 2023

Commits on Oct 20, 2023