forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU stepthomas atomic csub no rtn forms ver2 #1
Closed
stepthomas
wants to merge
467
commits into
main
from
AMDGPU-stepthomas-atomic-csub-no-rtn-forms-ver2
Closed
AMDGPU stepthomas atomic csub no rtn forms ver2 #1
stepthomas
wants to merge
467
commits into
main
from
AMDGPU-stepthomas-atomic-csub-no-rtn-forms-ver2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ability macros Adding additional instantiations to the dylib isn't actually an ABI break as long as programs targeting an older dylib don't start to depend on them. Making additional instantiations a matter of availability allows us to add them without an ABI break. Reviewed By: #libc, ldionne, Mordante Spies: arichardson, ldionne, Mordante, libcxx-commits Differential Revision: https://reviews.llvm.org/D154796
- Add test coverage for sext/zext boolean additions - [InstCombine] Fold comparison of adding two z/sext booleans Fixes llvm#64859.
Reviewed By: #libc, ldionne Spies: ldionne, Mordante, libcxx-commits Differential Revision: https://reviews.llvm.org/D155411
…stinations (llvm#68074) Extend `bufferization.materialize_in_destination` to support memref destinations. This op can now be used to indicate that a tensor computation should materialize in a given buffer (that may have been allocated by another component/runtime). The op still participates in "empty tensor elimination". Example: ```mlir func.func @test(%out: memref<10xf32>) { %t = tensor.empty() : tensor<10xf32> %c = linalg.generic ... outs(%t: tensor<10xf32>) -> tensor<10xf32> bufferization.materialize_in_destination %c in restrict writable %out : (tensor<10xf32>, memref<10xf32>) -> () return } ``` After "empty tensor elimination", the above IR can bufferize without an allocation: ```mlir func.func @test(%out: memref<10xf32>) { linalg.generic ... outs(%out: memref<10xf32>) return } ``` This change also clarifies the meaning of the `restrict` unit attribute on `bufferization.to_tensor` ops.
This reverts commit 0687e4d. Causes LLDB failures: https://reviews.llvm.org/D101206#4653253
Some large AVR programs (for devices without long jump) may exceed 128KiB, and lld should give explicit errors other than generate wrong executables silently.
…mber detection (llvm#68405) Fixes misleading comment introduced in `f74aaca63202cabb512c78fe19196ff348d436a8`
…llvm#67120) The transfrom interpreter accepts an argument to a "library" file with named sequences. This patch exteneds this functionality such that (1) several such individual files are accepted and (2) folders can be passed in, in which all `*.mlir` files are loaded.
…ests Made these TODO instead of negative
…ffles Allow length changing shuffle masks in the "bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'" fold. It also exposes some poor shuffle mask detection for extract/insert subvector cases inside improveShuffleKindFromMask First stage towards addressing Issue llvm#67803
…s from SK_PermuteSingleSrc
This patch tries to canonicalise add + gep to gep + gep. Co-authored-by: Paul Walker <paul.walker@arm.com> Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D155688
…m#68403) Instead of RDSVL * RDSVL.
…erErrors member variable correctly based on the PP diagnostics. (llvm#68127)"" This reverts commit a6acf3f and relands a50e63b. The original revert was done by mistake.
…8382) While working on llvm#68377 inspecting `Allocate()` calls, I found out that there are couple of places where we forget to use placement-new to create objects in the allocated memory.
Second try...
/llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h:948:33: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Werror,-Wsign-compare] (Index + Mask.size()) <= NumSrcElts) { ~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
…ing a simple transform dialect interpreter pass (llvm#68330)
These were missed as I didn't expect clang codegen to be updated
…lvm#65852) This patch folds the pattern `a ne/eq (zext/sext (a ne/eq c))` into a boolean constant or a compare. Clang vs GCC: https://godbolt.org/z/4ro817WE8 Proof for `zext`: https://alive2.llvm.org/ce/z/6z9NRF Proof for `sext`: https://alive2.llvm.org/ce/z/tv5wuE Fixes llvm#65073.
This reverts commit 3f8ef57.
Introduced by 5b657f5 that moved LICM after AMDGPUCodeGenPrepare. Some instructions are no longer sunk during ir optimizations but in machine-sinking instead. If vgpr instruction used sgpr defined inside the cycle is sunk outside of the cycle we end up with not-handled case of temporal divergence. Add test for theoretical case when SALU instruction (represents uniform value) is sunk outside of the cycle. Add a test when SALU instruction can be sunk if it edits lane mask.
Temporal divergence that was present in input or introduced in IR transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies by changing sgpr source instr to vgpr instr. After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare, machine-sinking can introduce temporal divergence by sinking instructions outside of the cycle. Add isSafeToSink callback in TargetInstrInfo.
…er (llvm#67284) For now, data location expression is hard coded to little endian. We are going to support sanitizers on AIX which is big endian. Support big endian too in the data location expression parser of llvm-symbolizer.
llvm#68501) …ailabl externally A workaround for llvm#60996 As the title suggested, we can avoid emitting available externally functions which is marked as noinline already. Such functions should contribute nothing for optimizations. The update for docs will be sent seperately if this got approved.
1. Map R16-R31 to DWARF registers 130-145. 2. Make R16-R31 caller-saved registers. 3. Make R16-31 allocatable only when feature EGPR is supported 4. Make R16-31 availabe for instructions in legacy maps 0/1 and EVEX space, except XSAVE*/XRSTOR RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4 Explanations for some seemingly unrelated changes: inline-asm-registers.mir, statepoint-invoke-ra-enter-at-end.mir: The immediate (TargetInstrInfo.cpp:1612) used for the regdef/reguse is the encoding for the register class in the enum generated by tablegen. This encoding will change any time a new register class is added. Since the number is part of the input, this means it can become stale. seh-directive-errors.s: R16-R31 makes ".seh_pushreg 17" legal musttail-varargs.ll: It seems some LLVM passes use the number of registers rather the number of allocatable registers as heuristic.
1. Use `Ext.PrimaryVT` in `PatSetCC_m ` 2. Merge `PatFprFprDynFrm` from Zfh/Zhinx two locations into `PatFprFprDynFrm_m`.
…vm#68578) Fixes llvm#68481, In the following scenario, the conversion fails: 1. resultType of tosa.slice is UnrankedTensorType 2. tosa.slice.getsize().size() < resultType.getRank()
This restores the pre-b9383a86b8f behavior. Most platforms / compilers don't add relocations for CIEs, however they're not prohibited and we want objects that contain them to remain loadable.
llvm#68320) This function has several overloads that allow to specify the symbol that should be renamed and the scope for that renaming in different ways. The overloads were inconsistent in the following way (quoted strings are `StringAttr`s, other variables are `Operation *`): * `replaceAllSymbolUses(symbolOp, "new_symbol", scopeOp)` would traverse into the nested regions of `scopeOp` and hence rename the symbol inside of `scopeOp`. * `replaceAllSymbolUses("symbol", "new_symbol", scopeOp)` would *not* traverse into the nested regions of `scopeOp` and hence *not* rename the symbol. The underlying behavior was spread over different places and is somewhat hard to understand. The two overloads above mainly differed by what `collectSymbolScopes` computed, which is itself overloaded. If `scopeOp` is a top-level module, then the overload on `(Operation *, Operation *)`, which is used in the first of the above cases, computes a scope where the body region of the module is the `limit`; however, the overload on `(StringAttr, Operation *)` computed the module op itself as the `limit`. Later, `walkSymbolTable` would walk the body of the module if it was given as a region but it would *not* enter the regions of the module op because that op has a symbol table (which was assumed to be a *different* scope). The fix in this commit is change the behavior of `collectSymbolScopes` such that the `(StringAttr, Operation *)` overload returns a scope for each region in the `limit` argument.
The -fsanitize=alignment implementation follows the model that we allow forming unaligned pointers but disallow accessing unaligned pointers. See [RFC: Enforcing pointer type alignment in Clang](https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html) for detail. memcpy is a memory access and we require an `int *` argument to be aligned. Similar to https://reviews.llvm.org/D9673 , emit -fsanitize=alignment check for arguments of builtin memcpy and memmove functions to catch misaligned load like: ``` // Check the alignment of a but ignore the alignment of b void unaligned_load(int *a, void *b) { memcpy(a, b, sizeof(*a)); } ``` For a reference parameter, we emit a -fsanitize=alignment check as well, which can be optimized out by InstCombinePass. We rely on the call site `TCK_ReferenceBinding` check instead. ``` // The alignment check of a will be optimized out. void unaligned_load(int &a, void *b) { memcpy(&a, b, sizeof(a)); } ``` The diagnostic message looks like ``` runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'int *' ``` We could use a better message for memcpy, but we don't do it for now as it would require a new check name like misaligned-pointer-use, which is probably not necessary. *RFC: Enforcing pointer type alignment in Clang* is not well documented, but this patch does not intend to change the that. Technically builtin memset functions can be checked for -fsanitize=alignment as well, but it does not seem too useful.
1. The generated file contained a lot of duplicate switch cases, e.g.: ``` switch (Syntax) { case AttributeCommonInfo::Syntax::AS_GNU: return llvm::StringSwitch<int>(Name) ... .Case("error", 1) .Case("warning", 1) .Case("error", 1) .Case("warning", 1) ``` 2. Some attributes were listed in wrong places, e.g.: ``` case AttributeCommonInfo::Syntax::AS_CXX11: { if (ScopeName == "") { return llvm::StringSwitch<int>(Name) ... .Case("warn_unused_result", LangOpts.CPlusPlus11 ? 201907 : 0) ``` `warn_unused_result` is a non-standard attribute and should not be available as [[warn_unused_result]]. 3. Some attributes had the wrong version, e.g.: ``` case AttributeCommonInfo::Syntax::AS_CXX11: { } else if (ScopeName == "gnu") { return llvm::StringSwitch<int>(Name) ... .Case("fallthrough", LangOpts.CPlusPlus11 ? 201603 : 0) ``` [[gnu::fallthrough]] is a non-standard spelling and should not have the standard version. Instead, __has_cpp_attribute should return 1 for it. There is another issue with attributes that share spellings, e.g.: ``` .Case("interrupt", true && (T.getArch() == llvm::Triple::arm || ...) ? 1 : 0) .Case("interrupt", true && (T.getArch() == llvm::Triple::avr) ? 1 : 0) ... .Case("interrupt", true && (T.getArch() == llvm::Triple::riscv32 || ...) ? 1 : 0) ``` As can be seen, __has_attribute(interrupt) would only return true for ARM targets. This patch does not address this issue. Differential Revision: https://reviews.llvm.org/D159393
…tof` (llvm#65246) Fixes llvm#64619 Clang warns diagnostic for non-standard layout types in `offsetof` only if they are in evaluated context. With this patch, you'll also get diagnostic if you use `offsetof` on non-standard layout types in any other contexts
… on downstream projects that may define additional opcodes.
I would put this into the implementation of verify for tosa.slice
…68571) On MinGW targets, the .ctors section is always used for constructors. When using the .ctors section, the constructors need to be emitted in reverse order to get them execute in the right order. (Constructors with a specific priority are sorted separately by the linker later.) In LLVM, in CodeGen/AsmPrinter/AsmPrinter.cpp, there's code that reverses them before writing them out, executed when using the .ctors section. This logic is done whenever TM.Options.UseInitArray is set to false. Thus, make sure to set UseInitArray to false for this target. This fixes llvm#55938.
…vm#68287) A number of useful constants can be encoded with a 64-bit ORR followed by a 64-bit EOR, including all remaining repeated byte patterns, some useful repeated 16-bit patterns, and some irregular masks. This patch prioritizes that encoding over three or four instruction encodings. Encoding with MOV + MOVK or ORR + MOVK is still preferred for fast literal generation and readability respectively. The method devises three candidate values, and checks if both Candidate and (Imm ^ Candidate) are valid logical immediates. If so, Imm is materialized with: ``` ORR Xd, XZR, #(Imm ^ Candidate) EOR Xd, Xd, #(Candidate) ``` The method has been exhaustively tested to ensure it can solve all possible values (excluding 0, ~0, and plain logical immediates, which are handled earlier).
llvm#67791) The primary ISA-independent justification for using PC-relative addressing is that it makes code position-independent and therefore allows sharing of .text pages between processes. When not sharing .text pages, we can use absolute relocations instead, which will possibly prevent a bubble introduced by s_getpc_b64. Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>
llvm#68646) Fixes:llvm#68542 It‘s meaningless to diagnose further error for invalid function declaration.
Similar to D159254, this fixes the order of WriteAdr operands on post/pre-inc loads/stores in the Cortex-A510 scheduling model. I will add the same for other models too, this will be the most impactful due to it being the default cpu scheduling model. Closes llvm#68518
When converting to ConstantRange, we should treat undef like a full range. Fixes llvm#68381.
These will be used in future pass to ensure that loads/stores of masks are legal (as the LLVM backend does not support this for any type smaller than an svbool, which is vector<[16]xi1>). Depends on llvm#68399
for real
…C_CSUB instructions The BUFFER_ATOMIC_CSUB and GLOBAL_ATOMIC_CSUB instructions have encodings for non-value-returning forms, although actually using them isn't supported by hardware. However, these encodings aren't supported by the backend, meaning that they can't even be assembled or disassembled. Add support for the non-returning encodings, but gate actually using them in instruction selection behind a new feature FeatureAtomicCSubNoRtnInsts, which no target uses. This does allow the non-returning instructions to be tested manually and llvm.amdgcn.atomic.csub.ll is extended to cover them. The feature does not gate assembling or disassembling them, this is now not an error, and encoding and decoding tests have been adapted accordingly.
stepthomas
pushed a commit
that referenced
this pull request
Oct 10, 2023
This reverts commit a1e81d2. Revert "Fix test hip-offload-compress-zlib.hip" This reverts commit ba01ce6. Revert due to sanity fail at https://lab.llvm.org/buildbot/#/builders/5/builds/37188 https://lab.llvm.org/buildbot/#/builders/238/builds/5955 /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment 0xaaaae2d90e7c: note: pointer points here bc 00 00 00 94 dc 29 9a 89 fb ca 2b 78 9c 8b 8f 77 f6 71 f4 73 8f f7 77 73 f3 f1 77 74 89 77 0a ^ #0 0xaaaaba125f70 in clang::CompressedOffloadBundle::decompress(llvm::MemoryBuffer const&, bool) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25 #1 0xaaaaba126150 in clang::OffloadBundler::ListBundleIDsInFile(llvm::StringRef, clang::OffloadBundlerConfig const&) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1089:7 Will reland after fixing it.
stepthomas
deleted the
AMDGPU-stepthomas-atomic-csub-no-rtn-forms-ver2
branch
October 10, 2023 09:46
stepthomas
restored the
AMDGPU-stepthomas-atomic-csub-no-rtn-forms-ver2
branch
October 10, 2023 10:33
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
FunctionLevel
([mlgo][coro] Assign coro split-ed functions aFunctionLevel
llvm/llvm-project#68263)wgmma.descriptor
towarpgroup.descriptor
(NFC) ([MLIR][NVGPU] Change namewgmma.descriptor
towarpgroup.descriptor
(NFC) llvm/llvm-project#67526)GenerateWarpgroupDescriptor
([mlir] Change the class name of theGenerateWarpgroupDescriptor
llvm/llvm-project#68286)warpgroup.mma
Op (NFC) ([mlir][nvgpu] Improve nvgpu->nvvm transformation ofwarpgroup.mma
Op (NFC) llvm/llvm-project#67325)nvgpu.wargroup.mma.store
Op for Hopper GPUs ([MLIR][NVGPU] Introducenvgpu.wargroup.mma.store
Op for Hopper GPUs llvm/llvm-project#65441)empty-tensor-elimination
around self-copies ([mlir]: fix a issue and refine some code (#67977) llvm/llvm-project#68129)(X +/- Y) & Y
into~X & Y
when Y is a power of 2 llvm/llvm-project#67915. NFC.dump_alias_sets
option to transform op ([mlir][bufferization] Adddump_alias_sets
option to transform op llvm/llvm-project#68289)tensor.empty
bufferizes to allocation ([mlir][tensor][bufferize]tensor.empty
bufferizes to allocation llvm/llvm-project#68201)libcgpu.a
libraryPredicateUsesOperands = 1
([TableGen][GISel] Fix incorrect binding of predicate operands uponPredicateUsesOperands = 1
llvm/llvm-project#68125)getelementptr
([mlir][llvm] Fix elem type passing intogetelementptr
llvm/llvm-project#68136)bind_front
andbind_back
([libcxx] replaces SFINAE with requires-expressions inbind_front
andbind_back
llvm/llvm-project#68249)LLVM_UNREACHABLE_OPTIMIZE=OFF
forRelease
builds ([clang] Correct behavior ofLLVM_UNREACHABLE_OPTIMIZE=OFF
forRelease
builds llvm/llvm-project#68284)getelementptr
([mlir][llvm] Fix elem type passing intogetelementptr
llvm/llvm-project#68136)"--time-trace
([LLD][COFF] Add support for--time-trace
llvm/llvm-project#68236)from
after llvm@7876899MaterializeInDestinationOp
: Support memref destinations ([mlir][bufferization]MaterializeInDestinationOp
: Support memref destinations llvm/llvm-project#68074)(X +/- Y) & Y
into~X & Y
when Y is a power of 2 llvm/llvm-project#67915. NFC.a ne/eq (zext/sext (a ne/eq c))
([InstCombine] Simplify the patterna ne/eq (zext/sext (a ne/eq c))
llvm/llvm-project#65852)gc.result
into account llvm/llvm-project#68439lower_unpack
when dynamic dimensions are involved ([mlir] Fixlower_unpack
when dynamic dimensions are involved llvm/llvm-project#68423)SLocEntry
search intoASTReader
([clang][modules] MoveSLocEntry
search intoASTReader
llvm/llvm-project#66966)VarTemplateDecl
in record ([clang][ASTImporter] Fix crash when importVarTemplateDecl
in record llvm/llvm-project#67522)MaterializeInDestinationOp
: Support memref destinations llvm/llvm-project#68074 ([mlir][bufferization] Follow up for #68074 llvm/llvm-project#68488)checkModuleIsAvailable
should use a const & parameter instead of pointer ([clang][Modules]checkModuleIsAvailable
should use a const & parameter instead of pointer llvm/llvm-project#67902)createTask
,createTeams
([OpenMPIRBuilder] Remove wrapper function increateTask
,createTeams
llvm/llvm-project#67723)ExceptionSpecAnalyzer
s handling of conditional noexcept expressions ([clang-tidy] ImproveExceptionSpecAnalyzer
s handling of conditional noexcept expressions llvm/llvm-project#68359)ArgMax
operator ([mlir][tosa] Add verifier forArgMax
operator llvm/llvm-project#68410)elect.sync
Op ([mlir][nvvm] Introduceelect.sync
Op llvm/llvm-project#68323)arith.select
([mlir][arith] Canonicalization patterns forarith.select
llvm/llvm-project#67809)movmsk
; PR67287bufferizeOp
function signature ([mlir][bufferization][NFC] SimplifybufferizeOp
function signature llvm/llvm-project#68625)StorageSpecifierToLLVMPass
from bufferization pipeline ([mlir][sparse] ExtractStorageSpecifierToLLVMPass
from bufferization pipeline llvm/llvm-project#68635)offsetof
([Clang] Fix missing diagnostic for non-standard layout type inoffsetof
llvm/llvm-project#65246)