forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase xtheadvector on llvm/llvm-project:main #1
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Make sure that VPInstructions with OR opcodes are properly registered as disjoint ops. Fixes llvm#87378.
…lvm#87296) We can't just check if it is a splat constant or not. We should also check if the value match.
This fixes a test broken in 3d469c0. fast-forwarded. ../clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl
When a nested parallel region ends, the runtime calls __kmp_join_call(). During this call, the primary thread of the nested parallel region will reset its tid (retval of omp_get_thread_num()) to what it was in the outer parallel region. A data race occurs with the current code when another worker thread from the nested inner parallel region tries to steal tasks from the primary thread's task deque. The worker thread reads the tid value directly from the primary thread's data structure and may read the wrong value. This change just uses the calculated victim_tid from execute_tasks() directly in the steal_task() routine rather than reading tid from the data structure. Fixes: llvm#87307
Summary: This patch adds a temporary implementation that uses a struct-based interface in lieu of varargs support. Once varargs support exists we will move this implementation to the "real" printf implementation. Conceptually, this patch has the client copy over its format string and arguments to the server. The server will then scan the format string searching for any specifiers that are actually a string. If it is a string then we will send the pointer back to the server to tell it to copy it back. This copied value will then replace the pointer when the final formatting is done. This will require a built-in extension to the varargs support to get access to the underlying struct. The varargs used on the GPU will simply be a struct wrapped in a varargs ABI.
…RE_PAUTH` (llvm#85231) This adds support for `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` feature (as defined in ARM-software/abi-aa#240) handling in llvm-readobj and llvm-readelf. The following constants for supported platforms are also introduced: - `AARCH64_PAUTH_PLATFORM_INVALID = 0x0` - `AARCH64_PAUTH_PLATFORM_BAREMETAL = 0x1` - `AARCH64_PAUTH_PLATFORM_LLVM_LINUX = 0x10000002` For the llvm_linux platform, output of the tools contains descriptions of PAuth features which are enabled/disabled depending on the version value. Version value bits correspond to the following `LangOptions` defined in llvm#85232: - bit 0: `PointerAuthIntrinsics`; - bit 1: `PointerAuthCalls`; - bit 2: `PointerAuthReturns`; - bit 3: `PointerAuthAuthTraps`; - bit 4: `PointerAuthVTPtrAddressDiscrimination`; - bit 5: `PointerAuthVTPtrTypeDiscrimination`; - bit 6: `PointerAuthInitFini`. Support for `.note.AARCH64-PAUTH-ABI-tag` is dropped since it's deleted from the spec in ARM-software/abi-aa#250.
…#85638) Summary: This patch adds an implementation of `printf` that's provided by the GPU C library runtime. This `pritnf` currently implemented using the same wrapper handling that OpenMP sets up. This will be removed once we have proper varargs support. This `printf` differs from the one CUDA offers in that it is synchronous and uses a finite size. Additionally we support pretty much every format specifier except the `%n` option. Depends on llvm#85331
Summary: The RPC server build for the GPU support needs to be build from the "projects" phase of the LLVM build. That means it is built with the same compile that LLVM supports, which currently is GCC 7.4 in most cases. A previous patch removed the `LIBC_HAS_BUILTIN` indirection we used, which regressed the case where we used the `libc` source externally. The files that we need to use here are `converter.cpp` and `writer.cpp` which currently are compatible with C++17, so there aren't issues with the code itself. However, older GCC does not have this builtin which makes the checks fail. This patch just adds in a simple wrapper that allows it to correctly ignore everything if using a compiler that doesn't support it.
This patch adds SWIG cmake flags to the stage2 build in Fuchsia Clang configuration.
…llvm#87420) Pseudo Mnemonic could be of other uses.
…vancing iterator (llvm#84126) Currently, the bounds check in `std::ranges::advance(it, n, s)` is done _before_ `n` is checked. This results in one extra, unneeded bounds check. Thus, `std::ranges::advance(it, 1, s)` currently is _not_ simply equivalent to: ```c++ if (it != s) { ++it; } ``` This difference in behavior matters when the check involves some "expensive" logic. For example, the `==` operator of `std::istreambuf_iterator` may actually have to read the underlying `streambuf`. Swapping around the checks in the `while` results in the expected behavior.
… continue iteration of object files (llvm#87344) This patch introduces a new `IterationMarker` enum (happy to take alternative name suggestions), which callbacks, like the one in `SymbolFileDWARFDebugMap::ForEachSymbolFile`, can return in order to indicate whether the caller should continue iterating or bail. For now this patch just changes the `ForEachSymbolFile` callback to use this new enum. In the future we could change the various `DWARFIndex::GetXXX` callbacks to do the same. This makes the callbacks easier to read and hopefully reduces the chance of bugs like llvm#87177.
The tosa-infer-shapes pass inserts tensor.cast operations to mediate refined result types with consumers whose types cannot be refined. This process interferes with how types are refined in tosa.while_loop body regions, where types are propagated speculatively (to determine the types of the tosa.yield terminator) and then reverted. The new tosa.cast operations result in a crash due to not having types associated to them for the reversion process. This change modifies the shape propagation behavior so that the introduction to tensor.cast operations behaves better with this type reversion process. The new behavior is to only introduce tensor.cast operations once we wish to commit the newly computed types to the IR. This is an example causing the crash: ```mlir func.func @while_dont_crash(%arg0 : tensor<i32>) -> (tensor<*xi32>) { %0 = tosa.add %arg0, %arg0 : (tensor<i32>, tensor<i32>) -> tensor<*xi32> %1 = tosa.while_loop (%arg1 = %0) : (tensor<*xi32>) -> tensor<*xi32> { %2 = "tosa.const"() <{value = dense<3> : tensor<i32>}> : () -> tensor<i32> %3 = tosa.greater_equal %2, %arg1 : (tensor<i32>, tensor<*xi32>) -> tensor<*xi1> tosa.yield %3 : tensor<*xi1> } do { ^bb0(%arg1: tensor<*xi32>): // Inferrable operation whose type will refine to tensor<i32> %3 = tosa.add %arg1, %arg1 : (tensor<*xi32>, tensor<*xi32>) -> tensor<*xi32> // Non-inferrable use site, will require the cast: // tensor.cast %3 : tensor<i32> to tensor<*xi32> // // The new cast operation will result in accessing undefined memory through // originalTypeMap in the C++ code. "use"(%3) : (tensor<*xi32>) -> () tosa.yield %3 : tensor<*xi32> } return %1 : tensor<*xi32> } ``` The `tensor.cast` operation inserted in the loop body causes a failure in the code which resets the types after propagation through the loop body: ```c++ // The types inferred in the block assume the operand types specified for // this iteration. We need to restore the original types to ensure that // future iterations only use the already specified types, not possible // types from previous iterations. for (auto &block : bodyRegion) { for (auto arg : block.getArguments()) arg.setType(originalTypeMap[arg]); for (auto &op : block) for (auto result : op.getResults()) result.setType(originalTypeMap[result]); // problematic access } ``` --------- Co-authored-by: Spenser Bauman <sabauma@fastmail>
…64_FEATURE_PAUTH`" (llvm#87434) Reverts llvm#85231 See build failure https://lab.llvm.org/buildbot/#/builders/186/builds/15631
llvm#87155) Both `std::distance` or `ranges::distance` are inefficient for non-sized ranges. Also, calculating the range using `int` type is seriously problematic. This patch avoids using `distance` and calculation of the length of non-sized ranges. Fixes llvm#86833.
llvm#87312) …ner (llvm#87030)" Fix broken test. This reverts commit b8ead21.
Summary: This is more hacky, but I want to get the bot green before we work on a better solution.
…arse tensors (llvm#87305) `linalg.generic` ops with sparse tensors do not necessarily bufferize to element-wise access, because insertions into a sparse tensor may change the layout of (or reallocate) the underlying sparse data structures.
…lvm#87308) Use a single insert for the non-mask case instead of a push_back followed by an insert that may contain 0 registers.
…` instead of `SourceLocation` (llvm#87427) For pragma diagnostic mappings, we always write/read `SourceLocation` with offset 0. This is equivalent to just writing a `FileID`, which is exactly what this patch starts doing. Originally reviewed here: https://reviews.llvm.org/D137213
Improve the test gnu-ifunc-nonpreemptible.s to check IRELATIVE offsets. Ensure that IRELATIVE offsets are ordered to improve locality.
…MP (llvm#85638)" This reverts commit 2cf8118. Failing tests, revert until I can fix it
- The opcode of the mina.fmt and max.fmt is documented wrong, the object code compiled from the same assembly with LLVM behaves differently than one compiled with GCC and Binutils. - Modify the opcodes to match Binutils. The actual opcodes are as follows: {5,3} | bits {2,0} of func | ... | 100 | 101 | 110 | 111 -----+-----+-----+-----+-----+----- 010 | ... | min | mina | max | maxa
silvanshade
force-pushed
the
xtheadvector
branch
from
April 3, 2024 02:20
c038259
to
2130e2f
Compare
commit d89914f Author: Kazu Hirata <kazu@google.com> Date: Wed Apr 3 21:48:38 2024 -0700 changed RecordWriterTrait to a template class with IndexedVersion as a template parameter. This patch changes the class back to a non-template one while retaining the ability to serialize multiple versions. The reason I changed RecordWriterTrait to a template class was because, even if RecordWriterTrait had IndexedVersion as a member variable, RecordWriterTrait::EmitKeyDataLength, being a static function, would not have access to the variable. Since OnDiskChainedHashTableGenerator calls EmitKeyDataLength as: const std::pair<offset_type, offset_type> &Len = InfoObj.EmitKeyDataLength(Out, I->Key, I->Data); we can make EmitKeyDataLength a member function, but we have one problem. InstrProfWriter::writeImpl calls: void insert(typename Info::key_type_ref Key, typename Info::data_type_ref Data) { Info InfoObj; insert(Key, Data, InfoObj); } which default-constructs RecordWriterTrait without a specific version number. This patch fixes the problem by adjusting InstrProfWriter::writeImpl to call the other form of insert instead: void insert(typename Info::key_type_ref Key, typename Info::data_type_ref Data, Info &InfoObj) To prevent an accidental invocation of the default constructor of RecordWriterTrait, this patch deletes the default constructor.
…87567) LLVMgold.so can be used with GNU ar, gold, ld, and nm to process LLVM bitcode files. Install it in LLVM_INSTALL_TOOLCHAIN_ONLY=on builds like we install libLTO.so. Suggested by @emelife Fix llvm#84271
…motableOpInterface` (llvm#86792) Add `requiresReplacedValues` and `visitReplacedValues` methods to `PromotableOpInterface`. These methods allow `PromotableOpInterface` ops to transforms definitions mutated by a `store`. This change is necessary to correctly handle the promotion of `LLVM_DbgDeclareOp`. --------- Co-authored-by: Théo Degioanni <30992420+Moxinilian@users.noreply.github.com>
…d/or x,y), C)`; NFC
In `(icmp eq (and x,y), C)` all 1s in `C` must also be set in both `x`/`y`. In `(icmp eq (or x,y), C)` all 0s in `C` must also be set in both `x`/`y`. Closes llvm#87143
There is one notable "regression". This patch replaces the bespoke `or disjoint` logic we a direct match. This means we fail some simplification during `instsimplify`. All the cases we fail in `instsimplify` we do handle in `instcombine` as we add `disjoint` flags. Other than that, just some basic cases. See proofs: https://alive2.llvm.org/ce/z/_-g7C8 Closes llvm#86083
It matches up with other _attribute_ adding member functions and helps simplify InterfaceFile assignment for InstallAPI.
Depends on llvm#87545 Emit `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` property in `.note.gnu.property` section depending on `aarch64-elf-pauthabi-platform` and `aarch64-elf-pauthabi-version` llvm module flags.
…ther/scatter Noticed while starting triage for llvm#87640
…#86894) The existing heuristics were assuming that every core behaves like an Apple A7, where any extend/shift costs an extra micro-op... but in reality, nothing else behaves like that. On some older Cortex designs, shifts by 1 or 4 cost extra, but all other shifts/extensions are free. On all other cores, as far as I can tell, all shifts/extensions for integer loads are free (i.e. the same cost as an unshifted load). To reflect this, this patch: - Enables aggressive folding of shifts into loads by default. - Removes the old AddrLSLFast feature, since it applies to everything except A7 (and even if you are explicitly targeting A7, we want to assume extensions are free because the code will almost always run on a newer core). - Adds a new feature AddrLSLSlow14 that applies specifically to the Cortex cores where shifts by 1 or 4 cost extra. I didn't add support for AddrLSLSlow14 on the GlobalISel side because it would require a bunch of refactoring to work correctly. Someone can pick this up as a followup.
Summary: This synchronization should be done before we handle the logic relating to closing the port. This isn't majorly important now but it would break if we ever decided to run a server on the GPU.
I believe I've got the tests properly configured to only run on Linux x86(_64), as I don't have a Linux AArch64/Arm device to diagnose what's going wrong with the tests (I suspect there's some issue with generating `.note.gnu.build-id` sections...) The actual code fixes have now been reviewed 3 times: llvm#79181 (moved shell tests to API tests), llvm#85693 (Changed some of the testing infra), and llvm#86812 (didn't get the tests configured quite right). The Debuginfod integration for symbol acquisition in LLDB now works with the `executable` and `debuginfo` Debuginfod network requests working properly for normal, `objcopy --only-keep-debug` stripped, split-dwarf, and `objcopy --only-keep-debug` stripped *plus* split-dwarf symbols/binaries. The reasons for the multiple attempts have been tests on platforms I don't have access to (Linux AArch64/Arm + MacOS x86_64). I believe I've got the tests properly disabled for everything except for Linux x86(_64) now. I've built & tested on MacOS AArch64 and Linux x86_64. --------- Co-authored-by: Kevin Frei <freik@meta.com>
Since we have released Clang 16 is no longer actively supported. However the FreeBSD runner is still using this, so some tests still guard against Clang 16.
upstream change: c532ba4
upstream change: c532ba4
upstream change: 4400018
silvanshade
force-pushed
the
xtheadvector
branch
from
April 4, 2024 18:52
2130e2f
to
439395d
Compare
Oh thanks very much!! 👍 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR rebases the
xtheadvector
branch.