Rebase xtheadvector on llvm/llvm-project:main #1

silvanshade · 2024-04-03T02:03:05Z

This PR rebases the xtheadvector branch.

Make sure that VPInstructions with OR opcodes are properly registered as disjoint ops. Fixes llvm#87378.

…lvm#87296) We can't just check if it is a splat constant or not. We should also check if the value match.

This fixes a test broken in 3d469c0. fast-forwarded. ../clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl

When a nested parallel region ends, the runtime calls __kmp_join_call(). During this call, the primary thread of the nested parallel region will reset its tid (retval of omp_get_thread_num()) to what it was in the outer parallel region. A data race occurs with the current code when another worker thread from the nested inner parallel region tries to steal tasks from the primary thread's task deque. The worker thread reads the tid value directly from the primary thread's data structure and may read the wrong value. This change just uses the calculated victim_tid from execute_tasks() directly in the steal_task() routine rather than reading tid from the data structure. Fixes: llvm#87307

Summary: This patch adds a temporary implementation that uses a struct-based interface in lieu of varargs support. Once varargs support exists we will move this implementation to the "real" printf implementation. Conceptually, this patch has the client copy over its format string and arguments to the server. The server will then scan the format string searching for any specifiers that are actually a string. If it is a string then we will send the pointer back to the server to tell it to copy it back. This copied value will then replace the pointer when the final formatting is done. This will require a built-in extension to the varargs support to get access to the underlying struct. The varargs used on the GPU will simply be a struct wrapped in a varargs ABI.

…RE_PAUTH` (llvm#85231) This adds support for `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` feature (as defined in ARM-software/abi-aa#240) handling in llvm-readobj and llvm-readelf. The following constants for supported platforms are also introduced: - `AARCH64_PAUTH_PLATFORM_INVALID = 0x0` - `AARCH64_PAUTH_PLATFORM_BAREMETAL = 0x1` - `AARCH64_PAUTH_PLATFORM_LLVM_LINUX = 0x10000002` For the llvm_linux platform, output of the tools contains descriptions of PAuth features which are enabled/disabled depending on the version value. Version value bits correspond to the following `LangOptions` defined in llvm#85232: - bit 0: `PointerAuthIntrinsics`; - bit 1: `PointerAuthCalls`; - bit 2: `PointerAuthReturns`; - bit 3: `PointerAuthAuthTraps`; - bit 4: `PointerAuthVTPtrAddressDiscrimination`; - bit 5: `PointerAuthVTPtrTypeDiscrimination`; - bit 6: `PointerAuthInitFini`. Support for `.note.AARCH64-PAUTH-ABI-tag` is dropped since it's deleted from the spec in ARM-software/abi-aa#250.

…#85638) Summary: This patch adds an implementation of `printf` that's provided by the GPU C library runtime. This `pritnf` currently implemented using the same wrapper handling that OpenMP sets up. This will be removed once we have proper varargs support. This `printf` differs from the one CUDA offers in that it is synchronous and uses a finite size. Additionally we support pretty much every format specifier except the `%n` option. Depends on llvm#85331

…vm#86624) Fixes llvm#86559.

Summary: The RPC server build for the GPU support needs to be build from the "projects" phase of the LLVM build. That means it is built with the same compile that LLVM supports, which currently is GCC 7.4 in most cases. A previous patch removed the `LIBC_HAS_BUILTIN` indirection we used, which regressed the case where we used the `libc` source externally. The files that we need to use here are `converter.cpp` and `writer.cpp` which currently are compatible with C++17, so there aren't issues with the code itself. However, older GCC does not have this builtin which makes the checks fail. This patch just adds in a simple wrapper that allows it to correctly ignore everything if using a compiler that doesn't support it.

This patch adds SWIG cmake flags to the stage2 build in Fuchsia Clang configuration.

…llvm#87420) Pseudo Mnemonic could be of other uses.

…vancing iterator (llvm#84126) Currently, the bounds check in `std::ranges::advance(it, n, s)` is done _before_ `n` is checked. This results in one extra, unneeded bounds check. Thus, `std::ranges::advance(it, 1, s)` currently is _not_ simply equivalent to: ```c++ if (it != s) { ++it; } ``` This difference in behavior matters when the check involves some "expensive" logic. For example, the `==` operator of `std::istreambuf_iterator` may actually have to read the underlying `streambuf`. Swapping around the checks in the `while` results in the expected behavior.

… continue iteration of object files (llvm#87344) This patch introduces a new `IterationMarker` enum (happy to take alternative name suggestions), which callbacks, like the one in `SymbolFileDWARFDebugMap::ForEachSymbolFile`, can return in order to indicate whether the caller should continue iterating or bail. For now this patch just changes the `ForEachSymbolFile` callback to use this new enum. In the future we could change the various `DWARFIndex::GetXXX` callbacks to do the same. This makes the callbacks easier to read and hopefully reduces the chance of bugs like llvm#87177.

The tosa-infer-shapes pass inserts tensor.cast operations to mediate refined result types with consumers whose types cannot be refined. This process interferes with how types are refined in tosa.while_loop body regions, where types are propagated speculatively (to determine the types of the tosa.yield terminator) and then reverted. The new tosa.cast operations result in a crash due to not having types associated to them for the reversion process. This change modifies the shape propagation behavior so that the introduction to tensor.cast operations behaves better with this type reversion process. The new behavior is to only introduce tensor.cast operations once we wish to commit the newly computed types to the IR. This is an example causing the crash: ```mlir func.func @while_dont_crash(%arg0 : tensor<i32>) -> (tensor<*xi32>) { %0 = tosa.add %arg0, %arg0 : (tensor<i32>, tensor<i32>) -> tensor<*xi32> %1 = tosa.while_loop (%arg1 = %0) : (tensor<*xi32>) -> tensor<*xi32> { %2 = "tosa.const"() <{value = dense<3> : tensor<i32>}> : () -> tensor<i32> %3 = tosa.greater_equal %2, %arg1 : (tensor<i32>, tensor<*xi32>) -> tensor<*xi1> tosa.yield %3 : tensor<*xi1> } do { ^bb0(%arg1: tensor<*xi32>): // Inferrable operation whose type will refine to tensor<i32> %3 = tosa.add %arg1, %arg1 : (tensor<*xi32>, tensor<*xi32>) -> tensor<*xi32> // Non-inferrable use site, will require the cast: // tensor.cast %3 : tensor<i32> to tensor<*xi32> // // The new cast operation will result in accessing undefined memory through // originalTypeMap in the C++ code. "use"(%3) : (tensor<*xi32>) -> () tosa.yield %3 : tensor<*xi32> } return %1 : tensor<*xi32> } ``` The `tensor.cast` operation inserted in the loop body causes a failure in the code which resets the types after propagation through the loop body: ```c++ // The types inferred in the block assume the operand types specified for // this iteration. We need to restore the original types to ensure that // future iterations only use the already specified types, not possible // types from previous iterations. for (auto &block : bodyRegion) { for (auto arg : block.getArguments()) arg.setType(originalTypeMap[arg]); for (auto &op : block) for (auto result : op.getResults()) result.setType(originalTypeMap[result]); // problematic access } ``` --------- Co-authored-by: Spenser Bauman <sabauma@fastmail>

…64_FEATURE_PAUTH`" (llvm#87434) Reverts llvm#85231 See build failure https://lab.llvm.org/buildbot/#/builders/186/builds/15631

llvm#87155) Both `std::distance` or `ranges::distance` are inefficient for non-sized ranges. Also, calculating the range using `int` type is seriously problematic. This patch avoids using `distance` and calculation of the length of non-sized ranges. Fixes llvm#86833.

llvm#87312) …ner (llvm#87030)" Fix broken test. This reverts commit b8ead21.

Summary: This is more hacky, but I want to get the bot green before we work on a better solution.

…arse tensors (llvm#87305) `linalg.generic` ops with sparse tensors do not necessarily bufferize to element-wise access, because insertions into a sparse tensor may change the layout of (or reallocate) the underlying sparse data structures.

…lvm#87308) Use a single insert for the non-mask case instead of a push_back followed by an insert that may contain 0 registers.

…` instead of `SourceLocation` (llvm#87427) For pragma diagnostic mappings, we always write/read `SourceLocation` with offset 0. This is equivalent to just writing a `FileID`, which is exactly what this patch starts doing. Originally reviewed here: https://reviews.llvm.org/D137213

Improve the test gnu-ifunc-nonpreemptible.s to check IRELATIVE offsets. Ensure that IRELATIVE offsets are ordered to improve locality.

…MP (llvm#85638)" This reverts commit 2cf8118. Failing tests, revert until I can fix it

- The opcode of the mina.fmt and max.fmt is documented wrong, the object code compiled from the same assembly with LLVM behaves differently than one compiled with GCC and Binutils. - Modify the opcodes to match Binutils. The actual opcodes are as follows: {5,3} | bits {2,0} of func | ... | 100 | 101 | 110 | 111 -----+-----+-----+-----+-----+----- 010 | ... | min | mina | max | maxa

commit d89914f Author: Kazu Hirata <kazu@google.com> Date: Wed Apr 3 21:48:38 2024 -0700 changed RecordWriterTrait to a template class with IndexedVersion as a template parameter. This patch changes the class back to a non-template one while retaining the ability to serialize multiple versions. The reason I changed RecordWriterTrait to a template class was because, even if RecordWriterTrait had IndexedVersion as a member variable, RecordWriterTrait::EmitKeyDataLength, being a static function, would not have access to the variable. Since OnDiskChainedHashTableGenerator calls EmitKeyDataLength as: const std::pair<offset_type, offset_type> &Len = InfoObj.EmitKeyDataLength(Out, I->Key, I->Data); we can make EmitKeyDataLength a member function, but we have one problem. InstrProfWriter::writeImpl calls: void insert(typename Info::key_type_ref Key, typename Info::data_type_ref Data) { Info InfoObj; insert(Key, Data, InfoObj); } which default-constructs RecordWriterTrait without a specific version number. This patch fixes the problem by adjusting InstrProfWriter::writeImpl to call the other form of insert instead: void insert(typename Info::key_type_ref Key, typename Info::data_type_ref Data, Info &InfoObj) To prevent an accidental invocation of the default constructor of RecordWriterTrait, this patch deletes the default constructor.

@emelife

…87567) LLVMgold.so can be used with GNU ar, gold, ld, and nm to process LLVM bitcode files. Install it in LLVM_INSTALL_TOOLCHAIN_ONLY=on builds like we install libLTO.so. Suggested by @emelife Fix llvm#84271

…motableOpInterface` (llvm#86792) Add `requiresReplacedValues` and `visitReplacedValues` methods to `PromotableOpInterface`. These methods allow `PromotableOpInterface` ops to transforms definitions mutated by a `store`. This change is necessary to correctly handle the promotion of `LLVM_DbgDeclareOp`. --------- Co-authored-by: Théo Degioanni <30992420+Moxinilian@users.noreply.github.com>

…d/or x,y), C)`; NFC

In `(icmp eq (and x,y), C)` all 1s in `C` must also be set in both `x`/`y`. In `(icmp eq (or x,y), C)` all 0s in `C` must also be set in both `x`/`y`. Closes llvm#87143

…icate`; NFC

There is one notable "regression". This patch replaces the bespoke `or disjoint` logic we a direct match. This means we fail some simplification during `instsimplify`. All the cases we fail in `instsimplify` we do handle in `instcombine` as we add `disjoint` flags. Other than that, just some basic cases. See proofs: https://alive2.llvm.org/ce/z/_-g7C8 Closes llvm#86083

It matches up with other _attribute_ adding member functions and helps simplify InterfaceFile assignment for InstallAPI.

Depends on llvm#87545 Emit `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` property in `.note.gnu.property` section depending on `aarch64-elf-pauthabi-platform` and `aarch64-elf-pauthabi-version` llvm module flags.

…ther/scatter Noticed while starting triage for llvm#87640

…#86894) The existing heuristics were assuming that every core behaves like an Apple A7, where any extend/shift costs an extra micro-op... but in reality, nothing else behaves like that. On some older Cortex designs, shifts by 1 or 4 cost extra, but all other shifts/extensions are free. On all other cores, as far as I can tell, all shifts/extensions for integer loads are free (i.e. the same cost as an unshifted load). To reflect this, this patch: - Enables aggressive folding of shifts into loads by default. - Removes the old AddrLSLFast feature, since it applies to everything except A7 (and even if you are explicitly targeting A7, we want to assume extensions are free because the code will almost always run on a newer core). - Adds a new feature AddrLSLSlow14 that applies specifically to the Cortex cores where shifts by 1 or 4 cost extra. I didn't add support for AddrLSLSlow14 on the GlobalISel side because it would require a bunch of refactoring to work correctly. Someone can pick this up as a followup.

Summary: This synchronization should be done before we handle the logic relating to closing the port. This isn't majorly important now but it would break if we ever decided to run a server on the GPU.

I believe I've got the tests properly configured to only run on Linux x86(_64), as I don't have a Linux AArch64/Arm device to diagnose what's going wrong with the tests (I suspect there's some issue with generating `.note.gnu.build-id` sections...) The actual code fixes have now been reviewed 3 times: llvm#79181 (moved shell tests to API tests), llvm#85693 (Changed some of the testing infra), and llvm#86812 (didn't get the tests configured quite right). The Debuginfod integration for symbol acquisition in LLDB now works with the `executable` and `debuginfo` Debuginfod network requests working properly for normal, `objcopy --only-keep-debug` stripped, split-dwarf, and `objcopy --only-keep-debug` stripped *plus* split-dwarf symbols/binaries. The reasons for the multiple attempts have been tests on platforms I don't have access to (Linux AArch64/Arm + MacOS x86_64). I believe I've got the tests properly disabled for everything except for Linux x86(_64) now. I've built & tested on MacOS AArch64 and Linux x86_64. --------- Co-authored-by: Kevin Frei <freik@meta.com>

Since we have released Clang 16 is no longer actively supported. However the FreeBSD runner is still using this, so some tests still guard against Clang 16.

upstream change: c532ba4

upstream change: 4400018

Reference: XUANTIE-RV/thead-extension-spec@2420d05

kata-ark · 2024-04-16T08:30:44Z

Oh thanks very much!! 👍

fhahn and others added 27 commits April 2, 2024 21:48

[VPlan] Make sure OR VPInstructions are treated as disjoint ops.

6261c53

Make sure that VPInstructions with OR opcodes are properly registered as disjoint ops. Fixes llvm#87378.

[mlir][tensor] Fix tensor::PackOp fold() handling of padding value (l…

c3e3d59

…lvm#87296) We can't just check if it is a splat constant or not. We should also check if the value match.

[HLSL] Fix broken spir-v test

f119a4f

This fixes a test broken in 3d469c0. fast-forwarded. ../clang/test/CodeGenHLSL/builtins/wave_get_lane_index_subcall.hlsl

[LV] Add test depending on target to RISCV subdirectory.

89271b4

[clang-format] Fix a regression in annotating TrailingReturnArrow (ll…

a7f4576

…vm#86624) Fixes llvm#86559.

[Fuchsia] Add SWIG flags to Fuchsia Clang stage2 build (llvm#87421)

68217a5

This patch adds SWIG cmake flags to the stage2 build in Fuchsia Clang configuration.

Use setup_host_tool for clang-ast-dump, fixes 76707

b4adb42

[CodeGen][NFC] Make an opt<> static

633bc3b

AMDGPU: Use PseudoInstr instead of Pseudo Mnemonic for SIMCInstr, NFC (…

12c7371

…llvm#87420) Pseudo Mnemonic could be of other uses.

[libc] Include 'config.h' from the printf structs for builtins

0492e1e

Revert "[PAC][llvm-readobj][AArch64][ELF] Support `GNU_PROPERTY_AARCH…

c45861f

…64_FEATURE_PAUTH`" (llvm#87434) Reverts llvm#85231 See build failure https://lab.llvm.org/buildbot/#/builders/186/builds/15631

Reapply "[CodeGen] Fix register pressure computation in MachinePipeli… (

ea4a119

llvm#87312) …ner (llvm#87030)" Fix broken test. This reverts commit b8ead21.

[libc] Move include so it covers the other files

3ae5c77

Summary: This is more hacky, but I want to get the bot green before we work on a better solution.

[RISCV] Slightly simplify RVVArgDispatcher::constructArgInfos. NFC (l…

3b19cd7

…lvm#87308) Use a single insert for the non-mask case instead of a push_back followed by an insert that may contain 0 registers.

[ELF] Sort IRELATIVE by offset

01e2274

Improve the test gnu-ifunc-nonpreemptible.s to check IRELATIVE offsets. Ensure that IRELATIVE offsets are ordered to improve locality.

Revert "[Libomptarget] Add RPC-based printf implementation for Open…

943f39d

…MP (llvm#85638)" This reverts commit 2cf8118. Failing tests, revert until I can fix it

silvanshade mentioned this pull request Apr 3, 2024

[llvm][mc][riscv] MC support of T-Head vector extension (xtheadvector) llvm/llvm-project#84447

Open

silvanshade force-pushed the xtheadvector branch from c038259 to 2130e2f Compare April 3, 2024 02:20

aeubanks and others added 27 commits April 4, 2024 17:05

[gn build] Port 8bb9443

13e7572

[gn build] Port fd38366

258dd64

[CMake] Install LLVMgold.so for LLVM_INSTALL_TOOLCHAIN_ONLY=on (llvm#…

b9ec4ab

…87567) LLVMgold.so can be used with GNU ar, gold, ld, and nm to process LLVM bitcode files. Install it in LLVM_INSTALL_TOOLCHAIN_ONLY=on builds like we install libLTO.so. Suggested by @emelife Fix llvm#84271

[ValueTracking] Add tests for computing known bits from `(icmp eq (an…

02b49d1

…d/or x,y), C)`; NFC

[ValueTracking] Infer known bits fromfrom (icmp eq (and/or x,y), C)

05cff99

In `(icmp eq (and x,y), C)` all 1s in `C` must also be set in both `x`/`y`. In `(icmp eq (or x,y), C)` all 0s in `C` must also be set in both `x`/`y`. Closes llvm#87143

[ValueTracking] Add tests for deducing more conditions in `isTruePred…

74447cf

…icate`; NFC

[TextAPI] Reorder addRPath parameters (llvm#87601)

515d3f7

It matches up with other _attribute_ adding member functions and helps simplify InterfaceFile assignment for InstallAPI.

[AArch64][PAC][MC][ELF] Support PAuth ABI compatibility tag (llvm#85236)

d97d560

Depends on llvm#87545 Emit `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` property in `.note.gnu.property` section depending on `aarch64-elf-pauthabi-platform` and `aarch64-elf-pauthabi-version` llvm module flags.

[CostModel][X86] Add costkinds test coverage for masked load/store/ga…

53fe94a

…ther/scatter Noticed while starting triage for llvm#87640

[SLP]Add a test with the incorrect casting for external user, NFC.

5ae143d

[libc] Move thread sync when closing port earlier

8004ce2

Summary: This synchronization should be done before we handle the logic relating to closing the port. This isn't majorly important now but it would break if we ever decided to run a server on the GPU.

[libc++][CI] Updates to Clang 19. (llvm#85301)

b798c2a

Since we have released Clang 16 is no longer actively supported. However the FreeBSD runner is still using this, so some tests still guard against Clang 16.

add t-head vector definitions

afe954c

fix LLVM MC and related tests

dd9cc9c

upstream change: c532ba4

fix RISCVInstPrinter for XTHeadVector

4ff7b2a

upstream change: c532ba4

port MC test from GCC

faf45e3

upstream change: 4400018

fix unit test

bc4c99c

rename some old names

56bf7c2

clang format

bb79d78

add 5 assembly pseudoinstructions for XTheadVector extension.

713d571

Reference: XUANTIE-RV/thead-extension-spec@2420d05

add tests about th.vmsge.vx

b92ff85

[RISCV] [xtheadvector] Fix for renamed schedule classes (f14224d)

439395d

silvanshade force-pushed the xtheadvector branch from 2130e2f to 439395d Compare April 4, 2024 18:52

kata-ark merged commit e059cb3 into kata-ark:main Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebase xtheadvector on llvm/llvm-project:main #1

Rebase xtheadvector on llvm/llvm-project:main #1

silvanshade commented Apr 3, 2024

kata-ark commented Apr 16, 2024

Rebase xtheadvector on llvm/llvm-project:main #1

Rebase xtheadvector on llvm/llvm-project:main #1

Conversation

silvanshade commented Apr 3, 2024

kata-ark commented Apr 16, 2024