Implement grouped conv interface #80870

srcarroll · 2024-02-06T16:39:45Z

No description provided.

This reverts commit 2ec122d.

Casting the result of `Section.getAddressWithOffset()` goes wrong if we are on a 32-bit platform whose addresses are regarded as signed; in that case, just doing ``` (uint64_t)Section.getAddressWithOffset(...) ``` or ``` reinterpret_cast<uint64_t>(Section.getAddressWithOffset(...)) ``` will result in sign-extension. We use these expressions when constructing branch stubs, which is before we know the final load address, so we can just switch to the `Section.getLoadAddressWithOffset(...)` method instead. Doing that is also more consistent, since when calculating relative offsets for relocations, we use the load address anyway, so the code currently only works because `Section.Address` is equal to `Section.LoadAddress` at this point. Fixes llvm#94478.

…s on LA64 (llvm#93813) Materializing constants on LoongArch is simpler if the constant is sign extended from i32. By default i32 constant operands of phis are zero extended. This patch adds a hook to allow LoongArch to override this for i32. We have an existing isSExtCheaperThanZExt, but it operates on EVT which we don't have at these places in the code.

…fmin (llvm#91936)

Implements fmaxf16 and fminf16, which are two missing functions listed here: llvm#93566

This patch make all errors start with a lowercase letter and removes trailing periods and newlines. This fixes inconsistencies between error messages and facilitate concatenating them.

…ath functions (llvm#94535) llvm#93566

…ges (llvm#94259) This patch changes the crashlog image loading default behaviour to not only load images from the crashed thread but also for the application specific backtrace thread. This patch also move the Application Specific Backtrace / Last Exception Backtrace tag from the thread queue field to the thread name. rdar://128276576 Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

…n object files (llvm#94487) Follow up to llvm#92042

Following of llvm#86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to llvm#86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As llvm#86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.

…ave Zvfbfmin" (llvm#94565) Reverts llvm#91936 Premerge bots are broken.

…vm#92746) This patch add support of intrinsics GNU extension GETCWD llvm#84203. Some usage info and example has been added to `flang/docs/Intrinsics.md`. The patch contains both the lowering and the runtime code and works on both Windows and Linux. | System | Implmentation | |-----------|--------------------| | Windows | _getcwd | | Linux |getcwd |

…86512) This patch implements a `__is_bitwise_cloneable` builtin in clang. The builtin is used as a guard to check a type can be safely bitwise copied by memcpy. It's functionally similar to `__is_trivially_copyable`, but covers a wider range of types (e.g. classes with virtual functions). The compiler guarantees that after copy, the destination object has the same object representations as the source object. And it is up to user to guarantee that program semantic constraints are satisfied. Context: https://discourse.llvm.org/t/extension-for-creating-objects-via-memcpy

…lvm#93814) Although i32 type is illegal in the backend, LA64 has pretty good support for i32 types by using W instructions. By adding n32 to the DataLayout string, middle end optimizations will consider i32 to be a native type. One known effect of this is enabling LoopStrengthReduce on loops with i32 induction variables. This can be beneficial because C/C++ code often has loops with i32 induction variables due to the use of `int` or `unsigned int`. If this patch exposes performance issues, those are better addressed by tuning LSR or other passes.

This commit enhances the docsting of `translateModuleToLLVMIR` as a followup to llvm#94445

…lvm#94522)

As the comment already indicates, only replacement with undef is problematic, as it introduces an additional use of undef. Use the correct ValueTracking helper.

If we're only checking for undef, then also only look for undef elements in the vector (rather than undef and poison).

…#91715) - There is no restriction on a loop with controlled convergent operations when the relevant tokens are defined and used within the loop. - When a token defined outside a loop is used inside (also called a loop convergence heart), unrolling is allowed only in the absence of remainder or runtime checks. - When a token defined inside a loop is used outside, such a loop is said to be "extended". This loop can only be unrolled by also duplicating the extended part lying outside the loop. Such unrolling is disabled for now. - Clean up loop hearts: When unrolling a loop with a heart, duplicating the heart will introduce multiple static uses of a convergence control token in a cycle that does not contain its definition. This violates the static rules for tokens, and needs to be cleaned up into a single occurrence of the intrinsic. - Spell out the initializer for UnrollLoopOptions to improve readability. Original implementation [D85605] by Nicolai Haehnle <nicolai.haehnle@amd.com>.

…lvm#93806) The m_ZExtOrSelf() family of matchers currently incorrectly calls std::forward twice on the same value. However, just removing those causes other complications, because then template arguments get incorrectly inferred to const references instead of the underlying value types. Things become a mess. Instead, just completely remove the use of std::forward and rvalue references from SDPatternMatch. I don't think they really provide value in this context, especially as they're not used consistently in the first place.

…Y` are known signed/unsigned Several transforms: 1) If known `Y < 0`: - slt -> ult: https://alive2.llvm.org/ce/z/9zt2iK - sle -> ule: https://alive2.llvm.org/ce/z/SPoPNF - sgt -> ugt: https://alive2.llvm.org/ce/z/IGNxAk - sge -> uge: https://alive2.llvm.org/ce/z/joqTvR 2) If known `Y >= 0`: - `(X & PosY) s> X --> X s< 0` - https://alive2.llvm.org/ce/z/7e-5BQ - `(X & PosY) s> X --> X s< 0` - https://alive2.llvm.org/ce/z/jvT4Gb 3) If known `X < 0`: - `(NegX & Y) s> NegX --> Y s>= 0` - https://alive2.llvm.org/ce/z/ApkaEh - `(NegX & Y) s<= NegX --> Y s< 0` - https://alive2.llvm.org/ce/z/oRnfHp Closes llvm#94417

Cleanup for llvm#94504

…UEs (llvm#94458) `SelectionDAGBuilder::handleDebugValue` has a parameter `Order` which represents the insert-at position for the new DBG_VALUE. Prior to this patch `SelectionDAGBuilder::SDNodeOrder` is used instead of the `Order` parameter. The only code-paths where `Order != SDNodeOrder` are the two calls calls to `handleDebugValue` from `salvageUnresolvedDbgValue`. `salvageUnresolvedDbgValue` is called from `resolveOrClearDbgInfo` and `dropDanglingDebugInfo`. The former is called after SelectionDAG completes one block. Some dbg.values can't be lowered to DBG_VALUEs right away. These get recorded as 'dangling' - their order-number is saved - and get salvaged later through `dropDanglingDebugInfo`, or if we've still got dangling debug info once the whole block has been emitted, through `resolveOrClearDbgInfo`. Their saved order-number is passed to `handleDebugValue`. Prior to this patch, DBG_VALUEs inserted using these functions are inserted at the "current" `SDNodeOrder` rather than the intended position that is passed to the function. Fix and add test.

Change the target triple to remove some unnecessary instructions.

This change is an implementation of llvm#87367 investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This PR is just for Tan. Now that x86 tan backend landed: llvm#90503 we can add other backends since the shared pieces are in tree now. Changes: - `llvm/include/llvm/Analysis/VecFuncs.def` - vectorization of tan for arm64 backends. - `llvm/lib/Target/AArch64/AArch64FastISel.cpp` - Add tan to the libcall table - `llvm/lib/Target/AArch64/AArch64ISelLowering.cpp` - Add tan expansion for f128, f16, and vector\neon operations - `llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp` define `G_FTAN` as a legal arm64 instruction resolves llvm#94755

Summary: The utilities `nvptx-arch` and `amdgpu-arch` are used to support `--offload-arch=native` among other utilities in clang. However, these rely on the GPU drivers to query the features. In certain cases these drivers can become locked up, which will lead to indefinate hangs on any compiler jobs running in the meantime. This patch adds a ten second timeout period for these utilities before it kills the job and errors out.

@CharKeaney

All post-Increment load/store, register-register load/store spec: https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @NandniJamnadas, @PaoloS02, @serkm, @simonpcook, @xingmingjie, @realqhc

This PR depends on llvm#90260 We changed the order in which functions are outlined in Machine Outliner. The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size. |run (CTMark/) |baseline (1)|priority (2)|diff (1 -> 2)| |----------------|------------|------------|-------------| |lencod |349624 |349264 |-0.1030% | |SPASS |219672 |219480 |-0.0874% | |kc |271956 |251200 |-7.6321% | |sqlite3 |223920 |223708 |-0.0947% | |7zip-benchmark |405364 |402624 |-0.6759% | |bullet |139820 |139500 |-0.2289% | |consumer-typeset|295684 |290196 |-1.8560% | |pairlocalalign |72236 |72092 |-0.1993% | |tramp3d-v4 |189572 |189292 |-0.1477% | This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).

Parameter "Version" is confusing in deserializeV012 and deserializeV3 because we also have member variable "Version". Fortunately, parameter "Version" and member variable "Version" always have the same value because IndexedMemProfReader::deserialize initializes the member variable and passes it to deserializeV012 and deserializeV3. This patch removes the parameter.

This patch integrates CallStackRadixTreeBuilder into the V3 format, reducing the profile size to about 27% of the V2 profile size. - Serialization: writeMemProfCallStackArray just needs to write out the radix tree array prepared by CallStackRadixTreeBuilder. Mappings from CallStackIds to LinearCallStackIds are moved by new function CallStackRadixTreeBuilder::takeCallStackPos. - Deserialization: Deserializing a call stack is the same as deserializing an array encoded in the obvious manner -- the length followed by the payload, except that we need to follow a pointer to the parent to take advantage of common prefixes once in a while. This patch teaches LinearCallStackIdConverter to how to handle those pointers.

The "Emulated" sub-directories under "ArmSVE" and "ArmSME" have been removed. Associated tests have been moved up a directory and now include the "REQUIRES" constraint for the arm-emulator.

Allow KnownBits to represent "always poison" values via conflict. close: llvm#94436

…#94646) These tests pass on 64-bit. They were fixed by 5fdd094 on 32-bit. So XFAIL only for 32-bit before clang 19.

If we are extracting the even lanes and the odd lanes and adding them, we can use an addp instruction.

llvm#94550) For regex patterns that produce zero-length matches, there is one (imaginary) match in-between every character in the sequence being searched (as well as before the first character and after the last character). It's easiest to demonstrate using replacement: `std::regex_replace("abc"s, "!", "")` should produce `!a!b!c!`, where each exclamation mark makes a zero-length match visible. Currently our implementation doesn't correctly set the prefix of each zero-length match, "swallowing" the characters separating the imaginary matches -- e.g. when going through zero-length matches within `abc`, the corresponding prefixes should be `{'', 'a', 'b', 'c'}`, but before this patch they will all be empty (`{'', '', '', ''}`). This happens in the implementation of `regex_iterator::operator++`. Note that the Standard spells out quite explicitly that the prefix might need to be adjusted when dealing with zero-length matches in [`re.regiter.incr`](http://eel.is/c++draft/re.regiter.incr): > In all cases in which the call to `regex_search` returns `true`, `match.prefix().first` shall be equal to the previous value of `match[0].second`... It is unspecified how the implementation makes these adjustments. [Reproduction example](https://godbolt.org/z/8ve6G3dav) ```cpp #include <iostream> #include <regex> #include <string> int main() { std::string str = "abc"; std::regex empty_matching_pattern(""); { // The underlying problem is that `regex_iterator::operator++` doesn't update // the prefix correctly. std::sregex_iterator i(str.begin(), str.end(), empty_matching_pattern), e; std::cout << "\""; for (; i != e; ++i) { const std::ssub_match& prefix = i->prefix(); std::cout << prefix.str(); } std::cout << "\"\n"; // Before the patch: "" // After the patch: "abc" } { // `regex_replace` makes the problem very visible. std::string replaced = std::regex_replace(str, empty_matching_pattern, "!"); std::cout << "\"" << replaced << "\"\n"; // Before the patch: "!!!!" // After the patch: "!a!b!c!" } } ``` Fixes llvm#64451 rdar://119912002

Re-apply llvm#87550 with fixes. Details: Some tests in fuchsia failed because of the newly added assertion. This was because `GetExceptionBreakpoint()` could be called before `g_dap.debugger` was initted. The fix here is to just lazily populate the list in GetExceptionBreakpoint() rather than assuming it's already been initted. (There is some nuisance here because we can't simply just populate it in DAP::DAP(), which is a global ctor and is called before `SBDebugger::Initialize()` is called. )

This patch reverts 9b832b7 (llvm#87111): - [libc++] Deprecated `shared_ptr` Atomic Access APIs as per P0718R2 - [libc++] Implemented P2869R3: Remove Deprecated `shared_ptr` Atomic Access APIs from C++26 As explained in [1], the suggested replacement in P2869R3 is `__cpp_lib_atomic_shared_ptr`, which libc++ does not yet implement. Let's not deprecate the old way of doing things before the new way of doing things exists. [1]: llvm#87111 (comment)

…rep expression (and remove an unused argument)

Add SHAPE runtime API (will be used for assumed-rank, lowering is generating other cases inline). I tried to make it in a way were there is no dynamic allocation in the runtime/deallocation expected to be inserted by inline code for arrays that we know are small (lowering will just always stack allocate a rank 15 array to avoid dynamic stack allocation or heap allocation).

…lag (llvm#94749)

) Summary: AMDGPU supports a `target-id` feature which is used to qualify targets with different incompatible features. These are both rules and target features. Currently, we pass `-target-cpu` twice when offloading to OpenMP, and do not pass the target-id features at all. The effect was that passing something like `--offload-arch=gfx90a:xnack+` would show up as `-target-cpu=gfx90a:xnack+ -target-cpu=gfx90a`. Thus ignoring the xnack completely and passing it twice. This patch fixes that to pass it once and then separate it like how HIP does.

…m#94592) As discussed in llvm#94443, this PR changes the wording to be more correct.

…lvm#94756)

Otherwise, older copies of LLD may not understand the latest bitcode versions (for example, if we increase `ModuleSummaryIndex::BitCodeSummaryVersion`) Related to llvm#90692 (comment)

…lvm#94538) It also moves the test near other similar test cases.

banach-space self-requested a review February 7, 2024 19:22

dsandersllvm and others added 29 commits June 7, 2024 14:01

Test commit

abd0bf1

Revert "Test commit"

bf572a5

This reverts commit 2ec122d.

[clang-format] Don't format comments in SkipMacroDefinitionBody (llvm…

28e57cd

…#94425) Fixes llvm#94326.

[RISCV] Support select/merge like ops for bf16 vectors when have Zvfb…

5ddc841

…fmin (llvm#91936)

[libc][math][c23] Implement fmaxf16 and fminf16 function (llvm#94131)

95e1431

Implements fmaxf16 and fminf16, which are two missing functions listed here: llvm#93566

[lldb] Fix inconsistencies in DWARFExpression errors (llvm#94554)

9e25be5

This patch make all errors start with a lowercase letter and removes trailing periods and newlines. This fixes inconsistencies between error messages and facilitate concatenating them.

[libc][math][c23] Add {nextafter,nexttoward,nextup,nextdown}f16 C23 m…

f2165ae

…ath functions (llvm#94535) llvm#93566

[NFC] Remove unused value (llvm#94439)

b22873d

[WebAssembly] Set IS_64 flag correctly on __indirect_function_table i…

725b792

…n object files (llvm#94487) Follow up to llvm#92042

Revert "[RISCV] Support select/merge like ops for bf16 vectors when h…

c719881

…ave Zvfbfmin" (llvm#94565) Reverts llvm#91936 Premerge bots are broken.

[MLIR][LLVM] Improve module translation comment (NFC) (llvm#94577)

8ffa33f

This commit enhances the docsting of `translateModuleToLLVMIR` as a followup to llvm#94445

[clang] NFCI: Make ASTContext optional in the AST text dumper again (l…

b6c4da3

…lvm#94522)

[InstCombine] Add more tests for select equivalence fold (NFC)

ab331bb

[InstCombine] Only requite not-undef in select equiv fold

d836ae8

As the comment already indicates, only replacement with undef is problematic, as it introduces an additional use of undef. Use the correct ValueTracking helper.

[ValueTracking] Make undef element check more precise

942e935

If we're only checking for undef, then also only look for undef elements in the vector (rather than undef and poison).

[ARM] vabd.ll - regenerate test checks

2b0061c

Cleanup for llvm#94504

[ARM] vaba.ll - regenerate test checks

43a52d5

Cleanup for llvm#94504

[MC][RISCV] relocations.s - add missing opcode to test check

39027b5

davemgreen and others added 29 commits June 7, 2024 14:02

[ARM] Clean up neon_vabd.ll, vaba.ll and vabd.ll tests a bit. NFC

b9d3565

Change the target triple to remove some unnecessary instructions.

[AArch64] Add addp from shuffles tests. NFC

50bec57

[flang] lower SIZE and SIZEOF for assumed-ranks (llvm#94684)

8d913d5

[mlir][vector] Remove Emulated Sub-directory (llvm#94742)

ad12734

The "Emulated" sub-directories under "ArmSVE" and "ArmSME" have been removed. Associated tests have been moved up a directory and now include the "REQUIRES" constraint for the arm-emulator.

[gn] port 33a6ce1 (check-clang obj2yaml dep)

7f5aeb1

[gn] port cb7690a (ntdll dep)

aae32f6

[KnownBits] Remove hasConflict() assertions (llvm#94568)

5b14f6d

Allow KnownBits to represent "always poison" values via conflict. close: llvm#94436

[libc++][test][AIX] Only XFAIL atomic tests for before clang 19 (llvm…

1508a3d

…#94646) These tests pass on 64-bit. They were fixed by 5fdd094 on 32-bit. So XFAIL only for 32-bit before clang 19.

[AArch64] Add patterns for add(uzp1(x,y), uzp2(x, y)) -> addp.

53615ae

If we are extracting the even lanes and the odd lanes and adding them, we can use an addp instruction.

[LV] Add test with dead load and vector pointer.

3a93ccc

[Reassociate] shifttest.ll - generate test checks to replace custom g…

fd08cef

…rep expression (and remove an unused argument)

[flang][OpenMP] Add --openmp-enable-delayed-privatization-staging f…

8a0529a

…lag (llvm#94749)

Fixed grammatical error in "enum specifier" error msg llvm#94443 (llv…

624a743

…m#94592) As discussed in llvm#94443, this PR changes the wording to be more correct.

[clang] always use resolved arguments for default argument deduction (l…

f7d4ecb

…lvm#94756)

Check if LLD is built when checking if lto_supported (llvm#92752)

d931adf

Otherwise, older copies of LLD may not understand the latest bitcode versions (for example, if we increase `ModuleSummaryIndex::BitCodeSummaryVersion`) Related to llvm#90692 (comment)

[mlir][vector][NFC] Make function name more meaningful in lit tests. (l…

ae23164

…lvm#94538) It also moves the test near other similar test cases.

update

b928554

Merge branch 'main' into implement-grouped-conv-interface

df0747c

srcarroll closed this Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement grouped conv interface #80870

Implement grouped conv interface #80870

srcarroll commented Feb 6, 2024

Implement grouped conv interface #80870

Implement grouped conv interface #80870

Conversation

srcarroll commented Feb 6, 2024