[GEN] Update GENX branch to LLVM `657ec73` by whitneywhtsang · Pull Request #14230 · intel/llvm

whitneywhtsang · 2024-06-20T00:48:48Z

No description provided.

…. (#95178)

Extra test cases that caused revert of llvm/llvm-project#92555

See Buildbot failures: - https://lab.llvm.org/buildbot/#/builders/78/builds/13 - https://lab.llvm.org/buildbot/#/builders/182/builds/7

This patch is a collection of one-liner migrations to getValueArrayForSite.

The file was added to MLIRBindingsPythonCoreNoCAPI but objects weren't. Signed-off-by: Jacques Pienaar <jpienaar@google.com>

If NV == 0, nothing interesting happens after the "if" statement. We should just "continue" to the next value site. While I am at it, this patch migrates a use of getValueForSite to getValueArrayForSite.

The setAtom call introduced by e17bc02 was due to my misunderstanding of flushPendingLabels (see https://discourse.llvm.org/t/mc-removing-aligned-bundling-support/79518). When evaluating `.quad x-y`, MCExpr.cpp:AttemptToFoldSymbolOffsetDifference gives different results at parse time and layout time because the `if (FA->getAtom() == FB.getAtom())` condition in isSymbolRefDifferenceFullyResolvedImpl only works when `setAtom` with a non-null pointer has been called. Calling setAtom in flushPendingLabels does not help anything.

…5313) 1. Added a conversion for `vector.deinterleave` to the `VectorToSPIRV` pass. 2. Added LIT tests for the new conversion. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>

@aengelke

Mach-O's `.subsections_via_symbols` mechanism associates a fragment with an atom (a non-temporary defined symbol). The current approach (`MCFragment::Atom`) wastes space for other object file formats. After #95077, `MCFragment::LayoutOrder` is only used by `AttemptToFoldSymbolOffsetDifference`. While it could be removed, we might explore future uses for `LayoutOrder`. @aengelke suggests one use case: move `Atom` into MCSection. This works because Mach-O doesn't support `.subsection`, and `LayoutOrder`, as the index into the fragment list, is unchanged. This patch moves MCFragment::Atom to MCSectionMachO::Atoms. `getAtom` may be called at parse time before `Atoms` is initialized, so a bound checking is needed to keep the hack working. Pull Request: llvm/llvm-project#95341

Hello! Currently, watchpoints don't work on Windows (this can be reproduced with the existing tests). This patch fixes the related issues so that the tests and watchpoints start working. Here is the list of tests that are fixed by this patch (on Windows, checked in **release/18.x** branch): - commands/watchpoints/hello_watchpoint/TestMyFirstWatchpoint.py - commands/watchpoints/multiple_hits/TestMultipleHits.py - commands/watchpoints/multiple_threads/TestWatchpointMultipleThreads.py - commands/watchpoints/step_over_watchpoint/TestStepOverWatchpoint.py - commands/watchpoints/unaligned-watchpoint/TestUnalignedWatchpoint.py - commands/watchpoints/watchpoint_commands/TestWatchpointCommands.py - commands/watchpoints/watchpoint_commands/command/TestWatchpointCommandLLDB.py - commands/watchpoints/watchpoint_commands/command/TestWatchpointCommandPython.py - commands/watchpoints/watchpoint_commands/condition/TestWatchpointConditionCmd.py - commands/watchpoints/watchpoint_count/TestWatchpointCount.py - commands/watchpoints/watchpoint_disable/TestWatchpointDisable.py - commands/watchpoints/watchpoint_size/TestWatchpointSizes.py - python_api/watchpoint/TestSetWatchpoint.py - python_api/watchpoint/TestWatchpointIgnoreCount.py - python_api/watchpoint/TestWatchpointIter.py - python_api/watchpoint/condition/TestWatchpointConditionAPI.py - python_api/watchpoint/watchlocation/TestTargetWatchAddress.py --------- Co-authored-by: Jason Molenda <jmolenda@apple.com>

Some of the freelist code uses type punning which is UB in C++, namely because we read from a union member that is not the active union member.

Print .altinstructions parsing stats only once.

…Axis (#95059) In unsplitLastAxisInResharding, wrong argument was passed when calling targetShardingInUnsplitLastAxis.There weren't any tests to uncover this. I added one in mesh-spmdization.mlir for Linalg and one in resharding-spmdization.mlir for Mesh dialects.

…d (#95481) My recent change that distinguishes pass-by-reference from pass-by-value reduction operation functions missed the "CppReduceComplex" cases, and also broke the shared library build-bots. Fix.

Use the packaging [1] module for parsing version numbers, instead of pkg_resources which is distributed with setuptools. I recently switched over to using the latter, knowing it was deprecated (in favor of the packaging module) because it comes with Python out of the box. Newer versions of setuptools have removed `pkg_resources` so we have to use packaging. [1] https://pypi.org/project/packaging/

…5477) Note that the version of getValueProfDataFromInst that returns bool has been "deprecated" since: commit 1e15371 Author: Mingming Liu <mingmingl@google.com> Date: Mon Apr 1 15:14:49 2024 -0700

This was reverted in llvm/llvm-project#95435 because it broke Android static hwasan binaries. This reland limits the change to !SANITIZER_ANDROID. Original commit message: When set to non-zero, the HWASan runtime will map the shadow base at the specified constant address. This is particularly useful in conjunction with the existing compiler option 'hwasan-mapping-offset', which bakes a hardcoded constant address into the instrumentation. --------- Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>

…60 (#94004) lowerInvokeable wasn't updating the returned chain after emitting the lowerEndEH, which caused SwiftErrorVal-handling code to re-set the DAG root, and thus accidentally skip the EH_LABEL node it was supposed to have addeed. After fixing that, a few places needed to be adjusted that assume the specific shape of the returned DAG. Fixes: #64826 Fixes: rdar://113994760

…95275) MacOS 15.0 and iOS 18.0 added a new sysctl to fetch a bitvector of all the hw.optional.arm.FEAT_*'s in one go. Using this has a perf advantage over doing multiple round-trips to the kernel and back, but since it's not present in older oses, we still need the slow fallback.

…ruction, NFC And VEXEncoding_* are renamed to OpcodePrefix_*. This is in preparation for the coming pseudo rex/rex2 prefixes support.

…vector. The instructions are only defined to operator f16 data. If the scalar FPR register isn't properly nan-boxed, these instructions will create a fp16 nan not a bf16 nan in the vector register.

…5485) Note that the version of getValueProfDataFromInst that returns bool has been "deprecated" since: commit 1e15371 Author: Mingming Liu <mingmingl@google.com> Date: Mon Apr 1 15:14:49 2024 -0700

In MachineBlockPlacement, the function getFirstUnplacedBlock is inefficient because in most cases (for usual loop CFG), this function fails to find a candidate, and its complexity becomes O(#(loops in function) * #(blocks in function)). This makes the compilation of very long functions slow. This update reduces it to O(k * #(blocks in function)) where k is the maximum loop nesting depth, by iterating through the BlockFilter instead.

This fixes llvm/llvm-project#95417

The bf16 test cases were copied to other files without the Zvfh/Zfvhmin options. Remove the duplication by adding a few Zvfh command lines to the bf16 files and deleting the bf16 tests from the test files for f16/f32/f64.

Closes #95094.

It allows to control of error output for the function. Closes #94205. --------- Co-authored-by: Owen Pan <owenpiano@gmail.com>

…5719) The values have been bound already, so use m_Specific.

Fix #95343 .

I made a mistake when I tried to make the code handle the backtick character like the hash character. The code did not recognize the delay control structure. It caused net names in the declaration to be aligned to the type name instead of the first net name. new ```Verilog wire logic #0 mynet, // mynet1; ``` old ```Verilog wire logic #0 mynet, // mynet1; ```

…t of dynamic classes (#75912) Close llvm/llvm-project#70585 and reflect itanium-cxx-abi/cxx-abi#170. The significant change of the patch is: for dynamic classes attached to module units, we generate the vtable to the attached module units directly and the key functions for such classes is meaningless.

The dead code is caught by PVS studio analyzer - https://pvs-studio.com/en/blog/posts/cpp/1126/, fragment N12. Warning message - V523 The 'then' statement is equivalent to the 'else' statement. Options.cpp 1212

…95601) This patch adds folds for the cases where both operands are the same or where it can be established that the first operand is less than, equal to, or greater than the second operand.

close: #94737 alive2: https://alive2.llvm.org/ce/z/WF_7mX In this patch, we combine `(X + Y) / 2` into `(X & Y)` only when both X and Y are less than or equal to 1.

… (#95521) Sinking currently only supports instructions that have zero or one uses. Extend this to handle instructions with any number of uses, as long as all uses are consistent (i.e. the "same" for all sinking candidates). After #94462 this is basically just a matter of looping over all uses instead of checking the first one only.

Fixes: #76426

…based on' (#95650) As discussed in https://discourse.llvm.org/t/getelementptr-inbounds-inbounds-of-which-allocation/79024, we need the pointer to be inbounds of *the* allocated object the pointer is based on, not just any allocated object.

…(#95558) Expand all constant expressions that use fat pointers upfront, so that the rewriting logic only has to deal with instructions and not the constant expression variants as well. My primary motivation is to remove the creation of illegal constant expressions (mul and shl) from this pass, but this also cuts down quite a bit on the amount of duplicate logic.

This MIR test case is added to seek the consumption of VGPR lanes being used for SGPR spills during si-lower-sgpr-spills pass of AMDGPU pass pipeline. Basically, in this pass, stack slots are mapped to available VGPR lanes for spilling purpose, thus ending the need for stack slots. In current scenario, each new SGPR spill goes into new VGPR lanes as, being mapped from its distinct stack slots assigned during SGPR allocation pass. It can be clearly seen in the added test case.

For RISC-V, it's always 0 and I don't see any reason we will change it in the future.

For single-index GEPs the source and result element types are the same, but using the source type is semantically more correct.

…P (#91871) This PR adds initial support for the `scmp`/`ucmp` 3-way comparison intrinsics in the SelectionDAG. Some of the expansions/lowerings are not optimal yet.

…#95531) This produces better/more canonical codegen than the generic LLVM lowering, which is a pattern the backend currently does not recognize. See: llvm/llvm-project#81840.

Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

…e info and fix diagnostic locations (#191658) This is the initial fix of llvm/llvm-project#191442. Following the discussion here llvm/llvm-project#115418 (comment). - Fix #21040 - Fix #52659 - Fix #115418 - Fix #14230 - Fix #21133 ### Description This PR introduces a new AST node, `ExplicitInstantiationDecl`, to systematically fix the long-standing issue of missing or incorrect source location information for explicit template instantiations. #### Background & The Problem Historically, Clang's AST lacked a dedicated node to represent the lexical occurrence of an explicit instantiation statement. Instead, `Sema` tried to shoehorn this information into existing specialization nodes (e.g., `FunctionDecl`, `VarTemplateSpecializationDecl`) or simply returned `nullptr`. This resulted in fragmented behavior across the seven instantiable entity types: * Function & Member Function Templates: Returned `nullptr`, completely losing `SourceRange` and `NestedNameSpecifier` information. * Member Functions & Static Data Members: Mutated existing nodes in-place. Consequently, multiple `template` or `extern template` declarations in the same file would overwrite each other's source locations. * Variable Templates: Suffered from `dyn_cast` bugs and dropped NNS information. #### Design Trade-offs Evaluated Before settling on the current design, I evaluated a mixed redeclaration-chain approach (similar to how explicit *specializations* are handled, creating new `FunctionDecl` nodes and stitching them into the redecl chain). However, this approach had significant flaws: 1. Inconsistency: It couldn't be cleanly applied to member functions or static data members due to `DeclContext` constraints (e.g., a member function shouldn't lexically reside in a namespace `DeclContext`, but placing it in the class context would pollute member lookup). 2. Fragility: It required bypassing standard `FoldingSet` mechanisms (`setFunctionTemplateSpecialization`). 3. Lookup Pollution: Injecting new `NamedDecl` nodes purely for instantiations risked breaking downstream `ASTMatcher`s and altering name lookup behavior. To avoid these pitfalls, this PR introduces `ExplicitInstantiationDecl` as a **purely lexical annotation node**. **Key Design Characteristics:** 1. Inherits from `Decl`, not `NamedDecl`: This is the most crucial design choice. Much like `StaticAssertDecl` or `FriendDecl`, this node lives in a `DeclContext` (making it traversable by `RecursiveASTVisitor` and visible in AST dumps) but remains completely invisible to C++ name lookup. It does not interfere with overload resolution or lookup tables. 2. Unified Representation: A single node type now covers all seven entity types. It holds a pointer (`Specialization`) to the underlying instantiated declaration, unifying how functions, variables, classes, and members are handled. 3. Lexical Fidelity: The node resides in the enclosing namespace or Translation Unit where the explicit instantiation was actually written, perfectly preserving the `SourceRange`, `NestedNameSpecifierLoc`, and the exact locations of the `template` and `extern` keywords. Assisted-by: Claude Code (Anthropic) — used for test writing and checking test results

jayfoad and others added 30 commits June 13, 2024 20:20

[llvm-project] Fix typo "seperate" (#95373)

d4a0154

[mlir][TilingInterface] Update documentation for TilingInterface.td…

c7b5be8

…. (#95178)

[LV] Add extra cost model tests with truncated inductions.

52d29eb

Extra test cases that caused revert of llvm/llvm-project#92555

[libc] Fix build breaks caused by f16sqrtf changes (#95459)

ba7d5eb

See Buildbot failures: - https://lab.llvm.org/buildbot/#/builders/78/builds/13 - https://lab.llvm.org/buildbot/#/builders/182/builds/7

[ProfileData] Migrate to getValueArrayForSite (#95457)

4158773

This patch is a collection of one-liner migrations to getValueArrayForSite.

[mlir][bzl] Add missing dep

93181db

The file was added to MLIRBindingsPythonCoreNoCAPI but objects weren't. Signed-off-by: Jacques Pienaar <jpienaar@google.com>

[llvm-profdata] Clean up traverseAllValueSites (NFC) (#95467)

1365ce2

If NV == 0, nothing interesting happens after the "if" statement. We should just "continue" to the next value site. While I am at it, this patch migrates a use of getValueForSite to getValueArrayForSite.

[mlir][spirv] Implement SPIR-V lowering for vector.deinterleave (#9…

597cde1

…5313) 1. Added a conversion for `vector.deinterleave` to the `VectorToSPIRV` pass. 2. Added LIT tests for the new conversion. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>

[HWASan] disable hwasan_symbolize_stack_uas on x86

8fa7cf0

[libc][stdlib] Fix UB in freelist (#95330)

3106a23

Some of the freelist code uses type punning which is UB in C++, namely because we read from a union member that is not the active union member.

[BOLT] Fix duplicate diagnostic message (#95167)

1ebda11

Print .altinstructions parsing stats only once.

[flang] Address missed cases for REDUCE change, fixes shared lib buil…

c54f5f6

…d (#95481) My recent change that distinguishes pass-by-reference from pass-by-value reduction operation functions missed the "CppReduceComplex" cases, and also broke the shared library build-bots. Fix.

[Transforms] Migrate to a new version of getValueProfDataFromInst (#9…

602634d

…5477) Note that the version of getValueProfDataFromInst that returns bool has been "deprecated" since: commit 1e15371 Author: Mingming Liu <mingmingl@google.com> Date: Mon Apr 1 15:14:49 2024 -0700

[HWASan] comment why hwasan_symbolize_stack_uas is arm64 only

41c50f0

[X86][AsmParser] Avoid duplicated code in MatchAndEmit(ATT/Intel)Inst…

50ead2e

…ruction, NFC And VEXEncoding_* are renamed to OpcodePrefix_*. This is in preparation for the coming pseudo rex/rex2 prefixes support.

[RISCV] Don't use SEW=16 .vf instructions to move scalar bf16 into a …

cb021f5

…vector. The instructions are only defined to operator f16 data. If the scalar FPR register isn't properly nan-boxed, these instructions will create a fp16 nan not a bf16 nan in the vector register.

[Transforms] Migrate to a new version of getValueProfDataFromInst (#9…

836ca5b

…5485) Note that the version of getValueProfDataFromInst that returns bool has been "deprecated" since: commit 1e15371 Author: Mingming Liu <mingmingl@google.com> Date: Mon Apr 1 15:14:49 2024 -0700

[X86][MC] Add missing support for pseudo rex/rex2 prefix in assembler

eb1248f

This fixes llvm/llvm-project#95417

[RISCV] Remove unused check prefixes. NFC

5745851

[RISCV] Remove duplicate bf16 testing. NFC

a7a1195

The bf16 test cases were copied to other files without the Zvfh/Zfvhmin options. Remove the duplication by adding a few Zvfh command lines to the bf16 files and deleting the bf16 tests from the test files for f16/f32/f64.

StreamChecker.cpp: Use isa<> (for #93408) [-Wunused-but-set-variable]

43bd7ae

owenca and others added 22 commits June 16, 2024 13:50

[clang-format] Handle AttributeMacro before access modifiers (#95634)

a106131

Closes #95094.

[clang-format] Add DiagHandler parameter to format::getStyle() (#91317)

fe9aef0

It allows to control of error output for the function. Closes #94205. --------- Co-authored-by: Owen Pan <owenpiano@gmail.com>

[clang][NFC] Update C++ DR issues list

d340f62

[Transforms] Replace incorrect uses of m_Deferred with m_Specific (#9…

fbac697

…5719) The values have been bound already, so use m_Specific.

[clang-format][NFC] Suppress diagnostic noise in GetStyleOfFile test

527e732

[DebugInfo][Reassociate] Fix missing debug location drop (#95355)

470d59d

Fix #95343 .

[LLDB] Remove dead code (NFC) (#95713)

e4e350e

The dead code is caught by PVS studio analyzer - https://pvs-studio.com/en/blog/posts/cpp/1126/, fragment N12. Warning message - V523 The 'then' statement is equivalent to the 'else' statement. Options.cpp 1212

[InstSimplify] Implement simple folds for ucmp/scmp intrinsics (#…

b7b3d17

…95601) This patch adds folds for the cases where both operands are the same or where it can be established that the first operand is less than, equal to, or greater than the second operand.

[InstCombine] simplify average of lsb (#95684)

1d4e857

close: #94737 alive2: https://alive2.llvm.org/ce/z/WF_7mX In this patch, we combine `(X + Y) / 2` into `(X & Y)` only when both X and Y are less than or equal to 1.

[clang][AArch64] Add validation for Global Register Variable. (#94271)

5fe7f73

Fixes: #76426

[RISCV] Remove getOffsetOfLocalArea() (#93765)

94a6b9c

For RISC-V, it's always 0 and I don't see any reason we will change it in the future.

[InstCombine] Prefer source over result element type (NFC)

9a86d0a

For single-index GEPs the source and result element types are the same, but using the source type is semantically more correct.

[SelectionDAG] Add support for the 3-way comparison intrinsics [US]CM…

995835f

…P (#91871) This PR adds initial support for the `scmp`/`ucmp` 3-way comparison intrinsics in the SelectionDAG. Some of the expansions/lowerings are not optimal yet.

[mlir][ArmSVE] Lower predicate-sized vector.create_masks to whilelt (…

657ec73

…#95531) This produces better/more canonical codegen than the generic LLVM lowering, which is a pattern the backend currently does not recognize. See: llvm/llvm-project#81840.

Merge commit '657ec7320d8a28171755ba0dd5afc570a5a16791'

6e905d1

[GEN] Update libGenISAIntrinsics

66bcd20

Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

whitneywhtsang self-assigned this Jun 20, 2024

whitneywhtsang added the genx Pull requests or issues for genx branch label Jun 20, 2024

whitneywhtsang requested a review from a team June 20, 2024 00:49

victor-eds approved these changes Jun 20, 2024

View reviewed changes

whitneywhtsang merged commit 66bcd20 into intel:genx Jun 20, 2024

whitneywhtsang deleted the merge branch June 20, 2024 15:30

whitneywhtsang mentioned this pull request Jun 22, 2024

Merge OpenAI Triton till June 21st intel/intel-xpu-backend-for-triton#1298

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GEN] Update GENX branch to LLVM `657ec73`#14230

[GEN] Update GENX branch to LLVM `657ec73`#14230
whitneywhtsang merged 755 commits into
intel:genxfrom
whitneywhtsang:merge

whitneywhtsang commented Jun 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

whitneywhtsang commented Jun 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants