Skip to content

[GEN] Update GENX branch to LLVM 657ec73#14230

Merged
whitneywhtsang merged 755 commits into
intel:genxfrom
whitneywhtsang:merge
Jun 20, 2024
Merged

[GEN] Update GENX branch to LLVM 657ec73#14230
whitneywhtsang merged 755 commits into
intel:genxfrom
whitneywhtsang:merge

Conversation

@whitneywhtsang
Copy link
Copy Markdown
Contributor

No description provided.

jayfoad and others added 30 commits June 13, 2024 20:20
This patch is a collection of one-liner migrations to
getValueArrayForSite.
The file was added to MLIRBindingsPythonCoreNoCAPI but objects weren't.

Signed-off-by: Jacques Pienaar <jpienaar@google.com>
If NV == 0, nothing interesting happens after the "if" statement.  We
should just "continue" to the next value site.

While I am at it, this patch migrates a use of getValueForSite to
getValueArrayForSite.
The setAtom call introduced by e17bc02
was due to my misunderstanding of flushPendingLabels
(see https://discourse.llvm.org/t/mc-removing-aligned-bundling-support/79518).

When evaluating `.quad x-y`,
MCExpr.cpp:AttemptToFoldSymbolOffsetDifference gives different results
at parse time and layout time because the `if (FA->getAtom() ==
FB.getAtom())` condition in isSymbolRefDifferenceFullyResolvedImpl only
works when `setAtom` with a non-null pointer has been called. Calling
setAtom in flushPendingLabels does not help anything.
…5313)

1. Added a conversion for `vector.deinterleave` to the `VectorToSPIRV`
pass.
2. Added LIT tests for the new conversion.

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Mach-O's `.subsections_via_symbols` mechanism associates a fragment with
an atom (a non-temporary defined symbol). The current approach
(`MCFragment::Atom`) wastes space for other object file formats.

After #95077, `MCFragment::LayoutOrder` is only used by
`AttemptToFoldSymbolOffsetDifference`. While it could be removed, we
might explore future uses for `LayoutOrder`.

@aengelke suggests one use case: move `Atom` into MCSection. This works
because Mach-O doesn't support `.subsection`, and `LayoutOrder`, as the
index into the fragment list, is unchanged.

This patch moves MCFragment::Atom to MCSectionMachO::Atoms. `getAtom`
may be called at parse time before `Atoms` is initialized, so a bound
checking is needed to keep the hack working.

Pull Request: llvm/llvm-project#95341
Hello!

Currently, watchpoints don't work on Windows (this can be reproduced
with the existing tests). This patch fixes the related issues so that
the tests and watchpoints start working.

Here is the list of tests that are fixed by this patch (on Windows,
checked in **release/18.x** branch):
- commands/watchpoints/hello_watchpoint/TestMyFirstWatchpoint.py
- commands/watchpoints/multiple_hits/TestMultipleHits.py
- commands/watchpoints/multiple_threads/TestWatchpointMultipleThreads.py
- commands/watchpoints/step_over_watchpoint/TestStepOverWatchpoint.py
- commands/watchpoints/unaligned-watchpoint/TestUnalignedWatchpoint.py
- commands/watchpoints/watchpoint_commands/TestWatchpointCommands.py
-
commands/watchpoints/watchpoint_commands/command/TestWatchpointCommandLLDB.py
-
commands/watchpoints/watchpoint_commands/command/TestWatchpointCommandPython.py
-
commands/watchpoints/watchpoint_commands/condition/TestWatchpointConditionCmd.py
- commands/watchpoints/watchpoint_count/TestWatchpointCount.py
- commands/watchpoints/watchpoint_disable/TestWatchpointDisable.py
- commands/watchpoints/watchpoint_size/TestWatchpointSizes.py
- python_api/watchpoint/TestSetWatchpoint.py
- python_api/watchpoint/TestWatchpointIgnoreCount.py
- python_api/watchpoint/TestWatchpointIter.py
- python_api/watchpoint/condition/TestWatchpointConditionAPI.py
- python_api/watchpoint/watchlocation/TestTargetWatchAddress.py

---------

Co-authored-by: Jason Molenda <jmolenda@apple.com>
Some of the freelist code uses type punning which is UB in C++, namely
because we read from a union member that is not the active union member.
Print .altinstructions parsing stats only once.
…Axis (#95059)

In unsplitLastAxisInResharding, wrong argument was passed when calling
targetShardingInUnsplitLastAxis.There weren't any tests to uncover this.
I added one in mesh-spmdization.mlir for Linalg and one in
resharding-spmdization.mlir for Mesh dialects.
…d (#95481)

My recent change that distinguishes pass-by-reference from pass-by-value
reduction operation functions missed the "CppReduceComplex" cases, and
also broke the shared library build-bots. Fix.
Use the packaging [1] module for parsing version numbers, instead of
pkg_resources which is distributed with setuptools. I recently switched
over to using the latter, knowing it was deprecated (in favor of the
packaging module) because it comes with Python out of the box. Newer
versions of setuptools have removed `pkg_resources` so we have to use
packaging.

[1] https://pypi.org/project/packaging/
…5477)

Note that the version of getValueProfDataFromInst that returns bool
has been "deprecated" since:

  commit 1e15371
  Author: Mingming Liu <mingmingl@google.com>
  Date:   Mon Apr 1 15:14:49 2024 -0700
This was reverted in llvm/llvm-project#95435
because it broke Android static hwasan binaries. This reland limits the
change to !SANITIZER_ANDROID.

Original commit message:
When set to non-zero, the HWASan runtime will map the shadow base at the
specified constant address.

This is particularly useful in conjunction with the existing compiler
option 'hwasan-mapping-offset', which bakes a hardcoded constant address
into the instrumentation.

---------

Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
…60 (#94004)

lowerInvokeable wasn't updating the returned chain after emitting the
lowerEndEH, which caused SwiftErrorVal-handling code to re-set the DAG
root, and thus accidentally skip the EH_LABEL node it was supposed to
have addeed. After fixing that, a few places needed to be adjusted that
assume the specific shape of the returned DAG.

Fixes: #64826
Fixes: rdar://113994760
…95275)

MacOS 15.0 and iOS 18.0 added a new sysctl to fetch a bitvector of all
the hw.optional.arm.FEAT_*'s in one go. Using this has a perf advantage
over doing multiple round-trips to the kernel and back, but since it's
not present in older oses, we still need the slow fallback.
…ruction, NFC

And VEXEncoding_* are renamed to OpcodePrefix_*.

This is in preparation for the coming pseudo rex/rex2 prefixes support.
…vector.

The instructions are only defined to operator f16 data. If the
scalar FPR register isn't properly nan-boxed, these instructions
will create a fp16 nan not a bf16 nan in the vector register.
…5485)

Note that the version of getValueProfDataFromInst that returns bool
has been "deprecated" since:

  commit 1e15371
  Author: Mingming Liu <mingmingl@google.com>
  Date:   Mon Apr 1 15:14:49 2024 -0700
In MachineBlockPlacement, the function getFirstUnplacedBlock is
inefficient because in most cases (for usual loop CFG), this function
fails to find a candidate, and its complexity becomes O(#(loops in
function) * #(blocks in function)). This makes the compilation of very
long functions slow. This update reduces it to O(k * #(blocks in
function)) where k is the maximum loop nesting depth, by iterating
through the BlockFilter instead.
The bf16 test cases were copied to other files without the Zvfh/Zfvhmin
options. Remove the duplication by adding a few Zvfh command lines to
the bf16 files and deleting the bf16 tests from the test files for f16/f32/f64.
owenca and others added 22 commits June 16, 2024 13:50
It allows to control of error output for the function.

Closes #94205.

---------

Co-authored-by: Owen Pan <owenpiano@gmail.com>
…5719)

The values have been bound already, so use m_Specific.
I made a mistake when I tried to make the code handle the backtick
character like the hash character.  The code did not recognize the delay
control structure.  It caused net names in the declaration to be aligned
to the type name instead of the first net name.

new

```Verilog
wire logic #0 mynet, //
              mynet1;
```

old

```Verilog
wire logic #0 mynet, //
     mynet1;
```
…t of dynamic classes (#75912)

Close llvm/llvm-project#70585 and reflect
itanium-cxx-abi/cxx-abi#170.

The significant change of the patch is: for dynamic classes attached to
module units, we generate the vtable to the attached module units
directly and the key functions for such classes is meaningless.
The dead code is caught by PVS studio analyzer -
https://pvs-studio.com/en/blog/posts/cpp/1126/, fragment N12.

Warning message -
V523 The 'then' statement is equivalent to the 'else' statement.
Options.cpp 1212
…95601)

This patch adds folds for the cases where both operands are the same or
where it can be established that the first operand is less than, equal
to, or greater than the second operand.
close: #94737
alive2: https://alive2.llvm.org/ce/z/WF_7mX

In this patch, we combine `(X + Y) / 2` into `(X & Y)` only when both X
and Y are less than or equal to 1.
… (#95521)

Sinking currently only supports instructions that have zero or one uses.
Extend this to handle instructions with any number of uses, as long as
all uses are consistent (i.e. the "same" for all sinking candidates).

After #94462 this is basically just a matter of looping over all uses
instead of checking the first one only.
…based on' (#95650)

As discussed in
https://discourse.llvm.org/t/getelementptr-inbounds-inbounds-of-which-allocation/79024,
we need the pointer to be inbounds of *the* allocated object the pointer
is based on, not just any allocated object.
…(#95558)

Expand all constant expressions that use fat pointers upfront, so that
the rewriting logic only has to deal with instructions and not the
constant expression variants as well.

My primary motivation is to remove the creation of illegal constant
expressions (mul and shl) from this pass, but this also cuts down quite
a bit on the amount of duplicate logic.
This MIR test case is added to seek the consumption of VGPR lanes being
used for SGPR spills during si-lower-sgpr-spills pass of AMDGPU pass
pipeline. Basically, in this pass, stack slots are mapped to available
VGPR lanes for spilling purpose, thus ending the need for stack slots.

In current scenario, each new SGPR spill goes into new VGPR lanes as,
being mapped from its distinct stack slots assigned during SGPR
allocation pass. It can be clearly seen in the added test case.
For RISC-V, it's always 0 and I don't see any reason we will
change it in the future.
For single-index GEPs the source and result element types are the
same, but using the source type is semantically more correct.
…P (#91871)

This PR adds initial support for the `scmp`/`ucmp` 3-way comparison
intrinsics in the SelectionDAG. Some of the expansions/lowerings
are not optimal yet.
…#95531)

This produces better/more canonical codegen than the generic LLVM
lowering, which is a pattern the backend currently does not recognize.
See: llvm/llvm-project#81840.
Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
@whitneywhtsang whitneywhtsang self-assigned this Jun 20, 2024
@whitneywhtsang whitneywhtsang added the genx Pull requests or issues for genx branch label Jun 20, 2024
@whitneywhtsang whitneywhtsang requested a review from a team June 20, 2024 00:49
@whitneywhtsang whitneywhtsang merged commit 66bcd20 into intel:genx Jun 20, 2024
@whitneywhtsang whitneywhtsang deleted the merge branch June 20, 2024 15:30
iclsrc pushed a commit that referenced this pull request Apr 22, 2026
…e info and fix diagnostic locations (#191658)

This is the initial fix of
llvm/llvm-project#191442. Following the
discussion here
llvm/llvm-project#115418 (comment).

- Fix #21040
- Fix #52659
- Fix #115418
- Fix #14230
- Fix #21133

### Description

This PR introduces a new AST node, `ExplicitInstantiationDecl`, to
systematically fix the long-standing issue of missing or incorrect
source location information for explicit template instantiations.

#### Background & The Problem
Historically, Clang's AST lacked a dedicated node to represent the
lexical occurrence of an explicit instantiation statement. Instead,
`Sema` tried to shoehorn this information into existing specialization
nodes (e.g., `FunctionDecl`, `VarTemplateSpecializationDecl`) or simply
returned `nullptr`.

This resulted in fragmented behavior across the seven instantiable
entity types:
* Function & Member Function Templates: Returned `nullptr`, completely
losing `SourceRange` and `NestedNameSpecifier` information.
* Member Functions & Static Data Members: Mutated existing nodes
in-place. Consequently, multiple `template` or `extern template`
declarations in the same file would overwrite each other's source
locations.
* Variable Templates: Suffered from `dyn_cast` bugs and dropped NNS
information.

#### Design Trade-offs Evaluated

Before settling on the current design, I evaluated a mixed
redeclaration-chain approach (similar to how explicit *specializations*
are handled, creating new `FunctionDecl` nodes and stitching them into
the redecl chain). However, this approach had significant flaws:
1. Inconsistency: It couldn't be cleanly applied to member functions or
static data members due to `DeclContext` constraints (e.g., a member
function shouldn't lexically reside in a namespace `DeclContext`, but
placing it in the class context would pollute member lookup).
2. Fragility: It required bypassing standard `FoldingSet` mechanisms
(`setFunctionTemplateSpecialization`).
3. Lookup Pollution: Injecting new `NamedDecl` nodes purely for
instantiations risked breaking downstream `ASTMatcher`s and altering
name lookup behavior.

To avoid these pitfalls, this PR introduces `ExplicitInstantiationDecl`
as a **purely lexical annotation node**.

**Key Design Characteristics:**
1. Inherits from `Decl`, not `NamedDecl`: This is the most crucial
design choice. Much like `StaticAssertDecl` or `FriendDecl`, this node
lives in a `DeclContext` (making it traversable by `RecursiveASTVisitor`
and visible in AST dumps) but remains completely invisible to C++ name
lookup. It does not interfere with overload resolution or lookup tables.
2. Unified Representation: A single node type now covers all seven
entity types. It holds a pointer (`Specialization`) to the underlying
instantiated declaration, unifying how functions, variables, classes,
and members are handled.
3. Lexical Fidelity: The node resides in the enclosing namespace or
Translation Unit where the explicit instantiation was actually written,
perfectly preserving the `SourceRange`, `NestedNameSpecifierLoc`, and
the exact locations of the `template` and `extern` keywords.

Assisted-by: Claude Code (Anthropic) — used for test writing and
checking test results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

genx Pull requests or issues for genx branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.