Skip to content

Conversation

@iclsrc
Copy link
Collaborator

@iclsrc iclsrc commented Oct 10, 2024

adrian-prantl and others added 30 commits September 18, 2024 17:28
…epCandidate() (#109212)

These are helper functions to be used by the vectorizer's dependency
graph.
Resolve #94928

This PR adds `if (TD->getTemplateDecl())` to prevent `InnerD` becoming
`nullptr`, suggested by @firstmoonlight.

I also add `-ast-dump-decl-types` option and declare type `CHECK` to the
testcase `clang/test/AST/ast-dump-concepts.cpp`.

---------

Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
This patch improves the documentation for JITLink by fixing some typos,
correcting indentations and fixing out-dated code examples.
…uild. (#109078)" (#109207)

`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.

Additional change on top of #109078 is to use `cuda::std::complex`
only for the device compilation, otherwise the host compilation
fails because `libcudacxx` may not support `long double` specialization
at all (depending on the compiler).
…109176)

The API is present, and we even have a test for it, but it isn't
documented so no one probably knows you can set requirements for your
scripted commands. This just adds docs and uses it appropriately in the
`framestats` example command.
… is marked Promote.

We have a special check that tries to determine if vector FP
operations are supported for the type to determine whether to
scalarize or not. If FP arithmetic would be promoted, don't unroll.

This improves Zvfhmin codegen on RISC-V.
Check that the destination of G_EXTRACT_SUBVECTOR is smaller than the
source. Improve wording of error messages.
-Improve messages.
-Remove redundant checks that are handled in generic code.
-Add check that the subvector is smaller than the vector.
-Add checks that subvector is smaller than the vector.
This revision adds vector predication smax, smin, umax and umin
intrinsic ops.
…ariable (#109213)

This patch adds new runtime entry points that perform the simple
allocation/deallocation of module allocatable variable with cuda
attributes.
When the allocation is initiated on the host, the descriptor on the
device is synchronized. Both descriptors point to the same data on the
device.

This is the first PR of a stack.
… (#109195)

Change RegisterBankEmitter to use const RecordKeeper.

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
Convert `cuf.allocate` and `cuf.deallocate` to the runtime entry points added
in #109213

Was reviewed in llvm/llvm-project#109214 but the
parent branch was closed for some reason.
Added tests to the validator and fixed issues stemming from the previous skipping over BBs with single successors - which is incorrect. That would be now picked by added tests where the assertions are expected to be triggered.
…ntable callsites (#109184)

Reinforcing properties ensured at instrumentation time.
Example: https://lab.llvm.org/buildbot/#/builders/169/builds/3381

The CI allowed the `llvm::append_range` instantiation, but
on the other hand it's quite unnecessary here.
The code was passing a physical register directly to getPressureSets
which expects a register unit.

Fix this by looping over the register units and calling getPressureSets
for each of them.

Found while trying to add a RegisterUnit class to stop storing register
units in `Register`. 0 is a valid register unit but not a valid
Register.
Change variable name `o` to `OS` to match definition, and 
`ClName` to `ClassName` for better clarity.

Cache RegBank reference in the class and do no pass around
class members to functions.
…r (#108094)

Make sure there is no data transfer generated when a device variable is
used in these intrinsic functions.
…er (#109194)

Change PseudoLoweringEmitter to use const RecordKeeper.

This is a part of effort to have better const correctness in TableGen backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
…109189)

Change InstrInfoEmitter to use const RecordKeeper.

This is a part of effort to have better const correctness in TableGen backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
…8663)

macOS 10.15 added a "full" x86_64 GPR thread state flavor, equivalent to
the normal one but with DS, ES, SS, and GSbase added. This flavor can
only be used with processes that install a custom LDT (functionality
that was also added in 10.15 and is used by apps like Wine to execute
32-bit code).

Along with allowing DS, ES, SS, and GSbase to be viewed/modified, using
the full flavor is necessary when debugging a thread executing 32-bit
code.
If thread_set_state() is used with the regular thread state flavor, the
kernel resets CS to the 64-bit code segment (see
[set_thread_state64()](https://github.com/apple-oss-distributions/xnu/blob/94d3b452840153a99b38a3a9659680b2a006908e/osfmk/i386/pcb.c#L723),
which makes debugging impossible.

There's no way to detect whether the full flavor is available, try to
use it and fall back to the regular one if it's not available.

A downside is that this patch exposes the DS, ES, SS, and GSbase
registers for all x86_64 processes, even though they are not populated
unless the full thread state is available.
I'm not sure if there's a way to tell LLDB that a register is
unavailable. The classic GDB `g` command [allows returning
`x`](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Packets.html#Packets)
to denote unavailable registers, but it seems like the debug server uses
newer commands like `jThreadsInfo` and I'm not sure if those have the
same support.

Fixes #57591
(also filed as Apple FB11464104)

@jasonmolenda
@jsji
Copy link
Contributor

jsji commented Oct 16, 2024

This is ready for review.

@jsji jsji self-assigned this Oct 16, 2024
jsji added 2 commits October 16, 2024 17:03
Before b7b28e7, AreSupportedUsers will skip
MemTransferInst, it may cause unexpected assertion.
https://godbolt.org/z/z5d691fj1
In b7b28e7, we start to allow MemTransferInst,
we should allow it in adjustByValArgAlignment too.

(cherry picked from commit 0bbdc76)
@jsji jsji requested a review from AlexeySachkov October 17, 2024 14:27
@jsji
Copy link
Contributor

jsji commented Oct 17, 2024

@intel/llvm-gatekeepers Please help to issue a /merge. The dev ci and AMD failures are irrelevant, also failing on other PRs.
#15727 is a follow up for bindless image test.

@sarnex
Copy link
Contributor

sarnex commented Oct 17, 2024

/merge

@bb-sycl
Copy link
Contributor

bb-sycl commented Oct 17, 2024

Thu 17 Oct 2024 04:10:01 PM UTC --- Start to merge the commit into sycl branch. It will take several minutes.

@bb-sycl
Copy link
Contributor

bb-sycl commented Oct 17, 2024

Thu 17 Oct 2024 04:14:51 PM UTC --- Merge the branch in this PR to base automatically. Will close the PR later.

@bb-sycl bb-sycl merged commit 20a7cd1 into sycl Oct 17, 2024
24 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable-lint Skip linter check step and proceed with build jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.