Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMPIRBuilder] Added createTeams #65767

Closed
wants to merge 33 commits into from
Closed

Conversation

shraiysh
Copy link
Member

@shraiysh shraiysh commented Sep 8, 2023

This patch adds a generator for the teams construct. The generated IR looks like the following:

current_fn() {
  ...
  call @__kmpc_fork_teams(ptr @ident, i32 num_args, ptr @outlined_omp_teams, ...args)
  ...
}
outlined_omp_teams(ptr %global_tid, ptr %bound_tid, ...args) {
  ; teams body
}

It does this by first generating the body in the current function. Then we outline the
body in a temporary function. We then create the @outlined_omp_teams function and embed
the temporary outlined function in this function. We then emit the call to runtime
function.

This patch adds a generator for the teams construct. The generated IR looks like the following:

```
current_fn() {
  ...
  call @__kmpc_fork_teams(ptr @Ident, i32 num_args, ptr @outlined_omp_teams, ...args)
  ...
}
outlined_omp_teams(ptr %global_tid, ptr %bound_tid, ...args) {
  ; teams body
}
```

It does this by first generating the body in the current function. Then we outline the
body in a temporary function. We then create the @outlined_omp_teams function and embed
the temporary outlined function in this function. We then emit the call to runtime
function.
@shraiysh shraiysh requested a review from a team as a code owner September 8, 2023 14:57
yronglin and others added 2 commits September 8, 2023 22:59
…lvm#65762)

This whitespaces breaking `Format` CI.
```
$ trap 'kill -- $$' INT TERM QUIT; clang/utils/ci/run-buildbot check-format
+ set -o pipefail
+ unset LANG
+ unset LC_ALL
+ unset LC_COLLATE
++ basename clang/utils/ci/run-buildbot
+ PROGNAME=run-buildbot
+ [[ 1 == 0 ]]
+ [[ 1 -gt 0 ]]
+ case ${1} in
+ BUILDER=check-format
+ shift
+ [[ 0 -gt 0 ]]
++ git rev-parse --show-toplevel
+ MONOREPO_ROOT=/var/lib/buildkite-agent/builds/linux-56-fdf668759-vtv88-1/llvm-project/clang-ci
+ BUILD_DIR=/var/lib/buildkite-agent/builds/linux-56-fdf668759-vtv88-1/llvm-project/clang-ci/build/check-format
+ INSTALL_DIR=/var/lib/buildkite-agent/builds/linux-56-fdf668759-vtv88-1/llvm-project/clang-ci/build/check-format/install
+ cmake --version
cmake version 3.23.3
CMake suite maintained and supported by Kitware (kitware.com/cmake).
+ ninja --version
1.10.1
+ case "${BUILDER}" in
+ grep -rnI '[[:blank:]]$' clang/lib clang/include clang/docs
clang/include/clang/Basic/riscv_vector.td:580:        let Name = op # eew # "_v", IRName = op, MaskedIRName = op # "_mask",
clang/include/clang/Basic/riscv_vector.td:1786:    let RequiredFeatures = ["ZvfhminOrZvfh"] in
```

Signed-off-by: yronglin <yronglin777@gmail.com>
This reverts the following commits:

 - 5d7b75e
  [NFC][memprof] Temporarly remove RTSanitizerCommonSymbolizerInternal

 - edb211c
   [NFC][memprof] Temporarly remove RTSanitizerCommonSymbolizerInternal

 - 4d14b4a
   [sanitizer] Add CMake flag to build with internal symbolizer

They break macOS nodes because CMake can't evaluate generator expressions:

  Error evaluating generator expression:

    $<TARGET_OBJECTS:RTSanitizerCommonSymbolizerInternal.osx>
    $<TARGET_OBJECTS:RTSanitizerCommonSymbolizerInternal.ios>
    $<TARGET_OBJECTS:RTSanitizerCommonSymbolizerInternal.iossim>
MacDue and others added 6 commits September 8, 2023 16:11
llvm#65738)

Due to a copy-paste error in 32b7c1f this was incorrectly mapped to
`TosaProfileEnum::MainTraining` rather than `TosaProfileEnum::Undefined`.
…rent SDNode (llvm#65765)

SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:

t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!

Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.

Recommit with a fix for a use-after-free bug in the first version of
this patch (llvm#65340) which was caught by asan.
This allows front ends to insert s_nops - this is most often when a
delay less
than s_sleep 1 is required.
The return value of the function is only used to get the debug location.
Directly return the debug location, as this avoids an extra null
check in the caller.
llvm::ArrayType::get(
llvm::PointerType::getUnqual(M.getContext()), Names.size()),
llvm::ArrayType::get(llvm::PointerType::getUnqual(M.getContext()),
Names.size()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you clang-formated the entire file. Either only format your changes, or pre-commit a NFC that formats the existing file. This way it clutters the changes with unrelated ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. I assumed the file would be formatted. I will update this soon.

BodyGenCallbackTy BodyGenCB) {
if (!updateToLocation(Loc)) {
return Loc.IP;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: No braces

}
FunctionCallee WrapperFuncVal = M.getOrInsertFunction(
"outlined_omp_teams",
FunctionType::get(Builder.getVoidTy(), WrapperArgTys, false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function generated? I assume it's internal, if so, we cannot use the name.

Builder.SetInsertPoint(ExitBB);

return Builder.saveIP();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the parallel handling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to task handling. Parallel handling uses CodeExtractor to correct the arguments I think. I didn't quite understand that - CodeExtractor and OutlineInfo seem to do similar things.

@@ -2005,6 +2005,13 @@ class OpenMPIRBuilder {
/// \param Loc The insert and source location description.
void createTargetDeinit(const LocationDescription &Loc);

/// Generator for `#omp teams`
///
/// \param Loc The location where the task construct was encountered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : task construct -> teams construct

Lancern and others added 9 commits September 8, 2023 11:47
This commit contains refactorings around __dynamic_cast without changing
its behavior. Some important changes include:

- Refactor __dynamic_cast into various small helper functions;
- Move dynamic_cast_stress.pass.cpp to libcxx/benchmarks and refactor
  it into a benchmark. The benchmark performance numbers are updated
  as well.

Differential Revision: https://reviews.llvm.org/D138006
Like concepts checking, a trailing return type of a lambda
in a dependent context may refer to captures in which case
they may need to be rebuilt, so the map of local decl
should include captures.

This patch reveal a pre-existing issue.
`this` is always recomputed by TreeTransform.

`*this` (like all captures) only become `const`
after the parameter list.

However, if try to recompute the value of `this` (in a parameter)
during template instantiation while determining the type of the call operator,
we will determine  it to be const (unless the lambda is mutable).

There is no good way to know at that point that we are in a parameter
or not, the easiest/best solution is to transform the type of this.

Note that doing so break a handful of HLSL tests.
So this is a prototype at this point.

Fixes llvm#65067
Fixes llvm#63675

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D159126
Note that these are non-canonical.  At IR, we will generally canonicalize to the intptrty width form if knownbits allows.
The first test case added in the LIT test demonstrates the problem.
Even though we did not consider the inner loop as a candidate for
the transformation due to the array_coor with a slice, we decided to
version the outer loop for the same function argument.
During the cloning of the outer loop we dropped the slicing completely
producing invalid code.

I restructured the code so that we record all arg uses that cannot be
transformed (regardless of the reason), and then fixup the usage
information across the loop nests. I also noticed that we may generate
redundant contiguity checks for the inner loops, so I fixed it
since it was easy with the new way of keeping the usage data.
…#65707)

This is a copy of the corresponding ArrayValueCopy analysis
for non-overlapping array slices. It is required to achieve
the same performance for Polyhedron/nf, though, additional
changes are needed in the alias analysis for disambiguating
host associated accesses.
htyu and others added 6 commits September 8, 2023 09:49
…tes (llvm#65685)

With `-fpseudo-probe-for-profiling`, the dwarf discriminator for a
callsite will be overwritten to pseudo probe related information for
that callsite. The probe information is encoded in a special format
(i.e., with all lowest three digits be one) in order to be distinguished
from regular dwarf discriminator. The special encoding format will be
decoded to zero by the regular discriminator logic. This means all
callsites would have a zero discriminator in both the sample profile and
the compiler, for classic AutoFDO. This is inconvenient in that no
decent classic AutoFDO can be generated from a pseudo probe build. I'm
mitigating the issue by allowing callsite probe id to be used as the
base dwarf discriminator for classic AutoFDO, since probe id is also
unique and can be used to differentiate callsites on the same source
line.
This patch adds a generator for the teams construct. The generated IR looks like the following:

```
current_fn() {
  ...
  call @__kmpc_fork_teams(ptr @Ident, i32 num_args, ptr @outlined_omp_teams, ...args)
  ...
}
outlined_omp_teams(ptr %global_tid, ptr %bound_tid, ...args) {
  ; teams body
}
```

It does this by first generating the body in the current function. Then we outline the
body in a temporary function. We then create the @outlined_omp_teams function and embed
the temporary outlined function in this function. We then emit the call to runtime
function.
@shraiysh shraiysh requested review from a team as code owners September 8, 2023 16:59
@github-actions github-actions bot added compiler-rt backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" libc vectorization PGO Profile Guided Optimizations flang Flang issues not falling into any other category labels Sep 8, 2023
@shraiysh
Copy link
Member Author

shraiysh commented Sep 8, 2023

Apologies for the ping, unfortunately I have messed up rebasing my branch. I will close this pull request to avoid future notifications for others in this thread. :)

@shraiysh shraiysh closed this Sep 8, 2023
@shraiysh shraiysh deleted the teams_ompbuilder branch September 8, 2023 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" compiler-rt flang Flang issues not falling into any other category libc PGO Profile Guided Optimizations vectorization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet