Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMPIRBuilder] Added createTeams #65785

Closed
wants to merge 1,131 commits into from
Closed

Conversation

shraiysh
Copy link
Member

@shraiysh shraiysh commented Sep 8, 2023

This patch adds a generator for the teams construct. The generated IR looks like the following:

current_fn() {
  ...
  call @__kmpc_fork_teams(ptr @ident, i32 num_args, ptr @outlined_omp_teams, ...args)
  ...
}
outlined_omp_teams(ptr %global_tid, ptr %bound_tid, ...args) {
  ; teams body
}

It does this by first generating the body in the current function. Then we outline the body in a temporary function. We then create the @outlined_omp_teams function and embed the temporary outlined function in this function. We then emit the call to runtime function.

@@ -6106,6 +6106,147 @@ void OpenMPIRBuilder::loadOffloadInfoMetadata(Module &M) {
}
}

OpenMPIRBuilder::InsertPointTy
OpenMPIRBuilder::createTeams(const LocationDescription &Loc,
BodyGenCallbackTy BodyGenCB) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why we now have basically the same logic 3 times, parallel, tasks, teams.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should be a part of the OutlineInfo as a field of argument mapping or something. I will try to work on it as an independent PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a blocker for this patch? I would like to add that separately because I think it would take a bit longer to design and review. Can we please add support for teams with this independent of that improvement for common usage?

@shraiysh
Copy link
Member Author

Ping for review!

@kiranchandramohan
Copy link
Contributor

Could you clarify in the patch whether this is based on the parallel construct or the task construct? I believe task had a intermediate wrapper function. Also while reviewing the task construct there were some suggestions to not insert and delete terminators from Michael. I am not suggesting one or the other, but it will be easier to review if it is clarified on which one (among parallel or task) this is based. If there is a reason, that would also be good to know.

@shraiysh
Copy link
Member Author

It is based on the task construct. This also has an intermediate wrapper function - it is inlined at the end though (I can remove it if there's a reason to. I just chose it this way to avoid the extra function. I do not know about the terminators. I will try to look for his comments on the patch for task.

@kiranchandramohan
Copy link
Contributor

Copying the comments about temporary instructions below.

StaleCI->eraseFromParent();
-- 
Meinersbur: Please avoid temporary instructions. Use splitBB instead.
shraiysh: AFAIK, splitBB requires an instruction pointer. I have updated this to erase the temporary instruction immediately after I have split the basic blocks, but I cannot figure out a way to completely eliminate temporary instructions. Is this okay?
Meinersbur:
                BasicBlock *TaskExitBB = splitBB(Builder, "task.exit");
                BasicBlock *TaskBodyBB = splitBB(Builder, "task.body");
                BasicBlock *TaskAllocaBB = splitBB(Builder, "task.alloca");
 Note that the reverse order. After this, Builder will insert instructions before the terminator of Currbb (where you probably don't want to insert instructions here).

@kiranchandramohan
Copy link
Contributor

kiranchandramohan: Is this a change from clang codegen? Why not runtime_call(..., outlined_fn, ...)? Can you say that explicitly in the summary that this is different from Clang if it was changed for a particular reason?

shraiysh: The main reason for a difference is that Outlining generates a function call as shown in the input IR in the comment above. I could not find an easy way to change the arguments of a function after it has been constructed. Parallel does this by tweaking the CodeExtractor used inside outlining - and I did not want to touch outlining internally so that I can confidently rely on outlining to do its job.A nice way to have similar (but not same) behavior as clang would be to add the inline attribute to the outlined function. After an Inline pass, the wrapper function will become the outlined function and it will be almost identical to clang's codegen. I will update the summary to highlight this difference.

Meinersbur: Could you add the function signature differences between wrapper_fn and outlined_fn? I.e. what arguments does wrapper_fn throw away?
Btw, current Clang emits such a wrapper function with -g as well, so I don't see it as an issue. LLVM will always inline a private function with just a single call site, except in -O0 where always_inline would be needed.

@shraiysh
Copy link
Member Author

Thank you! I will fix these.

cor3ntin and others added 13 commits September 16, 2023 11:04
…th a constant rhs

This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs.
Comparison with GCC: https://godbolt.org/z/c5zPdP7j4

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158673
Extend folding for `2^n` euclidean division remainder operations
on signed integers by handling the specific instance in which one
`select` arm has already been replaced by 1.

Reported-By: HypheX

Fixes: llvm#66417.
…lvm#65861)

When a statement following a case label had to be broken into multiple
lines, the continuation parts were not indented correctly.

Old:

```Verilog
case (data)
  16'd0:
    result = // break here
    10'b0111111111;
endcase
```

New:

```Verilog
case (data)
  16'd0:
    result = // break here
        10'b0111111111;
endcase
```

Verilog case labels and the following statements are on the same
unwrapped line due to the difficulty of identifying them. So there was a
rule in `getNewLineColumn` to add a level of indentation to the part
following the case label. However, in case the line had to be broken
again, the code at the end of the function would see that the line was
already broken with the continuation part indented, so it would not
indent it more. Now `State.FirstIndent` is changed as well for the part
following the case label, so the logic for determining when to add a
continuation indentation works.
…lvm#66320)

This will cause the auto-labeler not to run on pull requests with more
than 10 commits. Usually larger pull requests like this are mistakes and
we want to avoid generating an excessive amount of notifications.

It may be possible for legitimate pull requests to have 10 or more
commits from people pushing fixup commits to addresss review comments.
However, these pull requests should already have the correct labels by
the time they grow to 10 commits.
When current state is `CFG_Finalized`, function `validateCFG()` should return true directly.

Reviewed By: maksfb, yota9, Kepontry

Differential Revision: https://reviews.llvm.org/D159410
…tion pointer on win32

See issue llvm#62594

This code does not work on win32:

  auto lstatic = []()  static  { return 0;  };
  int (*f2)(void) = lstatic;

Since a calling convention such as CC_X86ThisCall can rightly interfere with the implicit pointer to function conversion if erroneously marked on a static function, the fix entails checking the 'static' specifier on the lambda declarator prior to assigning it a calling convention of an non-static member (which pre-c++23 made sense).
wangpc-pp and others added 24 commits September 19, 2023 19:51
We should not optimize it in D158062. This adds the test coverage.

And unneeded attributes `nonnull` and `inbounds` are removed.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D159530
The size of ZA depends on the streaming vector length regardless
of the active mode. So in addition to vg (which reports the active
mode) we must send the client svg.

Otherwise the mechanics are the same as for non-streaming SVE.
Use the svg value to update the defined size of ZA, accounting
for the fact that ZA is not a single vector but a suqare matrix.

So if svg is 8, a single streaming vector would be 8*8 = 64 bytes.
ZA is that squared, so 64*64 = 4096 bytes.

Testing is included in a later patch.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D159504
An SME enabled program has the following extra state:
* Streaming mode or non-streaming mode.
* ZA enabled or disabled.
* The active vector length.

Covering the transition between all possible states and all other
possible states is not viable, therefore the testing added here is a cross
section of that, all of which found real bugs in LLDB and the Linux
Kernel during development.

Many of those transitions will not be possible via LLDB
(e.g. disabling ZA) and many more are possible but unlikely to be
used in normal use.

Added testing:
* TestSVEThreadedDynamic now checks for correct SVG values.
* New test TestZAThreadedDynamic creates 3 threads with different ZA sizes
  and states and switches between them verifying the register value
  (derived from the existing threaded SVE test).
* New test TestZARegisterSaveRestore starts in a given SME state, runs a
  set of expressions in various orders, then checks that the original
  state has been restored.
* TestArm64DynamicRegsets has ZA and SVG checks added, including writing
  to ZA to enable it.

Running these tests will as usual require QEMU as there is no
real SME hardware available at this time, and a very recent
kernel.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D159505
Instead of unsetting flags on the instruction, attempting the
fold, and the resetting the flags if it failed, add support to
simplifyWithOpReplaced() to ignore poison-generating flags/metadata
and collect all instructions where they may need to be dropped.

This allows us to perform the fold a) with poison-generating
metadata, which was previously not handled and b) poison-generating
flags/metadata that are not on the root instruction.

Proof for the ctpop case: https://alive2.llvm.org/ce/z/3H3HFs

Fixes llvm#62450.
…vm#66655)

Instead of checking whether the last operation might be a terminator,
always insert operations to the end of the block.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Update handling of math errno. This change updates the logic for
generation of math intrinics in place of math library function calls.
The previous logic https://reviews.llvm.org/D151834 was incorrectly
using intrinsics when math errno handling was needed at optimization
levels above -O0.
This also fixes issue mentioned in https://reviews.llvm.org/D151834 by
@uabelho
This is joint work with @andykaylor Andy.
Currently if Dexter encounters a parser error with a command, the resulting
error message will refer to the most recently declared file (i.e. the source
file it is testing) rather than the file containing the command itself. This
patch fixes this so that parser errors point towards the correct location.
…nfoOp and DataBoundsOp operations to the OpenMP Dialect

This patch adds two new operations:

The first is the DataBoundsOp, which is based on OpenACC's DataBoundsOp,
which holds stride, index, extent, lower bound and upper bounds
which will be used in future follow up patches to perform initial
array sectioning of mapped arrays, and Fortran pointer and
allocatable mapping. Similarly to OpenACC, this new OpenMP
DataBoundsOp also comes with a new OpenMP type, which
helps to restrict operations to accepting only
DataBoundsOp as an input or output where necessary
(or other related operations that implement this type as
a return).

The patch also adds the MapInfoOp which rolls up some of
the old map information stored in target
operations into this new operation, and adds new
information that will be utilised in the lowering of mapped
variables, e.g. the aforementioned DataBoundsOp, but also a
new ByCapture OpenMP MLIR attribute, and isImplicit boolean
attribute. Both the ByCapture and isImplicit arguments will
affect the lowering from the OpenMP dialect to LLVM-IR in
minor but important ways, such as shifting the final maptype
or generating different load/store combinations to maintain
semantics with the OpenMP standard and alignment with the
current Clang OpenMP output as best as possible.

This MapInfoOp operation is slightly based on OpenACC's
DataEntryOp, the main difference other than some slightly
different fields (e,g, isImplicit/MapType/ByCapture) is that
OpenACC's data operations "inherit" (the MLIR ODS
equivalent) from this operation, whereas in OpenMP operations
that utilise MapInfoOp's are composed of/contain them.

A series of these MapInfoOp (one per map clause list item) is
now held by target operations that represent OpenMP
directives that utilise map clauses, e.g. TargetOp. MapInfoOp's
do not have their own specialised lowering to LLVM-IR, instead
the lowering is dependent on the particular container of the
MapInfoOp's, e.g. TargetOp has its own lowering to LLVM-IR
which utilised the information stored inside of MapInfoOp's to
affect it's lowering and the end result of the LLVM-IR generated,
which in turn can differ for host and device.

This patch contains these operations, minor changes to the
printing and parsing to support them, changes to tests (only
those relevant to this segment of the patch, other test
additions and changes are in other dependent
patches in this series) and some alterations to the OpenMPToLLVM
rewriter to support the new OpenMP type and operations.

This patch is one in a series that are dependent on each
other:

https://reviews.llvm.org/D158734
https://reviews.llvm.org/D158735
https://reviews.llvm.org/D158737

Reviewers: kiranchandramohan, TIFitis, razvanlupusoru

Differential Revision: https://reviews.llvm.org/D158732
…ations and tie them to relevant Target operations

This patch builds on top of a prior patch in review which adds a new map
and bounds operation by modifying the OpenMP PFT lowering to support
these operations and generate them from the PFT.

A significant amount of the support for the Bounds operation is borrowed
from OpenACC's own current implementation and lowering, just ported
over to OpenMP.

The patch also adds very preliminary/initial support for lowering to
a new Capture attribute, which is stored on the new Map Operation,
which helps the later lowering from OpenMP -> LLVM IR by indicating
how a map argument should be handled. This capture type will
influence how a map argument is accessed on device and passed by
the host (different load/store handling etc.). It is reflective of a
similar piece of information stored in the Clang AST which performs a
similar role.

As well as some minor adjustments to how the map type (map bitshift
which dictates to the runtime how it should handle an argument) is
generated to further support more use-cases for future patches that
build on this work.

Finally it adds the map entry operation creation and tying it to the relevant
target operations as well as the addition of some new tests and alteration
of previous tests to support the new changes.

Depends on D158732

reviewers: kiranchandramohan, TIFitis, clementval, razvanlupusoru

Differential Revision: https://reviews.llvm.org/D158734
…Entry and declare target globals

This patch is a required change for the device side IR to
maintain apporpiate links for declare target variables to
their global variables for later lowering.

It is also a requirement to clone over map bounds and
entry operations to maintain the correct information for
later lowering of the IR.

It simply tries to clone over the relevant information
maintaining the appropriate links they would have
maintained prior to the pass, rather than redirecting
them to new function arguments which causes a
loss of information in the case of Declare Target
and map information.

Depends on D158734

reviewers: TIFitis, razvanlupusoru

Differential Revision: https://reviews.llvm.org/D158735
…to Bounds and MapEntry operations

This patch adjusts the lower to LLVM-IR inside of
OpenMPToLLVMIRTranslation to faciliate the changes made
to Target related operations to add the new Map related
operations. It also includes adjustments to tests to support
these changes, primarily modifying the MLIR as opposed to
the LLVM-IR, the LLVM-IR should be identical after this patch.

Depends on D158735

Reviewers: kiranchandramohan, TIFitis, razvanlupusoru

Differential Revision: https://reviews.llvm.org/D158737
Currently the naming scheme is a bit funky; the specializations are named
after the original function followed by an arbitrary decimal number. This
makes it hard to debug inlined specializations of recursive functions.
With this patch I am adding ".specialized." in between of the original
name and the suffix, which is now a single increment counter.
Lookup extended instruction numbers in the given instruction set so that
correct names are now emitted for GLSL.std.450 instructions as well as
OpenCL.std.

Add a single test to verify correct abs intrinsic names are emitted when
targetting logical SPIR-V.

Depends on D156424

Differential Revision: https://reviews.llvm.org/D159227
This fixes a bug where functions generated by the MLIR Math dialect, for
example ipowi, would fail to link with link.exe on Windows due to having
linkonce linkage but no associated comdat. Adding the comdat on ELF also
allows linkers to perform better garbage collection in the binary.

Simply adding comdats to all functions with this linkage type should
also cover future cases where linkonce or linkonce_odr functions might
be necessary.
Consider cleanup functions in thread safety analysis.

Differential Revision: https://reviews.llvm.org/D152504
Add Int16, Int64 and Float64 capabilities as always available for Vulkan
(since 1.0), and add tests covering most of the basic types from
clang/test/CodeGenHLSL/basic_types.hlsl except for half floats.

Depends on D156049
… a different module (llvm#66549)

`unw_getcontext` saves the caller's registers in the context. However,
if the caller of `unw_getcontext` is in a different module, the glue
code of `unw_getcontext` sets the TOC register (r2) with the new TOC
base and saves the original TOC register value in the stack frame. This
causes the incorrect TOC value is used when the caller steps up frames,
which fails libunwind LIT test case `unw_resume.pass.cpp`. This PR fixes
the problem by using the original TOC register value saved in the stack
if the caller is in a different module and enables `unw_resume.pass.cpp`
on AIX.
Subsequent PRs will add the scheduling model and support for macro
fusions.
Add a DAG combine to form a masked.store from a masked_strided_store intrinsic
with stride equal to element size. This is the store analogy to PR llvm#65674.

As seen in the tests, this does pickup a few cases that we'd previously missed
due to selection ordering.  We match strided stores early without going through
the recently added generic mscatter combines, and thus weren't recognizing the
unit strided store.
This patch adds a generator for the teams construct. The generated IR looks like the following:
```
current_fn() {
  ...
  call @__kmpc_fork_teams(ptr @Ident, i32 num_args, ptr @outlined_omp_teams, ...args)
  ...
}
outlined_omp_teams(ptr %global_tid, ptr %bound_tid, ...args) {
  ; teams body
}
```

It does this by first generating the body in the current function. Then we outline the
body in a temporary function. We then create the @outlined_omp_teams function and embed
the temporary outlined function in this function. We then emit the call to runtime
function.
@shraiysh shraiysh requested review from a team as code owners September 19, 2023 17:37
@shraiysh
Copy link
Member Author

Apologies for the ping. Errors while resolving merge conflict while rebasing. I will close this PR now to avoid future pings. :)

@shraiysh shraiysh closed this Sep 19, 2023
@shraiysh shraiysh deleted the teams_ompirbuilder_2 branch September 19, 2023 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet