[OpenMPIRBuilder] Added `createTeams` #65785

shraiysh · 2023-09-08T17:59:59Z

This patch adds a generator for the teams construct. The generated IR looks like the following:

current_fn() {
  ...
  call @__kmpc_fork_teams(ptr @ident, i32 num_args, ptr @outlined_omp_teams, ...args)
  ...
}
outlined_omp_teams(ptr %global_tid, ptr %bound_tid, ...args) {
  ; teams body
}

It does this by first generating the body in the current function. Then we outline the body in a temporary function. We then create the @outlined_omp_teams function and embed the temporary outlined function in this function. We then emit the call to runtime function.

jdoerfert · 2023-09-09T01:34:48Z

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

@@ -6106,6 +6106,147 @@ void OpenMPIRBuilder::loadOffloadInfoMetadata(Module &M) {
  }
 }

+OpenMPIRBuilder::InsertPointTy
+OpenMPIRBuilder::createTeams(const LocationDescription &Loc,
+                             BodyGenCallbackTy BodyGenCB) {


I still don't understand why we now have basically the same logic 3 times, parallel, tasks, teams.

IMO this should be a part of the OutlineInfo as a field of argument mapping or something. I will try to work on it as an independent PR.

Is this a blocker for this patch? I would like to add that separately because I think it would take a bit longer to design and review. Can we please add support for teams with this independent of that improvement for common usage?

shraiysh · 2023-09-13T19:10:36Z

Ping for review!

kiranchandramohan · 2023-09-15T17:28:58Z

Could you clarify in the patch whether this is based on the parallel construct or the task construct? I believe task had a intermediate wrapper function. Also while reviewing the task construct there were some suggestions to not insert and delete terminators from Michael. I am not suggesting one or the other, but it will be easier to review if it is clarified on which one (among parallel or task) this is based. If there is a reason, that would also be good to know.

shraiysh · 2023-09-15T17:33:13Z

It is based on the task construct. This also has an intermediate wrapper function - it is inlined at the end though (I can remove it if there's a reason to. I just chose it this way to avoid the extra function. I do not know about the terminators. I will try to look for his comments on the patch for task.

kiranchandramohan · 2023-09-15T17:47:48Z

Copying the comments about temporary instructions below.

StaleCI->eraseFromParent();
-- 
Meinersbur: Please avoid temporary instructions. Use splitBB instead.
shraiysh: AFAIK, splitBB requires an instruction pointer. I have updated this to erase the temporary instruction immediately after I have split the basic blocks, but I cannot figure out a way to completely eliminate temporary instructions. Is this okay?
Meinersbur:
                BasicBlock *TaskExitBB = splitBB(Builder, "task.exit");
                BasicBlock *TaskBodyBB = splitBB(Builder, "task.body");
                BasicBlock *TaskAllocaBB = splitBB(Builder, "task.alloca");
 Note that the reverse order. After this, Builder will insert instructions before the terminator of Currbb (where you probably don't want to insert instructions here).

kiranchandramohan · 2023-09-15T17:50:16Z

kiranchandramohan: Is this a change from clang codegen? Why not runtime_call(..., outlined_fn, ...)? Can you say that explicitly in the summary that this is different from Clang if it was changed for a particular reason?

shraiysh: The main reason for a difference is that Outlining generates a function call as shown in the input IR in the comment above. I could not find an easy way to change the arguments of a function after it has been constructed. Parallel does this by tweaking the CodeExtractor used inside outlining - and I did not want to touch outlining internally so that I can confidently rely on outlining to do its job.A nice way to have similar (but not same) behavior as clang would be to add the inline attribute to the outlined function. After an Inline pass, the wrapper function will become the outlined function and it will be almost identical to clang's codegen. I will update the summary to highlight this difference.

Meinersbur: Could you add the function signature differences between wrapper_fn and outlined_fn? I.e. what arguments does wrapper_fn throw away?
Btw, current Clang emits such a wrapper function with -g as well, so I don't see it as an issue. LLVM will always inline a private function with just a single call site, except in -O0 where always_inline would be needed.

shraiysh · 2023-09-15T19:50:36Z

Thank you! I will fix these.

…th a constant rhs This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs. Comparison with GCC: https://godbolt.org/z/c5zPdP7j4 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158673

…egerValue`. NFC. See also https://reviews.llvm.org/D156238#inline-1546774

Extend folding for `2^n` euclidean division remainder operations on signed integers by handling the specific instance in which one `select` arm has already been replaced by 1. Reported-By: HypheX Fixes: llvm#66417.

…lvm#65861) When a statement following a case label had to be broken into multiple lines, the continuation parts were not indented correctly. Old: ```Verilog case (data) 16'd0: result = // break here 10'b0111111111; endcase ``` New: ```Verilog case (data) 16'd0: result = // break here 10'b0111111111; endcase ``` Verilog case labels and the following statements are on the same unwrapped line due to the difficulty of identifying them. So there was a rule in `getNewLineColumn` to add a level of indentation to the part following the case label. However, in case the line had to be broken again, the code at the end of the function would see that the line was already broken with the continuation part indented, so it would not indent it more. Now `State.FirstIndent` is changed as well for the part following the case label, so the logic for determining when to add a continuation indentation works.

…lvm#66320) This will cause the auto-labeler not to run on pull requests with more than 10 commits. Usually larger pull requests like this are mistakes and we want to avoid generating an excessive amount of notifications. It may be possible for legitimate pull requests to have 10 or more commits from people pushing fixup commits to addresss review comments. However, these pull requests should already have the correct labels by the time they grow to 10 commits.

When current state is `CFG_Finalized`, function `validateCFG()` should return true directly. Reviewed By: maksfb, yota9, Kepontry Differential Revision: https://reviews.llvm.org/D159410

Fix llvm#65061 Reviewed By: maksfb, Amir Differential Revision: https://reviews.llvm.org/D159407

…tion pointer on win32 See issue llvm#62594 This code does not work on win32: auto lstatic = []() static { return 0; }; int (*f2)(void) = lstatic; Since a calling convention such as CC_X86ThisCall can rightly interfere with the implicit pointer to function conversion if erroneously marked on a static function, the fix entails checking the 'static' specifier on the lambda declarator prior to assigning it a calling convention of an non-static member (which pre-c++23 made sense).

We should not optimize it in D158062. This adds the test coverage. And unneeded attributes `nonnull` and `inbounds` are removed. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D159530

The size of ZA depends on the streaming vector length regardless of the active mode. So in addition to vg (which reports the active mode) we must send the client svg. Otherwise the mechanics are the same as for non-streaming SVE. Use the svg value to update the defined size of ZA, accounting for the fact that ZA is not a single vector but a suqare matrix. So if svg is 8, a single streaming vector would be 8*8 = 64 bytes. ZA is that squared, so 64*64 = 4096 bytes. Testing is included in a later patch. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D159504

An SME enabled program has the following extra state: * Streaming mode or non-streaming mode. * ZA enabled or disabled. * The active vector length. Covering the transition between all possible states and all other possible states is not viable, therefore the testing added here is a cross section of that, all of which found real bugs in LLDB and the Linux Kernel during development. Many of those transitions will not be possible via LLDB (e.g. disabling ZA) and many more are possible but unlikely to be used in normal use. Added testing: * TestSVEThreadedDynamic now checks for correct SVG values. * New test TestZAThreadedDynamic creates 3 threads with different ZA sizes and states and switches between them verifying the register value (derived from the existing threaded SVE test). * New test TestZARegisterSaveRestore starts in a given SME state, runs a set of expressions in various orders, then checks that the original state has been restored. * TestArm64DynamicRegsets has ZA and SVG checks added, including writing to ZA to enable it. Running these tests will as usual require QEMU as there is no real SME hardware available at this time, and a very recent kernel. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D159505

Instead of unsetting flags on the instruction, attempting the fold, and the resetting the flags if it failed, add support to simplifyWithOpReplaced() to ignore poison-generating flags/metadata and collect all instructions where they may need to be dropped. This allows us to perform the fold a) with poison-generating metadata, which was previously not handled and b) poison-generating flags/metadata that are not on the root instruction. Proof for the ctpop case: https://alive2.llvm.org/ce/z/3H3HFs Fixes llvm#62450.

…vm#66655) Instead of checking whether the last operation might be a terminator, always insert operations to the end of the block. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

@andykaylor

Update handling of math errno. This change updates the logic for generation of math intrinics in place of math library function calls. The previous logic https://reviews.llvm.org/D151834 was incorrectly using intrinsics when math errno handling was needed at optimization levels above -O0. This also fixes issue mentioned in https://reviews.llvm.org/D151834 by @uabelho This is joint work with @andykaylor Andy.

Currently if Dexter encounters a parser error with a command, the resulting error message will refer to the most recently declared file (i.e. the source file it is testing) rather than the file containing the command itself. This patch fixes this so that parser errors point towards the correct location.

…nfoOp and DataBoundsOp operations to the OpenMP Dialect This patch adds two new operations: The first is the DataBoundsOp, which is based on OpenACC's DataBoundsOp, which holds stride, index, extent, lower bound and upper bounds which will be used in future follow up patches to perform initial array sectioning of mapped arrays, and Fortran pointer and allocatable mapping. Similarly to OpenACC, this new OpenMP DataBoundsOp also comes with a new OpenMP type, which helps to restrict operations to accepting only DataBoundsOp as an input or output where necessary (or other related operations that implement this type as a return). The patch also adds the MapInfoOp which rolls up some of the old map information stored in target operations into this new operation, and adds new information that will be utilised in the lowering of mapped variables, e.g. the aforementioned DataBoundsOp, but also a new ByCapture OpenMP MLIR attribute, and isImplicit boolean attribute. Both the ByCapture and isImplicit arguments will affect the lowering from the OpenMP dialect to LLVM-IR in minor but important ways, such as shifting the final maptype or generating different load/store combinations to maintain semantics with the OpenMP standard and alignment with the current Clang OpenMP output as best as possible. This MapInfoOp operation is slightly based on OpenACC's DataEntryOp, the main difference other than some slightly different fields (e,g, isImplicit/MapType/ByCapture) is that OpenACC's data operations "inherit" (the MLIR ODS equivalent) from this operation, whereas in OpenMP operations that utilise MapInfoOp's are composed of/contain them. A series of these MapInfoOp (one per map clause list item) is now held by target operations that represent OpenMP directives that utilise map clauses, e.g. TargetOp. MapInfoOp's do not have their own specialised lowering to LLVM-IR, instead the lowering is dependent on the particular container of the MapInfoOp's, e.g. TargetOp has its own lowering to LLVM-IR which utilised the information stored inside of MapInfoOp's to affect it's lowering and the end result of the LLVM-IR generated, which in turn can differ for host and device. This patch contains these operations, minor changes to the printing and parsing to support them, changes to tests (only those relevant to this segment of the patch, other test additions and changes are in other dependent patches in this series) and some alterations to the OpenMPToLLVM rewriter to support the new OpenMP type and operations. This patch is one in a series that are dependent on each other: https://reviews.llvm.org/D158734 https://reviews.llvm.org/D158735 https://reviews.llvm.org/D158737 Reviewers: kiranchandramohan, TIFitis, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158732

…ations and tie them to relevant Target operations This patch builds on top of a prior patch in review which adds a new map and bounds operation by modifying the OpenMP PFT lowering to support these operations and generate them from the PFT. A significant amount of the support for the Bounds operation is borrowed from OpenACC's own current implementation and lowering, just ported over to OpenMP. The patch also adds very preliminary/initial support for lowering to a new Capture attribute, which is stored on the new Map Operation, which helps the later lowering from OpenMP -> LLVM IR by indicating how a map argument should be handled. This capture type will influence how a map argument is accessed on device and passed by the host (different load/store handling etc.). It is reflective of a similar piece of information stored in the Clang AST which performs a similar role. As well as some minor adjustments to how the map type (map bitshift which dictates to the runtime how it should handle an argument) is generated to further support more use-cases for future patches that build on this work. Finally it adds the map entry operation creation and tying it to the relevant target operations as well as the addition of some new tests and alteration of previous tests to support the new changes. Depends on D158732 reviewers: kiranchandramohan, TIFitis, clementval, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158734

…Entry and declare target globals This patch is a required change for the device side IR to maintain apporpiate links for declare target variables to their global variables for later lowering. It is also a requirement to clone over map bounds and entry operations to maintain the correct information for later lowering of the IR. It simply tries to clone over the relevant information maintaining the appropriate links they would have maintained prior to the pass, rather than redirecting them to new function arguments which causes a loss of information in the case of Declare Target and map information. Depends on D158734 reviewers: TIFitis, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158735

…to Bounds and MapEntry operations This patch adjusts the lower to LLVM-IR inside of OpenMPToLLVMIRTranslation to faciliate the changes made to Target related operations to add the new Map related operations. It also includes adjustments to tests to support these changes, primarily modifying the MLIR as opposed to the LLVM-IR, the LLVM-IR should be identical after this patch. Depends on D158735 Reviewers: kiranchandramohan, TIFitis, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158737

Currently the naming scheme is a bit funky; the specializations are named after the original function followed by an arbitrary decimal number. This makes it hard to debug inlined specializations of recursive functions. With this patch I am adding ".specialized." in between of the original name and the suffix, which is now a single increment counter.

Lookup extended instruction numbers in the given instruction set so that correct names are now emitted for GLSL.std.450 instructions as well as OpenCL.std. Add a single test to verify correct abs intrinsic names are emitted when targetting logical SPIR-V. Depends on D156424 Differential Revision: https://reviews.llvm.org/D159227

This fixes a bug where functions generated by the MLIR Math dialect, for example ipowi, would fail to link with link.exe on Windows due to having linkonce linkage but no associated comdat. Adding the comdat on ELF also allows linkers to perform better garbage collection in the binary. Simply adding comdats to all functions with this linkage type should also cover future cases where linkonce or linkonce_odr functions might be necessary.

Consider cleanup functions in thread safety analysis. Differential Revision: https://reviews.llvm.org/D152504

Add Int16, Int64 and Float64 capabilities as always available for Vulkan (since 1.0), and add tests covering most of the basic types from clang/test/CodeGenHLSL/basic_types.hlsl except for half floats. Depends on D156049

… a different module (llvm#66549) `unw_getcontext` saves the caller's registers in the context. However, if the caller of `unw_getcontext` is in a different module, the glue code of `unw_getcontext` sets the TOC register (r2) with the new TOC base and saves the original TOC register value in the stack frame. This causes the incorrect TOC value is used when the caller steps up frames, which fails libunwind LIT test case `unw_resume.pass.cpp`. This PR fixes the problem by using the original TOC register value saved in the stack if the caller is in a different module and enables `unw_resume.pass.cpp` on AIX.

Subsequent PRs will add the scheduling model and support for macro fusions.

Add a DAG combine to form a masked.store from a masked_strided_store intrinsic with stride equal to element size. This is the store analogy to PR llvm#65674. As seen in the tests, this does pickup a few cases that we'd previously missed due to selection ordering. We match strided stores early without going through the recently added generic mscatter combines, and thus weren't recognizing the unit strided store.

@Ident

This patch adds a generator for the teams construct. The generated IR looks like the following: ``` current_fn() { ... call @__kmpc_fork_teams(ptr @Ident, i32 num_args, ptr @outlined_omp_teams, ...args) ... } outlined_omp_teams(ptr %global_tid, ptr %bound_tid, ...args) { ; teams body } ``` It does this by first generating the body in the current function. Then we outline the body in a temporary function. We then create the @outlined_omp_teams function and embed the temporary outlined function in this function. We then emit the call to runtime function.

…ct into teams_ompirbuilder_2

shraiysh · 2023-09-19T17:37:57Z

Apologies for the ping. Errors while resolving merge conflict while rebasing. I will close this PR now to avoid future pings. :)

shraiysh requested review from Meinersbur, skatrak, kiranchandramohan, jdoerfert, raghavendhra, DominikAdamski, dpalermo and kiranktp September 8, 2023 18:00

shraiysh requested a review from a team as a code owner September 8, 2023 18:00

github-actions bot added the flang:openmp label Sep 8, 2023

jdoerfert reviewed Sep 9, 2023

View reviewed changes

cor3ntin and others added 13 commits September 16, 2023 11:04

[Github] Do not label clang-format files as clang (llvm#66388)

f3796ac

[InstCombine] Use ConstantInt::getBool instead of `Constant::getInt…

5163319

…egerValue`. NFC. See also https://reviews.llvm.org/D156238#inline-1546774

[InstCombine] Handle constant arms in select of srem fold

ce5b88b

Extend folding for `2^n` euclidean division remainder operations on signed integers by handling the specific instance in which one `select` arm has already been replaced by 1. Reported-By: HypheX Fixes: llvm#66417.

[Github] Fix yml syntax

a59870a

[InstSimplify] Add pre-commit tests for PR65910. NFC.

bce6c53

[BOLT] Skip the validation of CFG after it is finalized

16fd879

When current state is `CFG_Finalized`, function `validateCFG()` should return true directly. Reviewed By: maksfb, yota9, Kepontry Differential Revision: https://reviews.llvm.org/D159410

[BOLT] Incorporate umask into the output file permission

473b9dd

Fix llvm#65061 Reviewed By: maksfb, Amir Differential Revision: https://reviews.llvm.org/D159407

[gn] port ac182de (RISCVGISel.td)

8049db0

[libc++] implement std::jthread

695138c

wangpc-pp and others added 24 commits September 19, 2023 19:51

[RISCV] Add tests for memory constraint A

61d819d

We should not optimize it in D158062. This adds the test coverage. And unneeded attributes `nonnull` and `inbounds` are removed. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D159530

[InstCombine] Add test for llvm#62450 (NFC)

f502ab7

[IR] SingleBlock::push_back operations to the back of the block (ll…

6d1c6ec

…vm#66655) Instead of checking whether the last operation might be a terminator, always insert operations to the end of the block. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

[bazel] Port for 571df01

00ecef9

[clang][TSA] Thread safety cleanup functions

cf8e189

Consider cleanup functions in thread safety analysis. Differential Revision: https://reviews.llvm.org/D152504

[SPIRV] Test basic float and int types (llvm#66282)

116f7a2

Add Int16, Int64 and Float64 capabilities as always available for Vulkan (since 1.0), and add tests covering most of the basic types from clang/test/CodeGenHLSL/basic_types.hlsl except for half floats. Depends on D156049

[mlgo] Fix tests post PR llvm#66334

a21d4ab

[RISCV] Recognize veyron-v1 processor in clang driver. (llvm#66703)

8f8f449

Subsequent PRs will add the scheduling model and support for macro fusions.

Merge branch 'teams_ompirbuilder_2' of github.com:shraiysh/llvm-proje…

2d3f29f

…ct into teams_ompirbuilder_2

shraiysh requested review from a team as code owners September 19, 2023 17:37

shraiysh closed this Sep 19, 2023

shraiysh deleted the teams_ompirbuilder_2 branch September 19, 2023 20:01

shraiysh mentioned this pull request Sep 19, 2023

[OMPIRBuilder] Added createTeams #66807

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenMPIRBuilder] Added `createTeams` #65785

[OpenMPIRBuilder] Added `createTeams` #65785

shraiysh commented Sep 8, 2023

jdoerfert Sep 9, 2023

shraiysh Sep 9, 2023

shraiysh Sep 9, 2023

shraiysh commented Sep 13, 2023

kiranchandramohan commented Sep 15, 2023

shraiysh commented Sep 15, 2023

kiranchandramohan commented Sep 15, 2023

kiranchandramohan commented Sep 15, 2023

shraiysh commented Sep 15, 2023

shraiysh commented Sep 19, 2023

[OpenMPIRBuilder] Added createTeams #65785

[OpenMPIRBuilder] Added createTeams #65785

Conversation

shraiysh commented Sep 8, 2023

jdoerfert Sep 9, 2023

Choose a reason for hiding this comment

shraiysh Sep 9, 2023

Choose a reason for hiding this comment

shraiysh Sep 9, 2023

Choose a reason for hiding this comment

shraiysh commented Sep 13, 2023

kiranchandramohan commented Sep 15, 2023

shraiysh commented Sep 15, 2023

kiranchandramohan commented Sep 15, 2023

kiranchandramohan commented Sep 15, 2023

shraiysh commented Sep 15, 2023

shraiysh commented Sep 19, 2023

[OpenMPIRBuilder] Added `createTeams` #65785

[OpenMPIRBuilder] Added `createTeams` #65785