-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMPIRBuilder] Added createTeams
#65785
Conversation
@@ -6106,6 +6106,147 @@ void OpenMPIRBuilder::loadOffloadInfoMetadata(Module &M) { | |||
} | |||
} | |||
|
|||
OpenMPIRBuilder::InsertPointTy | |||
OpenMPIRBuilder::createTeams(const LocationDescription &Loc, | |||
BodyGenCallbackTy BodyGenCB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why we now have basically the same logic 3 times, parallel, tasks, teams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should be a part of the OutlineInfo
as a field of argument mapping or something. I will try to work on it as an independent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a blocker for this patch? I would like to add that separately because I think it would take a bit longer to design and review. Can we please add support for teams with this independent of that improvement for common usage?
Ping for review! |
Could you clarify in the patch whether this is based on the parallel construct or the task construct? I believe task had a intermediate wrapper function. Also while reviewing the task construct there were some suggestions to not insert and delete terminators from Michael. I am not suggesting one or the other, but it will be easier to review if it is clarified on which one (among parallel or task) this is based. If there is a reason, that would also be good to know. |
It is based on the task construct. This also has an intermediate wrapper function - it is inlined at the end though (I can remove it if there's a reason to. I just chose it this way to avoid the extra function. I do not know about the terminators. I will try to look for his comments on the patch for task. |
Copying the comments about temporary instructions below.
|
|
Thank you! I will fix these. |
…th a constant rhs This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs. Comparison with GCC: https://godbolt.org/z/c5zPdP7j4 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158673
Extend folding for `2^n` euclidean division remainder operations on signed integers by handling the specific instance in which one `select` arm has already been replaced by 1. Reported-By: HypheX Fixes: llvm#66417.
…lvm#65861) When a statement following a case label had to be broken into multiple lines, the continuation parts were not indented correctly. Old: ```Verilog case (data) 16'd0: result = // break here 10'b0111111111; endcase ``` New: ```Verilog case (data) 16'd0: result = // break here 10'b0111111111; endcase ``` Verilog case labels and the following statements are on the same unwrapped line due to the difficulty of identifying them. So there was a rule in `getNewLineColumn` to add a level of indentation to the part following the case label. However, in case the line had to be broken again, the code at the end of the function would see that the line was already broken with the continuation part indented, so it would not indent it more. Now `State.FirstIndent` is changed as well for the part following the case label, so the logic for determining when to add a continuation indentation works.
…lvm#66320) This will cause the auto-labeler not to run on pull requests with more than 10 commits. Usually larger pull requests like this are mistakes and we want to avoid generating an excessive amount of notifications. It may be possible for legitimate pull requests to have 10 or more commits from people pushing fixup commits to addresss review comments. However, these pull requests should already have the correct labels by the time they grow to 10 commits.
When current state is `CFG_Finalized`, function `validateCFG()` should return true directly. Reviewed By: maksfb, yota9, Kepontry Differential Revision: https://reviews.llvm.org/D159410
Fix llvm#65061 Reviewed By: maksfb, Amir Differential Revision: https://reviews.llvm.org/D159407
…tion pointer on win32 See issue llvm#62594 This code does not work on win32: auto lstatic = []() static { return 0; }; int (*f2)(void) = lstatic; Since a calling convention such as CC_X86ThisCall can rightly interfere with the implicit pointer to function conversion if erroneously marked on a static function, the fix entails checking the 'static' specifier on the lambda declarator prior to assigning it a calling convention of an non-static member (which pre-c++23 made sense).
We should not optimize it in D158062. This adds the test coverage. And unneeded attributes `nonnull` and `inbounds` are removed. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D159530
The size of ZA depends on the streaming vector length regardless of the active mode. So in addition to vg (which reports the active mode) we must send the client svg. Otherwise the mechanics are the same as for non-streaming SVE. Use the svg value to update the defined size of ZA, accounting for the fact that ZA is not a single vector but a suqare matrix. So if svg is 8, a single streaming vector would be 8*8 = 64 bytes. ZA is that squared, so 64*64 = 4096 bytes. Testing is included in a later patch. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D159504
An SME enabled program has the following extra state: * Streaming mode or non-streaming mode. * ZA enabled or disabled. * The active vector length. Covering the transition between all possible states and all other possible states is not viable, therefore the testing added here is a cross section of that, all of which found real bugs in LLDB and the Linux Kernel during development. Many of those transitions will not be possible via LLDB (e.g. disabling ZA) and many more are possible but unlikely to be used in normal use. Added testing: * TestSVEThreadedDynamic now checks for correct SVG values. * New test TestZAThreadedDynamic creates 3 threads with different ZA sizes and states and switches between them verifying the register value (derived from the existing threaded SVE test). * New test TestZARegisterSaveRestore starts in a given SME state, runs a set of expressions in various orders, then checks that the original state has been restored. * TestArm64DynamicRegsets has ZA and SVG checks added, including writing to ZA to enable it. Running these tests will as usual require QEMU as there is no real SME hardware available at this time, and a very recent kernel. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D159505
Instead of unsetting flags on the instruction, attempting the fold, and the resetting the flags if it failed, add support to simplifyWithOpReplaced() to ignore poison-generating flags/metadata and collect all instructions where they may need to be dropped. This allows us to perform the fold a) with poison-generating metadata, which was previously not handled and b) poison-generating flags/metadata that are not on the root instruction. Proof for the ctpop case: https://alive2.llvm.org/ce/z/3H3HFs Fixes llvm#62450.
…vm#66655) Instead of checking whether the last operation might be a terminator, always insert operations to the end of the block. Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Update handling of math errno. This change updates the logic for generation of math intrinics in place of math library function calls. The previous logic https://reviews.llvm.org/D151834 was incorrectly using intrinsics when math errno handling was needed at optimization levels above -O0. This also fixes issue mentioned in https://reviews.llvm.org/D151834 by @uabelho This is joint work with @andykaylor Andy.
Currently if Dexter encounters a parser error with a command, the resulting error message will refer to the most recently declared file (i.e. the source file it is testing) rather than the file containing the command itself. This patch fixes this so that parser errors point towards the correct location.
…nfoOp and DataBoundsOp operations to the OpenMP Dialect This patch adds two new operations: The first is the DataBoundsOp, which is based on OpenACC's DataBoundsOp, which holds stride, index, extent, lower bound and upper bounds which will be used in future follow up patches to perform initial array sectioning of mapped arrays, and Fortran pointer and allocatable mapping. Similarly to OpenACC, this new OpenMP DataBoundsOp also comes with a new OpenMP type, which helps to restrict operations to accepting only DataBoundsOp as an input or output where necessary (or other related operations that implement this type as a return). The patch also adds the MapInfoOp which rolls up some of the old map information stored in target operations into this new operation, and adds new information that will be utilised in the lowering of mapped variables, e.g. the aforementioned DataBoundsOp, but also a new ByCapture OpenMP MLIR attribute, and isImplicit boolean attribute. Both the ByCapture and isImplicit arguments will affect the lowering from the OpenMP dialect to LLVM-IR in minor but important ways, such as shifting the final maptype or generating different load/store combinations to maintain semantics with the OpenMP standard and alignment with the current Clang OpenMP output as best as possible. This MapInfoOp operation is slightly based on OpenACC's DataEntryOp, the main difference other than some slightly different fields (e,g, isImplicit/MapType/ByCapture) is that OpenACC's data operations "inherit" (the MLIR ODS equivalent) from this operation, whereas in OpenMP operations that utilise MapInfoOp's are composed of/contain them. A series of these MapInfoOp (one per map clause list item) is now held by target operations that represent OpenMP directives that utilise map clauses, e.g. TargetOp. MapInfoOp's do not have their own specialised lowering to LLVM-IR, instead the lowering is dependent on the particular container of the MapInfoOp's, e.g. TargetOp has its own lowering to LLVM-IR which utilised the information stored inside of MapInfoOp's to affect it's lowering and the end result of the LLVM-IR generated, which in turn can differ for host and device. This patch contains these operations, minor changes to the printing and parsing to support them, changes to tests (only those relevant to this segment of the patch, other test additions and changes are in other dependent patches in this series) and some alterations to the OpenMPToLLVM rewriter to support the new OpenMP type and operations. This patch is one in a series that are dependent on each other: https://reviews.llvm.org/D158734 https://reviews.llvm.org/D158735 https://reviews.llvm.org/D158737 Reviewers: kiranchandramohan, TIFitis, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158732
…ations and tie them to relevant Target operations This patch builds on top of a prior patch in review which adds a new map and bounds operation by modifying the OpenMP PFT lowering to support these operations and generate them from the PFT. A significant amount of the support for the Bounds operation is borrowed from OpenACC's own current implementation and lowering, just ported over to OpenMP. The patch also adds very preliminary/initial support for lowering to a new Capture attribute, which is stored on the new Map Operation, which helps the later lowering from OpenMP -> LLVM IR by indicating how a map argument should be handled. This capture type will influence how a map argument is accessed on device and passed by the host (different load/store handling etc.). It is reflective of a similar piece of information stored in the Clang AST which performs a similar role. As well as some minor adjustments to how the map type (map bitshift which dictates to the runtime how it should handle an argument) is generated to further support more use-cases for future patches that build on this work. Finally it adds the map entry operation creation and tying it to the relevant target operations as well as the addition of some new tests and alteration of previous tests to support the new changes. Depends on D158732 reviewers: kiranchandramohan, TIFitis, clementval, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158734
…Entry and declare target globals This patch is a required change for the device side IR to maintain apporpiate links for declare target variables to their global variables for later lowering. It is also a requirement to clone over map bounds and entry operations to maintain the correct information for later lowering of the IR. It simply tries to clone over the relevant information maintaining the appropriate links they would have maintained prior to the pass, rather than redirecting them to new function arguments which causes a loss of information in the case of Declare Target and map information. Depends on D158734 reviewers: TIFitis, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158735
…to Bounds and MapEntry operations This patch adjusts the lower to LLVM-IR inside of OpenMPToLLVMIRTranslation to faciliate the changes made to Target related operations to add the new Map related operations. It also includes adjustments to tests to support these changes, primarily modifying the MLIR as opposed to the LLVM-IR, the LLVM-IR should be identical after this patch. Depends on D158735 Reviewers: kiranchandramohan, TIFitis, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158737
Currently the naming scheme is a bit funky; the specializations are named after the original function followed by an arbitrary decimal number. This makes it hard to debug inlined specializations of recursive functions. With this patch I am adding ".specialized." in between of the original name and the suffix, which is now a single increment counter.
Lookup extended instruction numbers in the given instruction set so that correct names are now emitted for GLSL.std.450 instructions as well as OpenCL.std. Add a single test to verify correct abs intrinsic names are emitted when targetting logical SPIR-V. Depends on D156424 Differential Revision: https://reviews.llvm.org/D159227
This fixes a bug where functions generated by the MLIR Math dialect, for example ipowi, would fail to link with link.exe on Windows due to having linkonce linkage but no associated comdat. Adding the comdat on ELF also allows linkers to perform better garbage collection in the binary. Simply adding comdats to all functions with this linkage type should also cover future cases where linkonce or linkonce_odr functions might be necessary.
Consider cleanup functions in thread safety analysis. Differential Revision: https://reviews.llvm.org/D152504
Add Int16, Int64 and Float64 capabilities as always available for Vulkan (since 1.0), and add tests covering most of the basic types from clang/test/CodeGenHLSL/basic_types.hlsl except for half floats. Depends on D156049
… a different module (llvm#66549) `unw_getcontext` saves the caller's registers in the context. However, if the caller of `unw_getcontext` is in a different module, the glue code of `unw_getcontext` sets the TOC register (r2) with the new TOC base and saves the original TOC register value in the stack frame. This causes the incorrect TOC value is used when the caller steps up frames, which fails libunwind LIT test case `unw_resume.pass.cpp`. This PR fixes the problem by using the original TOC register value saved in the stack if the caller is in a different module and enables `unw_resume.pass.cpp` on AIX.
Subsequent PRs will add the scheduling model and support for macro fusions.
Add a DAG combine to form a masked.store from a masked_strided_store intrinsic with stride equal to element size. This is the store analogy to PR llvm#65674. As seen in the tests, this does pickup a few cases that we'd previously missed due to selection ordering. We match strided stores early without going through the recently added generic mscatter combines, and thus weren't recognizing the unit strided store.
This patch adds a generator for the teams construct. The generated IR looks like the following: ``` current_fn() { ... call @__kmpc_fork_teams(ptr @Ident, i32 num_args, ptr @outlined_omp_teams, ...args) ... } outlined_omp_teams(ptr %global_tid, ptr %bound_tid, ...args) { ; teams body } ``` It does this by first generating the body in the current function. Then we outline the body in a temporary function. We then create the @outlined_omp_teams function and embed the temporary outlined function in this function. We then emit the call to runtime function.
…ct into teams_ompirbuilder_2
Apologies for the ping. Errors while resolving merge conflict while rebasing. I will close this PR now to avoid future pings. :) |
This patch adds a generator for the teams construct. The generated IR looks like the following:
It does this by first generating the body in the current function. Then we outline the body in a temporary function. We then create the @outlined_omp_teams function and embed the temporary outlined function in this function. We then emit the call to runtime function.