[SYCLomatic][Pulldown] Fix pulldown conflict from sycl to SYCLomatic#2127
Merged
zhimingwang36 merged 628 commits intooneapi-src:SYCLomaticfrom Jul 4, 2024
Merged
[SYCLomatic][Pulldown] Fix pulldown conflict from sycl to SYCLomatic#2127zhimingwang36 merged 628 commits intooneapi-src:SYCLomaticfrom
zhimingwang36 merged 628 commits intooneapi-src:SYCLomaticfrom
Conversation
…inearize (#92370) Building on top of [#88204](llvm/llvm-project#88204), this PR adds support for converting `vector.insert` into an equivalent `vector.shuffle` operation that operates on linearized (1-D) vectors.
The pass constructor can be generated automatically. This pass is module-level and then runs on all relevant intrinsic operations inside of the module, no matter what top level operation they are inside of.
…nd in dropUnitDims pass. (#93317)
`mlir-opt --linalg-fold-unit-extent-dims` pass on the following IR
```
#map = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 + d4, d2 + d5, d6)>
#map1 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)>
#map2 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)>
module {
func.func @main(%arg0: tensor<1x?x?x1xf32>, %arg1: index) -> tensor<?x1x61x1xf32> {
%cst = arith.constant dense<1.000000e+00> : tensor<1x1x1x1xf32>
%0 = tensor.empty(%arg1) : tensor<?x1x61x1xf32>
%1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %cst : tensor<1x?x?x1xf32>, tensor<1x1x1x1xf32>) outs(%0 : tensor<?x1x61x1xf32>) {
^bb0(%in: f32, %in_0: f32, %out: f32):
%2 = arith.mulf %in, %in_0 : f32
%3 = arith.addf %out, %2 : f32
linalg.yield %3 : f32
} -> tensor<?x1x61x1xf32>
return %1 : tensor<?x1x61x1xf32>
}
}
```
produces an incorrect tensor.expand_shape operation:
```
error: 'tensor.expand_shape' op expected dimension 0 of collapsed type to be dynamic since one or more of the corresponding dimensions in the expanded type is dynamic
%1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %cst : tensor<1x?x?x1xf32>, tensor<1x1x1x1xf32>) outs(%0 : tensor<?x1x61x1xf32>) {
^
/mathworks/devel/sandbox/sayans/geckWorks/g3294570/repro.mlir:8:10: note: see current operation: %5 = "tensor.expand_shape"(%4) <{reassociation = [[0, 1, 2, 3]]}> : (tensor<61xf32>) -> tensor<?x1x61x1xf32>
// -----// IR Dump After LinalgFoldUnitExtentDimsPass Failed (linalg-fold-unit-extent-dims) //----- //
#map = affine_map<(d0) -> (0, d0)>
#map1 = affine_map<(d0) -> ()>
#map2 = affine_map<(d0) -> (d0)>
"builtin.module"() ({
"func.func"() <{function_type = (tensor<1x?x?x1xf32>, index) -> tensor<?x1x61x1xf32>, sym_name = "main"}> ({
^bb0(%arg0: tensor<1x?x?x1xf32>, %arg1: index):
%0 = "arith.constant"() <{value = dense<1.000000e+00> : tensor<f32>}> : () -> tensor<f32>
%1 = "tensor.collapse_shape"(%arg0) <{reassociation = [[0, 1], [2, 3]]}> : (tensor<1x?x?x1xf32>) -> tensor<?x?xf32>
%2 = "tensor.empty"() : () -> tensor<61xf32>
%3 = "tensor.empty"() : () -> tensor<61xf32>
%4 = "linalg.generic"(%1, %0, %2, %3) <{indexing_maps = [#map, #map1, #map2, #map2], iterator_types = [#linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 3, 1>}> ({
^bb0(%arg2: f32, %arg3: f32, %arg4: f32, %arg5: f32):
%6 = "arith.mulf"(%arg2, %arg3) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
%7 = "arith.addf"(%arg4, %6) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
"linalg.yield"(%7) : (f32) -> ()
}) : (tensor<?x?xf32>, tensor<f32>, tensor<61xf32>, tensor<61xf32>) -> tensor<61xf32>
%5 = "tensor.expand_shape"(%4) <{reassociation = [[0, 1, 2, 3]]}> : (tensor<61xf32>) -> tensor<?x1x61x1xf32>
"func.return"(%5) : (tensor<?x1x61x1xf32>) -> ()
}) : () -> ()
}) : () -> ()
```
The reason of this is because the dimension `d0` is determined to be an
unit-dim that can be dropped based on the dimensions of operand `arg0`
to `linalg.generic`. Later on when iterating over operand `outs` the
dimension `d0` is determined to be an unit-dim even though the shape
corresponding to it is `Shape::kDynamic`. For the `linalg.generic` to be
valid `d0` of `outs` does need to be `1` but that isn't properly
processed in the current implementation and the dimension is dropped
resulting in `outs` operand to be `tensor<61xf32>` in the example.
The fix is to also check that the dimension shape is actually `1` before
dropping the dimension. The IR after the fix is:
```
#map = affine_map<()[s0, s1] -> (s0 * s1)>
#map1 = affine_map<(d0) -> (0, d0)>
#map2 = affine_map<(d0) -> ()>
module {
func.func @main(%arg0: tensor<1x?x?x1xf32>, %arg1: index) -> tensor<?x1x61x1xf32> {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%cst = arith.constant dense<1.000000e+00> : tensor<f32>
%collapsed = tensor.collapse_shape %arg0 [[0, 1], [2, 3]] : tensor<1x?x?x1xf32> into tensor<?x?xf32>
%0 = tensor.empty(%arg1) : tensor<?x61xf32>
%1 = affine.apply #map()[%arg1, %c1]
%2 = tensor.empty(%1) : tensor<?x61xf32>
%3 = linalg.generic {indexing_maps = [#map1, #map2, #map1, #map1], iterator_types = ["parallel"]} ins(%collapsed, %cst, %0 : tensor<?x?xf32>, tensor<f32>, tensor<?x61xf32>) outs(%2 : tensor<?x61xf32>) {
^bb0(%in: f32, %in_0: f32, %in_1: f32, %out: f32):
%4 = arith.mulf %in, %in_0 : f32
%5 = arith.addf %in_1, %4 : f32
linalg.yield %5 : f32
} -> tensor<?x61xf32>
%expanded = tensor.expand_shape %3 [[0, 1], [2, 3]] output_shape [%c0, 1, 61, 1] : tensor<?x61xf32> into tensor<?x1x61x1xf32>
return %expanded : tensor<?x1x61x1xf32>
}
}
```
Clang has some unwritten rules about diagnostic wording regarding things like punctuation and capitalization. This patch documents those rules and adds some tablegen support for checking diagnostics follow the rules. Specifically: tablegen now checks that a diagnostic does not start with a capital letter or end with punctuation, except for the usual exceptions like proper nouns or ending with a question. Now that the code base is clean of such issues, the diagnostics are emitted as an error rather than a warning to ensure that failure to follow these rules is either addressed by an author, or a new exception is added to the checking logic.
Fixes #90941. Add support for ``[[msvc::noinline]]`` attribute, which is actually an alias of ``[[clang::noinline]]``.
And as an extension in older language modes. Per https://eel.is/c++draft/lex.string#nt:d-char Fixes #93130
…(#91599) This reverts commit aa9d467.
…le::makeUniqueName()`. (#89057) E.g. during inlining new symbol name can be duplicated and then `ValueSymbolTable::makeUniqueName()` will add unique suffix, exceeding the `non-global-value-max-name-size` restriction. Also fixed `unsigned` type of the option to `int` since `ValueSymbolTable`' constructor can use `-1` value that means unrestricted name size.
"const" being removed in this patch prevents the move semantics from being used in: AI.CallStack = Callback(IndexedAI.CSId); With this patch on an indexed MemProf Version 2 profile, the cycle count and instruction count go down by 13.3% and 26.3%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords.
There was existing support for constant folding a `linalg.generic` that was actually a transpose. This commit adds support for the named op, `linalg.transpose`, as well by making use of the `LinalgOp` interface.
This change updates the dataLayout string to ensure alignment with the latest LLVM TargetMachine configuration. The aim is to maintain consistency and prevent potential compilation issues related to memory address space handling.
fir.box_rank codegen was invalid, it was assuming the rank field in the descriptor was an i32. This is not correct. Do not hard code the type, use the named position to find the type, and convert as needed in the patterns.
CONFLICT (content): Merge conflict in clang/test/Driver/mlong-double-128.c CONFLICT (content): Merge conflict in clang/test/Driver/mlong-double-64.c
Rename things in a couple of places to make the code a bit clearer.
…ing when parsing declaration DIEs. (#92328) This reapplies llvm/llvm-project@9a7262c (#90663) and added llvm/llvm-project#91808 as a fix. It was causing tests on macos to fail because `SymbolFileDWARF::GetForwardDeclCompilerTypeToDIE` returned the map owned by this symol file. When there were two symbol files, two different maps were created for caching from compiler type to DIE even if they are for the same module. The solution is to do the same as `SymbolFileDWARF::GetUniqueDWARFASTTypeMap`: inquery SymbolFileDWARFDebugMap first to get the shared underlying SymbolFile so the map is shared among multiple SymbolFileDWARF.
…ounding ops. (#93356) The elements that aren't sNans need to get passed through this fadd instruction unchanged. With the agnostic mask policy they might be forced to all ones.
Summary: There was a bug here where we would initialize the plugin multiple times when there were multiple images. Fix it by putting the `is_initliaized` check later.
This adds - `mlir::tosa::populateTosaToLinalgTypeConversion` which converts tensors of unsigned integers into tensors of signless integers - modifies the `tosa.reshape` lowering in TosaToTensor to use the type converter correctly I choose to implement the type converter in `mlir/Conversion/TosaToLinalg/TosaToLinalg.h` instead of `mlir/Conversion/TosaToTensor/TosaToTensor.h` because I need the same type converter in the TosaToLinalg lowerings (future PR). Alternatively, I could duplicate the type converter so it exists both in TosaToLinalg and TosaToTensor. Let me know if you prefer that.
This patch fixes: clang/unittests/Interpreter/IncrementalProcessingTest.cpp:39:13: error: unused function 'HostSupportsJit' [-Werror,-Wunused-function]
These interfaces are LLVM interfaces, not Clang ones; but this worked because of LLVM.h adding the interfaces to the clang namespace.
Skip explicit this check in non-valid scopes due to `null` type in lambdas with invalid captures or incomplete parameter lists during parsing Fixes #91536
This should address the post-commit failure introduced in intel/llvm#14353 Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
ESIMD kernel doesn't support noinline functions, so we can't support sanitizer for it now. We can't check if the kernel is ESIMD kernel at Unified Runtime, so we have to disable ASan completely when it found ESIMD kernel in device image.
Changes the semantics of [sycl_ext_oneapi_enqueue_barrier](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_enqueue_barrier.asciidoc) commands recorded from a queue into a SYCL-Graph. The current semantics are that a barrier without a wait-list will depend on all leaf nodes in the graph, and be a dependency of any subsequent node added to the graph. After discussion with users, this change updates the behavior of barriers to only depend on leaf nodes from the same recording queue, and only commands recorded from the same queue will have a dependency on the barrier.
…(#14370) Host device support is deprecated long time ago. Although our internal host task & host accessor implementation was still using it. This change eliminates it and remove possibility to create host queue/context and device. This brings the following changes: - Commands & Events could not guarantee queried Context != nullptr and Queue!= nullptr since for host task stuff no device queue/context is involved. For host task we have submitted queue instance stored in event to be able to report exceptions to user and to be able to properly handle dependencies. Submitted queue for host task is guaranteed to be not null. - Connection command for cross context dependencies is now attached to the queue of new command (dependency for which is being analyzed). Previously it was also related to host queue only. No perf impact is expected. - Stream flush command is now submitted to the same queue as corresponding kernel (previously it was submitted to the host queue). This could bring negative perf impact for stream usage with in-order queue but stream is not perf oriented feature. ABI breaking changes to remove is_host methods and some SYCL_EXTERN stuff will be submitted separately. --------- Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
This PR adds a default constructor for `syclcompat::dim3`, and changes the type of members `x`,`y`,`z` from `const size_t` to `unsigned int`. The default constructor sets all 3 dimensions to `1`. This means patterns like this are now possible: ```cpp syclcompat::dim3 myDim3; myDim3.x = 32; ``` --------- Signed-off-by: Joe Todd <joe.todd@codeplay.com>
These tests pass now. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Unused param/func warnings Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
…DIA and AMD backend in the clang-linker-wrapper for SYCL offloading (#14343) This PR adds third-party (nvptx, amd) AOT support for SYCL offloading in the clang-linker-wrapper tool. Following list of changes are found; 1. Code inside clang-linker-wrapper has been refactored to get it ready for non-SPIRV backends for SYCL offloading. Main changes are inside linkAndWrapDeviceFiles. Explanations are added as comments. 2. A new clang-linker-wrapper option to pass additional device library files for NVPTX backend has been added 3. An additional check to emit error when device library files are unexpectedly empty has been added 4. New tests have been added to test the compilation flow for the new changes and also existing tests have been modified to react to the additional check correctly. E2E tests and more extensive driver tests will be added in the next stage. Thanks --------- Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
SYCL language support on the part of the kernel_compiler is specified here: intel/llvm#11985 However, that specification is not presently realizable. We need more support from the FE and post link tool to get the demangled names propagated through. But it is usable before that, with constraints about using extern "C" or knowing the mangled kernel name. We have folk that want to test in the interim. I've refrained from updating the spec, and instead this interim support in our experimental extension will be snuck in until it can be completed in full - then we'll update the spec to release it properly.
To be re-enabled shortly. Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
post commit has unused var warnings that need tidying
intel/llvm#14397 The hang causes all jobs after to fail because the install dir can't be deleted, we need to disable these ASAP. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Today we don't pass the gpu lit params so we see things running and failing in the nightly that are already disabled in pre/postcommit. Example: https://github.com/intel/llvm/actions/runs/9754298775/job/26921343546 ``` FAIL: SYCL :: Regression/in_order_barrier_profiling.cpp (1746 of 2090) ******************** TEST 'SYCL :: Regression/in_order_barrier_profiling.cpp' FAILED ******************** Exit Code: -6 Command Output (stdout): -- # RUN: at line 1 /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/Regression/in_order_barrier_profiling.cpp -o /__w/llvm/llvm/build-e2e/Regression/Output/in_order_barrier_profiling.cpp.tmp.out # executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/Regression/in_order_barrier_profiling.cpp -o /__w/llvm/llvm/build-e2e/Regression/Output/in_order_barrier_profiling.cpp.tmp.out # note: command had no output on stdout or stderr # RUN: at line 2 env ONEAPI_DEVICE_SELECTOR=opencl:gpu /__w/llvm/llvm/build-e2e/Regression/Output/in_order_barrier_profiling.cpp.tmp.out # executed command: env ONEAPI_DEVICE_SELECTOR=opencl:gpu /__w/llvm/llvm/build-e2e/Regression/Output/in_order_barrier_profiling.cpp.tmp.out # .---command stderr------------ # | in_order_barrier_profiling.cpp.tmp.out: /__w/llvm/llvm/llvm/sycl/test-e2e/Regression/in_order_barrier_profiling.cpp:42: int main(): Assertion `KernelEnd <= BarrierStart' failed. # `----------------------------- # error: command failed with exit status: -6 ``` and the test already has: ``` // UNSUPPORTED: level_zero || (linux && opencl && gpu-intel-gen12) ``` Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
…14243) Resolve issue with -save-temps regarding a temp file being linked to itself. --------- Co-authored-by: Marcos Maronas <maarquitos14@users.noreply.github.com>
This moves the FPBuiltinFnSelection pass to the llvm/lib/Transforms/Scalar directory. This is needed to enable future changes that will run this pass as part of the main pipeline for device compilation. With this pass in CodeGen, the slibs build would fail.
Improve the kernel fusion end-to-end tests: * Remove the flag to embed IR on CUDA and AMD backend from the tests that abort execution before requiring access to IR. * Check for the number of kernel launches to make sure fusion did not fail silently. --------- Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
intel/llvm#14339 moved the FPBuiltinFnSelection pass, but post-commit testing is failing due to the Transform/Scalar CMake not linking with TargetParser. This commit fixes this issue. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
…4264)
This PR adds a missing feature in SYCL hierarchical parallelism support.
Specifically, this PR adds support for the case when there are functions
between parallel_for_work_group and parallel_for_work_item in the call
stack.
For example:
void foo(sycl::group<1> group, ...) {
group.parallel_for_work_item(range<1>(), [&](h_item<1> i) { ... });
}
// ...
cgh.parallel_for_work_group<class kernel>(
range<1>(...), range<1>(...), [=](group<1> g) {
foo(g, ...);
});
---------
Signed-off-by: Sudarsanam, Arvind <arvind.sudarsanam@intel.com>
This aspect is not required because bfloat16 math functions are implemented for all devices via generic implementations. This PR updates this status inline with the main bfloat16 extension/doc. --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
ShengchenJ
approved these changes
Jul 4, 2024
Signed-off-by: chenwei.sun <chenwei.sun@intel.com>
Signed-off-by: chenwei.sun <chenwei.sun@intel.com>
Signed-off-by: chenwei.sun <chenwei.sun@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Signed-off-by: chenwei.sun chenwei.sun@intel.com